Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

« Return to show page

Transcript for Episode #74:
Contributing to Open Source effectively

Recorded on Tuesday, Apr 17, 2018.

00:00 KENNEDY: Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 74, recorded on April 17, 2018. I'm Michael Kennedy.

00:12 OKKEN: And I'm Brian Okken.

00:12 HARRISON: And I am Matt Harrison.

00:14 KENNEDY: That's right, we've got a special guest here on the show, so Brian and I decided to invite Matt Harrison to mix things up a little bit and bring a slightly different perspective and we decided we're going to try this little experiment from time to time, once a month, once every six weeks, something like that. So Matt, welcome to the show.

00:30 HARRISON: Thanks. Thanks for having me. Good to be here.

00:32 KENNEDY: Great to have you here. Before we get into the topics, I just want to say thank you to Datadog. They're sponsoring this episode. Check them out at I'll tell you more about them later. Brian, you've been running your own open source project lately, and it's been fun, right?

00:47 OKKEN: Yeah, I started a little project I called it cards, but the intent was to and the intent is to talk about how I'm going to go about testing it using all the, what I think of as good test methodologies. But doing it in the open on an open-source project, I'm getting contributors already, which is really cool.

01:07 KENNEDY: That's awesome.

01:07 OKKEN: I don't really know the conventions, and even though I've been programming for decades, I'm kind of new to actually contributing to open-source.

01:15 KENNEDY: And it's different on GitHub than it was, say, in Subversion or something like that, right? There's the whole GitFlow and PRs and that whole thing, right?

01:22 OKKEN: Yeah, and how to deal with branching and forking with GitHub versus doing something out in the open on GitHub is one thing, but actually running it like a normal open-source project is completely different. Actually, Anthony Shaw did a pull request on cards, and he did a really good job on it, and one of the things is he started it with a WIP for work in progress. I didn't know that was a convention already. I looked around and there's some, if people are new to this and want to learn more, I'm sure everybody out there already knows about open-source stuff, but anyway.

02:01 KENNEDY: I would not make that assumption. I mean, it feels that way, but I think it's more of a dark matter sort of experience, where you see the stuff that people are doing but that doesn't mean everyone is participating. And many people I think are just wanting to get into it. What do you think, Matt?

02:19 HARRISON: I've been using open-source since the mailing list days and you had to talk over the mailing list, and so GitHub, when it first came out, was sort of a change for me, and I think a lot of people use GitHub as just, I'm going to throw my source over the fence and see what happens to it rather than maintaining it as real open-source project and trying to include community and get contributions, so I'm curious to see how this works, and I think like Michael said, don't make the assumption that everyone knows how to do these things, even though some people are doing it. There's certainly a lot of people who can learn from this.

02:52 OKKEN: There's a couple things I wanted to throw out. This is from 2015 already. But it's an article called "How to Write the Perfect Pull Request", and it kind of talks about the philosophy sort of both with approaching the pull request when you're first getting started, and it even talks about the WIP trick to tell the person that owns the re-po, you're not done with it, you just kind of like early feedback on it. Then there's some advice on offering feedback to pull request submitters, and also responding to that feedback as the submitter, as well. It's a short read, but there's some good tips there. I'm also really excited about VM's new book coming out from Pragmatic called "Forge Your Future with Open Source", and it includes things like how to deal with pull requests and everything.

03:37 KENNEDY: That sounds like such an interesting book, I love it.

03:40 HARRISON: I'd just like to say, using open-source and getting involved for someone who wants to get a break into learning Python, or learning a library, or even getting a job, that's an awesome way to do it.

03:53 KENNEDY: I agree, and one of the problems people run into is they work in a place that doesn't use Python, and so they don't have a place to actually practice Python outside of just like, toy things. So contributing to open-source lets you make a meaningful contribution even though maybe you're a Java, or , and they're like, Python, though.

04:12 HARRISON: Or they might even be using Subversion or some other source control.

04:17 KENNEDY: Absolutely. Cool, this is a good one, Brian. Matt, you and I both do in-person training periodically. I just did a class last week, I guess, a short class, and it was a lot of folks coming from a traditional, Matlab? Mathematica background? And moving into things like Jupyter, and I think that this might be a trend. What do you think?

04:37 OKKEN: Yeah, I think we're seeing Jupyter changing from early adopter to sort of normalcy. I found a thing going around Twitter. Paul Romer, who's an economy professor at NYU, tweeted about his experience with Mathematica and Jupyter, and he referenced an article in The Atlantic about both of those products, Mathematica and Jupyter, and for me, like I said, I have been involved with open-source for a long time. You don't often see stuff in The Atlantic, or professors, economy professors. This isn't a computer science professor, this is an economy professor posting about Python and Jupyter. Really cool stuff there.

05:17 KENNEDY: This is such an interesting find. People should open up The Atlantic link that you're linking to here, because wow, that's a pretty provocative picture. There's like a formal paper that says the scientific paper is obsolete. There's like a paper, like an academic paper, that's literally on fire, like in an animated way. Like full screen, it's crazy.

05:39 OKKEN: Yeah, and it makes reference to the discovery of gravitational waves and how that, there was a paper on that but they also along with that published a Jupyter notebook where you could go out on your own and you could look at their code, look at their data, and it had embedded text in it as well, and basically discover gravitational waves, or go through the same sequence and reproduce their science. So I thought that was pretty cool. A quote from Mr. Romer, he says that "Jupyter is a new open-source alternative to Mathematica that is well on its way to becoming the standard for exchanging research results."

06:13 KENNEDY: I agree, I think academics has been too dependent on these couple of big, really expensive lock-in type things. Mathematica and Matlab, I'm also thinking of journals and stuff like that. This sort of open-source paper in the form of Jupyter kind of touches on both of those. Brian, what do you think?

06:30 HARRISON: I'm actually really excited about all that. I was just listening to a... topic not too long ago about how, actually, it was one of your podcasts, about the academic journals that are, a lot of times nobody actually follows the steps along. But having the code on Jupyter notebooks just allows everybody to go and follow along right there.

06:47 OKKEN: One of the main points of these articles isn't that there's a notebook per se, but that a compelling reason for using Python and Jupyter is not necessarily that the technology is better, but that there's a huge community around it. They make the argument that, the Wolfram notebook might be prettier or whatnot, but you have so many people who are contributing to these open-source projects. You have Matplotlib for graphics, SymPy for symbolic math, NumPy, SciPy, Pandas and NLTK, and if you look at PyPI, there's 135,000 packages last week on that, and it's really hard to compete with that. That's super compelling.

07:29 KENNEDY: Yeah. It is super compelling. Speaking of community, Brian, you know I love to pull on the stack overflow developer survey, and like try to dig out results from the community, right?

07:40 HARRISON: Yes.

07:40 KENNEDY: Yeah, that's always fun. So there's another one that just came out that gives us a different perspective, and also is more Python-focused than that one, right? That's like broad software development. JetBrains, the PyCharm team, teamed up with the PSF just at the end of last year to do a Python developer survey and so the thing I'm linking to is Python developer survey 2017 results, but it's like, December, right? It's pretty relevant still, it's pretty fresh. They were just talking about it on their blog. This is a really nice piece of sort of almost journalism around data science, I think. They've really written it up nice, they show you graphs and they're like, here's the main takeaway from this section, here's the main takeaway from that section, so how about I share some takeaways with you? Alright. The first one says, of the people that they interviewed, this is obviously a self-selecting crowd, but the question was, "You are obviously doing Python. Is this your main language or a secondary language?" They said 80% of the people, Python is their main language. That was pretty cool. They said data analysis is actually just as popular as web development, which is pretty cool. There's basically as many Python web developers as there are data scientists. Does that surprise either of you? To me, Python felt like a web thing for most folks.

09:03 HARRISON: Yeah, I mean recently, it wouldn't surprise me but if you would have said that two years ago, it probably wasn't the case two years ago, but now, yeah, it's not surprising to me.

09:13 KENNEDY: Yeah, they also talk about... The growth of Python, and Brian and I, you and I, we've touched on this a few times but they are also confirming like we think that massive hockey stick is largely data science people coming in.

09:24 HARRISON: Yeah, it could be. There's a lot of room to grow. There's a lot of people who are using Excel and some of these tools that you've mentioned who probably want to migrate to something like Python, per the libraries and machinery and capabilities.

09:36 KENNEDY: Yeah. Another interesting one was Python vs. Legacy Python, so Python is at 75% usage among this group and 25% for Python 2. If you look at the curve, that's like increasing in time the rate at which people are moving to Python 3. That's really cool.

09:57 HARRISON: Yeah, you wonder how much self-selecting is there, right, the Legacy laggers didn't want to participate in this, but....

10:04 KENNEDY: I don't need no stinkin' surveys. I wouldn't know anything about Python I need to know in 2008. OK, so they also talked about where code runs, where people run their Python code, and this is, I don't think it includes the hosted notebook type stuff. So, probably not that. But 67% AWS, Brian, does that surprise you?

10:22 OKKEN: I'm going to plead, I'm not in the field in the web sort of space to know, really, where it's running.

10:29 KENNEDY: My basis for sort of judging the use of AWS compared to other platforms is when AWS goes down, like, what parts of the internet are no longer accessible? And they're pretty broad.

10:41 HARRISON: Yeah, I would think that it would actually be a little bit higher than that. The one that surprised me was, you've got Google App Engine at 29, Heroku at 26, DigitalOcean at 23, and the last one they say is Microsoft, you're at 16. And I think that 16 is probably going to change a little bit. They've been doing a lot of hiring in the Python space and getting some prominent Python people, so there'll be a great push from Microsoft on the--

11:04 KENNEDY: They're definitely focused on Python in a lot of important ways. They now have Azure notebooks, they have Brett Cannon, Steve Dower, both Python core devs, working there, they brought the guy who did the Python extension for Visual Studio Code in-house, like, they're doing a bunch of cool stuff. Alright, so a few more takeaways. Team size, right, you think of how big of a group do you, how large of a team do you work on, and if you think about one of the advantages of Python is you don't need a large set of people to build something interesting, and I think that's reflected here. So it says team size, two to seven people, 75%, 74% of the respondents are in that two to seven group. Then eight to 12 is 16, and then basically above that, above 12 people, all the way up to 40 or larger, is 9%, at the, at the balance, basically. Really small teams, and operating systems, Brian, you've touched on this a lot. 49% of the people are still using, are currently using Windows as their OS. Then 19% for Linux, 15% for Mac. And like you said before, Windows often gets the short end of the stick in sort of testing and examples and stuff. But it probably shouldn't.

12:13 OKKEN: One of the things I want to go back to is the cloud platforms that we talked about. One of the things that's interesting there is that clearly some people are running on multiple platforms because that's over 100%.

12:26 KENNEDY: Yeah, that's interesting. I could tell you for sure that if somebody asked me, which of these platforms do you use, I would definitely check the DigitalOcean and AWS boxes. Gor example, the main server for our podcast and the database server runs on DigitalOcean Droplets, but when you interact with it and you get, say, an email, especially around the training stuff, that goes through Amazon. Simple email notification service, things like that. There's this blend of them.

12:53 HARRISON: Yeah, similar. I've done Heroku and DigitalOcean and both had stuff in S3 as well, so, it's not a, either-or. One of the things I thought was interesting was the operating systems. I mean, like you said, Windows tends to get, people had something in their heart against it or whatnot, but I was surprised that Mac was so low on this. Interesting. I wouldn't have thought that at all, but...

13:18 KENNEDY: Telling you, dark matter developers, that's what it is. It's interesting, I think that story on Windows is going to get better. I believe the new version of Python is going to use MS build and not Visual Studio 2008 for its compilation stuff during install, which means like, modern versions of Windows will be able to install stuff without installing a 2008 version of Visual Studio, which will be real nice. Alright so before we get to the next one, I want to just tell you both a little about Datadog. Speaking of stuff hosted in the cloud and spanning multiple machines and things like that, Datadog is a monitoring solution that provides visibility and tracks down issues with distributed systems involving Python applications. Within just a few minutes, you can find bottlenecks in your code by exploring graphs on your dashboards, and you can visualize your whole performance across all of your apps, which, when you're doing distributed programming or distributed apps, microservice type things, that's a huge deal. And you can go to, do a quick little trial there and you'll get a free Datadog T-shirt, which is pretty cute. Check them out and let them know you appreciate them supporting the show. Alright, Brian. I'm a big fan of databases, especially shiny new ones. You've got a really new one, I can't even get this one yet. But it's still pretty cool.

14:39 OKKEN: Yeah, you can't get it. But one of our listeners, Arash. I think that's his name. Anyway, he let us know about EdgeDB, and Edge DB has a blog post up, says, "EdgeDB: A New Beginning." At first I thought, yeah, Okaywe'll keep an eye on this, and maybe we'll cover it later when we can actually play with it, because it's a new database that's not available to use yet. However, it's going to be open-source, and the reason why I brought it up now is because it's coming from some fairly interesting people.

15:10 KENNEDY: It has some pretty powerful Python origins, right?

15:13 OKKEN: Yeah, well, so for instance, Elvis and Yuri, and I'm not going to try to pronounce their last names or they will flame me, part of this and they're the people that brought us asyncio and uvloop, so that's pretty impressive.

15:27 KENNEDY: That's very impressive.

15:28 OKKEN: One of the things that's interesting is looking at the kind of code you get with this. They're trying to attack, the problem they're addressing is that document databases have some issues just with sacalability after your project gets larger. The schema part of it sometimes can be hard to deal with. A lot of people deal with it fine, but they see it as part of the problem, and relational databases are growing a lot, and postgres, for instance, is keeping up to date. But the interface, how you interact with the database, the schemas, and the underlying API to the database hasn't changed much in a long time. They're trying to change that. And I forget what they call it, an object relational--?

16:14 KENNEDY: They call it an object database.

16:16 HARRISON: Object relational database, yeah.

16:18 KENNEDY: Yeah, not like the traditional ones, they say, from the '80s.

16:21 HARRISON: Yeah, so one of the things to look at, if you're going to look at anything, is to go to the link and look at the example. They have a new query language called Edge QL, so they have a different way to write a schema that's fairly, doesn't look like Python but it's type based, and it's fairly expressive.

16:40 KENNEDY: It's pretty interesting. Instead of saying, let's have a class, and map that to the database, say, SQLalchemy or mongoengine might, they said, we're going to define our own data definition language, our own DSL. But it's really incredibly simple. Doing the relationships with cascading this and that and SQLalchemy, I always get that wrong, I have to look it up, and this is like, you want to fork your relationship, you say, link assignees, it goes to a user definition and the cardinality is double-star so I'm guessing multiple, many to many, sort of thing. And that's on, like, incredibly short there. My first impression was like, really? A new schema definition language? A new language? Seriously? Well, okay, I'm tired of SQL and I'm tired of the other ways of programming so we're going to invent another thing that people are going to get tired of. But it's starting to grow on me. Matt, what do you think about this?

17:35 HARRISON: Whenever you say, I'm going to invent something that's going to replace SQL, I think you hear a million developers cringe because... They all know SQL, right? But I think if you can get the five minute out of the box presentation where it's like, this is a compelling reason to use it, and everyone, at least most Python people I know want to use like an ORM and interact with the database that way, but there is this impedance mismatch with those. If you nail that down and have a really smooth five minute, out of the box experience with this, I think you could get a lot of people interested in it.

18:09 KENNEDY: Yeah, it's pretty interesting. I'm glad you brought it up, Brian. Thanks. Alright, Matt. You're a fan of the Wizard of Oz, is what I'm to draw from this next one.

18:16 HARRISON: Follow the yellow brick road. I do corporate training and I do consulting, and one of the things that I do when I'm doing data stuff is visualization. Visualization is pretty important. I mean, I've literally found bugs by visualizing something that we couldn't have found just by looking at the data, necessarily. And so visualization is also important in the evaluation of machine learning projects. And one of the projects that I have been liking and using recently is a project called Yellowbrick, so I guess this will take you to the Wizard of Oz if you follow it. It's not a new project necessarily, but it's a project that's alive and going and being worked on still, and what it does is it offers visualizations for various machine learning algorithms. If you use a tool like scikit-learn, you can go to their website and you'll have all these visualizations up there, but those aren't included in the library for scikit-learn.

19:08 KENNEDY: You got to, like, go create them yourself, right?

19:11 HARRISON: Yeah, you got to either copy and paste their code, or go find some Stack Overflow. So what I've been doing, I have a project on my GitHub, em-el-viz, that I just have my own, here's the visualizations that I commonly use, and then I use my little library. But I'm looking to replace it with this, and I've been using this for some of my training as well recently. It's got visualizations for classification, regression, clustering, and text. One of the cool things about it is if you're familiar with sk-learn or scikit-learn is that it has a similar API to that, so there's a fit. You can fit your visualization, you can transform it, and then you call this method called poof, and that will pull up a map

19:49 OKKEN: Poof?

19:52 HARRISON: That's the magic method that they have.

19:54 OKKEN: How do you spell "poof"?

19:56 HARRISON: P-O-O-F.

19:57 KENNEDY: Yeah, poof. Gotcha. Yeah, perfect. Love it.

20:02 HARRISON: Just a nice little library to, one of those things that can be annoying, always go and copy and paste that code and you can just pip install this and use it, and it has a great interface, it makes your life a little bit easier.

20:14 KENNEDY: Yeah. Absolutely. Brian, the next one, the last one that I want to cover comes from the whole Alexa thing. We had a couple of people write us about interesting things like, say, Flask-Ask, and Alexa skills, right? This is one of the, little bit of a serious one, or at least addressing a serious problem, right? It's not like putting mustaches on cats, but it's actually trying to solve a problem. That would be a hard thing to do, audibly, on Alexa. This one is called Depression AI, and it's an Alexa skill. I apologize, everyone's little device is probably going off. It's a Amazon device skill for people who are suffering with depression. It's open-source, it's based on Flask-Ask, which I covered pretty deeply on episode 146 of Talk Python. That's basically the way to use Flask, to write these Amazon voice assistant skills, which is pretty cool. The idea is that if you are suffering from depression, one of the things that's really hard for people apparently who are suffering from depression is to, sort of go about your normal daily routine, right? Get up, make your bed, take a shower, it's like, easy to just sort of stay sprawled out on the couch or the bed or whatever. And it sort of helps you encourage you to keep doing those things, and it's supposed to be able to detect your moods and kind of give you some feedback. What do you think?

21:45 HARRISON: I think that's super impressive. I mean, I have relatives who have dealt with these sorts of issues and I don't know that they're necessarily ones who would take to technology, but anytime you can get some help or get some feedback, or someone other than, you're not listening to yourself, it can be a good thing.

22:03 OKKEN: I think this is awesome. I think that there's a lot of people that would, aren't somebody that wants to go talk to somebody else but having, making the decision to put this in place when they're feeling good, and then have it help them through the hard times. This would be great.

22:18 KENNEDY: Yeah, and it's pretty cool. It won the Valley Hackathon, which I think is... I guess in Modesto? Sort of outside San Francisco. But this was apparently built in, what is that, a weekend or something? Which is also a pretty big testament. You can do things like, it'll evaluate your mood, it actually has suicidal intervention, it has a location-based recommendation, it mostly helps you with small activities. You can say things like, "Alexa, check on me." Or "I feel down." Or "Help me feel better." I haven't gotten out of bed today." It'll ask you things like, have you gotten out of bed yet? Things like that. Yeah, if it's open-source then I'll get up. And based on Python. So if this is inspiring, even if it's a totally different subject area, take it and use it as an example. Well, that's it for our official news. Brian, you got anything you want to share with the world while we're here?

23:09 OKKEN: I've got some good news and some bad news. So the good news is, I went to an estate sale the other day and I bought a book called "How to Be Interesting in 10 Simple Steps." That's the good news. The bad news is I'm a really slow reader so it might take a while to take effect. Nah, I've just skimmed it so far. I haven't even started yet.

23:31 HARRISON: Well, what's a book?

23:35 OKKEN: It was printed a long time ago, before we had .

23:38 KENNEDY: Is it one of those things on paper? It's like a tablet, but it doesn't run out of batteries? Is that what you're talking about?

23:43 OKKEN: I've got another book author harassing me about physical books.

23:47 KENNEDY: You got any books lined up?

23:50 HARRISON: I'm working on revamping my Pandas one. Big demand for Pandas, and I want to update mine to the latest version, so it's .17 and that's a couple years out, so.

23:50 KENNEDY: Very cool. Oh yeah, we also have some news about maybe a course coming out for you. We'll leave that as a teaser, but I think a video course may be in the near future.

23:50 HARRISON: Yeah, yeah, maybe. I don't know. We'll have to see.

23:50 KENNEDY: We'll have to see if we can get our act together. Awesome. Alright. Well, Matt, thank you for joining us and dropping in on this podcast and Brian, thank you as always.

23:50 HARRISON: Yeah, thanks. My pleasure.

23:50 OKKEN: Thank you.

23:50 KENNEDY: Yep. Bye guys.

23:50 HARRISON: Bye.

23:50 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes. That's pythonbytes as in B-Y-T-E-S. And get the full show notes at If you have a news item you want featured, just visit and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page