Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #74: Contributing to Open Source effectively

Return to episode page view on github
Recorded on Tuesday, Apr 17, 2018.

00:00 - Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:06 This is episode 74, recorded on April 17th, 2018.

00:11 I'm Michael Kennedy.

00:11 - And I'm Brian Okken.

00:12 - And I'm Matt Harrison.

00:14 - Yeah, that's right, we got a special guest here on the show, so Brian and I decided to invite Matt Harrison to mix things up a little bit and bring a slightly different perspective.

00:23 And we decided we were gonna try this little experiment from time to time, you know, once a month, once every six weeks, something like that.

00:29 So Matt, welcome to the show.

00:30 - Thanks, thanks for having me.

00:32 Good to be here.

00:32 - Good to have you here.

00:33 So before we get into the topics, I just want to say thank you to Datadog.

00:36 They're sponsoring this episode.

00:38 Check them out at pythonbytes.fm/datadog.

00:41 I'll tell you more about them later.

00:43 Brian, you've been running your own open source project lately, and it's been fun, right?

00:47 - Yeah, I started a little project.

00:48 I called it Cards, but the intent was to, it's on GitHub, and the intent is to talk about like how I'm gonna go about testing it using all the what I think of as good test methodologies, but doing it in the open on an open-source project, I'm getting contributors already, which is really cool.

01:06 >> That's awesome.

01:07 >> I don't really know the conventions and even though I've been programming for decades, I'm new to actually contributing to open-source.

01:14 >> Well, and it's different on GitHub than it was, say, in Subversion or something like that.

01:19 There's the whole GitFlow and PRs and that whole thing.

01:22 >> Yeah, and how to deal with branching and forking with GitHub versus doing something out in the open on GitHub is one thing, but actually running it like a normal open-source project is completely different.

01:37 I had a, actually, Anthony Shaw did a pull request on Cards, and he did a really good job on it.

01:45 One of the things is he started it with a WIP for a work in progress, and I didn't know that was a convention already.

01:52 I looked around and there's some, If people are new to this and want to learn more, I'm sure everybody out there already knows about open source stuff.

02:00 But anyway.

02:01 I would not make that assumption.

02:03 I definitely would.

02:04 I mean, it feels that way, but I think it's more of a dark matter experience where you see the stuff that people are doing, but that doesn't mean everyone is participating.

02:15 Many people, I think, are just wanting to get into it.

02:18 What do you think, Matt?

02:19 I've been using open source since the mailing list days, and you had to talk over the mailing list.

02:24 And so GitHub, when it first came out, was sort of a big change for me.

02:28 And I think a lot of people use GitHub as just, "I'm going to throw my source over the fence and see what happens to it," rather than maintaining it as a real open source project and trying to include community and get contributions.

02:40 So I'm curious to see how this works.

02:43 And I think, like Michael said, don't make the assumption that everyone knows how to do these things, even though some people are doing it.

02:49 There's certainly a lot of people who can learn from this.

02:51 There's a couple of things I wanted to throw out.

02:53 This is from 2015 already, but it's an article called "How to Write the Perfect Pull Request." And it kind of talks about the philosophy sort of both with approaching the pull request when you're first getting started, and it even talks about the WIP trick to tell the person that owns the repo, "You're not done with it.

03:11 You just kind of like early feedback on it." And then there's some advice on offering feedback to pull request submitters and also responding to that feedback as the submitter as well.

03:23 It's a short read, but there's some good tips there.

03:27 Then I'm also really excited about VM's new book coming out from Pragmatic called Forge Your Future with Open Source.

03:34 It includes things like how to deal with pull requests and everything.

03:37 >> That sounds like such an interesting book. I love it.

03:40 >> I'd just like to say that I think getting using Open Source and getting involved for someone who wants to get a break into learning Python or learning a library or even getting a job, that's an awesome way to do it.

03:53 - I agree, and one of the problems people run into is they work in a place that doesn't use Python and so they don't have a place to actually practice Python outside of just like toy things.

04:04 So contributing to open source lets you make a meaningful contribution, even though maybe you're a Java or a .NET job and they're like, Python, no.

04:12 - Yeah, they might even be using Subversion or some other source control.

04:17 - Absolutely.

04:18 Cool, this is a good one, Brian.

04:19 So Matt, you and I both do in-person training periodically, and I just did a class last week, I guess, a short class, and it was a lot of folks coming from a traditional sort of MATLAB, Mathematica background, and moving into things like Jupyter, and I think that this might be a trend.

04:36 What do you think?

04:37 - Yeah, I think we're seeing Jupyter changing from early adopter to sort of normalcy.

04:44 I found a thing going around Twitter.

04:45 Paul Romer, who's an economy professor at NYU, tweeted about his experience with Mathematica and Jupyter.

04:54 And he referenced an article in The Atlantic about both of those products, Mathematica and Jupyter.

05:00 And for me, like I said, I've been involved with open source for a long time.

05:04 You don't often see stuff in The Atlantic or professors, economy professors.

05:08 This isn't a computer science professor.

05:10 This is an economy professor posting about Python and Jupyter, so really cool stuff there.

05:17 - This is such an interesting find.

05:19 If you open, people should open up the Atlantic link that you're linking to here, because wow, that's a pretty provocative picture.

05:27 There's a formal paper, it says, "The scientific paper is obsolete." And there's a paper, like a academic paper, that's literally on fire, like in an animated way, like full screen, it's crazy.

05:39 - Yeah, and it makes reference to the discovery of gravitational waves and how that, there was a paper on that, but they also, along with that, published a Jupyter notebook where you could go out on your own and you could look at their code, look at their data, and it had embedded text in it as well, and basically discover gravitational waves, or go through the same sequence and reproduce their science.

06:02 So I thought that was pretty cool.

06:04 A quote from Mr. Romer, he says that, "Jupyter is the new open-source alternative to Mathematica "that is well on its way to becoming the standard for exchanging research results.

06:13 - I agree, I think academics has been too dependent on these couple of big, really expensive lock-in type of things, like Mathematica and MATLAB.

06:23 I'm also thinking of journals and stuff like that.

06:25 This sort of open-source paper in the form of Jupyter kind of touches on both of those.

06:29 Brian, what do you think?

06:30 - I'm actually really excited about all that.

06:31 I was just listening to a topic not too long ago about how the, actually it was one of your podcasts, about the academic journals that are, A lot of times nobody actually follows the steps along, but having the code out on Jupyter Notebooks just allows everybody to go and follow along right there.

06:48 - And one of the main points of these articles isn't that there's a notebook per se, but the compelling reason for using Python and Jupyter is not necessarily that the technology is better, but that there's a huge community around it.

07:02 So, you know, they make the argument that the Wolfram notebook might be prettier or whatnot, but you have so many people who are contributing to these open source projects.

07:12 You've got Matplotlib for graphics, SymPy for symbolic math, NumPy, SciPy, Pandas, NLTK.

07:19 And if you look at PyPI, there's 135,000 packages last week on that, and it's really hard to compete with that.

07:28 That's super compelling, so really cool.

07:30 - Yeah, it is super compelling.

07:31 So speaking of community, Brian, you know I love to pull on the Stack Overflow Developer Survey and try to dig out results from the community, right?

07:40 - Yes.

07:41 - Yeah, that's always fun.

07:42 So there's another one that just came out that gives us a different perspective and also is more Python focused than that one, right?

07:50 That's like broad software development.

07:52 So JetBrains, the PyCharm team, teamed up with the PSF just at the end of last year to do a Python developer survey.

08:01 And so the thing I'm linking to is Python developer survey 2017 results, but it's like December, right?

08:06 So it's pretty relevant still, it's pretty fresh.

08:09 They were just talking about it on their blog.

08:11 And so this is a really nice piece of sort of almost journalism around data science, I think.

08:18 It's actually, they've really written it up nice.

08:19 They show you graphs and they're like, here's the main takeaway from this section, here's the main takeaway from that section.

08:24 So how about I share some takeaways with you?

08:27 - Yes. - Yeah.

08:28 - All right, so the first one is, says of the people that they interviewed, Now, this is obviously a self-selecting crowd, but the question was, you're obviously doing Python.

08:40 Is this your main language or a secondary language?

08:42 They said 80% of the people, Python is their main language.

08:46 That was pretty cool.

08:47 They said data analysis is actually just as popular as web development, which is pretty cool.

08:55 So, there's basically as many Python web developers as there are data scientists.

08:59 Does that surprise either of you?

09:01 To me, Python felt like a web thing for most folks.

09:04 - Yeah, I mean, recently it wouldn't surprise me, but if you would have said that two years ago, it probably wasn't the case two years ago, but now, yeah, it's not surprising to me.

09:13 - Yeah, they also talk about the growth of Python, and Brian and I, you and I, we've touched on this a few times, but they're also confirming, like, we think that massive hockey stick growth is largely data science people coming in.

09:24 - Yeah, it could be.

09:25 I think there's a lot of room to grow.

09:27 There's a lot of people who are using Excel and some of these tools that you mentioned who probably want to migrate to something like Python for the libraries and machine learning capabilities.

09:36 - Yeah, another interesting one was Python versus Legacy Python.

09:40 So Python is at 75% usage among this group and 25% for Python 2.

09:48 And if you look at the curve, that's like increasing in time, like the rate at which people are moving to Python 3.

09:54 So that's really cool.

09:56 - Yeah, that's cool.

09:57 Yeah, you wonder how much self-selecting is there, right?

10:00 The legacy laggards didn't want to participate in this.

10:03 - That's right, I don't even know the stinking surveys.

10:05 I learned everything about Python I need to know in 2008.

10:09 Okay, so they also talked about where code runs, where people run their Python code.

10:13 And this is, I don't think includes like the hosted notebook type stuff, so probably not that.

10:18 But 67% AWS.

10:21 Brian, does that surprise you?

10:22 - I'm gonna plead, I'm not in the field, in the web sort of space to know really where it's running.

10:29 My basis for judging the use of AWS compared to other platforms is when AWS goes down, what parts of the internet are no longer accessible?

10:39 And they're pretty broad.

10:40 - Yeah, I would think that it'd actually be a little bit higher than that.

10:43 The ones that surprised me was you've got Google App Engine at 29, Heroku at 26, Digital Ocean at 23, and the last one they say is Microsoft Azure at 16.

10:54 And I think that 16's probably gonna change a little bit.

10:57 they've been doing a lot of hiring in the Python space and getting some prominent Python people.

11:01 So there's some great--

11:02 - Yeah, there's some great--

11:03 - Push from Microsoft on that.

11:04 - They're definitely focused on Python in a lot of important ways.

11:08 They now have Azure Notebooks.

11:09 They have Brett Cannon, Steve Dower, both Python Core devs working there.

11:15 They brought the guy who did the Python extension for Visual Studio Code in-house.

11:18 Like, they're doing a bunch of cool stuff.

11:20 All right, so a few more takeaways.

11:22 Team size, right?

11:23 Like, you think of how big of a group do you, you know, like how large of a team do you work on?

11:29 And if you think about, like, one of the advantages of Python is you don't need a large set of people to build something interesting, and I think that's reflected here.

11:36 So it says, like, team size, two to seven people, 75%, 74% of the respondents are in that two to seven group.

11:43 And then eight to 12 is 16, and then basically above that, above 12 people, all the way up to like 40 or larger, is 9% of the balance, basically.

11:54 So really small teams.

11:56 And then operating systems, Brian, you touched on this a lot.

11:58 49% of the people are still using, are currently using Windows as their OS.

12:03 Then 19% for Linux, 15% for Mac.

12:06 And like you said before, Windows often gets the short end of the stick in sort of testing and examples and stuff, but they probably shouldn't.

12:13 - Yeah, one of the things I want to go back to is the cloud platforms that we talked about.

12:19 One of the things that's interesting there is that clearly some people are running on multiple platforms because that's over 100%.

12:26 - Yeah, that's interesting.

12:27 I can tell you for sure that if somebody asked me which of these platforms do you use, I would definitely check the DigitalOcean and AWS boxes.

12:35 Because, for example, the main server for our podcast and the database server runs on DigitalOcean Droplets, but when you interact with it and you get, like say, an email, especially around the training stuff, that goes through Amazon Simple Email Notification Service and things like that.

12:51 there's this blend of them. Yeah, I'm similar. I've done Heroku and Digital Ocean and both had stuff in S3 as well, so it's not a either/or. One of the things I thought was interesting was the operating systems. I mean, like you said, Windows tends to get, you know, people have something in their heart against it or whatnot, but I was surprised that Mac was so low on this. Yeah. Interesting. I wouldn't have thought that at all, but... You go to the comp... I'm telling you, Dark Matter - Better developers, that's what it is.

13:22 - Yeah.

13:23 - It's interesting, I think the story on Windows is gonna get better.

13:26 I believe the new version of Python is gonna use MSBuild and not Visual Studio 2008 for its compilation stuff during install, which means like modern versions of Windows will be able to install stuff without like installing a 2008 version of Visual Studio, which will be real nice.

13:45 All right, so before we get to the next one, I wanted to just tell you both a little bit about Datadog.

13:50 So speaking of stuff hosted in the cloud and spinning multiple machines and things like that, Datadog is a monitoring solution that provides visibility and tracks down issues with distributed systems involving Python applications.

14:04 So within just a few minutes, you can find bottlenecks in your code by exploring graphs and rich dashboards, and you can visualize your whole performance across all of your apps, which, when you're doing distributed programming or distributed apps, microservice type things, That's a huge deal.

14:19 And you can go to pythonbytes.fm/datadog, do a quick little trial there, and you'll get a free Datadog t-shirt, which is pretty cute.

14:28 So check them out and let them know you appreciate them supporting the show.

14:32 All right, Brian, I'm a big fan of databases, especially shiny new ones.

14:36 You've got a really new one yet.

14:37 Like, I can't even get this one yet, but it's still pretty cool.

14:39 - Yeah, you can't get it, but one of our listeners, Arash, I think that's his name, Anyway, he let us know about EdgeDB, and EdgeDB has a blog post up, which says, "EdgeDB, a new beginning." And at first I thought, yeah, okay, we'll keep an eye on this, and maybe we'll cover it later when we can actually play with it, because it's a new database that's not available to use yet.

15:03 However, it's gonna be open source, and the reason why I brought it up now is because it's coming from some fairly interesting people.

15:10 - It has some pretty powerful Python origins, right?

15:13 - Yeah, well, so like for instance, the Elvis and Yuri, and I'm not gonna try to pronounce their last names or they will flame me, part of this, and they're the people that brought us AsyncIO and UVLoop.

15:26 So that's pretty impressive.

15:27 - That's very impressive.

15:28 - One of the things that's interesting is looking at the kind of code that you get with this.

15:32 So they're trying to attack, the problem they're addressing is that document databases have some issues with just scalability after your project gets larger.

15:44 The schema-less part of it sometimes can be hard to deal with.

15:49 A lot of people deal with it fine, but they see it as part of a problem.

15:52 Relational databases are growing a lot and Postgres, for instance, is keeping up to date.

15:58 But the interface, how you interact with the database, the schemas, and the underlying API to the database, it hasn't changed much in a long time.

16:07 So they're trying to change that and I forget what they call it, an object relational?

16:14 >> They call it an object database.

16:15 >> Object relational database.

16:17 >> Yeah.

16:18 >> Yeah, not like the traditional ones they say from the '80s.

16:21 >> Yeah. So one of the things to look at if you're going to look at anything is to go to the link and look at the example.

16:28 They have a new query language called EdgeQL.

16:30 So they have a different way to write a schema that's fairly, Doesn't really look like python, but it's a it's type based and it's fairly expressive. It's pretty interesting So instead of saying like let's have a class and map that to the database like say sequel comey or mongo engine might They said we're going to define our own data definition language our own DSL, but it's really Incredibly simple like doing the the relationships with cascading this and that and see what I'll give me I always get that wrong I have to look it up.

17:03 And this is like, you want a foreign key relationship, you say link assignees, goes to a user definition, and the cardinality is double stars, so I'm guessing multiple, many to many sort of thing.

17:14 And that's on like, incredibly short there.

17:17 So my first impression was like, really, a new schema definition language?

17:21 A new query language?

17:24 Seriously, it's just like, okay, well, I'm tired of SQL, and I'm tired of the other ways of programming, so we're gonna invent another thing that people are gonna get tired of.

17:32 but it's starting to grow on me.

17:33 Matt, what do you think about this?

17:35 - Whenever you say I'm gonna invent something that's gonna replace SQL, I think you hear a million developers cringe because they all know SQL, right?

17:44 But I think if you can get the five minute out of the box presentation where it's like, this is a compelling reason to use it and everyone, or at least most Python people I know want to use an ORM and interact with the database that way, but there is this impedance mismatch with those.

17:59 So if you can nail that down and have a really smooth, five minute, out of the box experience with this, I think you could get a lot of people interested in them.

18:09 - Yeah, it's pretty interesting.

18:10 I'm glad you brought it, Brian, thanks.

18:11 All right, Matt, so you're a fan of the Wizard of the Oz, is what I'm to draw from this next one.

18:16 - Yeah, yeah, follow the yellow brick road.

18:18 So I do corporate training and I do consulting, and one of the things that I do when I'm doing data stuff is visualization, visualization's pretty important.

18:28 I mean, I've literally found bugs by visualizing something that we couldn't have found just by looking at the data necessarily.

18:35 And so visualization is also important in the evaluation of machine learning projects.

18:40 And one of the projects that I've been liking and using recently is a project called Yellow Brick.

18:44 So I guess this will take you to the Wizard of Oz if you follow it.

18:48 It's not a new project necessarily, but it's a project that's alive and going and being worked on still.

18:54 And what it does is it offers visualizations for various machine learning algorithms.

18:59 So if you use a tool like Scikit-learn, you can go to their website and they'll have all these visualizations up there, but those aren't included in the library for Scikit-learn.

19:08 You either need to--

19:09 - Right, you've gotta go create them yourself, right?

19:11 - Yeah, you gotta either copy and paste their code or go find some stack overflow.

19:15 So what I've been doing, I mean, I have a project on my GitHub, mlvis, that I just have my own, here's the visualizations that I commonly use, and then I use my little library.

19:25 but I'm looking to replace it with this and I've been using this for some of my training as well recently.

19:31 So it's got visualizations for classification, regression, clustering, and text.

19:36 One of the cool things about it is if you're familiar with SKLearn or Scikit-learn is that it has a similar API to that.

19:44 So there's a fit, you can fit your visualization, you can transform it, and then you call this method called poof and that will pull up a map plot for your poof.

19:52 That's the magic method that they have.

19:54 How do you spell poof?

19:55 P-O-O-F.

19:56 P-O-O-F.

19:57 Poof.

19:58 Yeah, poof.

19:59 Gotcha.

20:00 Yeah.

20:01 Perfect.

20:02 Love it.

20:03 So just a nice little library to, you know, one of those things that can be annoying or that you always go and copy and paste that code and if you can just pip install this and use it and it has a great interface, it makes your life a little bit easier.

20:14 Yeah, absolutely.

20:15 So, Brian, the next one, the last one that I want to cover, comes from the whole Alexa thing.

20:22 A couple of people write us about interesting things with like say Flask Ask and Alexa skills, right?

20:27 - Yeah.

20:28 - Yeah, so this one is a little bit of a serious one or at least addressing a serious problem, right?

20:34 It's not like putting mustaches on cats but like it's actually trying to solve a problem.

20:38 That, although that would be a hard thing to do audibly on Alexa.

20:42 Nonetheless, so this one, this one is called Depression AI and it's an Alexa skill.

20:49 I apologize, everybody's little device is probably going off.

20:52 It's a Amazon device skill for people who are suffering with depression.

20:59 It's open source, it's based on Flask Ask, which I covered pretty deeply on episode 146 of Talk Python.

21:08 So that's basically a way to use Flask to write these Amazon voice assistant skills, which is pretty cool.

21:14 So the idea is that if you are suffering from depression, One of the things that's really hard for people, apparently, who are suffering from depression is to sort of go about your normal daily routine, right?

21:28 Get up, make your bed, take a shower.

21:30 It's like easy to just sort of like stay sprawled out on the couch or the bed or whatever.

21:34 And so it sort of helps to encourage you to keep doing those things, and it's supposed to be able to detect your moods and kind of give you some feedback.

21:44 What do you think?

21:45 - I think that's super impressive.

21:46 I mean, I have relatives who have dealt these sorts of issues and I don't know that they're necessarily ones who would take to technology but anytime you can get some help or get you know some feedback or some someone other you know you're not listening to yourself it can be a good thing. I think this is awesome to have I think there's a lot of people that would aren't somebody that wants to go talk to somebody else but having making the decision to put this in place when they're feeling good and then have it help them through the hard times this would be great. Yeah it's pretty cool at I won the Valley Hackathon, which I think is, I think that's in Modesto, sort of outside San Francisco.

22:26 But this was apparently built like, what is that, a weekend or something?

22:30 Which is also a pretty big testament.

22:32 So you can do things like, it'll evaluate your mood.

22:36 It actually has suicidal intervention.

22:38 It has location-based recommendations, and mostly helps you with small activities.

22:41 So you can say things like, "Alexa, check on me," or "I feel down," or "Help me feel better," I haven't gotten out of bed today.

22:49 It'll ask you things like, have you gotten out of bed yet?

22:51 Things like that.

22:52 So it's pretty cool.

22:53 And it's also open--

22:55 Yeah, it's open source and on GitHub and based on Python.

22:58 So if this is inspiring, even if it's a totally different subject area, take it and use it as an example.

23:04 Nice, well, that's it for our official news.

23:07 Brian, you got anything you wanna share with the world while we're here?

23:09 - I've got some good news and some bad news.

23:12 So the good news is, I went to an estate sale the other day And I bought a book called How to be Interesting in 10 Simple Steps.

23:21 So that's the good news.

23:22 The bad news is I'm a really slow reader, so it might take a while to take effect.

23:26 No, I've just skimmed it so far, so I haven't even started yet.

23:31 - Well, what's a book?

23:32 - Yeah, very good.

23:33 Those are dangerous.

23:34 (laughing)

23:35 - It was printed a long time ago, before we had e-books.

23:38 - Is it one of those things on paper?

23:39 It's like a tablet, but it doesn't run out of batteries?

23:42 Is that what you're talking about?

23:43 - I've got like another book author Harassing me about physical books. How about you Matt? You got any books lined up?

23:49 I'm working on revamping my pandas one so big demand for pandas and I I want to update mine to the latest version So it's point 17 and that's a couple years out. So sure very cool Oh, yeah, and we also have some news about maybe a course coming out for you Well, we'll leave that as a teaser, but I think a video course maybe in in near future. Yeah. Yeah, maybe I don't know We'll have to see. We'll have to see if we can get our act together. Awesome. All right. Well, Matt, thank you for joining us and dropping in on this podcast. And Brian, thank you as always.

24:21 Yeah, thanks. My pleasure. Thank you. Yep. Bye guys. Bye.

24:24 Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy.

24:48 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page