Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #32: 8 ways to contribute to open source when you have no time

Return to episode page view on github
Recorded on Thursday, Jun 29, 2017.

00:00 Michael KENNEDY: Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode #32, recorded on June 29, 2017. I’m Michael Kennedy.

00:00 Brian OKKEN: And I'm Brian Okken.

00:00 KENNEDY: We've got a bunch of great stuff lined up for you, but first I just want to say apologies for the slightly off audio and my end. I am not dialing in from the Python Bytes studio in Portland, Oregon. I'm actually on the road. So, Brian and I are doing it a little bit different this week.

00:00 OKKEN: It’s ungodly early at 6 a.m. here.

00:00 KENNEDY: I don’t know what your problem is, it’s 2 in the afternoon over here in Ireland. (Laughs) I slept in.

00:00 OKKEN: The magic of Skype.

00:00 KENNEDY: We live in the future we just don't really fully appreciate it.

00:00 Alright, let's talk about web apps. This time you are the one bringing up a web app.

00:00 OKKEN: Yeah, so this is pretty exciting. There's a medium article called, “Introducing Dash”. Dash is a reactive web app, Open Source project from Plotly and it looks really exciting. The graphics in the plots that you can do on this are kind of amazing. It looks like an interactive real time web page with interactive graphs and you hook up input and output and data coming in out. It's really kind of hard to describe but people should check it out because it's amazing.

00:00 KENNEDY: Yeah, it looks really cool and a lot of it is done in Python, right?

00:00 OKKEN: Yeah. There's Python and Pandas and Flask and React and JSON and all sorts of stuff like that involved to make this stuff work. It ends up being some fairly impressive demos with just a handful of lines of code.

00:00 KENNEDY: That’s super cool. So, basically if you're trying to do visualizations with some of the data science tooling, you could just make that available on the web, not as pictures but in a super interactive format, which is great.

00:00 OKKEN: Yeah and they say it’s good for data analysis, data exploration, visualization, modeling; and they also include instrument control and reporting in what they think is a good application. I want to try this for instrument control and visualization myself.

00:00 KENNEDY: Yeah, that looks really cool. I wish I had something to show so I could play with it but I just don't have that much to graph these days. I used to do a lot with science but not in the last 15 years.

00:00 OKKEN: We could do like, I don't know, plotting how much traffic our website gets or something.

00:00 KENNEDY: Yeah, actually, that would be kind of fun, like bandwidth by country or downloads over time. Who knows, we could actually play with that. That might be pretty cool.

00:00 OKKEN: They include a link in this but there's a user guide that has a gallery. They have pricing up so I think it's both something you can use as a service or yourself with the tool.

00:00 KENNEDY: That looks very nice. It will definitely give you that pro touch, if you're trying to put graphs on the internet. And using Python.

00:00 OKKEN: Yeah, especially if you're trying to stay competitive.

00:00 KENNEDY: You know what? There was a Python Language Summit back in the end of May so almost exactly one month ago at the time of this recording. One of the topics that came up was, ‘How do we keep Python competitive?’ And this has two angles. One angle is, ‘How do we keep Python competitive?’, so you don't hear people going, ‘I'm going to rewrite everything and go’ or something silly like that, which seems to be like a meme or something that's happening quite often. But also, how do we get people to move from Legacy Python to Modern Python? There have been a bunch of interesting little features that have been added to Python; the Asyncio stuff we've talked a lot about. Little language touches, like cleaner ways to generate dictionaries from sets of dictionaries, union sort of thing, that kind of stuff. A couple of years ago they really started hitting the drum beat of, ‘You know what? The thing that actually matters the most to people is just flat out performance.’ If we could make Python 3 faster than Python 2, if we could make Python 3 use less memory than Python 2, that is going to be a solid reason for these big companies with big code basis to move to Python 3 and really change that equation. So, this was sort of a conversation about, ‘How do we keep that going?’ at the Language Summit, from what I understand. I’m not entirely clear about how that all goes together. I think this was mostly based on a presentation by Victor Stinner. He's done a ton of stuff for performance in the last couple of versions of Python. I think this style of approaching the problem of, ‘How do we get adoption of Python 3 over Python 2?’ and the decision to say, ‘Well, let's focus on performance.’ I think that's actually working. We saw this, to some degree, with the Instagram presentation we covered last time, right?

00:00 OKKEN: Yeah.

00:00 KENNEDY: So, those guys got, I think 40% less memory usage on their async tier. And they got 12% less CP usage on the web tier. When you talk to about companies like Instagram, that's a lot of servers. That's really nice.

00:00 OKKEN: Yeah. Also, in some of the feedback we've gotten about people switching some applications to asynchronous within Python, an A I/O having a ten times speed up or a hundred times speed up sometimes.

00:00 KENNEDY: Yeah, that's a really good point. It's not about the CPU, it's just about leveraging the Asyncio bit, which is so much easier. So, this is kind of a summary of that conversation. I don't think the Language Summit was recorded, I could be wrong, but this is a write up of that presentation. It's kind of nice. it says basically we really need to keep Python performant to be competitive with other languages. But it's not as easy to optimize, as say optimizing C Sharp, Java or C because of the the boundary that the C API brings. Basically, there's a lot of ways of working that you are forced to follow in Python to keep the C API working. And the C API is actually a really important part of the Python performance story, right?

00:00 OKKEN: Yes.

00:00 KENNEDY: If you're going to use NumPy, that's super fast. But NumPy is basically mostly written in C, so you can't break that because you might make the Python code go faster but you're going to lose the ability to do the C stuff. So, that's pretty interesting.

00:00 They say it's great to compare Python 3 to Python 2 and say, ‘Oh look, it's much faster by most benchmarks’ but what you really need to do is compare it against modern languages. Not languages from the year 2000. So, let's try to work on this. There was some talk about the JIT implementation. We've got PyPy, which is five times faster, but is not very compatible. Mostly because of the C API, but also some other things, I think. There's Pyjion, done by Dino Viehland and Brett Cannon at Microsoft. That's actually a really interesting thing to bring JIT compilation to proper standard C-Python, not yet another fork of it. That's pretty interesting.

00:00 The final thing that someone proposed there was, ‘Is there a way to use the type hints and type annotations that appear in Python 3 to make a slight variation of Cython, which compiles to C and lets you write code that’s closer to regular Python and leverage those type hints. Basically, in Cython you have to say what the types are, but you would do that anyway if you have the type hints in there. So, there's a lot of interesting stuff just brewing for the future there.

00:00 OKKEN: That's kind of a really interesting idea. I like that. If you've got a whole huge data set and it's not going to change, it's going to be a fixed data type and your declaring it with type hints any way, having the language be able to take advantage of that, and just behind the scenes just Cython-ize is it or something, that would be slick. I would love that.

00:00 KENNEDY: It would actually be pretty darn cool, wouldn’t it? Yeah, we'll see. Could you in C or C++, you could have like inline assembler, right? You say, ‘This little bit, these five lines, this is assembly code but we need this’ or you can like inline methods. It would be cool if you could say, ‘Here with in my regular Python code this one function where this is the thing we do all the time, this one or two functions,’ you do an @Cython on it and it just goes. That would be cool.

00:00 OKKEN: Yeah it would.

00:00 KENNEDY: This is the future I want to see.

00:00 OKKEN: Definitely.

00:00 KENNEDY: Alright, so that would be a quick and dirty solution to make it faster, if I could just put an @Cython on things.

00:00 OKKEN: Yeah, man. I have a hard time not laughing when we do these transitions.

00:00 KENNEDY: They're so bad. We should take one episode and just be like, ‘What's the worst possible thing we can do?’ (Laughs)

00:00 OKKEN: The next starter is, “PyPI Quick and Dirty” and it's by Hynek Schlawak. I met him at PyCon, his hand and told him I love what he's doing, and he said, ‘Oh, you're the guy that always mispronounces my name on podcasts.’ Anyway, sorry, Hynek.

00:00 This is an awesome article. We've talked about packaging before on the podcast but this is a really good quick write up of how to package your code and get it ready, and put it up on PyPI with just a little bit of history, not too much of the background. Just, how do you do it today, this is how you do it today. It’s opinionated because he takes basically what he does for the attrs (or ‘adders’) project and talks about doing that. So, that's pretty much what it is, it's about distribution.

00:00 KENNEDY: Yeah, that's cool. I love the subtitle, ‘A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.’ It’s beautiful.

00:00 OKKEN: I know that for some people it might be a little bit frustrating that we as a community, we're not done. This is probably not the final solution for packaging. It's still being worked on. People are still coming up with ideas for how to maybe make this easier and it's pretty darn easy now.

00:00 KENNEDY: Yeah, it is not too bad. I put something up on PyPI before and I was like, ‘Really, that's it? That's actually pretty darn easy.’ So, basically, I think, the challenge here is actually creating the package, not getting it on PyPI. Once you've got the package, getting it on PyPI is actually a few CLI argument commands. You basically have to have an account set up, like a profile file that has your info in it. But other than that, you're kind of done. So, the more we can make packaging easy and obvious, the better.

00:00 OKKEN: Some of the differences between getting a package ready for sharing within just a local group at work or something, and getting it ready for PyPI, a lot of it is just getting all the metadata there that it's nice to have for distributions. One of the confusions as well, I think, is the word package because that really has two meanings. In Python, a package can be just a directory within a nit.py file, but it also is a distribution because the PyPI is not the Python distribution index, it's the package index. So, there's a little bit of confusion there.

00:00 KENNEDY: That’s for sure. Luckily consuming them is all nice and easy.

00:00 The next thing that I want to cover is basically a set of example algorithms. Especially if you're looking for a new job or you're going to do an interview, but also if you're coming from another language, I think it's helpful to study algorithms in simple forms. So, imagine you're super good at Java, and you know how to do, say like a depth-first tree traversal in Java. How do I do this in Python, right? Is it simpler? Is it harder?

00:00 So, there's this GitHub repository that's a set of, “Minimal Examples of Data Structures and Algorithms in Python” and there are many of them here. The GitHub repo is just algorithms so for a name, but it's all Python. You look at them and it's like, ‘Here's how you do the greatest common denominator computation in Python and these are the 6 lines of Python you write,’ ‘Here's how you a reverse a linked list,’ ‘Here’s how you would do a binary search’ and things like that. So, regardless if you are looking for a new job, if you're trying to compare one implementation in another language to Python, to Pythonic style, there's a lot of cool stuff going on in this.

00:00 OKKEN: This is actually pretty cool. When I saw this at first, I sort of dismissed it as just interview material, but there's some decent things in here, like rotating an image, doing subsets that I would definitely know how to do coming from a different language there, like in C++. But yeah, this is good. I like it.

00:00 KENNEDY: It's pretty cool, right? Yet to me, I think, you could try to solve this yourself and then compare your solution against what's here. I feel like if I did that, I'd have similar experience with what I did with py.CheckIO, their Python stuff. That's that game, that Python game, where you conquer islands by writing Python code, which is interesting. But then you can view other people's solutions to the steps in the games. I realized that I have a particular style that is different than other people’s style. In some ways theirs is better, in some ways mine is better, but I think you would also get the same experience here for algorithms.

00:00 OKKEN: Yeah, definitely. Also, sometimes when you just need to be able to do something for work, you don't want to come up with your own solution. I just want, ‘How do I do this in Python?’

00:00 KENNEDY: Exactly. ‘Just show me.’ Yeah, so that's cool and you know it's an Open Source project, so if you actually want to contribute back, you can look at it and go, this is good but actually you could write a more Pythonic implementation of particular algorithm. You could contribute back to that, right?

00:00 OKKEN: Yeah, but what if you don't have time? (Laughs) This is one of those great transitions, folks.

00:00 KENNEDY: There's a lot of ways you can still contribute to Open Source if you don't have time.

00:00 OKKEN: I've talked with a lot of people about Open Source contributions. There's times in your life where you've got more time to devote to something and then to Open Source, and then things happen. Like a new job or a change in your job or maybe a baby or something happens where you don't have as much time. There's ways to stay involved.

00:00 There's a nice article called, “Eight Ways to Contribute to Open Source When You Have No Time”. I think people forget that when they're used to contributing code, there's other ways to contribute to make a project successful. He lists a handful of them, like bug triaging, going through the defect reports and our bug reports and trying to figure out that adding detail or asking for more detail or cleaning those up, that's a lot of things you can do if you've got a few minutes.

00:00 KENNEDY: I think that's great because one of the things that, for me, is a big red flag for Open Source projects is, if i go there and there's a ton of unanswered bugs.

00:00 OKKEN: Yeah.

00:00 KENNEDY: Not like there's a conversation, they haven’t been closed necessarily, but they're not even responded to. And even worse is pull requests. People have taken the time to spend an afternoon and write some new feature, and the people can’t even be bothered to say, ‘No, this is not good’ or ‘It's good.’ That to me seems like a real red flag on these. So, this is a way to keep these projects healthy, I think. Just jumping in and helping out with that kind of stuff.

00:00 OKKEN: Yeah. Along those same lines, are mailing list supporters. If there's a mailing list around the project, be one of the people that answers some of the newby questions. That's a huge help to people running the project. Documentation patches; I don't know an Open Source project that doesn't have documentation holes and things that could be cleaned up with their documentation.

00:00 KENNEDY: Sure. Well, there is a big tension in taking new things. For example, there might be a pull request that says, ‘I want to change the way this works’ and it might be super simple to change one thing about it, but it might have so many knock-on effects into little areas that are problematic. So, for example, you might want to change the way you start some new project, but if even the steps are self-describing that happen as you run some little scaffolding thing, if that changes then, you've got to go change all the documentation. You’ve got to go change all the samples. All that stuff is like friction to prevent people in accepting pull requests. So, if you could help reduce that friction, that would be good.

00:00 OKKEN: I didn’t even think about that. You could help the person having a pull request, you could work on their branch as well, and say, ‘Hey, we need to add documentation changes to this before it gets pulled in.

00:00 KENNEDY: Sure.

00:00 OKKEN: Then my favorite actually, these are all great, but there's a bullet here for marketing. Talking about your project on community or social media or blogging or podcasting about about your favorite Open Source project.

00:00 KENNEDY: Yeah, that’s cool.

00:00 OKKEN: That's near and dear to my heart because I've been doing that with with Pytest on Testing Code and on the blog, trying to promote what I think is the best testing platform on the planet. But it wasn't really viewed as that before I got started, so I don't know. I doubt I'm the only person to take credit for that, but I think I helped a little bit.

00:00 KENNEDY: Well, and you've taken it to a very extreme level by writing a whole book.

00:00 OKKEN: (Laughs) Yeah. That’s not even listed in here. You can write a book about your project.

00:00 KENNEDY: That's actually a good point. You can spread the word and education by writing blog posts, but you could also do video tutorials. You could do online courses about an Open Source project. You can write a book about it. Marketing is really actually super broad. It could be that the person who created a program it is not really as good or interested in doing that, or even maybe their time is better spent creating features and you could be spreading the word about it. There’s a lot of good way there.

00:00 OKKEN: And then there's a second half of the article that talks about basically, ways to find more time in your life. If you really want to try to find time, there’s a couple of ways. Whether they're realistic or not, the one that amused me is if you are having trouble sleeping, why try sleeping? Just get up and work on your Open Source projects.

00:00 KENNEDY: (Laughs) That’s right. Use it as a sleep aid. One thing think a lot of people can easily do is not watch television. If you are an average person, especially average American, if you are looking to find more time in your life to do things like this, or work on your own projects or whatever, we spend a lot of time on TV. If you don't watch it, you find your evenings all of a sudden have some time for these kinds of things.

00:00 OKKEN: You know, I totally see that point but I also want to have some moderation there. You can cut cold turkey and have a ton of free time, yes, but when I tried to do this and realized that there was an hour a day or something that I was hanging out with my wife that if I didn't do.

00:00 KENNEDY: Yeah.

00:00 OKKEN: So, I would moderate that, and say also just pay attention to how much time you're spending. And if you want to watch a little TV at night, go for it but maybe put a limit on it. Say, you know, when one shows done I'm not going to try to find something else, I'm just going to turn it off and go do something Open Source.

00:00 KENNEDY: Absolutely. So, speaking of Open Source. The last thing I want to cover is a real Open Source success story. We talked about NumPy at the beginning. NumPy is really one of the super foundational building blocks for all the scientific data science side of Python. As we’ve seen and covered in a couple of ways, some of the massive growth, a good portion of the last three or four years of massive growth in Python has to do with data science. So, NumPy is really a core pillar of that whole area, right?

00:00 OKKEN: Yes.

00:00 KENNEDY: So, there's really good news for NumPy. They have just received a $645,000 grant for the next two years to improve NumPy.

00:00 OKKEN: That's very exciting.

00:00 KENNEDY: That is really great. We had PyPy recently received the $200,000 Mozilla grant and now we have NumPy almost three quarters of a million dollars to make it better. This grant comes from the Moore Foundation and is going through UC Berkeley’s data science program. So, Dr. Nathaniel Smith is sort of shepherding this. Of course, NumPy was started by Travis Oliphant, the continuum, back in 2006 and it is great to see it growing. Just another Open Source success project.

00:00 OKKEN: Definitely. That’s neat.

00:00 KENNEDY: Alright. Very good news. I don't have a whole lot more to say, other than I just want to call it out that here's another great funding coming into Python and Open Source.

00:00 Any more news for you on the book?

00:00 OKKEN: I’m very excited. I've got a little bit of a break because I've got all of the book turned in. It's at the point where it's gone out to a handful of, actually quite a few, technical reviewers who go through it and make sure I didn't make any horrible mistakes or leave out something very crucial. I've got a great team of people set up to do that. Luckily, a lot of the core contributors to Pytest have agreed to help out with that, which is amazing. I’m very humbled by that.

00:00 KENNEDY: That’s awesome.

00:00 OKKEN: It's out of my hands for the most part. I'm on the line for making changes, if anybody comes up with something. These are all pretty picky people, so I probably will have a lot of changes. But then then it's off to being ready to probably ship a physical copy in September or October.

00:00 KENNEDY: That will be cool. You can actually put it on your bookshelf, then you’ll have officially done it. Aright, well, congratulations.

00:00 Not a lot of news on my end to report. I'm just hanging out here in Ireland for a short work trip.

00:00 OKKEN: That's just awesome, man. I wish I was there with you.

00:00 KENNEDY: Yeah, it’s definitely been fun.

00:00 Well, thanks, Brian, as always, for finding all these cool things to share with everyone and everyone thank you for listening.

00:00 OKKEN: Thank you.

00:00 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes and get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbyes.fm and send it our way. We’re always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page