#32: 8 ways to contribute to open source when you have no time

Published Sat, Jul 1, 2017, recorded Thu, Jun 29, 2017

Brian #1: Introducing Dash

UI library for analytical web applications

Michael #2: Keeping Python competitive

Article on LWN, interview with Victor Stinner
He sees a need to improve Python performance in order to keep it competitive with other languages.
Not as easy to optimize as other languages. For one thing, the C API blocks progress in this area
Python 3.7 is as fast as Python 2.7 on most benchmarks, but 2.7 was released in 2010. Users are now comparing Python performance to that of Rust or Go, which had only been recently announced in 2010.
In his opinion, the Python core developers need to find a way to speed Python up by a factor of two in order for it to continue to be successful.
JITs may be part of the answer, notably Pyjion by Dino Viehland and Brett Cannon
An attendee suggested Cython, which does AoT compilation, but its types are not Pythonic. He suggested that it might be possible to use the new type hints and Cython to create something more Pythonic.

Brian #3: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI. - Hynek Schlawack

Michael #4: Minimal examples of data structures and algorithms in Python

Simple algorithmic examples in Python, including
- linked lists
- reversing linked lists
- GCD
- Queues
- Binary search
- depth first search
- many, many more

Brian #5: 8 ways to contribute to open source when you have no time

Michael #6: NumPy receives first ever funding, thanks to Moore Foundation

For the first time ever, NumPy—a core project for the Python scientific computing stack—has received grant funding.
The proposal, “Improving NumPy for Better Data Science” will receive $645,020 from the Moore Foundation over 2 years, with the funding going to UC Berkeley Institute for Data Science.
The principal investigator is Dr. Nathaniel Smith.
The NumPy project was started in 2006 by Travis Oliphant.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:06 This is episode 32, recorded on June 29th, 2017.

00:12 I'm Michael Kennedy.

00:14 And I'm Brian Arkin.

00:15 And we've got a bunch of great stuff lined up for you.

00:17 But first, I just want to say apologies for the slightly off audio on my end.

00:21 I'm not dialing in from the Python Bytes studio in Portland, Oregon.

00:26 I'm actually on the road.

00:27 So Brian and I are doing a little bit different this week.

00:29 Yeah, it's ungodly early at 6 a.m. here.

00:33 I don't know what your problem is.

00:34 It's 2 in the afternoon over here in Ireland.

00:36 What's the love, dude?

00:38 The magic of Skype.

00:39 The magic of Skype.

00:41 We live in the future.

00:43 We just don't really fully appreciate it.

00:44 All right, let's talk about web apps.

00:46 This time, you are the one bringing up a web app.

00:49 Yeah.

00:49 So this is pretty exciting.

00:51 There's a Medium article called Introducing Dash.

00:55 Dash is a reactive web app open source project from Plotly.

01:02 And it looks really exciting.

01:05 The graphics and the plots that you can do on this are kind of amazing.

01:10 And it looks like an interactive, real-time web page with interactive graphs.

01:18 And you hook up input and output and data coming in and out.

01:22 And it's really kind of hard to describe, but people should check it out because it's amazing.

01:27 Yeah, it looks really, really cool.

01:29 And a lot of it is done in Python, right?

01:33 Yeah, so there's Python and Pandas and Flask and React and JSON and all sorts of stuff like that involved to make this stuff work.

01:42 But it ends up being some fairly impressive demos with just a handful of lines of code.

01:47 Yeah, that's super cool.

01:48 So basically, if you're trying to do visualizations with some of the data science tooling,

01:55 you can just make that available on the web, not as pictures, but in a super interactive format, right?

02:01 Which is great.

02:02 Yeah, and they say it's good for data analysis, data exploration, visualization, modeling.

02:09 And they also include instrument control and reporting in what they think is a good application.

02:15 I want to try this for instrument control and visualization myself.

02:20 Oh, yeah, that sounds, that looks really, really cool.

02:22 I kind of feel like I wish I had something to show so I could play with it,

02:26 but I just don't have that much to graph these days.

02:29 I used to do a lot with, like, science, but not in the last 10 years.

02:32 Maybe we could do, like, I don't know, plotting how much traffic our website gets or something.

02:39 Yeah, actually, that would be, that would actually be kind of fun, like, you know,

02:42 bandwidth by country or, you know, downloads over time or who knows?

02:46 We could actually play with that.

02:47 That might be pretty cool.

02:48 And then they include a link in this, but there's a user guide that has a gallery.

02:53 And it looks like it's both, they have pricing up.

02:57 So I think it's both something you can use as a service or yourself with the tool.

03:03 Yeah, cool.

03:04 Yeah, it looks very, very nice.

03:05 Definitely, it will give you that pro touch if you're trying to put graphs on the internet.

03:09 Yeah.

03:09 And you're using Python.

03:10 Especially if you're trying to stay competitive.

03:12 Yeah.

03:13 You know what?

03:14 There was a Python language summit back in the end of May.

03:18 So almost exactly one month ago at the time of this recording.

03:21 And one of the topics that came up was how do we keep Python competitive?

03:28 And this has two angles, right?

03:30 There's basically one angle is how do we keep Python competitive?

03:33 So you don't hear people going, I'm going to rewrite everything in Go or something silly like that.

03:38 Or which seems to be like a meme or something that's happening quite often.

03:44 But also, how do we get people to move from legacy Python to modern Python?

03:49 And there have been a bunch of interesting little features that have been added to Python.

03:55 The asyncio stuff, we've talked a lot about.

03:57 You know, little language touches like cleaner ways to generate dictionaries from sets of dictionaries.

04:02 You know, union sort of thing.

04:04 That kind of stuff.

04:05 But a couple years ago, they really started hitting the drumbeat of, you know what?

04:09 The thing that actually matters the most to people is just flat out performance.

04:13 If we could make Python 3 faster than Python 2, if we could make Python 3 use less memory than Python 2,

04:20 that is going to be a solid reason for these big companies with big code bases to move to Python 3 and really change that equation.

04:28 And so this was sort of a conversation about how do we keep that going at the Language Summit, from what I understand.

04:35 It's not entirely clear how that all goes together.

04:38 I think this was mostly based on a presentation by Victor Steiner.

04:42 He's done a ton of stuff for performance in the last couple versions of Python.

04:47 I think this style of approaching the problem of like, how do we get adoption of Python 3 over Python 2?

04:53 And the decision to say, well, let's focus on performance.

04:56 I think that's actually working.

04:57 Like we saw this to some degree with the Instagram presentation we covered last time, right?

05:02 Yeah.

05:02 So those guys got, I think, 40% less memory usage on their async tier.

05:07 And they got 12% less CPU usage on their web tier.

05:12 And when you talk about companies like Instagram, that's a lot.

05:16 That's a lot of servers.

05:17 Right.

05:19 So that's really nice.

05:20 Yeah.

05:20 Well, and then also just some of the feedback we've gotten about people switching some applications to asynchronous within Python and AIO, having like 10 times speed up or 100 times speed up sometimes.

05:36 Yeah.

05:36 That's a good point.

05:37 That's a really good point.

05:38 It's not about the CPU.

05:39 It's just about like leveraging the asyncio bit, which is so much easier.

05:43 So this is kind of a summary of that conversation.

05:47 Like, I don't think the language submit is recorded.

05:49 I could be wrong.

05:50 But this is a write-up of that presentation.

05:52 So it's kind of nice.

05:53 It says, basically, we really need to keep Python performant to be competitive with other languages.

05:59 But it's not as easy to optimize as, say, like optimizing C# or Java or C because of the boundary that the C API brings.

06:11 Basically, there's a lot of stuff that ways of working that you're forced to follow in Python to keep the C API working.

06:19 And the C API is actually a really important part of the Python performance story, right?

06:23 Yes.

06:23 Yeah.

06:24 Yeah.

06:24 So like if you're going to use NumPy, that's super fast.

06:27 But NumPy basically is just, you know, a C, mostly written in C.

06:31 So you can't break that because you might make the Python code go faster, but you're going to lose the ability to do the C stuff.

06:38 So that's really pretty interesting.

06:40 And they say it's great to compare Python 3 to Python 2 and say, oh, look, it's much faster.

06:47 But by most benchmarks.

06:48 But what you really need to do is compare it against modern languages, you know, not languages from the year 2000.

06:54 All right.

06:55 So let's try to work on this.

06:57 There was some talk about the JIT implementations, right?

07:01 We've got PyPy, which is like five times faster, but is not very compatible because there's mostly because of the C API, but also some other things.

07:10 I think there's Pigeon done by Dino Veland and Brett Cannon at Microsoft.

07:16 And that's actually a really interesting thing to bring JIT compilation to proper standard CPython, not yet another fork of it.

07:24 So that's pretty interesting.

07:25 And the final thing that someone proposed there was like, is there a way to use the type hints and typed annotations that are appearing in Python 3 to make a slight variation of Cython, which compiles to C, that lets you write code that's closer to regular Python.

07:41 And leverage those type hints because it actually would, you know, basically in Cython, you have to say what the types are.

07:47 But you kind of would do that anyway if you have the type hints in there.

07:50 So there's a lot of interesting stuff just brewing, you know, for the future there.

07:53 That's kind of a really interesting idea.

07:55 I like that.

07:56 Like, if you've got a whole, like a huge data set and it's not going to change, it's going to be a fixed data type and you're declaring it with type hints anyway.

08:07 Having the language be able to take advantage of that and just behind the scenes, just Cythonize it or something.

08:15 That would be slick.

08:16 I would love that.

08:18 It would actually be pretty darn cool, wouldn't it?

08:21 Yeah.

08:21 So, yeah, we'll see.

08:23 I mean, to me, I almost see, like, could you in C or C++, you can have, like, inline assembler, right?

08:31 You say this little bit, these five lines, this is assembly code, but, like, we need this.

08:35 Or you can, like, inline methods.

08:36 It would be cool if you could say, here within my regular Python code, this one function where this is the thing we do all the time, this one or two functions, this is, like, you know, you do an at Cython on it and it just goes.

08:48 That'd be cool.

08:49 Yeah, well.

08:49 This is the future I want to see.

08:51 Definitely.

08:51 All right.

08:52 So that'd be a quick and dirty solution to making it faster.

08:55 If I could just put an at Cython on things.

08:57 Yeah.

08:58 And, man, I was just, I have a hard time not laughing when we do these transitions.

09:03 They're so bad.

09:05 We should just, like, take one episode and just see, like, what's the worst possible thing we can do.

09:09 The next article is PyPI Quick and Dirty.

09:12 It's by Hynek.

09:14 And it's, I met him at PyCon.

09:17 I shook his hand and told him I loved what he's doing.

09:20 And he said, oh, you're the guy that always mispronounces my name on podcasts.

09:24 Anyway.

09:25 Sorry, Hynek.

09:27 This is an awesome article.

09:28 We've talked about packaging before on the podcast, but this is a really good quick write-up

09:34 of how to package your code and get it ready and put it up on PyPI.

09:40 With just a little bit of history, not too much of the background.

09:44 Just how do you do it today?

09:45 This is how you do it today.

09:46 It's opinionated because he takes basically what he does for the ATTRS or adders project

09:54 and talks about doing that.

09:57 So that's pretty much what it is.

09:58 It's about distribution.

09:59 Yeah, that's cool.

10:00 I love the subtitle, A Completely Incomplete Guide to Packaging a Python Module and Sharing

10:05 It with the World on PyPI.

10:06 It's beautiful.

10:07 And I know that for some people, it might be a little bit frustrating that we as a community,

10:12 we're not done.

10:13 This is probably not the final solution for packaging.

10:16 It's still being worked on.

10:18 People are still coming up with ideas for how to maybe make this easier.

10:22 And it's pretty darn easy now.

10:25 Yeah, it is not too bad.

10:27 I've put something up on PyPI before and I was like, really, that's it?

10:30 That's actually pretty darn easy.

10:33 So basically, I think the challenge here is actually creating the package, not getting it

10:40 on PyPI.

10:41 Like once you've got the package, getting it on PyPI is actually like a few CLI argument

10:45 commands.

10:46 And you basically have to have an account and set up like a profile file that has your info

10:50 in it.

10:51 But other than that, you're kind of done.

10:53 So yeah, the more we can make packaging easy and obvious, the better.

10:57 And then some of the differences between getting a package ready for sharing within just a local

11:03 group at work or something and getting it ready for PyPI, a lot of it is just getting all the

11:10 metadata there that it's nice to have for distributions.

11:14 One of the confusions as well, I think, is the word package because that really has two

11:21 meanings.

11:22 In Python, a package can be just a directory with an init.py file.

11:27 But it also is a distribution because the PyPI is not the Python distribution index.

11:34 It's the package index.

11:36 So there's a little bit of confusion there.

11:38 Yeah, that's for sure.

11:40 That's for sure.

11:40 Luckily, consuming them is all nice and easy.

11:43 The next thing that I want to cover is basically a set of example algorithms.

11:50 Especially if you're looking for a new job or you're going to do an interview.

11:54 But also if you're coming from another language, I think it's helpful to study algorithms in

11:58 like simple forms.

12:00 So imagine like you're super good at Java and you know how to do, say, like a depth first

12:05 tree traversal in Java.

12:08 How do I do this in Python?

12:10 Right.

12:10 Is it simpler?

12:11 Is it harder?

12:12 Whatever.

12:13 Right.

12:13 So there's this GitHub repository that's a minimal, a set of minimal examples of data structures

12:19 and algorithms in Python.

12:21 And there are many of them here.

12:23 The GitHub repo is just algorithms.

12:25 So for her name.

12:27 But it's all Python.

12:28 And you look at them and it's like, here's how you create the, how you would do a greatest

12:32 common denominator computation in Python.

12:35 And these are like the six lines of Python you write.

12:37 Here's how you reverse a linked list.

12:39 Here's how you would do a binary search and things like that.

12:43 And so regardless, if you're looking for a new job, if you're trying to compare one implementation

12:48 of another language to Python, to the Pythonic style, like there's a lot of cool stuff going

12:53 on in this.

12:53 This is actually pretty cool.

12:54 When I saw this at first, I sort of dismissed it as, you know, just interview material.

13:00 But there's some decent things in here like rotating an image, doing subsets that I would

13:07 definitely know how to do coming from a different language there, like in C++.

13:12 But yeah, this is good.

13:14 I like it.

13:14 It's pretty cool, right?

13:15 Yeah.

13:16 To me, I think this is, you could like try to solve this yourself and then compare that

13:20 against, you compare your solution against what's here.

13:23 I feel like if I did that, I'd have similar experience to what I did with PyCheck.io, their

13:29 Python stuff.

13:30 So that's kind of that game, that Python game.

13:32 And you like conquer islands by writing Python code, which is interesting.

13:36 But then you can view other people's solutions to the steps in the games.

13:40 And I realized like I have a particular style that's different than other people's style.

13:44 And some ways there's better, some ways mine's better.

13:46 But I think you would also get the same experience here for algorithms.

13:49 Yeah, definitely.

13:49 And also sometimes when you just need to be able to do something for a work, you don't

13:55 want to come up with your own solution.

13:57 I just want, how do I do this in Python?

14:00 Exactly.

14:00 Just show me.

14:01 That's great.

14:02 Yeah.

14:03 So that's cool.

14:03 And you know, it's an open source project.

14:05 So if you actually want to contribute back, you look at it, you're like, oh, this is good.

14:09 But actually, you could write a more Pythonic implementation of a particular algorithm.

14:12 You could contribute back to that, right?

14:14 Yeah.

14:15 But what if you don't have time?

14:18 This is one of those great transitions, folks.

14:20 There's a lot of ways you could still contribute to open source if you don't have time.

14:25 And I think there's a lot of people, especially, I've talked with a lot of people about open

14:29 source contributions.

14:30 And there's times in your life where you've got more time to devote to something and then

14:36 to open source.

14:37 And then things happen like a new job or a change in your job or maybe a baby or something

14:43 happens where you don't have as much time.

14:45 And there's ways to stay involved.

14:47 There's a nice article called Eight Ways to Contribute to Open Source When You Have

14:52 No Time.

14:53 I think people forget that there is, when they're used to contributing code, there's other ways

14:58 to contribute to make a project successful.

15:00 And he lists a handful of them like bug triaging, like going through the defect reports or bug

15:09 reports and trying to figure out adding detail or asking for more detail.

15:15 cleaning those up.

15:16 That's a lot of things you can do with just if you've got a few minutes.

15:19 I think that's great because one of the things that to me is a big red flag for open source

15:24 projects is if I go there and there's a ton of unanswered bugs.

15:31 Yeah.

15:31 Not like there's a conversation that haven't been closed necessarily, but they're like not

15:35 even responded to.

15:36 And even worse is pull requests.

15:38 Like people have taken the time to like spend an afternoon and write some new feature and the

15:43 people can't even be bothered to say, no, this is not good or it's good.

15:46 Like it's, that's to me seems like a real red flag on these.

15:49 So like this is a way to keep these projects healthy.

15:52 I think you just jumping in and helping out with that kind of stuff.

15:56 Yeah.

15:56 And then there's along those same lines is mailing list support.

16:00 If there's a mailing list around the project, be one of the people that answers some of the

16:04 newbie questions.

16:05 That's huge help to people running the project.

16:09 Documentation patches.

16:10 Every, I don't know of an open source project that doesn't have documentation holes and things

16:16 that could be cleaned up with their documentation.

16:18 Sure.

16:18 Well, and there's a big tension in taking new things.

16:22 Like, so for example, if you, there might be a pull request that says, I want to change

16:26 the way this works.

16:27 And it might be like super simple to change one thing about it, but it might have like so

16:34 many knock on effects into little areas, but that are like problematic.

16:38 So for example, you might want to change the way you start some new project.

16:42 But if even like the steps are self-describing that happen as you like run some little like

16:48 scaffolding thing, if that changes, then you've got to go change all the documentation.

16:51 You've got to go change all the samples.

16:53 You've got to just like, all that stuff is like friction to prevent people from accepting

16:59 pull requests.

17:00 And so if you could help reduce that friction, that'd be good.

17:02 I didn't even think about that.

17:03 You could help the person doing the, having a pull request.

17:07 You could work on their branch as well and say, Hey, this, we need to add documentation

17:12 changes to this before it gets pulled in.

17:14 Yeah, for sure.

17:14 And then my favorite, actually, these are all great, but there's a bullet here for marketing

17:20 talking about your project on community or social media or blogging or podcasting about, about

17:27 your favorite open source project.

17:29 Yeah, that's cool.

17:30 That's near and dear to my heart because I, I've been doing that with, with pytest on

17:34 testing code and on the blog, trying to promote what I think is the best testing platform on

17:42 the planet, but it wasn't really viewed as that before I got started.

17:46 So, I don't know if I, I doubt I'm the only person to take credit for that, but I think

17:51 I helped a little bit.

17:52 So, well, and you've taken it to a very extreme level by writing a whole book.

17:55 Yeah.

17:57 Oh yeah.

17:58 It doesn't, that's not even listed in here is you could write a book about your project.

18:02 Yeah.

18:03 That's actually a good point.

18:04 Like you can spread the word and education about it by writing blog posts, but you could

18:10 also do video tutorials.

18:12 You could do online courses about an open source project.

18:15 You could write a book about it.

18:16 There's like, like marketing is like really actually super broad and it could be that the

18:20 person who's great at programming is not really as good or interested in doing that, or even

18:26 maybe just their time is better spent like creating features and you could be spreading the word

18:30 about it.

18:31 There's a lot of good ways there.

18:32 And then there's a second half of the article that talks about basically ways to find more

18:37 time in your life.

18:37 If you really want to try to find time, here's a couple ways, which whether they're realistic

18:42 or not, the one that amused me is if you're having trouble sleeping, why try sleeping?

18:49 Just get up and work on your open source projects.

18:53 That's right.

18:54 Use it as a sleep aid.

18:56 You know, one of the things I think you can easily, a lot of people can easily do is not

19:01 watch television.

19:02 If you're an average person, especially average American, if you're looking to find more ways

19:08 to, more time in your life to do things like this or work on your own projects or whatever,

19:14 we spend a lot of time on TV.

19:16 And if you don't watch it, you find your evenings all of a sudden have some time for these kinds of things.

19:20 And you know, I totally see that point.

19:22 But I also want to have some moderation there.

19:25 You can cut cold turkey and have a ton of free time.

19:28 Yes.

19:29 But when I tried to do this and realized that was also like an hour a day or something that

19:35 I was hanging out with my wife, that if I didn't do that.

19:38 Yeah.

19:38 So I would moderate that and say, also, just pay attention to how much time you're spending.

19:43 And if you want to watch a little TV at night, go for it.

19:46 But maybe put a limit on it to say, you know, when one show's done, I'm not going to try to

19:50 find something else.

19:51 I'm just going to turn it off and go do something.

19:53 Open source.

19:54 Yeah.

19:55 Absolutely.

19:57 Sounds good.

19:58 All right.

19:58 So speaking of open source, the last thing I want to cover for us is a real open source

20:02 success story.

20:03 And we talked about NumPy at the beginning.

20:06 NumPy is really one of the super foundational building blocks for all the scientific data

20:11 science side of Python.

20:12 As we've seen and covered in a couple of ways, like some of the massive growth, a good portion

20:18 of the last three or four years of massive growth in Python has to do with data science.

20:23 So NumPy is like really a core pillar of that whole area, right?

20:28 Yes.

20:29 So there's really good news for NumPy.

20:33 They have just received a $645,000 grant for the next two years to improve NumPy.

20:40 That's very exciting.

20:42 That is really great.

20:43 We had PyPy recently received the $200,000 Mozilla grant.

20:47 And now we have NumPy getting almost three quarters of a million dollars to make it better.

20:51 So this grant comes from the Moore Foundation and is going through UC Berkeley's data science

20:58 program.

20:59 So Dr. Nathaniel Smith is like sort of shepherding this.

21:04 You know, of course, NumPy was started by Travis Oliphant, the Continuum, back in 2006.

21:08 And it's great to see it growing.

21:10 So just another open source success project.

21:12 Yeah, definitely.

21:13 That's neat.

21:14 All right.

21:14 Very good news.

21:15 I don't want to, you know, I don't have a whole lot more to say other than I just want to call

21:18 it out that, you know, here's another great funding coming into Python and open source.

21:23 Any more news for you on the book?

21:25 I'm very excited that it's, I've got a little bit of a break because I've got all of the

21:30 book turned in and it's at the point where it's gone out to a handful of actually quite

21:37 a few technical reviewers, make, go through it and make sure I didn't make any horrible

21:42 mistakes or leave out something very crucial.

21:44 And I've got a great team of people set up to do that.

21:48 Luckily, the, the, actually a lot of the core contributors to pytest have agreed to help

21:54 out with that, which is amazing.

21:55 Very humbled by that.

21:57 That's awesome.

21:57 And then, yeah, then it's out of my hands for the most part.

22:01 I'm, I'm on the line for making changes.

22:04 If anybody comes up with something, these are all pretty picky people.

22:08 So I probably will have a lot of changes, but then, then it's, it's off to being ready

22:13 to probably ship a physical copy September or October.

22:17 That'd be cool.

22:17 You can actually put it on your bookshelf and then you'll have officially done it.

22:22 Yeah.

22:22 So that's awesome.

22:24 All right.

22:24 Well, congratulations.

22:25 Not, not a lot of news on my end to report.

22:28 I'm just hanging out here in Ireland for a short work trip.

22:31 That's just awesome, man.

22:33 I wish I was there with you.

22:34 Yeah.

22:34 I've been fun.

22:35 Definitely been fun.

22:36 So, all right.

22:37 Well, thanks Brian, as always for finding all these cool things to share with everyone

22:41 and everyone.

22:42 Thank you for listening.

22:43 Thank you.

22:45 Thank you for listening to Python Bytes.

22:46 Follow the show on Twitter via at Python Bytes.

22:49 That's Python Bytes as in B-Y-T-E-S.

22:52 And get the full show notes at Pythonbytes.fm.

22:55 If you have a news item you want featured, just visit Pythonbytes.fm and send it our way.

23:00 We're always on the lookout for sharing something cool.

23:02 On behalf of myself and Brian Okken, this is Michael Kennedy.

23:06 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript