Transcript #41: Python Concurrency From the Ground Up and Caring for our Community
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 41, recorded August 30th, 2017.
00:09 I'm Michael Kennedy, and yes, I'm back from vacation.
00:13 Thank you, Brian Aachen, for covering and all of our guest co-hosts.
00:17 And it's time to immediately start repaying Brian for his keeping things rolling while I was gone.
00:23 He's in Germany for some work business, and we have a new guest co-host.
00:30 Welcome, Miguel Greenberg.
00:31 Hello, and thank you for having me, Michael. Happy to be here.
00:35 Yeah, I'm happy to have you here. It's going to be fun to talk about the items you've chosen for the week and what I've got as well.
00:41 So let's just kick it off first by saying thank you to Rollbar, who's bringing you this episode.
00:47 So check them out at pythonbytes.fm/rollbar and get some good monitoring for your app.
00:52 We'll talk more about that later.
00:53 But let's jump to your first item, lolviz.
00:56 I would like to call it lolviz.
00:58 But actually, this is not anything to laugh about.
01:02 So this is a Python package that generates graphical representations of very commonly used Python data structures.
01:10 So far, they support five different structures.
01:15 And one of them is a list of lists, which gives the lol name to the package.
01:21 But they also support dictionaries, lists, linked lists, and binary trees.
01:25 And what they do is basically they use GraphBiz, which you need to install on your machine to generate these graphical representations.
01:35 And one of the coolest things is that it integrates with Jupyter.
01:39 So if you're doing this from a notebook in Jupyter, then it'll render these graphical representations right in your prompt.
01:48 So it's super cool, especially, I imagine, for you and for me, people who used to teach Python.
01:55 I'm having tons of ideas to use this myself.
01:59 Yeah, so am I.
02:00 I'm like, this can definitely be used to help people who are new to these data structures and new to these ideas.
02:05 And honestly, new to the concept of pointers and reference types at all.
02:10 I think it's really a great way to learn it.
02:12 The linked list one is particularly interesting to help people who are not familiar with pointers to visualize how that works.
02:21 One thing I need to mention is that if you try to pip install this in Python 3 today, it's going to fail.
02:28 I submitted a bug and the author told me that a fix is coming pretty soon with additional features.
02:34 So I'm glad to hear that he's responsive and that he's working on making more improvements.
02:40 Yeah, this thing came out really recently.
02:41 I don't remember how old it is, but it's all quite new, like a couple of days.
02:46 And it's already got 200 stars on GitHub.
02:49 And it looks like it's going well.
02:50 Yeah, it's going pretty well.
02:52 Yeah, I see a bright future for it.
02:54 I do as well.
02:54 So if you're out there teaching Python and you're trying to help people understand pointers, data structures,
03:00 and even if you're not teaching it, I think looking at some of the different representations of these data structures that it draws is probably helpful for people
03:09 who just haven't really thought a lot about it as well.
03:11 So definitely worth checking out.
03:13 All right.
03:14 So moving from visualizing different data structures to transforming data structures,
03:19 the thing I want to cover first this week is Odo.
03:22 So this is a user or listener recommendation, as was LOLViz.
03:27 So Odo migrates data between all these different formats.
03:32 And it knows how to read, let's say, a Panda data frame and then convert that to a MongoDB table or collection and save it just like that.
03:41 And it's literally one line like, Odo, here's your data frame.
03:45 Go to that database.
03:46 Or here's your Postgres.
03:48 Go to CSV or reverse.
03:49 So if you find yourself pulling in data from one source and trying to convert it or save it more or less in the same shape over to another source,
03:58 then check out this thing called Odo.
04:00 I wonder if it takes a list of lists, an LOL, and writes a CSV file.
04:07 That would be something that I could find useful tons of times.
04:11 Yeah, absolutely.
04:12 That would be really, really cool.
04:14 And it does take lists and transform them.
04:16 It only does, right.
04:17 So I'm not entirely sure how flexible it is in that regard.
04:21 But I think there's also extensions.
04:22 So you can write extensions.
04:23 So just to give you the rundown, let's see.
04:25 It'll work with Panda data frames with a list with JSON files, CSV files, Postgres.
04:32 Yeah, so you could take a CSV file, load it into Postgres or Postgres into JSON.
04:37 Or it even works with converting into MongoDB, like I said, so like Pandas to Mongo or reverse.
04:42 So not a whole lot more to it than that, but it seems really handy if that's the problem you're trying to solve.
04:48 Yeah, very nice.
04:49 Yeah, so tell us about concurrency.
04:51 You chose this item, and it's not exactly new, but it's certainly something we haven't covered and is really amazing.
04:57 And I agree with you that this is one of the really interesting things to cover in concurrency.
05:03 Right.
05:04 So this is a PyCon talk from a couple years ago from the one and only Dave Beasley.
05:09 I mean, who else, right?
05:12 And the interesting thing, as a speaker, I find it interesting, not only the content, but also the way he presented this talk.
05:20 This is a live coding session.
05:22 The entire thing, it's Dave's terminal.
05:26 There are no slides, and he's speaking while coding and starts from an empty file, actually codes everything in the talk.
05:33 I think he just fires up Emacs and says, let's do this, right?
05:36 Right, yeah.
05:37 Just two or three terminal sessions, one with Emacs and the other two with Bash, and it's done all there.
05:44 And he goes to cover concurrency, the different ways he can do concurrency with threads, with processes, shows all the problems with both approaches, how the global interpreter log messes, you know, all this and complicates things.
05:59 And then in the second portion of the talk, he goes and builds an asynchronous framework, pretty much like asyncio, a small version of it, a minimal version, using Python generators without any other additional libraries, all in Python standard code.
06:16 So it's pretty amazing, and it's only 45 minutes.
06:20 The amount of knowledge that's in those 45 minutes, it's unbelievable.
06:24 Yeah, and I really love the style.
06:25 Like, well done, David.
06:26 So certainly the style of, we're going to build this up from scratch.
06:30 I'm not going to just show you a bunch of slides and talk about it, but I'm going to just show you how it's built.
06:35 It really makes it feel accessible.
06:36 You're like, well, if he could literally do it from scratch in 45 minutes.
06:39 Like, I saw everything that went into it.
06:41 It was pretty understandable.
06:42 It really is well done.
06:45 Yeah, if people are, you know, thinking about Python concurrency or generators or asyncio and all these things.
06:50 It's actually even good for networking because he builds a little server.
06:55 He doesn't even use, you know, nothing like Flask or Django or anything.
06:58 He builds a little server using the, he calls it the socket framework, just using network sockets.
07:04 Yeah, that was like part of the demo.
07:06 It was just like part of it, right?
07:08 Right.
07:08 Yeah, it's super awesome.
07:10 Yeah.
07:10 Yeah, so certainly this is, if this sounds interesting to you, be sure to watch that video.
07:14 It's on YouTube.
07:15 We'll link it to it in the show notes.
07:16 And it's, you'll learn a bunch, especially conceptually what's going on in all this asyncio stuff.
07:21 Super cool.
07:22 So before we move on to the next segment though, let me tell you about the sponsor for the show,
07:26 Rollbar.
07:27 So Rollbar lets you basically just type pip install Rollbar, type a few commands either in your config file or encode in your web app, and it will continuously monitor your site, your web application for any sorts of errors.
07:43 And not just tell you something happened, but capture all the details.
07:46 What was the logged in user when they ran into an error?
07:50 So who is your customer who's having this problem?
07:52 What's the call stack?
07:54 What other errors have you experienced like this?
07:56 What are the local variables passed to the function when it failed?
07:59 Like you can probably fix the error without even debugging it or running your code.
08:03 You just look and go, oh, I see what's wrong.
08:05 So I use Rollbar a lot.
08:06 Love it.
08:07 If you want to check it out, check it out at pythonbytes.fm/Rollbar.
08:11 All right.
08:12 So next up, I want to talk about some optimizations.
08:16 This concurrency stuff that you brought up is certainly a sort of a form of optimization, but this is kind of the future trying to push CPython, the main default Python implementation, further.
08:29 And this is an article on Medium by a friend of the show, Anthony Shaw, covering work that Victor Stinner has done.
08:37 And Victor Stinner, have you been following what he's doing?
08:39 He's like killing it on performance.
08:40 He is.
08:41 Yes, absolutely.
08:42 I've been to his talk this past PyCon as well.
08:46 Yeah.
08:47 So there's just so many things that he's doing that are amazing.
08:50 And he did give some good docs at PyCon as well.
08:53 So this project is called Fat Python, the next chapter in Python optimization.
08:57 So like I said, article by Anthony Shaw.
08:59 And he basically highlights this Fat Python project that was started by Victor Stinner back in 2015.
09:06 And the idea was, let's try to make it possible to apply better static optimizers for Python.
09:14 So one of the big challenges that you have with optimizing Python is it's super dynamic.
09:20 You can't necessarily just look at the code and say, well, it has this structure.
09:24 So we're going to change it around because you could go and dynamically add methods, functions, variables, whatever.
09:31 Right?
09:31 Yeah.
09:32 So that makes it a big challenge.
09:35 So there's actually three PEPs, Python enhancement proposals, that chain together to try to make things a little bit better that Victor is working on as part of this project.
09:44 So one is PEP 511, which is a proposal to add a process to optimize ASTs.
09:51 So ASTs or abstract syntax trees are basically what Python pulls up before it starts to execute your code, right?
09:59 It can pull up and...
10:01 It's basically a, right, a tree representation of the code that you write.
10:05 Right.
10:05 In a form that's easier to be interpreted.
10:08 Right.
10:08 Like an object-oriented representation of what your code's going to do, not just the text.
10:12 The idea is it's possible, it could be possible to have an optimizer look at that AST and say, okay, this looks like Panda's code.
10:22 And you're applying this, you're doing this particular anti-pattern that is slow.
10:26 So maybe we could change things around behind the scenes.
10:29 You don't even know it to optimize or to fix that, right?
10:33 Maybe people run linters that say, you're doing this thing that's not amazing.
10:36 What if we could have an optimizer that would just make it fast?
10:39 Right.
10:40 It would just make the change for you.
10:41 Exactly.
10:42 The proposal is to basically create some kind of hook for creating these optimizers.
10:47 And this might be built into CPython.
10:49 It might even be something you pip install, like pip install the NumPy optimizer or whatever for my runtime, which is really, really cool.
10:56 So that also brings us to PEP 509.
10:59 Like I said, this makes it really hard to optimize because everything is mutable at runtime.
11:04 And there's these things called guards that verify like the last time it's sort of processed this bit of the structure that it hasn't changed.
11:13 PEP 509 is a process to add a private version of dictionaries that implement a different type of guards that are much faster.
11:21 We have 5.10, which proposes a public API to CPython for specialized code with guards for a function.
11:29 So the idea is you put it all together.
11:31 You take this optimizer.
11:31 It generates a new high-performance version.
11:34 It replaces the code that would normally run with this optimized version.
11:40 And as long as it doesn't change, it can run that optimized code or fallback.
11:45 So taking together all three of these create this fat Python thing, which is really great.
11:50 So you can download this and run it.
11:51 You have to compile CPython, like a special branch of it, to make it work.
11:55 But the article talks about it.
11:56 But the results are really good.
11:58 So, for example, a basic function that just you call it returns a string, 24% faster than Python 3.6.
12:04 I was skimming through the article and the PEPs, and I could not find for this implementation what are the gotchas, if there are any.
12:14 It doesn't look like there are, which is great.
12:16 But I was wondering if any code can run or...
12:20 I think it would...
12:21 I guess the gotchas would probably be like, can the optimizer deal with it?
12:26 And are you 100% sure the optimizer is not going to make a change that has a behavioral change, right?
12:32 Right.
12:32 Right.
12:32 Yeah.
12:34 That was what I was looking for.
12:36 I mean, are there any cases where it can make a mistake at this point?
12:39 Yeah.
12:39 I think this is basically cracking the system open so that these things could be plugged in.
12:44 But then you'd probably have to look at the gotchas of the optimizers.
12:47 You know what I mean?
12:47 Right.
12:48 Yeah.
12:48 So it was pretty good.
12:49 Like a 24% improvement in function call speed.
12:51 And that's 46% faster than Python 2.7.
12:54 Like, that is a big deal.
12:56 One of the big drawbacks of Python is function calls are really slow.
13:00 And so people don't necessarily break their code into as small functions as they should.
13:05 And so this might really alleviate some of that.
13:07 I think it's great.
13:08 Yeah.
13:08 Yeah.
13:09 Awesome project.
13:10 Yeah.
13:10 Keep up the good work, Victor.
13:11 All right.
13:12 So suppose I'm sitting at a coffee shop.
13:14 I'm sure it's fine if I just, you know, go log into my bank.
13:17 Right.
13:18 Things like that.
13:19 Right.
13:20 Yeah.
13:21 So, I mean, we all hear that it is insecure to go online on a coffee shop or hotel, airport,
13:29 you know, all of those.
13:30 Yeah.
13:30 Hotels and airports scare me way more, especially hotels these days.
13:33 I don't know why.
13:34 Right.
13:34 So, you know, very few people really understand, you know, what's the problem.
13:38 You may think that if you only access sites that are HTTPS, so secure site that you'll find,
13:45 and you're really not completely.
13:48 All the content that you transfer to this site is going to be encrypted.
13:53 But there are other things that happen before you get to a connection that are not encrypted.
13:58 Like, for example, DNS searches.
13:59 So if you're in a coffee shop getting to a lot of sites, it's very easy for someone on the same network to find out what sites you visit,
14:08 even though they cannot see what, you know, what data you transfer to them.
14:12 Right.
14:12 And if somehow they happen to be able to alter that DNS, which probably has no verification because it's unencrypted,
14:17 then they send you to their Google.com.
14:20 They can send you to some other place.
14:21 Right.
14:22 Exactly.
14:23 So, you know, it's very important that you take security when you go on a public Wi-Fi spot, you know, very seriously.
14:30 And I wanted to mention this tool that so many times I talk to people and I mention it and they don't know it.
14:36 And it's super great.
14:38 It's called S-Shuttle.
14:39 And typically the solution that you're told is that you need to pay for a VPN service.
14:45 And S-Shuttle, it proclaims itself as the poor man's VPN.
14:51 And a nice advantage it has over regular VPNs is that it doesn't require any software to be installed on the remote machine,
14:59 the secure machine that you use for your connection.
15:03 So, the way it works is basically you need to have SSH access into a secure machine.
15:08 And in my case, I have it right here in my home.
15:10 I have a little, this is a chip machine that is $9 computers.
15:15 It could be a Raspberry Pi, you know, any of those.
15:19 And the only thing you need there is SSH, right?
15:22 So, you get it by default if you install a Linux distribution.
15:25 And then from anywhere you are, you use S-Shuttle to create a secure encrypted tunnel into this secure machine that you have in your home.
15:34 And then everything that you do goes through the tunnel and then it's forwarded into this secure machine.
15:41 And then it happens on your secure machine, including DNS searches.
15:46 So, that's really cool.
15:47 So, if I run a shuttle and I go to like gmail.com, it will actually funnel the traffic through, say, my Raspberry Pi?
15:55 Right.
15:55 On the coffee shop, there's only going to be a connection to your Raspberry Pi.
16:00 And then the Raspberry Pi will make the connection to Gmail and then forward the results back into your connection in the coffee shop through an encrypted channel.
16:08 Oh, that's really cool.
16:09 Everything you do is absolutely private.
16:11 Yeah, that's super cool.
16:13 And it's written in Python.
16:13 Yeah?
16:14 And it is written in Python.
16:15 It recently added support for Python 3.
16:18 Nice.
16:19 That's cool.
16:19 Because that used to be a problem in the past, but now it works on Python 3 as well.
16:23 All right.
16:24 Well, that's really cool.
16:24 I'm definitely going to check that out.
16:25 All right.
16:26 So, last thing is I want to talk about something that initially might surprise people about the topic is I want to talk about Node.js.
16:32 So, Node.js, while, you know, Python developers sometimes or web developers that also play with Node.js, I don't want to talk about it for that reason.
16:42 Mostly, I consider Node.js like a similar new modern ecosystem and environment similar to Python's, right?
16:50 It's very open source.
16:51 There's a lot of people excited and contributing to it.
16:54 Things like that.
16:55 But Node.js has been in the news for the wrong reason this week.
17:00 And basically, a bunch of people who are in charge of the steering committee for Node.js quit.
17:07 Like a third of the committee quit and just said, we're done with this because there's been like a huge rift in the community apparently.
17:16 And so, I kind of want to talk about this community aspect and sort of give thanks for how well things are going at Python and the Python ecosystem relative to Node.js.
17:27 So, basically, there was – I don't want to get into name calling or whatever.
17:31 You guys go read the articles.
17:32 But there was some guy who was on the committee who was making decisions that a lot of people disagreed with.
17:38 He was very much against the code of conduct for reasons that may or may not be valid.
17:43 I don't know.
17:43 But basically, they felt like he was not representing the community well.
17:48 And the way the board worked or the system to enforce the code of conduct worked was they would look at individual things.
17:56 And they would say, is this sufficiently bad, say, to like have this guy removed from being in charge?
18:02 And they said any individual thing was no big – it was not a big deal.
18:06 But it was not big enough to take that action.
18:09 But if you add up like 10 or 20 of these things, like all of a sudden, here's like a –
18:13 There's a pattern, right?
18:14 A pattern.
18:15 A pattern behavior that says, you know, maybe this guy is not representing us as well as we want or something like that.
18:21 And so, they decided not to remove the guy.
18:24 A bunch of people said, that's it.
18:25 We tried for a long time to sort of fix this.
18:28 And we're so fed up with it that we're leaving.
18:31 We're no longer going to be on the committee.
18:33 Some of the people just said, I'm done with Node.js.
18:35 Like these are former like steering community folks on Node.js.
18:38 Like I'm done with Node.js entirely.
18:40 And actually, one of the – maybe the biggest thing that came out of this, not the people on the board.
18:45 But they said moments after the failed leadership vote, Kate Marchin pushed the button and created IO.js, a new open source fork of Node.js.
18:55 And this has happened before.
18:57 There was some problem with Node.js in the community.
19:00 And they created this thing called IO.js.
19:02 This is A-Y-O.js, but phonetically, they're the same, right?
19:06 So, you had some pretty interesting insights on this, I thought.
19:11 Yeah.
19:11 One of the things that you said, like, I mean, first of all, just like give thanks that things are working better here.
19:16 And we seem to have like a better balance.
19:19 But you also pointed out that we have a single leader who ultimately decides, right?
19:24 We have Guido.
19:25 Right.
19:26 We have a single person that sets the tone for the community, right?
19:30 And I believe they don't, right?
19:33 Yeah, yeah.
19:33 They vote on things.
19:34 And many people vote.
19:35 And clearly, they're very divided, right, on their roles.
19:40 And some people put technical over community.
19:43 And clearly, some other people, it's the reverse of that.
19:47 So, yeah.
19:48 I think we're lucky that at least, you know, we have a model that we know, you know, that we follow, right?
19:56 Yeah.
19:56 And I think it's really important that, you know, Guido's open to taking feedback from all the folks involved in the community.
20:03 But in the end, he makes the decisions.
20:04 And I think, you know, credit to him.
20:06 He's made a lot of good decisions that keep people engaged.
20:10 Yeah.
20:11 Yes.
20:11 So, yeah.
20:12 Here's a Node.js item for Python bytes.
20:15 And really, the story is just, you know, look at what's going on over here.
20:18 And let's make sure that this doesn't happen to our community as well.
20:20 Let's hope not.
20:21 Yeah.
20:21 All right.
20:23 Well, Miguel, that's the items for this week that we had picked out.
20:26 Anything else that you got going on?
20:28 You just did an amazing Kickstarter, right?
20:30 Yeah.
20:31 That was actually a little experiment.
20:33 I wanted, you know, for many years, I wanted to update my Flask mega tutorial.
20:39 The first tutorial I wrote more than four years ago.
20:42 And the amount of work is so big that I always had to give preference to work, right?
20:48 Got to pay the bills.
20:50 Kids got to go to college.
20:51 Things like that, right?
20:52 Yeah.
20:52 So, I decided to give a Kickstarter a try.
20:55 And basically, I converted that task into work.
20:58 That is really cool.
20:59 So, you have this mega tutorial, this Flask tutorial that you've done.
21:03 And it's really elaborate.
21:04 And you basically ran the Kickstarter and said, look, I want to update this tutorial.
21:08 Who's in on helping me do this?
21:10 And the community really responded, right?
21:12 It responded in a huge way.
21:14 Basically, you know, the modest funding that I set up for Goal was met the first day.
21:21 And then from then on, I decided to start coming up with ideas to expand the tutorial, add more content to it.
21:28 It got funded in one day?
21:29 Yes.
21:30 Yeah, that's good.
21:30 The 100% funding was in the first day.
21:33 But then you got something like, I don't know, maybe 600% funding in the end.
21:39 It ended up pretty good.
21:40 And I have some serious work to do because I have three new, complete new topics to add that I will be working on as part of this rewrite.
21:51 That's really cool.
21:52 So, super excited about that.
21:53 Yeah, that's great.
21:54 Congratulations.
21:55 And how do people find more out about this?
21:56 They can go to my blog.
21:57 And the blog is blog.migaelgreenberg.com.
22:02 And they're going to find a little Kickstarter batch there.
22:05 And from there, they can go to the Kickstarter page and find out what I'm planning to do if they're interested.
22:09 Awesome.
22:10 Yeah, we'll throw the link in the show notes as well.
22:12 Thank you.
22:12 Cool.
22:13 So, I just have one piece of news as well myself.
22:15 Other than, hey, I'm back from vacation.
22:17 That's awesome.
22:18 While I was on vacation, I ended up finding a little bit of time to release my latest course, which is building RESTful APIs with Pyramid.
22:26 And so, it's like an eight-hour course digging into the whole what is REST, like how do you work with HTTP status codes and verbs and all that, and then making this all happen in Pyramid.
22:36 And it was really fun to write.
22:37 So, if that sounds cool to you, check it out at training.python.fm.
22:41 I'll probably put that in the show notes as well.
22:44 All right.
22:44 Miguel, thank you so much for filling in while Brian's having his beer over in Germany.
22:50 Actually, I don't know what he's doing, but I hope he had a beer.
22:51 Yeah, thank you for having me.
22:54 Yeah, it was great.
22:54 Talk to you later.
22:56 Thank you for listening to Python Bytes.
22:57 Follow the show on Twitter via at Python Bytes.
23:00 That's Python Bytes as in B-Y-T-E-S.
23:03 And get the full show notes at pythonbytes.fm.
23:06 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
23:10 We're always on the lookout for sharing something cool.
23:13 On behalf of myself and Brian Okken, this is Michael Kennedy.
23:17 Thank you for listening and sharing this podcast with your friends and colleagues.