Transcript #23: Can you grok the GIL?
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:06 This is episode 23, recorded April 25th, 2017.
00:11 And I'm one of your hosts, Michael Kennedy.
00:14 And I'm Brian Okken.
00:15 And we're here to share a bunch of cool Python stuff with you.
00:18 We've got six cool items queued up and ready to go.
00:21 But before we get to that, I want to say thanks to Advanced Digital.
00:24 They have an awesome Python job.
00:26 You can check it out at python.advance.net.
00:29 And we'll talk more about that later.
00:30 Right now, I want to talk about the GIL.
00:32 Brian, what do you think?
00:32 I think it's great to talk about the GIL.
00:34 And I'm really glad that...
00:36 So this is an article called Grok the GIL, How to Write Fast and Thread Safe Python.
00:42 And we talk about the GIL as the reason why we can't do parallel computing and programming just built in in Python.
00:51 But, you know, I haven't really jumped into it a lot.
00:54 And this article is from A. Jesse Giroud Davis.
00:59 Who, by the way, is an excellent writer.
01:01 If you want to have some examples of great writing, read his stuff.
01:04 It's great.
01:05 So he has this little...
01:08 It's a very lightweight introduction to what the GIL is.
01:11 And to talk...
01:12 And I like the approach of not just the details of it.
01:16 Because most of us aren't going to go in and start hacking the CPython core.
01:21 But a little peek into the CPython core to see that it's a mutex inside an interpreter lock.
01:26 The GIL is the global interpreter lock.
01:30 I love how he pulls out little snippets from CPython.
01:33 He's got a section.
01:34 Behold, the global interpreter lock.
01:36 And it just shows you, like, the C code.
01:38 Yeah, it's just one line.
01:39 Yeah, exactly.
01:39 And you don't really need to know a lot of C to appreciate it.
01:42 But there's just, like...
01:43 There's enough to make it super concrete.
01:45 You're like, this is actually the code that runs when you, like, call the socket.
01:50 Like, and that's how the GIL gets released, for example, right?
01:53 Yeah, talking about sockets.
01:54 And it really...
01:55 He really talks about that.
01:57 That it's focused around...
01:59 The lock is around I.O.
02:02 Whenever you're waiting for I.O.
02:03 And I think there's other places, too.
02:05 But that's the main place where your code will pause and let some other thread run.
02:11 I like the...
02:13 He has a thing that says it's so simple that you can...
02:16 The effect on threads is so simple you can write it on the back of your hand.
02:20 One thread runs Python while in others sleep or await I.O.
02:24 And he actually has a picture of his hand.
02:26 I think it's his hand.
02:27 Yeah, I was wondering if that's actually his hand.
02:29 Yeah, and if he wrote it, that means he must be left-handed because it's written on the back of his right hand.
02:34 Or he had somebody else write it.
02:35 So I was always curious about this.
02:38 What are the limitations?
02:39 And how do you utilize it to have faster code?
02:43 And the gist of it is if you've got some code that's waiting I.O., like maybe pulling off a whole bunch of different...
02:51 Taking a bunch of connections or pulling off a bunch of URLs...
02:53 Downloading a bunch of URLs.
02:55 That's a great place to use multi-threading because the guild doesn't really get in your way.
03:01 There's...
03:02 In places where you really have multiple processing, where you really want your Python code to run at the same time,
03:08 then you have to jump into multi-processing.
03:11 And he actually gives an example of that.
03:13 And it's not that bad either.
03:15 So anyway, I liked the quick jump into it.
03:19 And I think I'm going to be a better Python programmer for reading this.
03:23 Yeah, this is really nice work.
03:24 Good job, Jesse.
03:25 He's a great writer.
03:26 I actually had him on Talk Python on episode 69, I think, about design patterns for programmer blogs.
03:33 And we did a whole session on blogging.
03:35 It was great.
03:36 And one of the things I like about this is he talks about cooperative multitasking,
03:40 concurrency versus parallelism, preemptive multitasking, how sometimes you still need to actually lock your Python code,
03:49 even though you might think of like, well, this is all straight Python.
03:52 It's not going to get interrupted.
03:54 But there's certain mechanisms that slightly vary between Python 2 and 3,
03:59 where if you hang on to the guild too long, it will be potentially taken from you and given to another thread.
04:05 And so that might still cause what would appear to be parallel race conditions.
04:10 So that's also worth reading about.
04:12 Yeah.
04:12 And one of the things that surprised me is, and I do realize I don't really worry about that.
04:16 I deal with multi-threading in C++.
04:18 And with C++, you have to do it fine-grained locking of data structures,
04:24 any data structure shared by multiple threads.
04:28 But I was surprised how much you can share between threads in Python because the GIL won't interrupt a bytecode.
04:36 And it'll only interrupt, yeah, between bytecodes, not in the C code.
04:43 So the, like, things like sorting a list will happen atomically, and you won't be interrupted with that,
04:50 which is, that surprised me.
04:51 I didn't know that.
04:52 It is where, ironically, incrementing a variable could be interrupted.
04:56 Right.
04:56 Because it ends up being, like, a two-step or read-modify-write operation.
05:01 Yeah, exactly.
05:01 Exactly.
05:02 And Jesse uses the disk module to look inside, which is all very good.
05:06 So that's a great article.
05:06 I think that's probably the most substantive thing we're covering.
05:09 Do you want to think about not so substantive but pretty cool?
05:13 Yes.
05:14 I've got one for you.
05:14 Let's talk about the NBA as in National Basketball Association.
05:18 The American basketball.
05:20 So there was a pretty big deal on Twitter the other day.
05:25 So Mark Cuban, he owns the Dallas Mavericks.
05:27 And he's, I don't know if he comes from tech or not.
05:31 I don't really think so.
05:33 But he definitely was an entrepreneur.
05:34 He's, you know, worth, like, he's a billionaire, basically.
05:37 But as a billionaire, owner of a NBA team, he posted out a pretty interesting thing on Twitter saying, here's the new NBA.
05:46 And it was a picture of him learning Python machine learning with, I think, iPython, an iPython notebook open.
05:54 And he's like, I need to understand the Mavericks in the NBA.
05:57 I'm on it.
05:58 Wow.
05:58 It's pretty cool, right?
05:59 It is pretty cool.
06:00 I don't know much about basketball or Mark Cuban or any of that, but it's neat that somebody that high up is wanting to learn Python and notebooks.
06:09 That was basically the main takeaway.
06:11 A bunch of people, like our friends over at Partially Derivative, invited him to be on the show.
06:15 They're like, oh, we have to hear your story.
06:17 He's like, no, no, I'm just getting started.
06:18 They have a team at the Mavericks.
06:20 I just want to understand what they're doing when they use machine learning to help make predictions and planning.
06:27 And that's kind of cool to think of how machine learning is actually driving these professional sports teams as well.
06:32 Yeah.
06:33 Very interesting.
06:34 Indeed.
06:35 Indeed.
06:35 All right.
06:36 So next up, we have somebody winning an award.
06:39 How cool is that?
06:40 Yeah.
06:40 Ian Kurdasko.
06:42 He got a, or it was announced that he will get the 2017 Community Service Award from the Python Software Foundation.
06:49 And I think that's pretty cool.
06:52 It's apparently, I didn't know that he was doing, that a lot of the stuff that he did.
06:56 I mean, I was familiar with Ian.
06:58 He was on Test and Code episode 13.
07:01 And we talked about Betamax library that he has for recording and playing back requests interactions.
07:09 He's apparently been the election administrator for the PSF since 2015, volunteering all that time, of course.
07:17 And he is active in mentoring new coders and supporting other Python developers with apparently really a focus on trying to be active in mentoring women in Python.
07:29 And I think that's just pretty awesome.
07:31 Yeah, that's really awesome.
07:32 So congratulations, Ian.
07:34 And his project that you talked about, like replaying requests, that's called Betamax?
07:39 Yes.
07:39 That's an awesome name.
07:40 Yeah.
07:40 Yeah.
07:41 And it's, I guess, the idea, of course, of there's a VCR type library in some other languages.
07:48 But he chose Betamax because, well, everybody knows Betamax was better.
07:52 That's right.
07:52 But yeah, you should listen to it.
07:55 It's a pretty interesting tool.
07:57 So that was one that the community asked me to do.
08:00 There were community members of listeners of Test and Code that said, hey, could you go find Ian and talk to him about Betamax?
08:07 That's awesome.
08:08 We love to get those recommendations for all the shows, including some stuff that we're covering here today, right?
08:13 Yeah, definitely.
08:14 Definitely.
08:14 So if you want to work with these kind of fun things, maybe you work at a company where you're doing Java and you dabble in Python or you don't really get to do all the cool things you'd like,
08:25 Advanced Digital has a cool job offer for everyone out there who might want to make a change.
08:30 I wish I was near Jersey City because this sounds fun.
08:33 It does sound fun.
08:34 So, right, they're in Jersey City just across the Hudson from Manhattan there.
08:39 Small, agile environment.
08:41 They're mostly a Python shop, but they play with other cool technologies.
08:45 They fund you guys to go to conferences, professional development.
08:48 And most importantly and coolest, I think, is they run one of the 10 largest news sites by traffic in the U.S.
08:55 And they do it with Python.
08:56 So if you want to be part of that team and you want to play with cool stuff like that,
09:00 just visit python.advance.net and check it out.
09:03 So there's a couple of things coming up, Brian, that have to do with Python versus legacy Python.
09:10 Remember, Matthias from the IPython project, Matthias Boussournier, I'm sorry if I mess up your name, but I think that's pretty close.
09:19 He was the original guy who got us talking Python versus legacy Python instead of Python 3 versus Python 2.
09:25 Oh, yeah, right.
09:26 Yeah.
09:26 So he works on IPython and Jupyter and all that stuff.
09:29 And he's back with a new blog post, which is my next item.
09:32 And it's a pretty big deal.
09:35 We just talked about Mark Cuban, the new NBA machine learning, IPython.
09:39 And so they just released IPython 6.
09:43 Okay.
09:44 So that's pretty big news.
09:45 That is big news.
09:46 Yeah.
09:47 And so people who use IPython, you know, there's a brand new version.
09:50 That's awesome.
09:51 The bigger thing is that this is the first release where IPython goes Python 3 only.
09:57 They've dropped support for Python 2.
09:59 That's great.
10:00 So, or as Matthias would say, they now support Python and not mixing in legacy Python with it.
10:05 Yeah.
10:05 And what I thought was nice is, you know, it's a pretty major project.
10:08 They did a little write-up of what was their experience of converting a mixed source code to Python 3 only.
10:15 What were the benefits and what were the drawbacks?
10:17 So let's see, a couple of things, a couple of stats that Matthias put out.
10:21 The size of the IPython code base has decreased by 1,500 lines.
10:26 That's pretty solid, right?
10:28 That's significant.
10:29 Less code means less maintenance.
10:30 Right.
10:31 They said it's not just because they're dropping Python 2, but it's a significant amount is.
10:35 And even more impressive is they added some entirely new features that required hundreds of new lines of code.
10:42 So they're really the decrease in amount of code they had to support for Python 3 or really for 2 they were able to get rid of when they went to Python 3 is actually probably more.
10:51 So that's pretty cool.
10:54 And they said one of the benefits, they think, is that contributors can spend less time worrying about, well, how does this work if we do it in Python 2?
11:01 Or, you know, this has happened to me.
11:03 You make a pull request.
11:04 You submit it.
11:05 It runs on the continuous integration.
11:07 And it works fine in Python 3.
11:09 But then it fails in Python 2 because, you know, you forgot the B in the string or whatever, right?
11:14 So they don't have to worry about that.
11:16 CI runs faster.
11:18 They said basically, in summary, we're totally happy.
11:21 We're entirely pleased with having switched to basically have the ability to write Python 3 only code.
11:26 And they're looking forward to using a lot of the improvements in Python 3, specifically async and await, which will be cool.
11:33 So an async and await REPL inside of IPython.
11:37 How cool is that?
11:37 That's neat.
11:38 Is async and await in all of the three versions or did that get introduced?
11:42 It came in 3.5.
11:43 Okay.
11:44 The async I.O. stuff was introduced in Python 3.4, I think.
11:47 And then 3.5, they're like, let's put some proper syntax on this and make it really easy.
11:52 Yeah.
11:52 I'm trying to, I'm writing a little thing that I want to have available on Python 2 also and at least 2.7.
12:02 And even if I were to just do Python 3, all of the three versions, I still can't use f-strings, which I wish I could use f-strings.
12:10 I know.
12:11 They're so new.
12:11 It's 3.6 only.
12:12 Even on my production server, it's 3.5.
12:14 So it is what it is.
12:16 That's a move in the right direction.
12:17 And I think it's great that Matias and others talked about their experience with that change.
12:22 Yeah.
12:23 That's awesome.
12:23 Yeah.
12:24 Thanks, Matias.
12:24 Excellent.
12:25 I think I'm, to use an American expression, beating a dead horse.
12:28 But we have another.
12:29 Is that dead horse called Source?
12:31 SRC?
12:31 Yes.
12:32 Yes.
12:33 The other package I was talking about me building up, it's for the book.
12:38 But it's, I wanted to make sure I was representing the community correctly and how to, how to put together a distributable package and do it correctly.
12:47 At least with best practice.
12:48 I know there's not really a correct.
12:50 But somebody pointed me to the direction of an article by, I'm going to probably get this wrong.
12:57 I think it's Enoch.
12:58 And it's called Testing and Packaging.
13:00 And it's basically, he's the guy that did the Adders project or ATTRS.
13:05 Great project.
13:07 That we've talked about a couple times.
13:08 And how there were issues, at least with one package that wasn't using the source, SRC, that the testing that was done was, there was a bug that showed up in, with installed applications that doesn't show up in non-installed.
13:27 So one of the benefits of using SRC is you can more easily make sure that you're only testing the installed package and not the non-installed.
13:37 And he also just shows that it's really just two lines of code change.
13:41 So to do the right thing is not that much work.
13:44 Right.
13:44 So basically in your setup, PY, the call to setup, you set the packages to be looking in the source directory.
13:51 And you set the package to be in the source directory, right?
13:54 Yeah.
13:54 So when you would normally say find packages, he recommends specifically saying find packages and then give it a where equals SRC.
14:02 But you can also just put SRC in the, as the first argument.
14:07 And that works also.
14:07 And then listing it in the packages dir.
14:10 And then one of the things I noticed, which I don't think people have really talked about, is the entire repository looks better.
14:18 You've got all of the package junk, like your setup and your manifest and all that stuff at the top level.
14:24 And the stuff you really care about on a day-to-day basis is separated into subdirectories.
14:30 You've got the docs in one and the tests in another, and then your source in another.
14:34 And that separation just, it pleases my organization.
14:37 It just is nice.
14:39 Yeah.
14:39 I'm coming around to this as well.
14:40 It sounds pretty solid.
14:41 Anyway.
14:41 But that's, I'll probably try to drop talking about that every episode.
14:46 But there you go.
14:47 One more article.
14:48 Well, I'm not quite done beating the Python versus legacy Python horse yet.
14:52 So I'm going to keep going on that one.
14:53 Because there's some more big news.
14:55 We've heard that IPython went to Python 3 only.
14:57 And now, same week, last week, AWS Lambda goes to Python, or ads not only, but ads.
15:05 Python 3.6 support was just 2.7.
15:08 So that's a big jump, right?
15:09 Wow, that's a big jump.
15:10 Yeah.
15:10 Yeah.
15:10 So that's pretty awesome.
15:12 And do you have much experience with Lambda?
15:14 Have you played with it?
15:15 No.
15:15 I've heard a lot about it, but I haven't played with it yet.
15:18 So Lambda is one of these things from AWS, from Amazon, that fits into this serverless architecture.
15:27 So basically, you say, here's a function.
15:29 And when something happens, run this, please.
15:33 So run it on a schedule.
15:34 Somebody changes a database.
15:36 Somebody uploads a file to S3, whatever.
15:39 And it just runs.
15:40 There's no servers that you deal with.
15:43 Obviously, there are servers.
15:43 But it just distributes your code to run when it needs to.
15:48 So I'll cross a whole bunch of servers.
15:49 So it scales basically infinitely.
15:51 As long as you have infinite money, you can infinitely scale this.
15:55 It's fine, right?
15:56 And that's pretty cool.
15:58 Yeah.
15:58 So have you tried it?
16:00 No, I have not had a real reason to do it.
16:03 I mean, I guess there's a couple of things that I could do.
16:06 Like on the websites, there's a job that runs every couple hours that will completely re-index the database
16:14 and reorganize it for super fast queries.
16:17 Like the queries on the various websites run.
16:20 And I'm going to be adding to Python bytes.
16:21 No worries.
16:22 They run like sub-millisecond, right?
16:25 In order to get that stuff, you've got to kind of pre-compute some things.
16:28 Maybe that's a perfect Lambda operation, right?
16:31 Well, I mean, especially now that they have 3.6 support, I'm intrigued enough that I might give it a shot anyway.
16:37 Just to make up some excuse to play with it.
16:41 Exactly.
16:42 We need to run this.
16:43 But I mean, if you're using other AWS stuff, like their database services, Dynamo or RDS or S3,
16:49 like here's a way to run code like really near your resources on triggers with no effort.
16:54 And one of the things I thought was pretty cool, like this announcement just came out.
16:59 And Zappa, so Zappa, if you look at their page, which I linked, it's called serverless Python web services.
17:07 That's interesting, right?
17:09 So basically you can set it up so that using the AWS architecture, you can route web requests to these Lambda functions.
17:17 But you don't really have servers or anything like that.
17:20 And people have been asking for Python 3 support.
17:25 They've been saying, no, no, no, no.
17:26 As soon as this dropped, they're like, yes, it has Python 3 support.
17:29 So that is pretty cool as well.
17:31 So you've seen things that basically are layered on top of Lambda also started to support Python 3, which is great.
17:38 Yeah, definitely.
17:38 Cool.
17:39 All right, yeah, maybe we should play with Lambda.
17:40 I don't know.
17:41 Yeah.
17:41 Very nice.
17:43 All right, well, that's it for the news, Brian.
17:44 You got anything personally you want to share with everyone?
17:46 No, I'm going to be, I guess I'll be in the Munich area the second week in May if there's anybody around that wants to have a beer or something with me.
17:57 Hit me up.
17:58 Yeah, that sounds awesome.
17:59 I'm jealous.
18:00 I'd love to go visit Germany.
18:01 Well, I'll do that at the end of the summer maybe.
18:03 We'll see.
18:04 But no news for me.
18:05 I just want to say thank you, everyone, for listening.
18:08 Oh, you know what?
18:09 Actually, one more thing.
18:09 This is not personal news, but it falls right in here.
18:12 I also saw CheckIO at CheckIO.org.
18:15 These guys have a pretty cool gamification of learning Python.
18:19 They also just went Python 3.
18:21 Oh, cool.
18:22 So just to keep on this.
18:24 Hey, Python 3 is starting to really roll.
18:26 I'd say it's really starting to roll this week.
18:27 I use CheckIO for, hopefully I won't get in trouble for this, but I've gone through a bunch of this stuff and I use them for interview questions.
18:35 Yeah, I think it's actually pretty good.
18:37 What I really like is you can solve a puzzle and then you can look at other people's solutions.
18:42 And I found after solving a bunch and looking at the solutions that I unknowingly have an implicit bias towards performance over ease of reading or simplicity or whatever.
18:53 It's interesting that it uncovered that for me.
18:56 Oh, that's interesting.
18:57 And I totally have the opposite.
18:58 I like them to be readable more than anything else.
19:01 Yeah.
19:01 Yeah.
19:01 Funny, huh?
19:02 All right.
19:03 Well, thanks, Brian.
19:04 Thank you, everyone, for listening.
19:05 And we'll catch you next week.
19:07 And thank you.
19:07 Yep.
19:08 You're welcome.
19:08 Bye.
19:09 Thank you for listening to Python Bytes.
19:11 Follow the show on Twitter via at Python Bytes.
19:14 That's Python Bytes as in B-Y-T-E-S.
19:17 And get the full show notes at pythonbytes.fm.
19:20 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
19:25 We're always on the lookout for sharing something cool.
19:27 On behalf of myself and Brian Okken, this is Michael Kennedy.
19:31 Thank you for listening and sharing this podcast with your friends and colleagues.