Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

#23: Can you grok the GIL?

Published Wed, Apr 26, 2017, recorded Tue, Apr 25, 2017

Sponsored by ADVANCE DIGITAL. Find your Python web job at http://python.advance.net/

Brian #1: Grok the GIL - How to write fast and thread-safe Python

  • A. Jesse Jiryu Davis teaches us about the GIL, and how to use that knowledge to decide between threads and processes for parallelism.
  • From the article:
    • The GIL's effect on the threads in your program is simple enough that you can write the principle on the back of your hand: "One thread runs Python, while N others sleep or await I/O."
  • Discusses and Cooperative multitasking and Preemptive multitasking
  • When can a Python process be interrupted? (between bytecodes)
  • When do you need to use thread protection? (less than you think)
  • A. Jesse Jiryu Davis will be speaking at PyCon 2017, which will be held May 17-25 in Portland, Oregon. Catch his talk, Grok the GIL: Write Fast and Thread-Safe Python, on Friday, May 19.

Michael #2: The New NBA by Mark Cuban

  • Introduction to machine learning in Python & Jupyter notebooks
  • Mark Cuban using Python and ML to play with his NBA team
  • “We have a team at the Mavs but I need to know it to help define strategy and make decisions”

Brian #3: Ian Cordasco gets a Community Service Award from PSF

  • Ian was on Test & Code, episode 13, talking about Betamax
  • From the announcement:
    • Ian Cordasco has been the PSF’s Election Administrator since 2015, volunteering his efforts for this important role.
    • Cordasco frequently mentors newer coders and supports their Python endeavors.
    • The Python Software Foundation award the 2017 Q1 Community Service Award to Ian Cordasco for his contributions to PSF elections and active mentoring of women in Python community.
    • Cordasco has a history of going out of his way to support and encourage female developers. When Carol Willing, a developer for the Jupyter project, wanted to work on the Requests library, she got in touch with Cordasco. “We worked together on the project and my first commit to the Requests library got accepted!” Cordasco later wrote a fantastic post about it on his blog.

Sponsored by ADVANCE DIGITAL

  • They have a small team of developers who work in an agile/devops environment– you will make an impact with your work quickly
  • They are mostly a python shop, but there is an opportunity to introduce and run other technologies at scale
  • They fund employee development and conference attendance
  • They are located in beautiful Jersey City, one stop from Manhattan on the PATH
  • They are one of the 10 largest news sites by traffic in the US
  • Apply at http://python.advance.net/

Michael #4: Release of IPython 6.0

  • by Matthias Bussonnier
  • IPython goes Python 3 only
  • Our personal experience writing Python3-only source code.
    • The size of the IPython codebase has decreased by about 1500 lines of Python code relative to the last release. Of course, that’s not solely due to the removal of Python 2 support, but a non-negligible amount is.
    • even more remarkable in light of completely new features that required adding hundreds of lines of code
    • This change eases the burden on contributors to IPython. Contributors can can spend less time thinking “what about Python 2”, or rewriting a pull request because the Python 2 test suite fails.
    • At the same time, our tests now complete more quickly on continuous integration services because they need to run on fewer versions of Python.
    • From a developer point of view we are extremely pleased with having the possibility to write Python3-only code, and are looking forward to even more improvements like pathlib.
    • We hope you will enjoy this release. It will be the base for some awesome features, like async/await REPL.

Brian #5: Testing & Packaging

  • Hynek Schlawack describes why he was convinced to use a src directory in package distributions he works on.
  • Just use a src dir (of course it doesn’t have to be exactly “src” but that’s the convention and it should be different than your package name), it will make your life easier.
  • Without it, it is easy to think you are testing an installed package, but you’re really testing the modules before install. These can be different. Better to test as close to how your users will see the package as possible, and using a src dir helps that.

Michael #6: AWS Lambda adds Python 3.6 support

  • AWS Lambda is a compute service that lets you run code without provisioning or managing servers.
  • AWS Lambda executes your code only when needed and scales automatically, from a few requests per day to thousands per second.
  • You pay only for the compute time you consume - there is no charge when your code is not running.
  • You can also build serverless applications composed of functions that are triggered by events and automatically deploy them using AWS CodePipeline and AWS CodeBuild.
  • Already good things cometh: Zappa has Python 3 support.

Episode Transcript

Collapse transcript

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:06 This is episode 23, recorded April 25th, 2017.

00:11 And I'm one of your hosts, Michael Kennedy.

00:14 And I'm Brian Okken.

00:15 And we're here to share a bunch of cool Python stuff with you.

00:18 We've got six cool items queued up and ready to go.

00:21 But before we get to that, I want to say thanks to Advanced Digital.

00:24 They have an awesome Python job.

00:26 You can check it out at python.advance.net.

00:29 And we'll talk more about that later.

00:30 Right now, I want to talk about the GIL.

00:32 Brian, what do you think?

00:32 I think it's great to talk about the GIL.

00:34 And I'm really glad that...

00:36 So this is an article called Grok the GIL, How to Write Fast and Thread Safe Python.

00:42 And we talk about the GIL as the reason why we can't do parallel computing and programming just built in in Python.

00:51 But, you know, I haven't really jumped into it a lot.

00:54 And this article is from A. Jesse Giroud Davis.

00:59 Who, by the way, is an excellent writer.

01:01 If you want to have some examples of great writing, read his stuff.

01:04 It's great.

01:05 So he has this little...

01:08 It's a very lightweight introduction to what the GIL is.

01:11 And to talk...

01:12 And I like the approach of not just the details of it.

01:16 Because most of us aren't going to go in and start hacking the CPython core.

01:21 But a little peek into the CPython core to see that it's a mutex inside an interpreter lock.

01:26 The GIL is the global interpreter lock.

01:30 I love how he pulls out little snippets from CPython.

01:33 He's got a section.

01:34 Behold, the global interpreter lock.

01:36 And it just shows you, like, the C code.

01:38 Yeah, it's just one line.

01:39 Yeah, exactly.

01:39 And you don't really need to know a lot of C to appreciate it.

01:42 But there's just, like...

01:43 There's enough to make it super concrete.

01:45 You're like, this is actually the code that runs when you, like, call the socket.

01:50 Like, and that's how the GIL gets released, for example, right?

01:53 Yeah, talking about sockets.

01:54 And it really...

01:55 He really talks about that.

01:57 That it's focused around...

01:59 The lock is around I.O.

02:02 Whenever you're waiting for I.O.

02:03 And I think there's other places, too.

02:05 But that's the main place where your code will pause and let some other thread run.

02:11 I like the...

02:13 He has a thing that says it's so simple that you can...

02:16 The effect on threads is so simple you can write it on the back of your hand.

02:20 One thread runs Python while in others sleep or await I.O.

02:24 And he actually has a picture of his hand.

02:26 I think it's his hand.

02:27 Yeah, I was wondering if that's actually his hand.

02:29 Yeah, and if he wrote it, that means he must be left-handed because it's written on the back of his right hand.

02:34 Or he had somebody else write it.

02:35 So I was always curious about this.

02:38 What are the limitations?

02:39 And how do you utilize it to have faster code?

02:43 And the gist of it is if you've got some code that's waiting I.O., like maybe pulling off a whole bunch of different...

02:51 Taking a bunch of connections or pulling off a bunch of URLs...

02:53 Downloading a bunch of URLs.

02:55 That's a great place to use multi-threading because the guild doesn't really get in your way.

03:01 There's...

03:02 In places where you really have multiple processing, where you really want your Python code to run at the same time,

03:08 then you have to jump into multi-processing.

03:11 And he actually gives an example of that.

03:13 And it's not that bad either.

03:15 So anyway, I liked the quick jump into it.

03:19 And I think I'm going to be a better Python programmer for reading this.

03:23 Yeah, this is really nice work.

03:24 Good job, Jesse.

03:25 He's a great writer.

03:26 I actually had him on Talk Python on episode 69, I think, about design patterns for programmer blogs.

03:33 And we did a whole session on blogging.

03:35 It was great.

03:36 And one of the things I like about this is he talks about cooperative multitasking,

03:40 concurrency versus parallelism, preemptive multitasking, how sometimes you still need to actually lock your Python code,

03:49 even though you might think of like, well, this is all straight Python.

03:52 It's not going to get interrupted.

03:54 But there's certain mechanisms that slightly vary between Python 2 and 3,

03:59 where if you hang on to the guild too long, it will be potentially taken from you and given to another thread.

04:05 And so that might still cause what would appear to be parallel race conditions.

04:10 So that's also worth reading about.

04:12 Yeah.

04:12 And one of the things that surprised me is, and I do realize I don't really worry about that.

04:16 I deal with multi-threading in C++.

04:18 And with C++, you have to do it fine-grained locking of data structures,

04:24 any data structure shared by multiple threads.

04:28 But I was surprised how much you can share between threads in Python because the GIL won't interrupt a bytecode.

04:36 And it'll only interrupt, yeah, between bytecodes, not in the C code.

04:43 So the, like, things like sorting a list will happen atomically, and you won't be interrupted with that,

04:50 which is, that surprised me.

04:51 I didn't know that.

04:52 It is where, ironically, incrementing a variable could be interrupted.

04:56 Right.

04:56 Because it ends up being, like, a two-step or read-modify-write operation.

05:01 Yeah, exactly.

05:01 Exactly.

05:02 And Jesse uses the disk module to look inside, which is all very good.

05:06 So that's a great article.

05:06 I think that's probably the most substantive thing we're covering.

05:09 Do you want to think about not so substantive but pretty cool?

05:13 Yes.

05:14 I've got one for you.

05:14 Let's talk about the NBA as in National Basketball Association.

05:18 The American basketball.

05:20 So there was a pretty big deal on Twitter the other day.

05:25 So Mark Cuban, he owns the Dallas Mavericks.

05:27 And he's, I don't know if he comes from tech or not.

05:31 I don't really think so.

05:33 But he definitely was an entrepreneur.

05:34 He's, you know, worth, like, he's a billionaire, basically.

05:37 But as a billionaire, owner of a NBA team, he posted out a pretty interesting thing on Twitter saying, here's the new NBA.

05:46 And it was a picture of him learning Python machine learning with, I think, iPython, an iPython notebook open.

05:54 And he's like, I need to understand the Mavericks in the NBA.

05:57 I'm on it.

05:58 Wow.

05:58 It's pretty cool, right?

05:59 It is pretty cool.

06:00 I don't know much about basketball or Mark Cuban or any of that, but it's neat that somebody that high up is wanting to learn Python and notebooks.

06:09 That was basically the main takeaway.

06:11 A bunch of people, like our friends over at Partially Derivative, invited him to be on the show.

06:15 They're like, oh, we have to hear your story.

06:17 He's like, no, no, I'm just getting started.

06:18 They have a team at the Mavericks.

06:20 I just want to understand what they're doing when they use machine learning to help make predictions and planning.

06:27 And that's kind of cool to think of how machine learning is actually driving these professional sports teams as well.

06:32 Yeah.

06:33 Very interesting.

06:34 Indeed.

06:35 Indeed.

06:35 All right.

06:36 So next up, we have somebody winning an award.

06:39 How cool is that?

06:40 Yeah.

06:40 Ian Kurdasko.

06:42 He got a, or it was announced that he will get the 2017 Community Service Award from the Python Software Foundation.

06:49 And I think that's pretty cool.

06:52 It's apparently, I didn't know that he was doing, that a lot of the stuff that he did.

06:56 I mean, I was familiar with Ian.

06:58 He was on Test and Code episode 13.

07:01 And we talked about Betamax library that he has for recording and playing back requests interactions.

07:09 He's apparently been the election administrator for the PSF since 2015, volunteering all that time, of course.

07:17 And he is active in mentoring new coders and supporting other Python developers with apparently really a focus on trying to be active in mentoring women in Python.

07:29 And I think that's just pretty awesome.

07:31 Yeah, that's really awesome.

07:32 So congratulations, Ian.

07:34 And his project that you talked about, like replaying requests, that's called Betamax?

07:39 Yes.

07:39 That's an awesome name.

07:40 Yeah.

07:40 Yeah.

07:41 And it's, I guess, the idea, of course, of there's a VCR type library in some other languages.

07:48 But he chose Betamax because, well, everybody knows Betamax was better.

07:52 That's right.

07:52 But yeah, you should listen to it.

07:55 It's a pretty interesting tool.

07:57 So that was one that the community asked me to do.

08:00 There were community members of listeners of Test and Code that said, hey, could you go find Ian and talk to him about Betamax?

08:07 That's awesome.

08:08 We love to get those recommendations for all the shows, including some stuff that we're covering here today, right?

08:13 Yeah, definitely.

08:14 Definitely.

08:14 So if you want to work with these kind of fun things, maybe you work at a company where you're doing Java and you dabble in Python or you don't really get to do all the cool things you'd like,

08:25 Advanced Digital has a cool job offer for everyone out there who might want to make a change.

08:30 I wish I was near Jersey City because this sounds fun.

08:33 It does sound fun.

08:34 So, right, they're in Jersey City just across the Hudson from Manhattan there.

08:39 Small, agile environment.

08:41 They're mostly a Python shop, but they play with other cool technologies.

08:45 They fund you guys to go to conferences, professional development.

08:48 And most importantly and coolest, I think, is they run one of the 10 largest news sites by traffic in the U.S.

08:55 And they do it with Python.

08:56 So if you want to be part of that team and you want to play with cool stuff like that,

09:00 just visit python.advance.net and check it out.

09:03 So there's a couple of things coming up, Brian, that have to do with Python versus legacy Python.

09:10 Remember, Matthias from the IPython project, Matthias Boussournier, I'm sorry if I mess up your name, but I think that's pretty close.

09:19 He was the original guy who got us talking Python versus legacy Python instead of Python 3 versus Python 2.

09:25 Oh, yeah, right.

09:26 Yeah.

09:26 So he works on IPython and Jupyter and all that stuff.

09:29 And he's back with a new blog post, which is my next item.

09:32 And it's a pretty big deal.

09:35 We just talked about Mark Cuban, the new NBA machine learning, IPython.

09:39 And so they just released IPython 6.

09:43 Okay.

09:44 So that's pretty big news.

09:45 That is big news.

09:46 Yeah.

09:47 And so people who use IPython, you know, there's a brand new version.

09:50 That's awesome.

09:51 The bigger thing is that this is the first release where IPython goes Python 3 only.

09:57 They've dropped support for Python 2.

09:59 That's great.

10:00 So, or as Matthias would say, they now support Python and not mixing in legacy Python with it.

10:05 Yeah.

10:05 And what I thought was nice is, you know, it's a pretty major project.

10:08 They did a little write-up of what was their experience of converting a mixed source code to Python 3 only.

10:15 What were the benefits and what were the drawbacks?

10:17 So let's see, a couple of things, a couple of stats that Matthias put out.

10:21 The size of the IPython code base has decreased by 1,500 lines.

10:26 That's pretty solid, right?

10:28 That's significant.

10:29 Less code means less maintenance.

10:30 Right.

10:31 They said it's not just because they're dropping Python 2, but it's a significant amount is.

10:35 And even more impressive is they added some entirely new features that required hundreds of new lines of code.

10:42 So they're really the decrease in amount of code they had to support for Python 3 or really for 2 they were able to get rid of when they went to Python 3 is actually probably more.

10:51 So that's pretty cool.

10:54 And they said one of the benefits, they think, is that contributors can spend less time worrying about, well, how does this work if we do it in Python 2?

11:01 Or, you know, this has happened to me.

11:03 You make a pull request.

11:04 You submit it.

11:05 It runs on the continuous integration.

11:07 And it works fine in Python 3.

11:09 But then it fails in Python 2 because, you know, you forgot the B in the string or whatever, right?

11:14 So they don't have to worry about that.

11:16 CI runs faster.

11:18 They said basically, in summary, we're totally happy.

11:21 We're entirely pleased with having switched to basically have the ability to write Python 3 only code.

11:26 And they're looking forward to using a lot of the improvements in Python 3, specifically async and await, which will be cool.

11:33 So an async and await REPL inside of IPython.

11:37 How cool is that?

11:37 That's neat.

11:38 Is async and await in all of the three versions or did that get introduced?

11:42 It came in 3.5.

11:43 Okay.

11:44 The async I.O. stuff was introduced in Python 3.4, I think.

11:47 And then 3.5, they're like, let's put some proper syntax on this and make it really easy.

11:52 Yeah.

11:52 I'm trying to, I'm writing a little thing that I want to have available on Python 2 also and at least 2.7.

12:02 And even if I were to just do Python 3, all of the three versions, I still can't use f-strings, which I wish I could use f-strings.

12:10 I know.

12:11 They're so new.

12:11 It's 3.6 only.

12:12 Even on my production server, it's 3.5.

12:14 So it is what it is.

12:16 That's a move in the right direction.

12:17 And I think it's great that Matias and others talked about their experience with that change.

12:22 Yeah.

12:23 That's awesome.

12:23 Yeah.

12:24 Thanks, Matias.

12:24 Excellent.

12:25 I think I'm, to use an American expression, beating a dead horse.

12:28 But we have another.

12:29 Is that dead horse called Source?

12:31 SRC?

12:31 Yes.

12:32 Yes.

12:33 The other package I was talking about me building up, it's for the book.

12:38 But it's, I wanted to make sure I was representing the community correctly and how to, how to put together a distributable package and do it correctly.

12:47 At least with best practice.

12:48 I know there's not really a correct.

12:50 But somebody pointed me to the direction of an article by, I'm going to probably get this wrong.

12:57 I think it's Enoch.

12:58 And it's called Testing and Packaging.

13:00 And it's basically, he's the guy that did the Adders project or ATTRS.

13:05 Great project.

13:07 That we've talked about a couple times.

13:08 And how there were issues, at least with one package that wasn't using the source, SRC, that the testing that was done was, there was a bug that showed up in, with installed applications that doesn't show up in non-installed.

13:27 So one of the benefits of using SRC is you can more easily make sure that you're only testing the installed package and not the non-installed.

13:37 And he also just shows that it's really just two lines of code change.

13:41 So to do the right thing is not that much work.

13:44 Right.

13:44 So basically in your setup, PY, the call to setup, you set the packages to be looking in the source directory.

13:51 And you set the package to be in the source directory, right?

13:54 Yeah.

13:54 So when you would normally say find packages, he recommends specifically saying find packages and then give it a where equals SRC.

14:02 But you can also just put SRC in the, as the first argument.

14:07 And that works also.

14:07 And then listing it in the packages dir.

14:10 And then one of the things I noticed, which I don't think people have really talked about, is the entire repository looks better.

14:18 You've got all of the package junk, like your setup and your manifest and all that stuff at the top level.

14:24 And the stuff you really care about on a day-to-day basis is separated into subdirectories.

14:30 You've got the docs in one and the tests in another, and then your source in another.

14:34 And that separation just, it pleases my organization.

14:37 It just is nice.

14:39 Yeah.

14:39 I'm coming around to this as well.

14:40 It sounds pretty solid.

14:41 Anyway.

14:41 But that's, I'll probably try to drop talking about that every episode.

14:46 But there you go.

14:47 One more article.

14:48 Well, I'm not quite done beating the Python versus legacy Python horse yet.

14:52 So I'm going to keep going on that one.

14:53 Because there's some more big news.

14:55 We've heard that IPython went to Python 3 only.

14:57 And now, same week, last week, AWS Lambda goes to Python, or ads not only, but ads.

15:05 Python 3.6 support was just 2.7.

15:08 So that's a big jump, right?

15:09 Wow, that's a big jump.

15:10 Yeah.

15:10 Yeah.

15:10 So that's pretty awesome.

15:12 And do you have much experience with Lambda?

15:14 Have you played with it?

15:15 No.

15:15 I've heard a lot about it, but I haven't played with it yet.

15:18 So Lambda is one of these things from AWS, from Amazon, that fits into this serverless architecture.

15:27 So basically, you say, here's a function.

15:29 And when something happens, run this, please.

15:33 So run it on a schedule.

15:34 Somebody changes a database.

15:36 Somebody uploads a file to S3, whatever.

15:39 And it just runs.

15:40 There's no servers that you deal with.

15:43 Obviously, there are servers.

15:43 But it just distributes your code to run when it needs to.

15:48 So I'll cross a whole bunch of servers.

15:49 So it scales basically infinitely.

15:51 As long as you have infinite money, you can infinitely scale this.

15:55 It's fine, right?

15:56 And that's pretty cool.

15:58 Yeah.

15:58 So have you tried it?

16:00 No, I have not had a real reason to do it.

16:03 I mean, I guess there's a couple of things that I could do.

16:06 Like on the websites, there's a job that runs every couple hours that will completely re-index the database

16:14 and reorganize it for super fast queries.

16:17 Like the queries on the various websites run.

16:20 And I'm going to be adding to Python bytes.

16:21 No worries.

16:22 They run like sub-millisecond, right?

16:25 In order to get that stuff, you've got to kind of pre-compute some things.

16:28 Maybe that's a perfect Lambda operation, right?

16:31 Well, I mean, especially now that they have 3.6 support, I'm intrigued enough that I might give it a shot anyway.

16:37 Just to make up some excuse to play with it.

16:41 Exactly.

16:42 We need to run this.

16:43 But I mean, if you're using other AWS stuff, like their database services, Dynamo or RDS or S3,

16:49 like here's a way to run code like really near your resources on triggers with no effort.

16:54 And one of the things I thought was pretty cool, like this announcement just came out.

16:59 And Zappa, so Zappa, if you look at their page, which I linked, it's called serverless Python web services.

17:07 That's interesting, right?

17:09 So basically you can set it up so that using the AWS architecture, you can route web requests to these Lambda functions.

17:17 But you don't really have servers or anything like that.

17:20 And people have been asking for Python 3 support.

17:25 They've been saying, no, no, no, no.

17:26 As soon as this dropped, they're like, yes, it has Python 3 support.

17:29 So that is pretty cool as well.

17:31 So you've seen things that basically are layered on top of Lambda also started to support Python 3, which is great.

17:38 Yeah, definitely.

17:38 Cool.

17:39 All right, yeah, maybe we should play with Lambda.

17:40 I don't know.

17:41 Yeah.

17:41 Very nice.

17:43 All right, well, that's it for the news, Brian.

17:44 You got anything personally you want to share with everyone?

17:46 No, I'm going to be, I guess I'll be in the Munich area the second week in May if there's anybody around that wants to have a beer or something with me.

17:57 Hit me up.

17:58 Yeah, that sounds awesome.

17:59 I'm jealous.

18:00 I'd love to go visit Germany.

18:01 Well, I'll do that at the end of the summer maybe.

18:03 We'll see.

18:04 But no news for me.

18:05 I just want to say thank you, everyone, for listening.

18:08 Oh, you know what?

18:09 Actually, one more thing.

18:09 This is not personal news, but it falls right in here.

18:12 I also saw CheckIO at CheckIO.org.

18:15 These guys have a pretty cool gamification of learning Python.

18:19 They also just went Python 3.

18:21 Oh, cool.

18:22 So just to keep on this.

18:24 Hey, Python 3 is starting to really roll.

18:26 I'd say it's really starting to roll this week.

18:27 I use CheckIO for, hopefully I won't get in trouble for this, but I've gone through a bunch of this stuff and I use them for interview questions.

18:35 Yeah, I think it's actually pretty good.

18:37 What I really like is you can solve a puzzle and then you can look at other people's solutions.

18:42 And I found after solving a bunch and looking at the solutions that I unknowingly have an implicit bias towards performance over ease of reading or simplicity or whatever.

18:53 It's interesting that it uncovered that for me.

18:56 Oh, that's interesting.

18:57 And I totally have the opposite.

18:58 I like them to be readable more than anything else.

19:01 Yeah.

19:01 Yeah.

19:01 Funny, huh?

19:02 All right.

19:03 Well, thanks, Brian.

19:04 Thank you, everyone, for listening.

19:05 And we'll catch you next week.

19:07 And thank you.

19:07 Yep.

19:08 You're welcome.

19:08 Bye.

19:09 Thank you for listening to Python Bytes.

19:11 Follow the show on Twitter via at Python Bytes.

19:14 That's Python Bytes as in B-Y-T-E-S.

19:17 And get the full show notes at pythonbytes.fm.

19:20 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

19:25 We're always on the lookout for sharing something cool.

19:27 On behalf of myself and Brian Okken, this is Michael Kennedy.

19:31 Thank you for listening and sharing this podcast with your friends and colleagues.


Want to go deeper? Check our projects