Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #59: Instagram disregards Python's GC (again)

Return to episode page view on github
Recorded on Thursday, Jan 4, 2018.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 59, recorded January 4th, 2018. I'm Michael Kennedy.

00:11 And I'm Brian Okken.

00:12 And we got a bunch of awesome stuff lined up for you in this very first episode of 2018.

00:17 So let's say thank you and happy new year to DigitalOcean.

00:22 Yeah, thanks. And definitely happy new year. It's exciting to be back.

00:25 It's very exciting to be back. And we, you know, the Python news doesn't stop coming. I think if

00:30 anything, it's just picking up speed. I'm afraid we might scare people a little bit with some of your

00:35 picks this time, Brian.

00:36 What?

00:37 The stuff near the end. The stuff near the end. So yeah.

00:42 Okay.

00:42 Another thing that's kind of scary is turning off garbage collection. Seems like that might be bad,

00:47 right?

00:47 Right. Well, I was actually surprised and very interested when I was listening to

00:50 the Instagram talk at PyCon about turning off garbage collection. And there's an article that

00:57 they put out again. They said that they've, they had turned it off last year and then they wanted to

01:03 sort of, they were having memory problems. So they wanted to try to turn it back on a little bit,

01:08 but they still have concerns.

01:10 Yeah. So maybe we should take a moment, just a step back and say, you described the original thing.

01:15 So why did they start down this path of turning off garbage collection in the first place? What

01:19 they found was they were running many instances of their, the largest Django deployment on Python in

01:25 the world. So they're running lots of servers with us. And they found that the shared memory across

01:31 multiple processes running that on a single server was completely falling apart because garbage

01:37 collection was shifting stuff around. They said, well, could we turn it off? And it turned out that

01:41 they could, but they then this article you're referring to says they basically were losing

01:45 those gains again.

01:46 And we'd talked about this, I guess, a couple of times of if you turn it off, then you can

01:51 eventually will run out. But if you're restarting tasks every once in a while, that completely

01:57 cleans it up.

01:58 Yeah, exactly.

01:58 They were losing some of those gains, but they wanted, so they wanted to get some of those back.

02:02 This is a really interesting, and I had to read it, read this article about three times,

02:06 but it's called copy on write friendly Python garbage collection. And it's a pretty interesting

02:13 story, but the end punchline is that they've got a new addition to Python that's going to go into

02:20 Python 3.7, or it's already in there that is called GC freeze, which what happens is they get

02:26 their main stuff running with all the shared objects. But before they like fork off a bunch of threads,

02:33 they call this GC freeze and all the stuff that's in memory right now at this point doesn't get

02:39 garbage collected, but everything from now, from like this point in time on will be garbage collected,

02:45 which is pretty interesting.

02:46 Yeah, that's really, it's really interesting. So Python memory management is a little,

02:52 I think it's a little obscure. People don't talk about it very much. And I don't think there's a lot

02:57 of good write ups. You actually found a really fantastic write up on the intricate details of Python

03:03 memory management. The short version is most things are cleaned up through reference counting. So number

03:09 of things pointing at it, when that goes to zero, it goes away. But the problem with reference counting

03:13 is cycles, I have one object appointed another that object points back at the first, they both have a

03:19 count of one or higher forever, and they get leaked. And so there's this secondary garbage collection

03:24 phase that goes through and looks at these items, cleans them up, and so on. So this GC freeze says,

03:32 let's take all the stuff that exists now, and just tell the garbage collector to ignore it,

03:36 don't touch it, don't mess with it, leave it alone, right? And so you get like, basically your app into

03:41 its like normal working state, and then freeze it one time. And then all the new stuff that would make

03:46 the memory grow and grow and grow over time is going to be continually GCed. But the core essence of

03:52 your app, Python runtime, and a bunch of things to get started should be kind of fixed, right?

03:56 Yeah. And I think that's a pretty cool idea, because that's a common model for applications to

04:02 get connections up and get your normal like sitting state, idle state running. And then before you get

04:10 requests in and, and spawning stuff, just at that point, you're like, well, this is all the shared

04:15 stuff. Let's just, we don't need to move this stuff around. It's always going to be there.

04:20 Anyway, it's a cool idea. And apparently it saved them. They were at linear, linear memory growth,

04:26 and they slowed that down quite a bit.

04:28 Yeah, it looks really, really interesting. Instagram is doing amazing stuff, I think,

04:32 in the Python space and the web space. And if any of those guys are out there listening,

04:37 they want to come talk about Python and Instagram on Talk Python. I'm more than welcome to any other

04:42 more than welcome to come over. It'd be fun.

04:44 And I definitely appreciate that they're very open about this to say, hey, this is what we're trying.

04:49 It's not like perfect yet, but it's better.

04:51 Yeah, it's super cool. Do you know if GC freeze is approved or just proposed for 3.7?

04:57 So we have a link to the, the pull request that looks like it's already in.

05:02 Oh, it is merged. Yes, it is merged. So this is pretty awesome, right? We have

05:06 CPython on GitHub with a pull request merged in with its comment history. That was that's new,

05:12 right? That's the 2017 bit of magic that it's on GitHub.

05:16 Yeah, cool. So nice that we can actually track that. So the next thing that I want to talk about

05:21 is a little bit different. I think this will be mostly of interest for data science folks. This is

05:27 a little bit lower level maybe than it sounds, but this thing's called SpeechPy. So SpeechPy,

05:33 it's a library for speech processing and recognition. So this is a pretty interesting Python project.

05:40 You can come along and basically give it some, you know, spoken words and it can pull out

05:46 various effects and things that are sort of the essence of what you need to do speech recognition.

05:54 I think this works a little, you don't just feed it like here's a say a wave file and out pops text of

06:00 what it said, but it gives you what you would need to feed to a machine learning system.

06:04 Basically takes the spoken words into a representation. You can feed to some kind of algorithm to actually

06:10 get the text. So I think that was pretty cool. And one of the things that I wanted to bring this up

06:16 for is they have a really nice citation statement. So if you look at the GitHub repo,

06:21 like kind of near the top, it says, if you're going to use this package, please cite it as follows.

06:28 And that's interesting because there's been some talk in the scientific space, more true science,

06:35 not data science around people want to publish their software and they want to work on advancing

06:40 software. But in the academic space, you have to publish articles or, you know, the whole publish or

06:45 perish type of thing. And the way you get credit for your work is to be cited.

06:51 in other articles. And so this is sort of showing a way to cite this work, which is not a paper,

06:59 but which is an open source project in the same sense that the person, the people who created it

07:04 might get the same level of academic credit for their thing being cited. So I think that's pretty

07:10 cool. Yeah. I don't get the syntax, but it must mean something. I have no idea what it is.

07:15 Okay. I thought it's kind of neat. If you're doing machine learning, you need to turn

07:20 waveforms into something you can process. This is pretty cool. And the other thing that's kind of

07:25 nice is if you look at it here, and I think it's in the documentation or the tutorial, they actually

07:31 show you how to process wave files from SciPy, which is also maybe cool and handy at some point.

07:36 Yeah. It's actually something I need to be doing some wave file processing.

07:40 Well, SciPy apparently has it.

07:42 Nice. How about the next one?

07:46 Well, next up, we've got our friends at PyBytes. Is that what they're called? PyBytes.

07:52 Yeah. PyBytes. That's right.

07:53 They've got a new platform and I suddenly forgot the URL, but there it is. Code challenges,

08:01 but the ES is after the dot. So code challenge.es. No, clever though. But we've covered other things

08:11 before. I should have looked this up. There's a game one that you're going through a game and doing

08:17 code challenges and there's code katas around. This is a similar sort of thing. So you are able to

08:23 do these little code challenges and they say, it's called bytes of Python, bytes of pie and are,

08:30 they're self-contained 20 to 60 minute code challenges. And you can write them and verify

08:36 them in the browser. And I had, I did two of them this morning and I had kind of a lot of fun with it.

08:41 It was fun.

08:42 Nice. And you verify them by writing pytest unit test, right?

08:45 You don't write it. It has pre-written pytest code that checks your answers.

08:51 I see. So you've got to do some sort of thing and then you check it in and it runs basically the

08:55 test against your code and says thumbs up, thumbs down.

08:57 Yeah. Like for instance, on the second challenge, you have to write three different functions to

09:01 manipulate a list of names. And it has tests for all of these. I went ahead and just solved one at a

09:09 time, for instance. So I tried to solve the first one and then ran the tests and noticed that

09:13 the first one passed and, and then just did that. And looking at the, with the help of the test output

09:19 was helped me solve the rest of them.

09:22 That's really cool. And I also learned something by the transitor property through you.

09:25 You did?

09:26 I did. I learned what you learned in that min takes a key like sort and sorted does. That way you could

09:33 sort some complex object based on like a attribute of it.

09:36 I didn't know that. I had just discovered that this morning. So my solution for one of the

09:42 challenges is to try to find the, find the name with the shortest first name. And I went ahead and

09:48 sorted the list by the length of the first name and then just pick the first element. Their solution

09:54 uses min instead of sorting the list. You can just find the min length, which is pretty cool.

10:02 Yeah. That's really awesome. That's, that's gotta be quicker than a full on sort.

10:05 One of the things I like about these sorts of quick challenges is you can probably do them

10:09 like on your lunch break or a couple of lunch breaks to do one of them. And they just take a

10:15 browser. So you could just do it on your laptop. It's pretty fun.

10:17 Yep. That's cool. You could maybe even do it on an iPad or something if you really wanted.

10:21 Yeah. Well, I don't know. I haven't tried that probably.

10:23 If it runs in the browser, I bet it would. Nice. So yeah, that's, that's really cool.

10:27 I do like that you learn these little things like, wait, min takes a key. I didn't know that.

10:32 You know, that's just, you wouldn't think you'd pick up these little things so quickly, but

10:35 you know, these little challenges are nice like that. So before we get to the next item,

10:39 I want to say thank you to DigitalOcean. They're sponsoring this episode and many,

10:43 many other episodes. They're really a big supporter of Python bytes. So as many of you know,

10:48 many of our bits of code or stuff on the web and our files or MP3 files that get sent down to you all go

10:56 through DigitalOcean. So Python bytes is basically delivered in all of its forms to you through DigitalOcean,

11:02 have a bunch of servers there. They're super easy to work with very quick, very reliable. You can create a

11:07 new server, a new droplet, they call it in probably 30 seconds. And then you SSH in and you're off to the

11:13 races. So really, really nice and affordable and check them out at do.co slash Python and let them

11:19 know that you heard about it on Python bytes. So this end of the year thing, Brian, this is kind of

11:25 when, I mean, we're sort of on the other side of it, but this is when you get together with your family,

11:29 right? People, maybe you didn't even know like, wait, I have a second cousin from wire. Python's like

11:36 that, right? Yeah. Yeah. You were talking about like, what is the place where you can like do sort

11:41 of gamified code challenges and that's check IO. So the reason that's relevant, I'm coming back to it,

11:46 is there's an article by the guys at check IO called how big is the Python family? So this is really nice.

11:52 And you know, some of you I'm sure are aware of it, but many people I don't really think are aware of

11:57 how varied Python is as it sort of as a platform. So when you say Python, typically you mean

12:04 CPython, hopefully you mean modern Python three, six, not legacy to seven Python, but we'll, we'll

12:12 let that slide for now. There's also things like Jython and Jython will let you write Python code,

12:19 but executed on the JVM and interact with Java objects. Iron Python is the same thing for.net.

12:26 There's also Python for.net, which I think is a more up to date, modern variant on the same thing.

12:33 There's Cython, which is compiled slightly different Python. There's PyPy, which is a JIT version.

12:38 MicroPython, which is Python as an, your app is an operating system and Python on microchips basically.

12:44 And on Talk Python, you and I talked about Grumpy, right?

12:49 Yeah. Which is on Go.

12:51 Yeah. So Grumpy is from the YouTube guys, which is instead of using C to implement CPython,

12:57 they said, well, what if we wrote the same thing, but in Go? And that's kind of an interesting thing.

13:01 So I thought this is just a nice grouping of all of these ideas, a quick paragraph or two on each of

13:07 them. You know, if you bring in people onto your team and you're like, well, wait a minute,

13:11 there's actually a lot of types of Python here, check this out. Right. And also maybe a reminder to like,

13:16 give PyPy a try. Like they just had a big release for both Python 2 and Python 3 versions.

13:21 One of the things I like about this writeup that they did is it reminds you why some of these are

13:26 around. Like if you had to work with .NET, then working with like Iron Python or Python.NET might be

13:34 like a better thing than just trying to do it other ways.

13:38 Yeah. And one of the advantages there might be, you know, if you're working on a .NET app, but you want to add scripting.

13:43 Yeah.

13:44 Like what are your choices? You probably don't want to give them C#. And even if you did,

13:47 like it requires full on compilation and like, you know, how do you deal with that? Right.

13:51 So this could be a really nice way to plug in like scriptability into your enterprise app,

13:56 which would be pretty cool.

13:57 And one more thing I wanted to throw in on this conversation is a lot of times I'll say Python

14:02 runtime. And I know often people say Python interpreter. This is what the Python interpreter

14:08 does. It does this and that. Well, if you look at how the whole Python family, only some of them

14:13 are interpreters. Some of them are compiled execution engines, right? Like the JVM. That's actually not a

14:21 great example, but say PyPy, for example, or Cython, those two definitely are not interpreted. And in

14:29 the traditional sense, PyPy starts out that way, but it converts to a JIT version for the hotspots.

14:34 I often say Python runtime because I kind of feel like, you know, when you say interpreter, you really

14:40 just got the mindset of CPython, which is the most popular, but not always. What do you say?

14:45 Say interpreter?

14:46 I don't usually say either. I just say Python.

14:49 Yeah, there you go.

14:51 Cool. So anyway, I think this is a nice write up and good to have it all in one place.

14:55 So I like the one that you have coming up next. One of the problems I often see is I want to do

15:01 some work, but I don't care if it happens right now. I just want to like start it and let it go

15:05 somewhere. I don't usually have a great answer for that.

15:07 Task processing stuff. And one of the common things is often people bring up is celery.

15:12 And to be honest, I've tried to get into celery a couple of times, but kind of the learning curve on it,

15:18 maybe it's just me, but I had, I had a little bit of trouble getting into it.

15:22 I was interested when I heard an interview on podcast.init about a library called dramatic or

15:28 dramatic. I'm not sure. It's D R A M A T I Q. Yeah.

15:33 But it's a very, I'm sure since it's task scheduling, it's a quite complicated internals.

15:40 I'm sure you just like declare an actor for on some code and it's pretty easy to get started.

15:46 I thought I'd point people to it.

15:48 Yeah, it's quite cool. You basically put a decorator onto a method and then that method,

15:53 instead of running locally, you can like send work to it. And that send work actually kicks it off on

15:59 the example they had was rabbit MQ, I think. And that there's like a producer of the work.

16:04 And then there's another process that just hangs out and consumes anything that lands on the queue.

16:08 It's pretty cool.

16:09 Yeah. So that you can configure like what your defaults to rabbit MQ, I think. And there's

16:15 just good defaults that work off right off the, just if you don't care. And then there's a,

16:21 you can configure it to use other things if you need to. It apparently is, well, the, the person

16:27 and during, I forget his name that developed this it's used on quite significant projects. I mean,

16:34 it isn't a toy project, but it's pretty easy to get started and you can configure it to be all sorts

16:40 of fancy stuff if you need it to be. But one of the things I liked about the conversation is he,

16:45 he brought up that he intentionally kept the documentation and the fairly terse and small so

16:53 that when you're looking for something that you think you saw before, it's pretty easy to find

16:57 again. So that's cool.

16:58 Okay. Yeah. That's an interesting point. Yeah. And it looks like you can run it on top of rabbit MQ or

17:04 Redis. Take your pick. One final thing I want to point out that I thought was interesting is it's licensed

17:09 under a GPL, but it also has commercial licenses available upon request, which, you know, people are

17:17 always looking for ways to fund basically fund their open source work. And I thought that was an

17:22 interesting variation that I saw going through it.

17:24 Really? Okay. So I didn't pay attention to that. So I'm not sure what the a GPL is.

17:29 Yeah. I'd actually don't know either, but apparently you might want a commercial license if instead.

17:33 Okay. So the last one I want to talk about is a little bit similar to what you're talking about

17:39 running async work, but it's sort of the challenge of taking advantage of async things, but not making

17:49 that a problem for people trying to consume it who don't want to think of things that way. So this

17:54 article is called controlling Python async creep from friend of the show, Kristen Medina. And he says,

18:02 basically, if you've got some library that is written in an async way, you're supposed to await

18:07 it. But anybody who's going to call that and take advantage of that, that caller has to also be async.

18:14 And then the caller that has to be async. So maybe way, way down somewhere, you're trying to do something

18:18 async and it creates this sort of chain reaction of, well, the callers of this have to be async. Well,

18:23 the caller of those things have to be async and so on. It becomes, it can become quite a problem.

18:28 So he wrote this nice article, basically going through three examples of where you can sort of

18:34 put a stopgap and say, okay, like at this level, we're no longer worried about async, but we're still

18:39 taking advantages of it internally. So one way you can do that is you can wait for blocks of async code.

18:45 So if you got to contact, you know, a database, two web services, read something from the file system,

18:51 you want to do that sort of asynchronously, you could create those pieces of work, but then wait on them

18:56 as a group. And there's some built-in ways in async.io how to do that, which is really cool.

18:59 It's got some nice examples on that. So you could just use a thread and then let that thread's main

19:05 bit of work be the async thing, but you don't have to deal with it. And the most interesting, I think,

19:10 is mixing async and synchronous calls. And what he does is he actually detects by looking at the

19:18 traceback, I think, detects whether the caller is calling it as an async function or as a regular

19:26 function and implements an async behavior or a synchronous behavior the same. So you could write

19:33 a single library. And if somebody in Python 3.6 wants to use it in a fancy async way, it becomes

19:38 magically async. But if somebody from 2.7 calls it or something like that, an older version, or they just

19:44 don't call it in this async way. It just magically is a synchronous call and doesn't use that whole

19:48 stuff.

19:48 Okay.

19:49 This is really an interesting way to make it possible to bring async into your package or your libraries

19:54 without having the consumer of your libraries have to care about the fact that it's async. But still

20:01 make it into something they could take advantage of.

20:03 Wow, that's great. I'm going to have to read this. This reminds me of the, I guess, the learning

20:08 hurdle that people go through in the C++, C and C++ world when you go from single-threaded

20:14 applications to multi-threaded applications. You have to look in all the corners.

20:17 Yeah. It's definitely a mind shift. Yeah. This is very much like that.

20:21 Okay.

20:21 But yeah, Christian did a great job on this. And I really like his solution at the end. And actually,

20:26 he has it done in if statements. I feel like you could create a decorator that would

20:30 basically wrap that up in just like a magic, like a syncable or a waitable decorator. It's

20:36 really, really close to having some sort of decorator magic, making this even better.

20:39 Yeah. Okay. Cool.

20:41 All right. Well, that's all our news for the week, except for that it's not.

20:43 Well, yeah.

20:45 We have an extra one. Really quick, I just want to let people know that the Pi Tennessee

20:49 conference in Nashville is coming up in almost a month from now. So if you are in the Nashville

20:55 area or willing to travel there, February 10th and 11th, they've got their schedule out,

21:01 the tickets are on sale and things like that. And they even made a special discount code for

21:06 Python bytes. If we said, are you going to tell us about it? Then definitely here's the,

21:10 here's the code. So if you want to go to a Pi Tennessee, you can use the discount code

21:15 Python bytes, no spaces, capital P, capital B, and you get 10% off.

21:20 Cool.

21:20 Yeah. Very cool.

21:22 You have some pretty interesting news. It's not directly Python related, but it's very much

21:26 affects all of us.

21:28 Yeah.

21:28 Right. Codes on server, especially in the cloud.

21:30 I thought I'd, I don't know what to do about this, but I saw it this morning. I thought we just,

21:34 it's important enough to not ignore it. So I thought I'd drop a link.

21:38 What do you think? Like unplug all of the internet, just go hide in a corner or something like that?

21:42 It's like one of those things like having the credit services get hacked. You just,

21:47 I guess, be aware of it and pay attention. It's very much like the Experian. What was that credit

21:52 service?

21:52 Equifax maybe?

21:53 Equifax. I'm not going to say it because I don't want to say the wrong one, but the E credit agency,

21:59 I totally, for some reason forgetting, I think you're right. But yeah, like basically you're told

22:04 your world is crashing down. We're sorry. And this is kind of like that. Let me read from what you

22:10 quote a couple of articles. Let me read what they said in the New York times here. It said that

22:15 basically there's two problems called Meltdown and Spectre could allow hackers to steal the entire

22:20 memory contents of computers, including mobile devices, personal computers, and servers running

22:25 in so-called cloud computer networks. There's no easy fix for Spectre, which could require a redesign

22:30 of the processors, according to researchers. As for Meltdown, the software patch needed to fix the

22:36 issue could slow down computers by as much as 30%. So, you know, your AWS, DigitalOcean, whatever,

22:43 server may just get 30% slower now. Wonderful.

22:46 Yeah. So, most of the places, I think Google, Amazon, and Microsoft have all said that the servers

22:54 are pretty much changed to deal with Meltdown, but Spectre's still a problem.

23:00 I don't think there's a ton of concrete details here, at least not that I ran across. It's sort of vague.

23:07 Apparently, not all the details about the exploit are out, but I'd recommend people check out Risky.biz,

23:15 which is my favorite developer security podcast. It's super, super good. And those guys are going to

23:21 definitely have an insightful conversation on this next time they're on deck.

23:26 In case we were too vague about it, it was a design flaw found in all microprocessors that allow

23:33 attackers to read the entire memory of a computer. Yeah. Bummer.

23:37 I hope you don't do anything on the internet. Carry on now. Okay. So, yeah. So, the last thing,

23:45 this is a more positive thing. I think of it, at least. I just announced all my courses,

23:50 not all of them, actually, only a few of them for 2018, but I announced this new deal that I'm

23:56 having for all the Talk Python courses called the Everything Bundle. So, talkpython.fm slash

24:01 everything. And it gets you, what'll be probably 120 hours of Python course awesomeness, including

24:08 some new ones, Mastering PyCharm, Python 3, an Illustrated Tour, Introduction to Ansible,

24:13 and tons more coming. So, I was just finishing some of the videos for the PyCharm course right before

24:19 we chatted. So, it's almost done. Cool. So, is that going to be out this month then or soon?

24:23 That is going to be out probably next week. Okay. Cool. Yeah. Definitely soon. Definitely soon.

24:28 It's so fun to create these courses and just, you know, keep exploring the different areas and helping

24:33 people get better with them. So, lots of fun. Yeah. And you do things like working with companies if

24:38 they want to, like, get access to these for, like, everybody that works there or a handful of people.

24:43 I definitely have special programs for, like, site licenses, things like that. I've even talked to

24:48 some universities about having the courses for, like, all of their students or something like that.

24:53 That would be wild. Still talking.

24:56 You'll have to increase the price for them, I guess. Maybe.

24:59 I guess. But they're students, you know.

25:01 Cool.

25:03 All right. Cool. Well, Brian, thanks for sharing all your news.

25:05 Yeah. Thank you.

25:07 Nice to be back together after the whole holiday time off.

25:11 Yes.

25:11 All right. Catch you later.

25:12 Thank you for listening to Python Bytes.

25:15 Follow the show on Twitter via at Python Bytes.

25:18 That's Python Bytes as in B-Y-T-E-S.

25:20 And get the full show notes at pythonbytes.fm.

25:24 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

25:28 We're always on the lookout for sharing something cool.

25:31 On behalf of myself and Brian Okken, this is Michael Kennedy.

25:34 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page