Transcript #59: Instagram disregards Python's GC (again)
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:04 your earbuds. This is episode 59, recorded January 4th, 2018. I'm Michael Kennedy.
00:11 And I'm Brian Okken.
00:12 And we got a bunch of awesome stuff lined up for you in this very first episode of 2018.
00:17 So let's say thank you and happy new year to DigitalOcean.
00:22 Yeah, thanks. And definitely happy new year. It's exciting to be back.
00:25 It's very exciting to be back. And we, you know, the Python news doesn't stop coming. I think if
00:30 anything, it's just picking up speed. I'm afraid we might scare people a little bit with some of your
00:35 picks this time, Brian.
00:36 What?
00:37 The stuff near the end. The stuff near the end. So yeah.
00:42 Okay.
00:42 Another thing that's kind of scary is turning off garbage collection. Seems like that might be bad,
00:47 right?
00:47 Right. Well, I was actually surprised and very interested when I was listening to
00:50 the Instagram talk at PyCon about turning off garbage collection. And there's an article that
00:57 they put out again. They said that they've, they had turned it off last year and then they wanted to
01:03 sort of, they were having memory problems. So they wanted to try to turn it back on a little bit,
01:08 but they still have concerns.
01:10 Yeah. So maybe we should take a moment, just a step back and say, you described the original thing.
01:15 So why did they start down this path of turning off garbage collection in the first place? What
01:19 they found was they were running many instances of their, the largest Django deployment on Python in
01:25 the world. So they're running lots of servers with us. And they found that the shared memory across
01:31 multiple processes running that on a single server was completely falling apart because garbage
01:37 collection was shifting stuff around. They said, well, could we turn it off? And it turned out that
01:41 they could, but they then this article you're referring to says they basically were losing
01:45 those gains again.
01:46 And we'd talked about this, I guess, a couple of times of if you turn it off, then you can
01:51 eventually will run out. But if you're restarting tasks every once in a while, that completely
01:57 cleans it up.
01:58 Yeah, exactly.
01:58 They were losing some of those gains, but they wanted, so they wanted to get some of those back.
02:02 This is a really interesting, and I had to read it, read this article about three times,
02:06 but it's called copy on write friendly Python garbage collection. And it's a pretty interesting
02:13 story, but the end punchline is that they've got a new addition to Python that's going to go into
02:20 Python 3.7, or it's already in there that is called GC freeze, which what happens is they get
02:26 their main stuff running with all the shared objects. But before they like fork off a bunch of threads,
02:33 they call this GC freeze and all the stuff that's in memory right now at this point doesn't get
02:39 garbage collected, but everything from now, from like this point in time on will be garbage collected,
02:45 which is pretty interesting.
02:46 Yeah, that's really, it's really interesting. So Python memory management is a little,
02:52 I think it's a little obscure. People don't talk about it very much. And I don't think there's a lot
02:57 of good write ups. You actually found a really fantastic write up on the intricate details of Python
03:03 memory management. The short version is most things are cleaned up through reference counting. So number
03:09 of things pointing at it, when that goes to zero, it goes away. But the problem with reference counting
03:13 is cycles, I have one object appointed another that object points back at the first, they both have a
03:19 count of one or higher forever, and they get leaked. And so there's this secondary garbage collection
03:24 phase that goes through and looks at these items, cleans them up, and so on. So this GC freeze says,
03:32 let's take all the stuff that exists now, and just tell the garbage collector to ignore it,
03:36 don't touch it, don't mess with it, leave it alone, right? And so you get like, basically your app into
03:41 its like normal working state, and then freeze it one time. And then all the new stuff that would make
03:46 the memory grow and grow and grow over time is going to be continually GCed. But the core essence of
03:52 your app, Python runtime, and a bunch of things to get started should be kind of fixed, right?
03:56 Yeah. And I think that's a pretty cool idea, because that's a common model for applications to
04:02 get connections up and get your normal like sitting state, idle state running. And then before you get
04:10 requests in and, and spawning stuff, just at that point, you're like, well, this is all the shared
04:15 stuff. Let's just, we don't need to move this stuff around. It's always going to be there.
04:20 Anyway, it's a cool idea. And apparently it saved them. They were at linear, linear memory growth,
04:26 and they slowed that down quite a bit.
04:28 Yeah, it looks really, really interesting. Instagram is doing amazing stuff, I think,
04:32 in the Python space and the web space. And if any of those guys are out there listening,
04:37 they want to come talk about Python and Instagram on Talk Python. I'm more than welcome to any other
04:42 more than welcome to come over. It'd be fun.
04:44 And I definitely appreciate that they're very open about this to say, hey, this is what we're trying.
04:49 It's not like perfect yet, but it's better.
04:51 Yeah, it's super cool. Do you know if GC freeze is approved or just proposed for 3.7?
04:57 So we have a link to the, the pull request that looks like it's already in.
05:02 Oh, it is merged. Yes, it is merged. So this is pretty awesome, right? We have
05:06 CPython on GitHub with a pull request merged in with its comment history. That was that's new,
05:12 right? That's the 2017 bit of magic that it's on GitHub.
05:16 Yeah, cool. So nice that we can actually track that. So the next thing that I want to talk about
05:21 is a little bit different. I think this will be mostly of interest for data science folks. This is
05:27 a little bit lower level maybe than it sounds, but this thing's called SpeechPy. So SpeechPy,
05:33 it's a library for speech processing and recognition. So this is a pretty interesting Python project.
05:40 You can come along and basically give it some, you know, spoken words and it can pull out
05:46 various effects and things that are sort of the essence of what you need to do speech recognition.
05:54 I think this works a little, you don't just feed it like here's a say a wave file and out pops text of
06:00 what it said, but it gives you what you would need to feed to a machine learning system.
06:04 Basically takes the spoken words into a representation. You can feed to some kind of algorithm to actually
06:10 get the text. So I think that was pretty cool. And one of the things that I wanted to bring this up
06:16 for is they have a really nice citation statement. So if you look at the GitHub repo,
06:21 like kind of near the top, it says, if you're going to use this package, please cite it as follows.
06:28 And that's interesting because there's been some talk in the scientific space, more true science,
06:35 not data science around people want to publish their software and they want to work on advancing
06:40 software. But in the academic space, you have to publish articles or, you know, the whole publish or
06:45 perish type of thing. And the way you get credit for your work is to be cited.
06:51 in other articles. And so this is sort of showing a way to cite this work, which is not a paper,
06:59 but which is an open source project in the same sense that the person, the people who created it
07:04 might get the same level of academic credit for their thing being cited. So I think that's pretty
07:10 cool. Yeah. I don't get the syntax, but it must mean something. I have no idea what it is.
07:15 Okay. I thought it's kind of neat. If you're doing machine learning, you need to turn
07:20 waveforms into something you can process. This is pretty cool. And the other thing that's kind of
07:25 nice is if you look at it here, and I think it's in the documentation or the tutorial, they actually
07:31 show you how to process wave files from SciPy, which is also maybe cool and handy at some point.
07:36 Yeah. It's actually something I need to be doing some wave file processing.
07:40 Well, SciPy apparently has it.
07:42 Nice. How about the next one?
07:46 Well, next up, we've got our friends at PyBytes. Is that what they're called? PyBytes.
07:52 Yeah. PyBytes. That's right.
07:53 They've got a new platform and I suddenly forgot the URL, but there it is. Code challenges,
08:01 but the ES is after the dot. So code challenge.es. No, clever though. But we've covered other things
08:11 before. I should have looked this up. There's a game one that you're going through a game and doing
08:17 code challenges and there's code katas around. This is a similar sort of thing. So you are able to
08:23 do these little code challenges and they say, it's called bytes of Python, bytes of pie and are,
08:30 they're self-contained 20 to 60 minute code challenges. And you can write them and verify
08:36 them in the browser. And I had, I did two of them this morning and I had kind of a lot of fun with it.
08:41 It was fun.
08:42 Nice. And you verify them by writing pytest unit test, right?
08:45 You don't write it. It has pre-written pytest code that checks your answers.
08:51 I see. So you've got to do some sort of thing and then you check it in and it runs basically the
08:55 test against your code and says thumbs up, thumbs down.
08:57 Yeah. Like for instance, on the second challenge, you have to write three different functions to
09:01 manipulate a list of names. And it has tests for all of these. I went ahead and just solved one at a
09:09 time, for instance. So I tried to solve the first one and then ran the tests and noticed that
09:13 the first one passed and, and then just did that. And looking at the, with the help of the test output
09:19 was helped me solve the rest of them.
09:22 That's really cool. And I also learned something by the transitor property through you.
09:25 You did?
09:26 I did. I learned what you learned in that min takes a key like sort and sorted does. That way you could
09:33 sort some complex object based on like a attribute of it.
09:36 I didn't know that. I had just discovered that this morning. So my solution for one of the
09:42 challenges is to try to find the, find the name with the shortest first name. And I went ahead and
09:48 sorted the list by the length of the first name and then just pick the first element. Their solution
09:54 uses min instead of sorting the list. You can just find the min length, which is pretty cool.
10:02 Yeah. That's really awesome. That's, that's gotta be quicker than a full on sort.
10:05 One of the things I like about these sorts of quick challenges is you can probably do them
10:09 like on your lunch break or a couple of lunch breaks to do one of them. And they just take a
10:15 browser. So you could just do it on your laptop. It's pretty fun.
10:17 Yep. That's cool. You could maybe even do it on an iPad or something if you really wanted.
10:21 Yeah. Well, I don't know. I haven't tried that probably.
10:23 If it runs in the browser, I bet it would. Nice. So yeah, that's, that's really cool.
10:27 I do like that you learn these little things like, wait, min takes a key. I didn't know that.
10:32 You know, that's just, you wouldn't think you'd pick up these little things so quickly, but
10:35 you know, these little challenges are nice like that. So before we get to the next item,
10:39 I want to say thank you to DigitalOcean. They're sponsoring this episode and many,
10:43 many other episodes. They're really a big supporter of Python bytes. So as many of you know,
10:48 many of our bits of code or stuff on the web and our files or MP3 files that get sent down to you all go
10:56 through DigitalOcean. So Python bytes is basically delivered in all of its forms to you through DigitalOcean,
11:02 have a bunch of servers there. They're super easy to work with very quick, very reliable. You can create a
11:07 new server, a new droplet, they call it in probably 30 seconds. And then you SSH in and you're off to the
11:13 races. So really, really nice and affordable and check them out at do.co slash Python and let them
11:19 know that you heard about it on Python bytes. So this end of the year thing, Brian, this is kind of
11:25 when, I mean, we're sort of on the other side of it, but this is when you get together with your family,
11:29 right? People, maybe you didn't even know like, wait, I have a second cousin from wire. Python's like
11:36 that, right? Yeah. Yeah. You were talking about like, what is the place where you can like do sort
11:41 of gamified code challenges and that's check IO. So the reason that's relevant, I'm coming back to it,
11:46 is there's an article by the guys at check IO called how big is the Python family? So this is really nice.
11:52 And you know, some of you I'm sure are aware of it, but many people I don't really think are aware of
11:57 how varied Python is as it sort of as a platform. So when you say Python, typically you mean
12:04 CPython, hopefully you mean modern Python three, six, not legacy to seven Python, but we'll, we'll
12:12 let that slide for now. There's also things like Jython and Jython will let you write Python code,
12:19 but executed on the JVM and interact with Java objects. Iron Python is the same thing for.net.
12:26 There's also Python for.net, which I think is a more up to date, modern variant on the same thing.
12:33 There's Cython, which is compiled slightly different Python. There's PyPy, which is a JIT version.
12:38 MicroPython, which is Python as an, your app is an operating system and Python on microchips basically.
12:44 And on Talk Python, you and I talked about Grumpy, right?
12:49 Yeah. Which is on Go.
12:51 Yeah. So Grumpy is from the YouTube guys, which is instead of using C to implement CPython,
12:57 they said, well, what if we wrote the same thing, but in Go? And that's kind of an interesting thing.
13:01 So I thought this is just a nice grouping of all of these ideas, a quick paragraph or two on each of
13:07 them. You know, if you bring in people onto your team and you're like, well, wait a minute,
13:11 there's actually a lot of types of Python here, check this out. Right. And also maybe a reminder to like,
13:16 give PyPy a try. Like they just had a big release for both Python 2 and Python 3 versions.
13:21 One of the things I like about this writeup that they did is it reminds you why some of these are
13:26 around. Like if you had to work with .NET, then working with like Iron Python or Python.NET might be
13:34 like a better thing than just trying to do it other ways.
13:38 Yeah. And one of the advantages there might be, you know, if you're working on a .NET app, but you want to add scripting.
13:43 Yeah.
13:44 Like what are your choices? You probably don't want to give them C#. And even if you did,
13:47 like it requires full on compilation and like, you know, how do you deal with that? Right.
13:51 So this could be a really nice way to plug in like scriptability into your enterprise app,
13:56 which would be pretty cool.
13:57 And one more thing I wanted to throw in on this conversation is a lot of times I'll say Python
14:02 runtime. And I know often people say Python interpreter. This is what the Python interpreter
14:08 does. It does this and that. Well, if you look at how the whole Python family, only some of them
14:13 are interpreters. Some of them are compiled execution engines, right? Like the JVM. That's actually not a
14:21 great example, but say PyPy, for example, or Cython, those two definitely are not interpreted. And in
14:29 the traditional sense, PyPy starts out that way, but it converts to a JIT version for the hotspots.
14:34 I often say Python runtime because I kind of feel like, you know, when you say interpreter, you really
14:40 just got the mindset of CPython, which is the most popular, but not always. What do you say?
14:45 Say interpreter?
14:46 I don't usually say either. I just say Python.
14:49 Yeah, there you go.
14:51 Cool. So anyway, I think this is a nice write up and good to have it all in one place.
14:55 So I like the one that you have coming up next. One of the problems I often see is I want to do
15:01 some work, but I don't care if it happens right now. I just want to like start it and let it go
15:05 somewhere. I don't usually have a great answer for that.
15:07 Task processing stuff. And one of the common things is often people bring up is celery.
15:12 And to be honest, I've tried to get into celery a couple of times, but kind of the learning curve on it,
15:18 maybe it's just me, but I had, I had a little bit of trouble getting into it.
15:22 I was interested when I heard an interview on podcast.init about a library called dramatic or
15:28 dramatic. I'm not sure. It's D R A M A T I Q. Yeah.
15:33 But it's a very, I'm sure since it's task scheduling, it's a quite complicated internals.
15:40 I'm sure you just like declare an actor for on some code and it's pretty easy to get started.
15:46 I thought I'd point people to it.
15:48 Yeah, it's quite cool. You basically put a decorator onto a method and then that method,
15:53 instead of running locally, you can like send work to it. And that send work actually kicks it off on
15:59 the example they had was rabbit MQ, I think. And that there's like a producer of the work.
16:04 And then there's another process that just hangs out and consumes anything that lands on the queue.
16:08 It's pretty cool.
16:09 Yeah. So that you can configure like what your defaults to rabbit MQ, I think. And there's
16:15 just good defaults that work off right off the, just if you don't care. And then there's a,
16:21 you can configure it to use other things if you need to. It apparently is, well, the, the person
16:27 and during, I forget his name that developed this it's used on quite significant projects. I mean,
16:34 it isn't a toy project, but it's pretty easy to get started and you can configure it to be all sorts
16:40 of fancy stuff if you need it to be. But one of the things I liked about the conversation is he,
16:45 he brought up that he intentionally kept the documentation and the fairly terse and small so
16:53 that when you're looking for something that you think you saw before, it's pretty easy to find
16:57 again. So that's cool.
16:58 Okay. Yeah. That's an interesting point. Yeah. And it looks like you can run it on top of rabbit MQ or
17:04 Redis. Take your pick. One final thing I want to point out that I thought was interesting is it's licensed
17:09 under a GPL, but it also has commercial licenses available upon request, which, you know, people are
17:17 always looking for ways to fund basically fund their open source work. And I thought that was an
17:22 interesting variation that I saw going through it.
17:24 Really? Okay. So I didn't pay attention to that. So I'm not sure what the a GPL is.
17:29 Yeah. I'd actually don't know either, but apparently you might want a commercial license if instead.
17:33 Okay. So the last one I want to talk about is a little bit similar to what you're talking about
17:39 running async work, but it's sort of the challenge of taking advantage of async things, but not making
17:49 that a problem for people trying to consume it who don't want to think of things that way. So this
17:54 article is called controlling Python async creep from friend of the show, Kristen Medina. And he says,
18:02 basically, if you've got some library that is written in an async way, you're supposed to await
18:07 it. But anybody who's going to call that and take advantage of that, that caller has to also be async.
18:14 And then the caller that has to be async. So maybe way, way down somewhere, you're trying to do something
18:18 async and it creates this sort of chain reaction of, well, the callers of this have to be async. Well,
18:23 the caller of those things have to be async and so on. It becomes, it can become quite a problem.
18:28 So he wrote this nice article, basically going through three examples of where you can sort of
18:34 put a stopgap and say, okay, like at this level, we're no longer worried about async, but we're still
18:39 taking advantages of it internally. So one way you can do that is you can wait for blocks of async code.
18:45 So if you got to contact, you know, a database, two web services, read something from the file system,
18:51 you want to do that sort of asynchronously, you could create those pieces of work, but then wait on them
18:56 as a group. And there's some built-in ways in async.io how to do that, which is really cool.
18:59 It's got some nice examples on that. So you could just use a thread and then let that thread's main
19:05 bit of work be the async thing, but you don't have to deal with it. And the most interesting, I think,
19:10 is mixing async and synchronous calls. And what he does is he actually detects by looking at the
19:18 traceback, I think, detects whether the caller is calling it as an async function or as a regular
19:26 function and implements an async behavior or a synchronous behavior the same. So you could write
19:33 a single library. And if somebody in Python 3.6 wants to use it in a fancy async way, it becomes
19:38 magically async. But if somebody from 2.7 calls it or something like that, an older version, or they just
19:44 don't call it in this async way. It just magically is a synchronous call and doesn't use that whole
19:48 stuff.
19:48 Okay.
19:49 This is really an interesting way to make it possible to bring async into your package or your libraries
19:54 without having the consumer of your libraries have to care about the fact that it's async. But still
20:01 make it into something they could take advantage of.
20:03 Wow, that's great. I'm going to have to read this. This reminds me of the, I guess, the learning
20:08 hurdle that people go through in the C++, C and C++ world when you go from single-threaded
20:14 applications to multi-threaded applications. You have to look in all the corners.
20:17 Yeah. It's definitely a mind shift. Yeah. This is very much like that.
20:21 Okay.
20:21 But yeah, Christian did a great job on this. And I really like his solution at the end. And actually,
20:26 he has it done in if statements. I feel like you could create a decorator that would
20:30 basically wrap that up in just like a magic, like a syncable or a waitable decorator. It's
20:36 really, really close to having some sort of decorator magic, making this even better.
20:39 Yeah. Okay. Cool.
20:41 All right. Well, that's all our news for the week, except for that it's not.
20:43 Well, yeah.
20:45 We have an extra one. Really quick, I just want to let people know that the Pi Tennessee
20:49 conference in Nashville is coming up in almost a month from now. So if you are in the Nashville
20:55 area or willing to travel there, February 10th and 11th, they've got their schedule out,
21:01 the tickets are on sale and things like that. And they even made a special discount code for
21:06 Python bytes. If we said, are you going to tell us about it? Then definitely here's the,
21:10 here's the code. So if you want to go to a Pi Tennessee, you can use the discount code
21:15 Python bytes, no spaces, capital P, capital B, and you get 10% off.
21:20 Cool.
21:20 Yeah. Very cool.
21:22 You have some pretty interesting news. It's not directly Python related, but it's very much
21:26 affects all of us.
21:28 Yeah.
21:28 Right. Codes on server, especially in the cloud.
21:30 I thought I'd, I don't know what to do about this, but I saw it this morning. I thought we just,
21:34 it's important enough to not ignore it. So I thought I'd drop a link.
21:38 What do you think? Like unplug all of the internet, just go hide in a corner or something like that?
21:42 It's like one of those things like having the credit services get hacked. You just,
21:47 I guess, be aware of it and pay attention. It's very much like the Experian. What was that credit
21:52 service?
21:52 Equifax maybe?
21:53 Equifax. I'm not going to say it because I don't want to say the wrong one, but the E credit agency,
21:59 I totally, for some reason forgetting, I think you're right. But yeah, like basically you're told
22:04 your world is crashing down. We're sorry. And this is kind of like that. Let me read from what you
22:10 quote a couple of articles. Let me read what they said in the New York times here. It said that
22:15 basically there's two problems called Meltdown and Spectre could allow hackers to steal the entire
22:20 memory contents of computers, including mobile devices, personal computers, and servers running
22:25 in so-called cloud computer networks. There's no easy fix for Spectre, which could require a redesign
22:30 of the processors, according to researchers. As for Meltdown, the software patch needed to fix the
22:36 issue could slow down computers by as much as 30%. So, you know, your AWS, DigitalOcean, whatever,
22:43 server may just get 30% slower now. Wonderful.
22:46 Yeah. So, most of the places, I think Google, Amazon, and Microsoft have all said that the servers
22:54 are pretty much changed to deal with Meltdown, but Spectre's still a problem.
23:00 I don't think there's a ton of concrete details here, at least not that I ran across. It's sort of vague.
23:07 Apparently, not all the details about the exploit are out, but I'd recommend people check out Risky.biz,
23:15 which is my favorite developer security podcast. It's super, super good. And those guys are going to
23:21 definitely have an insightful conversation on this next time they're on deck.
23:26 In case we were too vague about it, it was a design flaw found in all microprocessors that allow
23:33 attackers to read the entire memory of a computer. Yeah. Bummer.
23:37 I hope you don't do anything on the internet. Carry on now. Okay. So, yeah. So, the last thing,
23:45 this is a more positive thing. I think of it, at least. I just announced all my courses,
23:50 not all of them, actually, only a few of them for 2018, but I announced this new deal that I'm
23:56 having for all the Talk Python courses called the Everything Bundle. So, talkpython.fm slash
24:01 everything. And it gets you, what'll be probably 120 hours of Python course awesomeness, including
24:08 some new ones, Mastering PyCharm, Python 3, an Illustrated Tour, Introduction to Ansible,
24:13 and tons more coming. So, I was just finishing some of the videos for the PyCharm course right before
24:19 we chatted. So, it's almost done. Cool. So, is that going to be out this month then or soon?
24:23 That is going to be out probably next week. Okay. Cool. Yeah. Definitely soon. Definitely soon.
24:28 It's so fun to create these courses and just, you know, keep exploring the different areas and helping
24:33 people get better with them. So, lots of fun. Yeah. And you do things like working with companies if
24:38 they want to, like, get access to these for, like, everybody that works there or a handful of people.
24:43 I definitely have special programs for, like, site licenses, things like that. I've even talked to
24:48 some universities about having the courses for, like, all of their students or something like that.
24:53 That would be wild. Still talking.
24:56 You'll have to increase the price for them, I guess. Maybe.
24:59 I guess. But they're students, you know.
25:01 Cool.
25:03 All right. Cool. Well, Brian, thanks for sharing all your news.
25:05 Yeah. Thank you.
25:07 Nice to be back together after the whole holiday time off.
25:11 Yes.
25:11 All right. Catch you later.
25:12 Thank you for listening to Python Bytes.
25:15 Follow the show on Twitter via at Python Bytes.
25:18 That's Python Bytes as in B-Y-T-E-S.
25:20 And get the full show notes at pythonbytes.fm.
25:24 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
25:28 We're always on the lookout for sharing something cool.
25:31 On behalf of myself and Brian Okken, this is Michael Kennedy.
25:34 Thank you for listening and sharing this podcast with your friends and colleagues.