Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #59: Instagram disregards Python's GC (again)

Return to episode page view on github
Recorded on Thursday, Jan 4, 2018.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 59, recorded January 4th, 2018.

00:10 I'm Michael Kennedy.

00:11 And I'm Brian Okken.

00:12 And we got a bunch of awesome stuff lined up for you in this very first episode of 2018.

00:18 So, let's say thank you and happy new year to DigitalOcean.

00:22 Yeah, thanks and definitely happy new year. It's exciting to be back.

00:25 It's very exciting to be back. And we, you know, the Python news doesn't stop coming. I think if anything, it's just picking up speed. I'm afraid we might scare people a little bit with some of your picks this this time, Brian. What? The stuff near the end, the stuff near the end. So yeah.

00:42 Another thing that's kind of scary is turning off garbage collection.

00:45 Seems like that might be bad, right? Right. Well, I was actually surprised and very interested when when I was listening to the Instagram talk at PyCon about turning off garbage collection.

00:55 And there's an article that they put out again, they said that they had turned it off last year and then they wanted to sort of, they were having memory problems, so they wanted to try to turn it back on a little bit, but they still have concerns.

01:10 - Yeah, so maybe we should take a moment, just a step back and say, you described the original thing, so why did they start down this path of turning off garbage collection in the first place?

01:19 What they found was they were running many instances of their, the largest Django deployment on Python in the world.

01:26 So they're running lots of servers with us.

01:27 And they found that the shared memory across multiple processes running that on a single server was completely falling apart because garbage collection was shifting stuff around.

01:39 They said, well, could we turn it off?

01:41 And it turned out that they could, but they then, this article you're referring to says they basically were losing those gains again.

01:46 - And we'd talked about this, I guess, a couple times of if you turn it off, then you can, eventually will run out, but if you're restarting tasks every once in a while, that completely cleans it up.

01:58 - Yeah, exactly.

01:58 - They were losing some of those gains, but they wanted, so they wanted to get some of those back.

02:03 This is a really interesting, and I had to read it, read this article about three times, but it's called Copy on Write Friendly Python Garbage Collection.

02:12 And it's a pretty interesting story, but the end punchline is that they've got a new addition to Python that's gonna go into Python 3.7, or it's already in there, that is called GC freeze, which what happens is they get their main stuff running with all the shared objects, but before they like fork off a bunch of threads, they call this GC freeze and all the stuff that's in memory right now at this point doesn't get garbage collected, but everything from now, from like this point in time on, will be garbage collected, which is pretty interesting.

02:47 Yeah, that's really, it's really interesting. So Python memory management is a little, I think it's a little obscure. People don't talk about it very much. And I don't think there's a lot of good write ups. You actually found a really fantastic write up on the intricate details of Python memory management. The short version is most things are cleaned up through reference counting. So number of things pointed at it, when that goes to zero, it goes away. But the problem with reference counting is cycles. I have one object that points at another that object points back at the first, they both have a count of one or higher forever, and they get leaked. And so there's this secondary garbage collection phase that goes through and looks at these items, cleans them up, and so on. So this GC freeze says, let's take all the stuff that exists now, and just tell the garbage collector to ignore it, don't touch it, don't mess with it, leave it alone, right. And so So you get like, basically your app into its normal working state and then freeze it one time.

03:44 And then all the new stuff that would make the memory grow and grow and grow over time is going to be continually GC'd.

03:50 But the core essence of your app, Python runtime and a bunch of things to get started, should be kind of fixed, right?

03:57 Yeah, and I think that's a pretty cool idea because that's a common model for applications to get connections up and get your normal like, sitting state, idle state, state running and then before you get requests in and and spawning stuff just at that point you're like well this is all the shared stuff let's just we don't need to move this stuff around it's always going to be there anyway it's a cool idea and apparently it saved them they were at linear linear memory growth and they slowed that down quite a bit yeah it looks really really interesting Instagram is doing amazing stuff I think in the Python space and the web space. And if any of those guys are out there listening, want to come talk about Python and Instagram on Talk Python, I'm more than welcome to any other more than welcome to come over. It'd be fun. And I definitely appreciate that they're very open about this to say, Hey, this is what we're trying. It's not like perfect yet. But it's better. Yeah, it's super cool.

04:53 Do you know if GC freeze is approved or just proposed for three, seven. So we have a link to the pull request that looks like it's already in.

05:02 >> Oh, it is merged. Yes, it is merged.

05:04 This is pretty awesome. We have CPython on GitHub with a pull request merged in with its comment history.

05:11 That's new. That's the 2017 bit of magic that it's on GitHub.

05:15 >> Yeah.

05:16 >> Cool. So nice that we can actually track that.

05:19 The next thing that I want to talk about is a little bit different.

05:22 I think this will be mostly of interest for data science folks.

05:27 This is a little bit lower level maybe than it sounds, but this thing's called SpeechPy.

05:31 So Speech P-Y.

05:33 It's a library for speech processing and recognition.

05:37 So this is a pretty interesting Python project.

05:41 You can come along and basically give it some, you know, spoken words, and it can pull out various effects and things that are sort of the essence of what you need to do speech recognition.

05:54 I think this works a little, you don't just feed it like, here's a say a wave file and out pops text of what it said, but it gives you what you would need to feed to a machine learning system.

06:05 Basically takes the spoken words into a representation you can feed to some kind of algorithm to actually get the text.

06:11 So I think that was pretty cool. And one of the things that I wanted to bring this up for is they have a really nice citation statement.

06:19 So if you look at the GitHub repo, like kind of near the top, it says, If you're going to use this package, please cite it as follows.

06:28 And that's interesting because there's been some talk in the scientific space, more true science, not data science, around people want to publish their software and they want to work on advancing software, but in the academic space, you have to publish articles or, you know, the whole publish or perish type of thing.

06:47 And the way you get credit for your work is to be cited in other articles.

06:52 And so this is sort of showing a way to cite this work, which is not a paper, but which is an open source project, in the same sense that the person, the people who created it might get the same level of academic credit for their thing being cited. So I think that's pretty cool.

07:10 Yeah, I don't get the syntax.

07:12 It must mean something. I have no idea what it is.

07:15 Okay.

07:16 I thought it's kind of neat if you're doing machine learning, you need to turn waveforms into something you can process. This is pretty cool.

07:24 And the other thing that's kind of nice is if you look at it here, and I think it's in the documentation or the tutorial, they actually show you how to process wave files from SciPy, which is also maybe cool and handy at some point.

07:36 Yeah, it's actually something I need to be doing some wave file processing.

07:41 Well, SciPy apparently has it.

07:43 Nice. How about the next one?

07:46 Well, next up we've got our friends at PyBytes.

07:50 Is that what they're called?

07:51 -PyBytes. -Yeah, PyBytes.

07:53 That's right.

07:53 They've got a new platform.

07:56 And I suddenly forgot the URL, but it's...

07:59 There it is.

08:00 CodeChallenges, but the ES is after the dot.

08:03 So, CodeChallenge.es.

08:06 [laughs]

08:07 No, clever though.

08:08 But it's...

08:09 We've covered other things before.

08:11 Like, there's a...

08:13 I should have looked this up.

08:14 There's a game one that's...

08:15 They're like going through a game and doing code challenges.

08:18 and there's code katas around.

08:21 This is a similar sort of thing.

08:22 So you are able to do these little code challenges and they say, it's called bytes of Python, bytes of pi, and are their self contained 20 to 60 minute code challenges.

08:34 And you can write them and verify them in the browser.

08:38 And I had I did two of them this morning and I had kind of a lot of fun with it.

08:41 It was fun.

08:42 Thanks.

08:43 And you verify them by writing pi test unit test, right?

08:45 You don't write it.

08:46 pre-written pytest code that checks your answers.

08:51 - I see, so you've got to do some sort of thing and then you check it in and it runs basically the test against your code and says thumbs up, thumbs down?

08:57 - Yeah, like for instance, on the second challenge, you have to write three different functions to manipulate a list of names.

09:04 And it has tests for all of these.

09:06 I went ahead and just solved one of it at a time, for instance, so I tried to solve the first one and then ran the test and noticed that the first one passed and then just did that and looking at the, with the help of the test output was helped me solve the rest of them.

09:22 - That's really cool.

09:23 And I also learned something by the transitive property through you.

09:25 - You did?

09:26 - I did.

09:27 I learned what you learned in that min takes a key like sort and sorted does.

09:33 That way you could sort some complex object based on like a attribute of it.

09:36 - I didn't know that.

09:38 I had just discovered that this morning.

09:40 So my solution for one of the challenges is to try to find the name with the shortest first name.

09:47 And I went ahead and sorted the list by the length of the first name, and then just picked the first element.

09:54 Their solution uses min instead of sorting the list.

09:59 You can just find the min length, which is pretty cool.

10:02 - Yeah, that's really awesome.

10:03 That's gotta be quicker than a full-on sort.

10:05 - One of the things I like about these sorts of quick challenges is you can probably do them like on your lunch break or a couple of lunch breaks to do one of them.

10:13 And they just take a browser, so you can just do it on your laptop.

10:17 It's pretty fun.

10:18 - Yep, that's cool.

10:18 You could maybe even do it on an iPad or something if you really wanted.

10:21 - Yeah, well, I don't know.

10:22 I haven't tried that, probably.

10:23 - If it runs in the browser, I bet it would.

10:26 Nice, so yeah, that's really cool.

10:28 Do you like that you learn these little things, like wait, men takes a key?

10:31 I didn't know that.

10:32 You know, that's just, you wouldn't think you'd pick up these little things so quickly, but you know, these little challenges are nice like that.

10:38 So before we get to the next item, I want to say thank you to DigitalOcean.

10:41 They're sponsoring this episode and many, many other episodes.

10:44 They're really a big supporter of Python Bytes.

10:46 So as many of you know, many of our bits of code, our stuff on the web and our files or MP3 files that get sent down to you all go through DigitalOcean.

10:57 So Python Bytes is basically delivered in all of its forms to you through DigitalOcean. Have a bunch of servers there.

11:04 They're super easy to work with, very quick, very reliable.

11:07 You can create a new server, a new droplet, they call it, in probably 30 seconds.

11:10 And then you SSH in, and you're off to the races.

11:13 So really, really nice and affordable.

11:15 And check them out at do.co/python, and let them know that you heard about it on Python Bytes.

11:22 So this end of the year thing, Brian, this is kind of when, I mean, we're sort of on the other side of it, but this is when you get together with your family, right?

11:30 People maybe you didn't even know, like, "Wait, I have a second cousin?" From where?

11:35 Python's like that, right?

11:36 - Yeah. - Yeah.

11:37 You were talking about like what is the place where you can like do sort of gamified code challenges, and that's Check.io.

11:44 So the reason that's relevant, I'm coming back to it, is there's an article by the guys at Check.io called "How Big is the Python Family?" So this is really nice.

11:52 And you know, some of you I'm sure are aware of it, but many people I don't really think are aware of how varied Python is as a platform.

12:02 So when you say Python, typically you mean CPython.

12:05 And hopefully you mean modern Python 3.6, not legacy 2.7 Python.

12:11 But we'll let that slide for now.

12:13 There's also things like Jython.

12:16 And Jython will let you write Python code, but execute it on the JVM and interact with Java objects.

12:23 Iron Python is the same thing for .NET.

12:27 There's also Python 4.NET, which I think is a more up-to-date modern variant on the same thing.

12:33 which is compiled slightly different Python.

12:36 There's PyPy, which is a JIT version.

12:38 MicroPython, which is Python as an--

12:40 your app is an operating system in Python on microchips, basically.

12:44 And on Talk Python, you and I talked about Grumpy, right?

12:48 Yeah, which is on Go.

12:50 Yeah, so Grumpy is from the YouTube guys, which is instead of using C to implement CPython, they said, "Well, what if we wrote the same thing but in Go?" And that's kind of an interesting thing.

13:01 So I thought this is just a nice grouping of all of these ideas, a quick paragraph or two on each of them.

13:08 You know, if you're bringing people onto your team and you're like, "Well, wait a minute, there's actually a lot of types of Python here, check this out," right?

13:14 And also maybe a reminder to like, give PyPy a try.

13:17 Like they just had a big release for both Python 2 and Python 3 versions.

13:21 One of the things I like about this write-up that they did is it reminds you why some of these are around.

13:27 Like, if you had to work with .NET, then working with like Iron Python or Python.net might be like a better thing than just trying to do it other ways.

13:38 >> Yeah. One of the advantages there might be, if you're working on a.net app, but you want to add scripting.

13:43 >> Yeah.

13:44 >> What are your choices? You probably don't want to give them C#.

13:46 Even if you did, like it requires full on compilation and how do you deal with that?

13:52 This could be a really nice way to plug in scriptability into your enterprise app, which would be pretty cool.

13:57 And one more thing I wanted to throw in on this conversation is, a lot of times I'll say Python runtime, and I know often people say Python interpreter.

14:06 This is what the Python interpreter does. It does this and that.

14:09 Well, if you look at how the whole Python family, only some of them are interpreters.

14:15 Some of them are compiled execution engines, right? Like the JVM.

14:20 That's actually not a great example, but say PyPy, for example, or Cython.

14:24 Those two definitely are not interpreted in the traditional sense.

14:30 PyPy starts out that way, but it converts to a JIT version for the hotspots.

14:35 I often say Python runtime because I kind of feel like, you know, when you say interpreter, you really just got the mindset of CPython, which is the most popular, but not always.

14:44 What do you say?

14:45 I don't usually say either.

14:48 I just say Python.

14:50 Yeah, there you go.

14:51 Cool, so anyway, I think this is a nice write up and good to have it all in one place.

14:55 So, I like the one that you have coming up next.

14:58 One of the problems I often see is I want to do some work, but I don't care if it happens right now.

15:03 I just want to like start it and let it go somewhere.

15:05 I don't usually have a great answer for that.

15:07 Task processing stuff and one of the common things is often people bring up is celery.

15:12 And to be honest, I've tried to get into celery a couple times, but kind of the learning curve on it, maybe it's just me, me, but I had a little bit of trouble getting into it.

15:22 I was interested when I heard an interview on podcast.init about a library called dramatic or dramatic.

15:29 I'm not sure it's D-R-A-M-A-T-I-Q.

15:32 But it's a very, I'm sure since it's task scheduling, it's a quite complicated internals, I'm sure you just like declare an actor for on some code and it's pretty easy to get started.

15:46 And I thought I'd point people to it.

15:48 Yeah, it's quite cool. You basically put a decorator onto a method, and then that method, instead of running locally, you can like send work to it.

15:56 And that send work actually kicks it off on, the example they had was RabbitMQ, I think.

16:02 And that there's like a producer of the work. And then there's another process that just hangs out and consumes anything that lands on the queue. It's pretty cool.

16:09 Yeah, so that you can configure like what your defaults to RabbitMQ, I think. And there's just good defaults that work right off the, just if you don't care.

16:20 And then there's a, you can configure it to use other things if you need to.

16:24 It apparently is well, the person and during, I forget his name, that developed this, it's used on quite significant projects.

16:34 I mean, it, it isn't a toy project, but it's pretty easy to get started and you can configure it to be all sorts of fancy stuff if you need it to be.

16:42 But one of the things I liked about the conversation is he, he brought up that he intentionally kept the documentation and the fairly terse and small so that when you're looking for something that you think you saw before, it's pretty easy to find again.

16:57 So that's cool.

16:58 Okay.

16:59 Yeah.

16:59 That's an interesting point.

17:00 Yeah.

17:01 And it looks like you can run it on top of rapid MQ or Redis.

17:04 Take your pick.

17:05 One final thing I want to point out that I thought was interesting is it's licensed under a GPL, but it also has commercial licenses available upon request, which, you know, people are always looking for ways to fund, basically fund their open source work.

17:21 And I thought that was an interesting variation that I saw going through it.

17:24 Really? Okay, so I didn't pay attention to that.

17:26 So I'm not sure what the a GPL is.

17:29 Yeah, I actually don't know either.

17:30 But apparently you might want a commercial license instead.

17:34 Okay, so the last one I want to talk about is a little bit similar to what you're talking about running async work.

17:41 But it's sort of the the challenge of taking advantage of async things but not making that a problem for people trying to consume it who don't want to think of things that way. So this article is called Controlling Python Async Creep from friend of the show, Kristen Medina.

17:59 And he says basically if you've got some library that is written in an async way you're supposed to await it.

18:08 But anybody who's going to call that and take advantage of that, that caller has to also be async.

18:14 And then the caller of that has to be async.

18:16 So maybe way, way down somewhere, you're trying to do something async, and it creates this sort of chain reaction of, well, the callers of this have to be async.

18:23 Well, the caller of those things have to be async, and so on.

18:26 It can become quite a problem.

18:28 So he wrote this nice article basically going through three examples of where you can sort of put a stopgap and say, okay, at this level, we're no longer worried about async, but we're still taking advantages of it internally.

18:41 So one way you can do that is you can wait for blocks of async code.

18:45 So if you got a contact, you know, a database, two web services, read something from the file system, you want to do that sort of asynchronously, you could create those pieces of work, but then wait on them as a group.

18:56 And there's some built in ways in async.io how to do that, which is really cool.

19:00 It's got some nice examples on that.

19:02 So you could just use a thread and then let that thread's main bit of work be the async thing, but you don't have to deal with it.

19:09 And the most interesting, I think, is mixing async and synchronous calls.

19:13 And what he does is he actually detects by looking at the traceback, I think, detects whether the caller is calling it as an async function or as a regular function and implements an async behavior or a synchronous behavior the same.

19:32 So you could write a single library and if somebody in Python 3.6 wants to use it in a fancy async way, it becomes magically async.

19:40 But if somebody from 2.7 calls it or something like that, an older version, or they just don't call it in this async way, it just magically is a synchronous call, it doesn't use that whole stuff.

19:48 Okay.

19:49 This is really an interesting way to make it possible to bring async into your package or your libraries without having the consumer of your libraries have to care about the fact that it's async.

20:00 But still make it something they can take advantage of.

20:03 That's great. I'm going to have to read this. This reminds me of the, I guess, the learning hurdle that people go through in the C++, C and C++ world when you go from single threaded applications to multi-threaded applications. You have to look in all the corners.

20:17 Yeah, it's definitely a mind shift. Yeah, this is very much like that.

20:21 But yeah, Christian did a great job on this, and I really like his solution at the end.

20:25 And actually, he has it done in if statements.

20:27 I feel like you could create a decorator that would basically wrap that up and just like a magic, like, a syncable or a waitable decorator.

20:35 It's really, really close to having some sort of decorator magic making this even better.

20:39 Yeah, okay.

20:40 All right, well, that's all our news for the week, except for that it's not.

20:43 Well, yeah.

20:45 We have an extra one.

20:46 Really quick, I just want to let people know that the Pi Tennessee conference in Nashville is coming up in almost a month from now. So if you're in the Nashville area or willing to travel there, February 10th and 11th, they've got their schedule out, the tickets are on sale and things like that.

21:03 And they even made a special discount code for Python bytes.

21:06 If we, you know, said, "Are you going to tell us about it?" Then definitely, here's the code.

21:11 So if you want to go to Python C, you can use the discount code PythonBytes, no spaces, capital P, capital B, and you get 10% off.

21:20 - Cool. - Yeah, very cool.

21:22 You have some pretty interesting news.

21:23 It's not directly Python related, but it very much affects all of us.

21:28 - Yeah. - Right, codes on server, especially in the cloud.

21:30 I thought I'd, I don't know what to do about this, but I saw it this morning, I thought we just, it's important enough to not ignore it.

21:36 So I thought I'd drop a link.

21:38 What do you think? Like unplug all the internet, just go hide in a corner, something like that.

21:42 - It's like one of those things like having the credit services get hacked.

21:46 You just, I guess, be aware of it and pay attention.

21:49 - It's very much like the Experian, what was that credit service?

21:52 - Equifax, maybe?

21:53 - Equifax.

21:54 I'm not gonna say it, 'cause I don't wanna say the wrong one, but the eCredit Agency, I totally for some reason forget it.

22:00 I think you're right.

22:02 But yeah, like basically you're told, your world is crashing down.

22:06 We're sorry.

22:07 Moving on now.

22:08 And this is kind of like that.

22:09 Let me read from what you quote a couple articles.

22:11 Let me read what they said in the New York Times here.

22:15 It said, "Basically, there's two problems called meltdown and spectra.

22:18 Could allow hackers to steal the entire memory contents of computers, including mobile devices, personal computers, and servers running in so-called cloud computer networks.

22:27 There's no easy fix for spectra, which could require a redesign of the processors, according to researchers.

22:33 As for meltdown, the software patch needed to fix the issue could slow down computers by as much as 30%.

22:40 So, you know, your AWS, DigitalOcean, whatever, server may just get 30% slower now.

22:46 Wonderful.

22:47 - Yeah, so most of the places, I think Google, Amazon, and Microsoft have all said that the servers are pretty much changed to deal with meltdown, but Spectre's still a problem.

23:00 - I don't think there's a ton of concrete details here, at least not that I ran across.

23:05 It's sort of vague.

23:07 Apparently, not all the details about the exploit are out.

23:11 But I'd recommend people check out Risky.biz, which is my favorite developer security podcast.

23:18 It's super, super good.

23:19 And those guys are going to definitely have an insightful conversation on this next time they're on deck.

23:26 In case we were too vague about it, it was a design flaw found in all microprocessors that allow attackers to read the entire memory of a computer.

23:36 Yeah. So, bummer.

23:38 I hope you don't do anything on the internet. Carry on now.

23:41 Okay.

23:43 So, yeah, so the last thing, this is a more positive thing, I think of it at least.

23:48 I just announced all my courses, not all of them actually, only a few of them, for 2018, but I announced this new deal that I'm having for all the Talk Python courses called the Everything Bundle.

23:59 So, talkpython.fm/everything.

24:02 and it gets you what will be probably 120 hours of Python course awesomeness, including some new ones, Mastering PyCharm, Python 3 and Illustrated Tour, Introduction to Ansible and tons more coming.

24:14 So I just got, I was just finishing some of the videos for the PyCharm course right before we chatted, so it's almost done.

24:21 Cool. So is that going to be out this month then? Or soon?

24:24 That is going to be out probably next week.

24:26 Okay, cool.

24:27 Definitely soon, definitely soon.

24:28 It's so fun to create these courses and just, you know, keep exploring the different areas and helping people get better with them. So lots of fun.

24:35 Yeah, and you do things like working with companies if they want to like get access to these for like everybody that works there or a handful of people.

24:43 I definitely have special programs for like site licenses, things like that. I've even talked to some universities about having the courses for like all of their students or something like that. That would be wild. Still talking.

24:56 You'll have to increase the price for them, I guess.

24:58 I guess, but they're students, you know.

25:01 Cool.

25:02 All right, cool.

25:03 Well, Brian, thanks for sharing all your news.

25:06 And yeah, thank you.

25:07 Nice to be back together after the whole holiday time off.

25:11 Yes.

25:12 All right.

25:13 Catch you later.

25:14 Thank you for listening to Python Bytes.

25:15 Follow the show on Twitter via @PythonBytes.

25:18 That's Python Bytes as in B-Y-T-E-S.

25:21 And get the full show notes at PythonBytes.fm.

25:24 If you have a news item you want featured, just visit PythonBytes.fm and send it our We're always on the lookout for sharing something cool.

25:32 On behalf of myself and Brian Okken, this is Michael Kennedy.

25:35 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page