Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #109: CPython byte code explorer

Return to episode page view on github
Recorded on Monday, Dec 10, 2018.

00:00 KENNEDY: Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is Episode 109, recorded December 10th, 2018. I'm Michael Kennedy.

00:11 OKKEN: And I'm Brian Okken.

00:12 KENNEDY: And this episode is brought to you by DigitalOcean. Thank you, thank you, DigitalOcean. Tell you more about them later. Right now, Brian, how is everything going?

00:19 OKKEN: It's going really well, how about you?

00:21 KENNEDY: Oh, it's super, I'm starting to think of year in review, like what was the most amazing Python stories of the year and things like that. So looking forward to sharing those with everyone actually.

00:33 OKKEN: Yeah, that'd be great.

00:34 KENNEDY: Yeah, so you and I actually did an episode along with Dan Bader on Talk Python, which we'll drop here on this channel as well for the Year in Review in Python News, which is like this but more stuff and more depth. So that'll be a good thing for all those people traveling for the holidays, right?

00:49 OKKEN: Yeah, give 'em something to listen to.

00:51 KENNEDY: Right, they may be stuck in an ice storm in Chicago O'Hare, but they can listen to some good Python news. That's right, speaking of good Python news, what do you got to kick us off this time?

01:00 OKKEN: I have a Python descripper. I have Python descriptors are magical creatures.

01:06 KENNEDY: That sounds awesome.

01:07 OKKEN: Yeah, this is actually kind of a neat, approach to this article thinking yeah, I know what descriptors are, and stuff, and properties, it's talking about properties, of object properties in Python. But this is a really great article. So this is an article by Pablo Arias, and it talks about how you can add getters and setters and properties to objects. So you can have like, instead of calling a function like get_version(), you can just have version, and you can use object.version, and you can assign to that, and that'll call the setter. And if you read from it, that'll call the getter and you can have custom functions for that. It's one of the cool things about Python, and one that I'm glad that it's been highlighted, because some people forget this is around, especially if you come from a language that doesn't have this sort of thing.

01:58 KENNEDY: C, Java, that kind of stuff, right?

02:00 OKKEN: Yeah, these are pretty neat and they make it look like, like an attribute of the object, but it's actually a function that gets called. And it's a way you can actually migrate, you can start a system where it really is just data that's sitting there, and if you want to intercept it and say you know, actually when somebody assigns to this I want to do some work, or I don't want to really store this data, I want to calculate it on the fly, you can turn those into getters and setters and the calling code doesn't need to know.

02:29 KENNEDY: Yeah, I really like this because often the API makes the most sense as sort of fields, just setting the attributes like user.name, user.firstname or something like that. But what if you want validation, right? Like the name can't contain white spaces or other weird stuff. You want to strip that off or you know user names always lowercase and things like that so properties are perfect for that, right? You can validate it. You can raise an exception that says you can't have a None value here. It has to be a non-empty string, all kinds of good stuff. But the consumer doesn't care. They don't have to know.

03:01 OKKEN: Yeah, and I personally have actually used get and set methods before but the getters and setters, but there's a deleter also and I don't think I use that very much and it's kind of probably a neat thing to stick in place if you're doing this any way to make sure if it's invalid for somebody to try to delete an attribute. You may want to intercept that.

03:23 KENNEDY: Yeah, you're like no, you always have a name. You can't delete it.

03:27 OKKEN: Yeah, but this is a good general introduction to how to use these and so people can clean up their cut a little bit and make it look a little less Java-y.

03:35 KENNEDY: Yep, I totally agree. So the next one is I want to talk about a survey. So we've talked about the JetBrains Python survey and that data science featured heavily in it, but they also did a separate data science survey for just data scientists and asked data science-y questions only. So they polled about 1600 people who are data scientists based in the US, China, Europe, and Japan, and to figure out what's the story, what's the zen, and how are people feeling in the data science space right now. And so it wasn't just for Python, it was just for data scientists. But you can imagine that there are many Python things happening in the data science world, right? So one of the key takeaways was that most people assume or currently most people use Python and then they assume that Python will remain the primary programming language at least for five years.

04:28 OKKEN: Yeah, and that's essentially forever in computer time.

04:31 KENNEDY: That's right, like if you're planning past five years, you've got either a lot of faith in where things are going or you're doing it wrong. Those actually could be the same thing. They also talked about what are the main tools people are using for like machine learning stuff and they said Keras is the main one for professional developers whereas if you're an amateur data scientist, you're more likely to use Azure, Microsoft Azure Machine Learning Services rather than libraries. So you're like just make this a model. Teach it stuff. Figure that out later. Whereas the pros in quote are actually doing the straight API stuff.

05:08 OKKEN: Remind me what Keras is.

05:10 KENNEDY: Keras is a machine learning framework. Yeah, so it's sort of comparable to Azure ML, but Azure ML is a service like machine learning's a service. I would never use it though. So let's see, main program languages, obviously there are other languages and if you look back just a couple of years, R was a machine learning and data science-y language that was more popular than Python was for data science. But now it's Python is 57%, R is only 15%. Some people say Julia is the next big language for data scientists. So they asked about Julia of these 1600 people and the number of people using it was 0% so that's not super compelling for Julia I guess.

05:52 OKKEN: At least amongst this statistical set.

05:56 KENNEDY: Yeah, yeah, yeah, and honestly I forgot how they found this set of people, so I'm sure they talk about it in the write up. Then finally when you talk about IDEs and editors, there's really, there were three standout main things people use. Obviously Jupyter, Jupyter Notebook, Jupyter Lab was 43%. PyCharm was 38% and RStudio was 23%. So that's pretty interesting. Alright, so if you're in the data science space, maybe this will help you keep your pulse on, keep the pulse of what's going on there.

06:27 OKKEN: I want to highlight a little tool. So like where I talked about properties just as a nice technique that people should make sure they understand how those work, another thing, I've ran across memoization. Not memorization, but memoization with no r. And this is a technique to if you've got a function or something, some work that you need to do is dependent on input data, only dependent on the input parameters, but to get your answer, you have to, it's computationally intensive. And you often also get a lot of the same types of information coming in, same type of parameters. Memoization is a technique that basically just store, you know save the data, calculate it once, and if you find, if you get past the same arguments again, just return the answer that you've already calculated.

07:22 KENNEDY: This technique can make your code incredibly fast. Like if you have some function that you're calling with relatively bounded set of inputs and that's at all computationally expensive or goes to a service and it gets an answer back. But like you said if the input is the only thing that drives it, it's not like well what's the weather at the zip code 'cause that could always change, but it's like what's the limit of this integral when passing in this lower bound, you know like discrete integral or something. Alright, it's always going to give you the same answer back. So you can actually go to the function. Even with the func tools built into Python, you can say I want this function if it gets the same arguments to not run again. Just give the answer back. Just kind of store it in memory or somewhere, right. And that only works in process.

08:04 OKKEN: Yeah. One of the things I want to highlight is a project called cache.py that saves all this stuff off to a file. This would be helpful especially if you've got like a command line tool that gets called lots of times, it isn't going to be able to store everything in memory. So being able to save it in a file might be helpful. The interface is just a decorator. Just say hey, this function you can cache the results so you throw a decorator on it, it's just cache.cache, add a decorator onto your function and it just works. And there's a whole bunch of customization you can do. You can say how long the cache is good for or where the files should be and things like that. But the default just kind of works pretty good too.

08:46 KENNEDY: Yeah, I really like this. So the thing is the built in stuff only works in memory and so once the process is done, it's done. But like you said, if this is a command line tool you're stringing together, you want it to keep that data for a certain amount of time or just always keep it so that it's like well if you pass me seven, the answer's always going to be this, right. Yeah, it's great that that'll keep it on the file system. And it uses Pickle, right?

09:11 OKKEN: I'm not sure.

09:12 KENNEDY: Yeah, let's see. Yeah, it currently uses Pickle and Inspect under the hood making it not portable. So you can't like take your cache file and move it to Windows when you ran it on Linux or something I believe. 'Cause of the memory structure and different versions of Python and so on.

09:28 OKKEN: So what, remind me what was the builtin one that works in memory?

09:31 KENNEDY: It's on functools and it's at lru_cache I believe.

09:35 OKKEN: Okay, that's cool.

09:36 KENNEDY: functools.lru_cache, yeah.

09:38 OKKEN: Yeah, I brought this up also mostly because I know a lot of people teach them, learn on the job or teach themselves to program. I'm not bragging that I have a computer science degree, but this is one of those topics that you probably don't come up with on your own. It's a clever thing and a nice useful tool for your toolbox, but it's not something that's obvious. It wasn't obvious to me until I learned about it.

10:01 KENNEDY: Yeah, same here. I think the first time I learned about this was when I started studying design patterns and stuff like that and somehow it came up and then I went oh, that's pretty clever. When you are working with code and it's slow, to me it seems like there's two things that are really really powerful that can just go oh well now it's 100 times faster, that's cool. That was like one line of code. You know, one is using the wrong kind of data structure like if you're using a list but you really should use a set 'cause you're text from membership on a big set, something like that, or dictionaries or whatever. The other one is this kind of caching, right? Like if you're doing something and it takes a long time, even if it's going out to the internet and calling a service, like if you think that data changes once a day, it'd be totally great to put like a one minute cache on that if you're calling it a bunch of times.

10:44 OKKEN: Yeah, and it can, like you said, it can make a massive improvement in speed up. And it's like sort of an obvious, you know like after you see it, you're like well duh, I didn't even think of that.

10:54 KENNEDY: Absolutely. So I really think this is a cool one because it takes that idea and it just makes it easy to carry it across different processes. Or different runs of the same process. Okay, so before we go on to the next one, let me just tell you all about DigitalOcean. They're doing all sorts of cool stuff. Our infrastructure runs on it, really really nice and reliable. One of the things I want to highlight this time is their work with Kubernetes, Docker, and coordinating Docker, orchestrating Docker stuff. With Kubernetes is a big deal these days and so they're launching a new Kubernetes cluster over at DigitalOcean. So a really nice way to manage and deploy your container workloads in the cloud. And if you go to pythonbytes.fm.digitalocean, and you're a new user, you get $100 credit to Kubernetes all the way if you want. You can run a lot of Kubernetes for 100 bucks on the cloud. So that's pretty awesome.

11:44 OKKEN: That's yeah very cool.

11:45 KENNEDY: Yeah, so check them out, pythonbytes.fm/digitalocean. They're big supporters of the show and they keep us going strong each week, don't they?

11:51 OKKEN: Yeah, I'm very grateful for them.

11:53 KENNEDY: Yeah, the next one I want to tell you about is a really short video. Last week I covered an hour and a half video about being an expert on Python. How about we cut this down to like a four minute one? So, I think this one is really good for people who are getting into data science and they have a little bit of a challenge. If you're an expert, this is definitely not the video for you, but this is called Setting Up the Data Science Tools. And so it's part of a larger video series, but it basically shows you how to set up the Anaconda distribution, TensorFlow, Keras, Jupyter, all those things. And it actually talks about using Conda, Conda Virtual Environments, creating notebooks and switching between virtual environments. So if you've been mostly working pip or you've seen examples in pip and you want to do more Anaconda stuff, this is a great video, and especially if you want to install some of these tools and get going. And if you're kind of new, this is a great way to get going.

12:43 OKKEN: That's awesome, yeah, cool.

12:44 KENNEDY: Yeah, it's great. I was just talking to somebody who was really new to Python, is super eager to get going, but he was having a problem because he was working on a computer that he didn't have admin access to. And so when he would try to pip install something, it would try to put it in the systemwide thing which you'd have to to make that happen. You shouldn't, but if you wanted it to happen, for sure you could use sudo, but he wasn't allowed to you know basically run his admin to do that. So I'm like oh, you just need to use a virtual environment. Then you can do whatever you want to on your machine. And he was like, oh, wonderful. So I think it might sound like old hat to folks that have been doing it for a long time, but when you're new to it, like that's not obvious. Like my Python won't install. Well, if you had a virtual environment it would. Or if you did these other steps, it would, right?

13:28 OKKEN: Right, and also it's somebody like, somebody like me that is used to virtual environments, it's still not obvious how to do that in Anaconda and Conda environment.

13:39 KENNEDY: Exactly, I have to look it up every time because I'm all about pip and I'm just like, wait a minute, it's a different way to activate. It's a global activate command. Where's the list? How do I know what exists at? It's different, so, I'm sure I could actually use this as well. Beginner means beginner to Anaconda and data science tools, right? Awesome, alright speaking of data science, I bet data scientists draw a lot of graphs, right?

14:01 OKKEN: Well lots of people draw a lot of graphs. Last time I tried to use Bokeh, or Bokeh, I keep saying that wrong. You don't need to email me that I'm saying it wrong. I know it's Bokeh. That's B-O-K-E-H. It is a very powerful charting tool. I believe it's not the most simplest interface to figure out as a newbie. And it's not like Matplotlib is super easy either, but a lot of people know about it. But Bokeh, it's not bad, it's just if you're a beginner, maybe there's an easier way. And this is the easier way or one of the easier ways, this a package called Chartify that simplifies a lot of the defaults and it's built on top of Bokeh. So if you've got some data and you want to throw it into a chart, this is a nice way to do it. It fills out a whole bunch of the defaults to where it starts out fairly pretty to start with. So simplifying the API for the newbies into Bokeh.

14:56 KENNEDY: Oh, that's great. I do find it a little overwhelming because you can do everything, right? You can specify so much detail and I'm like sometimes I'm just like, you know I could just use like a histogram. Wouldn't that that would be awesome. Can we just do a histogram?

15:07 OKKEN: Yeah, and if I got a bunch of different, you know I want to be able to pick the colors fairly easily and I don't really care, but I just want it to look nice.

15:16 KENNEDY: They also have a bunch of nice examples, example notebooks and stuff that walk you through using it. So yeah, it's a great little resource. Speaking of Jupyter and examples and notebooks and stuff, I want to stick with that for the last one here and it's called the CPython Bytecode Explorer. Most people probably know this at least at some level, but I'm sure not everyone does. When you run your Python code, it loads it up and it compiles it to byte code and you're like what? Python's interpreted, it's not compiled. So it compiles your source code into byte code and those Byte codes are interpreted on top of the CPython like a big loop that just runs. Okay, what's the next byte code, let's do that. So understanding what those byte codes are. How complex is something? Is it an atomic operation or does it take multiple steps? All of those things, you might wonder about. So this was sent to us by Anton Helm. And it's created by this guy named Jeremy Tuloop, and what it is it's a plugin for Jupyter Lab, not Jupyter Notebooks but the more full-featured Jupyter Lab. And what it does is it lets you look at the byte code of various things that you're, various operations that you're working on. So if you pull that, I think Brian, the link there, you can see there's a little animated gif that shows you what's happening. So it's creating like an a, b, and a c = a + b. And there's just on the right as you type, it just shows you the byte code of those. So I think this is a great way to explore working with Python if you want to understand more of this low level byte code thing.

16:52 OKKEN: Yeah, this would be awesome, just in teaching, like especially if you're going to talk about like how names work in Python. This would be kind of fun to use to see how it all links to the same thing and what not.

17:07 KENNEDY: Yeah, another example that's cool, if you go to the very bottom, there's a bunch of little animated gifs here. And the very bottom one shows two operations, the looping over just the numbers 0 to 9, and you can either do this by while loop, you create the while loop and you have i < 10, i += 1, or you could just say for i in range(0, 10). And they show it side by side, comparing the disassembled byte code of both of them, and surprise surprise, the for in loop is a lot fewer byte codes operations. So it's probably faster.

17:42 OKKEN: That's cool. There's even a demo that it shows that you can see how the Python 3.6 and Python 3.7 running side by side.

17:50 KENNEDY: Yeah, in the same Jupyter lab view, you can have different versions of Python with the same code to understand how byte codes have evolved over time.

17:58 OKKEN: That's trippy, yeah.

18:00 KENNEDY: I know. So if you want to understand byte codes, this is a pretty trippy here. So yeah, like you said, if you're teaching people about this kind of stuff, I think this would be an awesome resource.

18:10 OKKEN: Yeah, nice.

18:11 KENNEDY: Really good to just dig in and understand it. That's it for our six items this week Brian. But I was wondering how is you know the internet made? Is it like factory? Is it like internet trees?

18:25 OKKEN: Yeah, I was contemplating whether or not to bring this up.

18:28 KENNEDY: It's too late now.

18:30 OKKEN: Yeah, I saw on, I'm a little addicted to twitter, somebody passed around this little video called How the Internet was Made. And we're going to put a link to it and it's hard to describe, but it's just this complete silliness of these like old time videos of how things are made and stuff. It gets shipped from here to there and gets rolled across the field of barrels and stuff, and it's bizarre. But it made me laugh so hard.

18:57 KENNEDY: It's like an old timey silent movie with subtitles. It's like a documentary on how the internet is made. So it starts out, it has lots of gears and cambers and things. Then eventually, it's put into wheelbarrows. If I understand this correct. And it starts in Austria I believe. So the internet is mined in Austria and it's put into a special internet wheelbarrow which is pretty trippy. It's like a hovercraft. It's mixed up into like a gray goo and it's shipped off all in these pipes. Anyway, it's a good joke. Go and check it out. But it's much more visual.

19:30 OKKEN: It does reference both Austria and Ireland even though, I think it's Ireland, even though the map always points to Italy.

19:40 KENNEDY: I did notice that. This is so off. Pretty awesome, so people if you need a good laugh, you know click on that link. It's silent so it's not going to upset folks at work. It's all about just the visuals. Well I think it was a good one Brian and I'm glad I forced you to put it in there.

19:57 OKKEN: So next week we've got to kind of a year in review thing that you're putting in, right?

20:02 KENNEDY: Yeah, absolutely, so you and I have recorded a Talk Python Year in Review Top 10 Python Stories of the year not just of the week, and that's coming out next time. So be sure to check that out and it'll be a lot of fun.

20:14 OKKEN: Yeah, nice, okay.

20:15 KENNEDY: Alright, well, thank you for doing all of this this year with me Brian.

20:19 OKKEN: Yeah, thank you.

20:19 KENNEDY: You bet, bye.

20:20 OKKEN: Bye.

20:21 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page