Transcript #109: CPython byte code explorer
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 109, recorded December 10th, 2018. I'm Michael Kennedy.
00:11 And I'm Brian Okken.
00:11 And this episode is brought to you by DigitalOcean. Thank you, thank you, DigitalOcean. Tell you more about them later.
00:17 Right now, Brian, how is everything going?
00:19 It's going really well. How about you?
00:21 Oh, it's super. I'm starting to think of year in review, like what was the most amazing Python stories of the year and things like that.
00:29 So looking forward to sharing those with everyone, actually.
00:33 Yeah, that'd be great.
00:34 Yeah. So you and I actually did an episode along with Dan Bader on Talk Python, which we'll drop here on this channel as well for the year in review in Python news, which is like this, but more stuff in more depth.
00:46 So that'll be a good thing for, you know, all those people traveling for the holidays, right?
00:49 Yeah. Give them something to listen to.
00:51 All right. They may be stuck in an ice storm in Chicago, O'Hare, but they can listen to some good Python news.
00:57 Yeah.
00:58 That's right. Speaking of good Python news, what do you got to kick us off this time?
01:00 I have a Python descriptor.
01:02 I have Python descriptors are magical creatures.
01:06 That sounds awesome.
01:07 Yeah. This is actually kind of a neat approach.
01:10 This article thinking, yeah, I know what descriptors are and stuff and properties.
01:15 It's talking about properties of object properties in Python.
01:18 But this is a really great article.
01:20 So this is an article by Pablo Arias.
01:23 And it talks about how you can add getters and setters to and properties to objects.
01:31 So you can have like, instead of calling a function like get version, you can just have like version and, and you can use like object dot version and you can assign to that.
01:41 And that'll call the setter.
01:43 And if you read from it, that'll call the getter and you can have custom functions for that.
01:48 It's one of the cool things about Python.
01:50 And, and one that I'm glad that it's been highlighted because some people forget this is around, especially if you come from a language that doesn't have this sort of thing.
01:58 See Java, that kind of stuff, right?
02:00 Yeah.
02:00 These are pretty neat.
02:01 And they, they make it look like like an attribute of the object, but it's actually a function that gets called.
02:08 And it's a way you can actually migrate.
02:10 You can start a system where it really is just data that's sitting there.
02:14 And if you want to intercept it and say, you know, actually when somebody assigns to this, I want to do some work or I don't want to really store this data.
02:22 I want to calculate it on the fly.
02:24 I mean, you can turn those into getters and setters and the calling code doesn't need to know.
02:28 Yeah.
02:29 I really like this because often the API makes the most sense as sort of fields, just setting the attributes, right?
02:35 Like user dot name, user dot first name or something like that.
02:39 But what if you want validation, right?
02:41 Like the name can't contain white spaces or other weird stuff.
02:44 If you want to strip that off or, you know, username is always lowercase and things like that.
02:49 So properties are perfect for that, right?
02:51 You can validate it.
02:52 You can raise an exception that says you can't have a none value here.
02:56 It has to be a non-empty string, all kinds of good stuff.
02:58 But the consumer doesn't care.
03:00 They don't have to know.
03:01 Yeah.
03:01 And I personally actually have used a get and set methods before, but the getters and setters, but there's a deleter also.
03:08 And I don't think I use that very much.
03:10 And it's kind of probably a neat thing to stick in place if you're doing this anyway to make sure if it's invalid for somebody to try to delete an attribute, you may want to intercept that.
03:23 Yeah.
03:23 No, you always have a name.
03:25 You can't delete it.
03:27 Yeah.
03:28 But this is a good general introduction to how to use these.
03:31 And so people can clean up their code a little bit, make it look a little less Java-y.
03:35 Yep.
03:35 I totally agree.
03:36 So the next one is I want to talk about a survey.
03:40 So we've talked about the JetBrains Python survey and that data science featured heavily in it.
03:46 But they also did a separate data science survey for just data scientists and asked data science-y questions only.
03:53 So they pulled about 1,600 people who are data scientists based in the U.S., Europe, China, and Japan.
04:00 And to figure out what's the story, what's the zen, and how are people feeling in the data science space right now.
04:08 And so it wasn't just for Python.
04:10 It was just for data scientists.
04:11 But you can imagine that there are many Python things happening in the data science world, right?
04:17 So one of the key takeaways was that most people assume, or currently most people use Python, and then they assume that Python will remain the primary programming language at least for five years.
04:27 Yeah, and that's essentially forever in computer time.
04:31 That's right.
04:31 If you're planning past five years, you've got either a lot of faith in where things are going or you're doing it wrong.
04:38 Those actually could be the same thing.
04:40 All right.
04:41 They also talked about what are the main tools people are using for machine learning stuff.
04:46 And they said Keras is the main one for professional developers.
04:50 Whereas if you're an amateur data scientist, you're more likely to use Microsoft Azure machine learning services rather than libraries.
04:59 So you're like, just make this a model.
05:01 Teach it stuff.
05:02 Figure that out later.
05:03 Whereas the pros, in quote, are actually doing the straight API stuff.
05:08 And remind me what Keras is?
05:09 Keras is a machine learning framework.
05:11 Okay.
05:11 Yeah.
05:12 So it's sort of comparable to Azure ML.
05:14 But Azure ML is a service.
05:15 Like machine learning is a service.
05:17 I haven't ever used it, though.
05:17 Okay.
05:18 So let's see.
05:18 Main programming languages.
05:20 Obviously, there are other languages.
05:23 And if you look back just a couple years, right, R was a machine learning and data science-y language that was more popular than Python was for data science.
05:35 But now Python is 57%.
05:37 R is only 15%.
05:38 Some people say Julia is the next big language for data scientists.
05:42 So they asked about Julia of these 1,600 people.
05:45 And the number of people using it was 0%.
05:47 So that's not super compelling for Julia, I guess.
05:52 At least amongst this data, this statistical set.
05:56 Yeah, yeah, yeah.
05:57 And honestly, I forgot how they found this set of people.
06:01 So I'm sure they talk about it in the write-up.
06:03 And then finally, when you talk about IDEs and editors, there were three standout main things people used.
06:09 Obviously, Jupyter, Jupyter Notebooks, JupyterLab was 43%.
06:13 PyCharm was 38%.
06:15 And RStudio was 23%.
06:17 So that's pretty interesting.
06:19 Yeah.
06:19 Yep.
06:20 All right.
06:21 So if you're in the data science space, maybe this will help you keep your pulse on, keep the pulse of what's going on there.
06:27 I want to highlight a little tool.
06:28 So like I talked about properties just as a nice technique of people should make sure they understand how those work.
06:36 Another thing, I've ran across memoization.
06:40 It's a technique.
06:46 This is a technique to, if you've got a function or something, some work that you need to do that's dependent on input data, only dependent on the input parameters.
06:56 But to get your answer, you have to, it's computationally intensive.
07:05 You have to get your answer to that.
07:16 If you find the, if you get past the same arguments again, just return the answer that you've already calculated.
07:26 If you have some function that you're calling with relatively bounded set of inputs and it's at all computationally expensive or it goes to a service and it gets an answer back.
07:36 Like you said, if the input is the only thing that drives it, it's not like, well, what's the weather at the zip code?
07:41 Because that could always change.
07:42 But it's like, you know, what's the limit of this integral when passing in, you know, this lower bound, you know, like discrete integral or something.
07:48 Right.
07:49 It's always going to give you the same answer back.
07:51 So you can actually go to the function.
07:53 Even with the func tools built into Python, you can say, I want this function.
07:56 If it gets the same arguments to not run again, just give the answer back.
08:00 It's kind of stored memory or somewhere.
08:01 Right.
08:01 And that, that only works in process.
08:03 Yeah.
08:04 One of the things I wanted to highlight is a project called cache.py that saves all this stuff off to a file.
08:12 This would be helpful, especially if you've got a, like a command line tool that gets called lots of times.
08:17 It isn't going to be able to store everything in memory.
08:20 So being able to save it in a file might be helpful.
08:22 The interface is just a decorator to say, hey, this function, you can cache the results.
08:28 So you throw a decorator on it called, it's just cache.cache.
08:32 Decorator onto your function.
08:33 And it just works.
08:35 And there's a whole bunch of customization you can do.
08:37 You can say how, how long the cache is good for or, and where the, where the file should be and things like that.
08:44 But the default just kind of works pretty good too.
08:46 Yeah.
08:46 I really like this.
08:47 So the thing is the built-in stuff only works in memory.
08:51 And so once the process is done, it's done.
08:54 But like you said, if this is a command line tool you're stringing together and you want it to keep that data for a certain amount of time or just always keep it so that it's like, well, if you pass me seven, the answer is always going to be this.
09:06 Right?
09:06 Yeah.
09:06 That's, yeah.
09:07 It's great that that'll keep it on the file system.
09:09 And it uses pickle, right?
09:11 I'm not sure.
09:12 Yeah.
09:12 Let's see.
09:14 Yeah.
09:14 Currently uses pickle and inspect under the hood, making it not portable.
09:18 So you can't like take your cache file and move it to windows when you ran it on Linux or something, I believe.
09:24 Yeah.
09:25 Because of memory structure and different versions of Python and so on.
09:28 So what, remind me, what was the built-in one that works in memory?
09:31 It's on functools and it's an LRU cache, I believe.
09:34 Okay.
09:35 Functools, LRU cache.
09:37 Yeah.
09:37 Yeah.
09:37 Yeah.
09:38 I brought this up also mostly because I know a lot of people teach them, learn on the job or teach them.
09:44 I'm not bragging that I have a computer science degree, but this is one of those topics that you probably don't come up with on your own.
09:52 It's a clever thing and a nice, useful tool for your toolbox, but it's not something that's obvious.
09:59 It wasn't obvious to me until I learned about it.
10:01 Yeah.
10:02 Same here.
10:02 I think the first time I learned about this was when I started studying design patterns and stuff like that.
10:07 And somehow it came up in there.
10:08 I'm like, oh, that's pretty clever.
10:09 Yeah.
10:10 Yeah.
10:10 When you are working with code and it's slow, to me, it seems like there's two things that are really, really powerful that can just go, oh, well, now it's a hundred times faster.
10:18 That's cool.
10:19 And that was like one line of code.
10:21 You know, one is using the wrong kind of data structure.
10:24 Like if you're using a list, but you really should use a set because you're testing for membership on a big set, something like that or dictionaries or whatever.
10:31 The other one is this kind of caching, right?
10:33 Like if you're doing something and it takes a long time, even if it's going out to the internet and calling a service, like if you think that data changes once a day, it'd be totally great to put like a one minute cache on that.
10:43 If you're calling it a bunch of times.
10:44 Yeah.
10:45 And it can, like you said, it can, it can make a massive improvement speed up and it's like sort of an obvious of, you know, after you see it, you're like, well, yeah, duh.
10:53 I didn't even think of that.
10:53 Absolutely.
10:54 So I really think, I think this is a cool one because it takes that idea and it just makes it easy to carry it across different processes or different runs of the same process.
11:03 Okay.
11:03 So before we get on the next one, let me just tell you all about DigitalOcean.
11:07 They're doing all sorts of cool stuff.
11:08 Our infrastructure runs on it.
11:10 Really, really nice and reliable.
11:12 One of the things I want to highlight this time is their work with Kubernetes, Docker and coordinating Docker, orchestrating Docker stuff with Kubernetes is a big deal these days.
11:21 And so they're launching a new Kubernetes cluster over at DigitalOcean.
11:27 So a really nice way to manage and deploy your container workloads in the cloud.
11:31 And if you go to pythonbytes.fm/DigitalOcean and you're a new user, you get $100 credit to Kubernetes all the way if you want.
11:40 You can run a lot of Kubernetes for $100 in the cloud.
11:42 So that's pretty awesome.
11:43 That's, yeah, very cool.
11:45 Yeah.
11:45 So check them out, pythonbytes.fm/DigitalOcean.
11:48 They're big supporters of the show and they keep us going strong each week, don't they?
11:51 Yeah.
11:52 I'm very grateful for them.
11:53 Yep.
11:53 The next one I want to tell you about is a really short video.
11:56 Last week, I covered an hour and a half video about being an expert on Python.
11:59 How about we cut this down to like a four minute one?
12:02 So I think this one is really good for people who are getting into data science and they have a little bit of a challenge.
12:09 If you're an expert, this is definitely not the video for you.
12:11 But this is called Setting Up the Data Science Tools.
12:14 And so it's part of a larger video series.
12:16 But it basically shows you how to set up the Anaconda distribution, TensorFlow, Keras, Jupyter, all those things.
12:24 And it actually talks about using Conda, Conda virtual environments, creating notebooks and switching between virtual environments.
12:31 So if you've been mostly working with pip or you see examples in pip and you want to do more Anaconda stuff, this is a great video.
12:37 And especially if you want to install some of these tools and get going and you're kind of new, this is a great way to get going.
12:43 That's awesome.
12:43 Yeah.
12:44 Cool.
12:44 Yeah, it's great.
12:45 I was just talking to somebody who was really new to Python and super eager to get going.
12:50 But he was having a problem because he was working on a computer that he didn't have admin access to.
12:55 And so when he would try to pip install something, it would try to put it in the system-wide thing, which you'd have to to make that happen.
13:02 You shouldn't.
13:03 But if you wanted it to happen for sure, you could do sudo.
13:05 But he wasn't allowed to basically run his admin to do that.
13:09 Right?
13:10 So I'm like, oh, you just need to use a virtual environment.
13:12 Then you can do whatever you want to your machine.
13:14 It's like, oh, wonderful.
13:16 Right?
13:16 So I think it might sound like old hat to folks that have been doing it for a long time.
13:20 But when you're new to it, that's not obvious.
13:22 Right?
13:23 Like, my Python won't install.
13:24 Well, if you had a virtual environment, it would.
13:26 Or you did these other steps, it would.
13:27 Right?
13:28 Right.
13:28 And also, somebody like me that is used to virtual environments, it's still not obvious how to do that in an Anaconda environment.
13:38 Exactly.
13:39 I have to look it up every time because I'm all about PIP.
13:42 And I was like, wait a minute.
13:43 It's a different way to activate.
13:44 It's like a global activate command.
13:46 Where's the list?
13:46 How do I know what exists?
13:47 Yeah, it's different.
13:48 So I'm sure I could actually use this as well.
13:51 Cool.
13:51 Beginner means beginner to Anaconda in data science tools.
13:53 Not true beginner, right?
13:55 Yeah.
13:55 Awesome.
13:55 All right.
13:56 Speaking of data science, I bet data scientists draw a lot of graphs, right?
14:01 Yeah.
14:01 Well, lots of people draw a lot of graphs.
14:03 Last time I tried to use Boca or Bokeh.
14:05 I keep saying that wrong.
14:07 You don't need to email me that I'm saying it wrong.
14:09 I know it's Bokeh.
14:11 It's B-O-K-E-H.
14:13 It is a very powerful charting tool.
14:15 I believe it's not the most simplest interface to figure out as a newbie.
14:21 And it's not like Matplotlib is super easy either.
14:26 But a lot of people know about it.
14:28 But Bokeh, yeah, it's not bad.
14:30 It's just if you're a beginner, maybe there's an easier way.
14:33 And this is the easier way.
14:35 One of the easier ways is a package called Chartify that simplifies a lot of the defaults.
14:41 And it's built on top of Bokeh.
14:44 So if you've got some data and you want to throw it into a chart, this is a nice way
14:47 to do it.
14:47 It fills out a whole bunch of the defaults to where it starts out fairly pretty to start
14:52 with.
14:52 So simplifying the API for newbies into Bokeh.
14:56 Oh, that's great.
14:57 I do find it a little overwhelming because you can do everything, right?
15:00 You can specify so much detail.
15:01 I'm like, sometimes I'm just like, you know, I could just use like a histogram.
15:04 Wouldn't that?
15:05 That would be awesome.
15:05 Can we just do a histogram?
15:08 Yeah.
15:08 And if I got a bunch of different, you know, I want to be able to pick the colors fairly
15:13 easily.
15:13 And I don't really care, but I just want it to look nice.
15:16 Yeah.
15:16 They also have a bunch of nice examples, example notebooks and stuff that walk you through using
15:21 it.
15:21 So yeah, it's a great little resource.
15:23 Yeah.
15:23 Speaking of Jupyter and examples and notebooks and stuff, I want to stick with that for the
15:27 last one here.
15:28 And it's called the CPython Bytecode Explorer.
15:32 Most people probably know this at least at some level, but I'm sure not everyone does.
15:37 When you run your Python code, it loads it up and it compiles it to bytecode.
15:42 And you're like, wait, what?
15:43 Python's interpreted.
15:44 It's not compiled.
15:45 So it compiles your source code into bytecode.
15:49 And those bytecodes are interpreted on top of the CPython, like a big loop that just runs
15:55 goes, okay, what's the next bytecode?
15:56 Let's do that.
15:57 So understanding what those bytecodes are, how complex is something?
16:02 Is it an atomic operation or does it take multiple steps?
16:05 All of those things you might wonder about.
16:07 So this was sent to us by Anton Helm.
16:09 And it's created by this guy named Jeremy Tulup.
16:13 And what it is, is it's a plugin for JupyterLab, not Jupyter Notebooks, but the more full-featured
16:19 JupyterLab.
16:20 And what it does is it lets you look at the bytecode of various things that you're, various operations
16:27 that you're working on.
16:28 So if you pull up that thing, Brian, the link there, you can see there's a little animated
16:33 gif that shows you what's happening.
16:35 So it's creating like an A, B, and a C equals A plus B.
16:38 And there's just on the right as you type, it just shows you the bytecode of those.
16:43 So I think this is a great way to explore working with Python.
16:48 If you want to understand more of this low-level bytecode thing.
16:52 Yeah.
16:53 This would be awesome just in teaching, especially if you're going to talk about how names work
17:00 in Python.
17:01 This would be kind of fun to use to see how it all points to the same thing and whatnot.
17:06 Yeah.
17:07 Another example that's cool, if you go to the very bottom, there's a bunch of little
17:10 animated gifs here.
17:11 And the very bottom one shows the two operations looping over just the numbers 0 to 9.
17:18 And you can either do this by a while loop, you create the while loop, and you have i less
17:23 than 10, i plus equals 1, or you can just say 4i in range of 0 to 10.
17:29 And they show it side by side comparing the disassembled bytecode of both of them.
17:35 And surprise, surprise, the foreign loop is a lot fewer bytecode operations.
17:40 So it's probably faster.
17:41 That's cool.
17:42 Cool, right?
17:43 There's even a demo that shows that you can see the, have Python 3.6 and Python 3.7 running
17:48 side by side.
17:49 Yeah.
17:50 In the same JupyterLab view, you can have different versions of Python with the same code to understand
17:56 how bytecodes have evolved over time.
17:58 That's trippy.
17:59 Yeah.
17:59 Neat.
18:00 I know.
18:00 So if you want to understand bytecodes, this is pretty trippy here.
18:04 Yeah.
18:05 Like you said, if you're teaching people about this kind of stuff, I think this would be
18:09 an awesome resource.
18:09 Yeah.
18:10 Nice.
18:10 Cool.
18:11 Yeah.
18:11 Really good to just dig in to understand it.
18:13 That's it for our six items this week, Brian.
18:16 But I was wondering, how is the internet made?
18:20 Is it like factory?
18:22 Is it like, are there internet trees?
18:25 Yeah.
18:25 I was contemplating whether or not to bring this up, but I...
18:29 It's too late now.
18:30 Yeah.
18:30 I saw on, I'm a little addicted to Twitter.
18:33 So somebody passed around this little video called How the Internet is Made.
18:38 And we're going to put a link to it.
18:39 And it's hard to describe, but it's just this complete silliness of these like old time
18:45 videos of how things are made and stuff.
18:48 It gets shipped from here to there and gets rolled across the field with barrels and stuff.
18:53 And it's bizarre.
18:55 But it made me laugh so hard.
18:57 It's like an old timey silent movie with subtitles.
19:01 It's like a documentary on how the internet is made.
19:04 So it starts out, has lots of gears and cambers and things.
19:07 Then eventually it's put into wheelbarrows.
19:10 If I understand this correct.
19:12 Yeah.
19:12 And it starts in Austria, I believe.
19:14 So the internet is mined in Austria.
19:16 It's put into a special internet wheelbarrow, which is pretty trippy.
19:19 It's like a hovercraft.
19:20 It's mixed up into like a gray goo and then it's shipped off along these pipes.
19:25 Anyway, it's a good joke.
19:27 People can check it out.
19:28 But it's much more visual.
19:30 It does reference both Austria and Ireland, even though I think it's Ireland, even though
19:36 the map always points to Italy.
19:38 I didn't notice that.
19:41 I'm like, this is so off.
19:43 Pretty awesome.
19:44 So people, if you need a good laugh, you know, click on that link.
19:47 It's silent.
19:47 So it's not going to upset folks at work, right?
19:50 It's all about just the visuals.
19:52 Well, I think it was a good one, Brian.
19:54 And I'm glad I forced you to put it in there.
19:56 So next week, we've got a kind of a year in review thing that you're putting in, right?
20:02 Yeah, absolutely.
20:02 So you and I had recorded a Talk Python year in review, top 10 Python stories of the year,
20:08 not just of the week.
20:09 And that's coming out next time.
20:11 So be sure to check that out.
20:13 And it'll be a lot of fun.
20:14 Yeah.
20:14 Nice.
20:15 Okay.
20:15 All right.
20:16 Well, thank you for doing all of this year with me, Brian.
20:19 Yeah.
20:19 Thank you.
20:19 You bet.
20:20 Bye.
20:20 Bye.
20:20 Thank you for listening to Python Bytes.
20:22 Follow the show on Twitter via at Python Bytes.
20:25 That's Python Bytes as in B-Y-T-E-S.
20:27 And get the full show notes at pythonbytes.fm.
20:31 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
20:35 We're always on the lookout for sharing something cool.
20:38 On behalf of myself and Brian Okken, this is Michael Kennedy.
20:41 Thank you for listening and sharing this podcast with your friends and colleagues.