« Return to show page
Transcript for Episode #95:
Unleash the py-spy!
00:00 KENNEDY: Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode 95, recorded September 12th 2018. I'm Michael Kennedy.
00:09 OKKEN: And I'm Brian Okken.
00:11 KENNEDY: Hey Brian, how you doing this fine, fine Wednesday?
00:13 OKKEN: I am excellent.
00:14 KENNEDY: Nice, it's also excellent that Datadog is sponsoring the show, so before getting further tell them thank you, pythonbytes.fm/datadog. Actually a cool shirt if you go there and follow along. We'll talk more about that later. You know, I feel like summer's coming to an end, and I've been quite lazy all summer. I'm not sure I'm ready to get back into the main swing of things, but it's upon us.
00:34 OKKEN: Yeah, and do you know who else is lazy? Programmers are lazy.
00:38 KENNEDY: Productively lazy They put lazy to...
00:39 OKKEN: Productively.
00:40 KENNEDY: good use.
00:42 OKKEN: Yes.
00:42 KENNEDY: They make lazy a virtue.
00:44 OKKEN: And so this was our segue from nothing into the first item, which is dataset. And dataset is a Python package that bills itself as databases for lazy people. And this is actually something I totally want to try because it looks fun. Their premise is programmers are lazy. Oh, it says, at first, I'll just read some of the top. Although managing data in relational databases has plenty of benefits, they're rarely used in day-to-day work with small and medium-scale datasets. But why is that? It's because people are lazy and they'll throw it in JSON or CSV instead. Oh, they say, the answer is that programmers are lazy. They'll use the easy solution. And I guess I can't disagree. I've used JSON format as, essentially, a local database before. But this is kind of cool. This is, what it is, is it's built on top of Alchemy. So it's built on top of SQLAlchemy so it can work with any database or a SQL-style database and it's just really easy. It looks kind of like a NoSQL. It's kind of hard to describe, but, of course, over the air, but, it's pretty simple and worth checking out, I think.
01:57 KENNEDY: Yeah, I like it. It does automatic schema, creation, upserts. It has query helpers like distinct and stuff like that. So, if you were to say, I'm just going to use like an in-memory dictionary or other things like that, it's kind of nice that it helps with some of those things.
02:11 OKKEN: So you just said a couple terms that I don't even know what those are, so, upsert.
02:16 KENNEDY: Upsert is, I have a record and I'm going to try to save it to the database. If it does not exist in the database, an update would fail. But if it already exists, an insert would fail 'cause it would be a duplicate key. And upsert says, hey, data access layer, take this thing and save it. If it doesn't exist, put it in there as a new thing. If it does, make an update and set the values to the new one.
02:38 OKKEN: Okay, and it also deals with sparse stuff. So one of their initial examples is, let's say you've got people or something and in the first person, you give them a name and an age, and you can insert that. The second person comes along and you give them a name, an age and a gender. And then, you can search easily in it. Yeah, like you said, it deals with the schema for you already and you don't have to deal with that.
03:02 KENNEDY: Yeah, and the example that you have in the show notes here uses SQLite as the back-end database? But it uses the memory connection version. So you can just load it up with data and then query it and work with it and then when your app shuts down, it just goes away. Maybe you output stuff to a JSON file or whatever. So you don't even have to store the database necessarily.
03:22 OKKEN: Yeah, to be able to use some queries on information. That's interesting. But you can also just play with it this way and then turn it into a file-stored, file-backed database as well even with SQLite or with something else.
03:34 KENNEDY: Yeah, absolutely. It's a good find. It's quite interesting. I like it. So I have a question for you Brian. Why is NumPy, not this thing that we're going to talk about next, but NumPy itself, faster than regular Python? Do you know?
03:46 OKKEN: I think 'cause it's got stuff compiled in C.
03:49 KENNEDY: Exactly, 'cause it's written in C and it can even do parallelism and stuff. It could take advantage of cores. Like my new laptop is ridiculous. It has 12 cores.
03:57 OKKEN: Nice.
03:57 KENNEDY: That's a lot, right? So, maybe like we could actually take advantage of that with NumPy. Well, if you have NumPy code, this thing that I'm going to tell you about takes it to another level. It's called CuPy, I think is how you say it, 'cause I think it's based on CUDA, so CuPy is what I'm going to go with. And it's full name is CuPy GPU NumPy. And the idea is, it's a API compatible library with NumPy. So, all the NumPy features that it has, that NumPy has, you can call the same functions on CuPy, I called it, but, instead of running on your six, four, whatever cores you have on you machine, it runs them on the GPU cores, which is insane.
04:41 OKKEN: Oh, wow! Okay.
04:42 KENNEDY: Isn't that cool? And I looked. I looked and I did a quick little search, just like, hey, what's like a modern machine-learning or data science-type GPU you might get? So, pretty standard one might be the GeForce GTX 1080 Ti. These things are getting super expensive because of all the Bitcoin miners and stuff but, anyway, you get one of these things, and it has 3,584 cores, 3,000, and you can run your code parallel on all of those.
05:14 OKKEN: Wow, okay.
05:14 KENNEDY: So instead of, like, having like, oh my gosh, I can't believe I have 12, no, you have 3,500. And all you have to do, the only line of code you have to change is, instead of import numpy as np, which is the very common thing that people do, you would say, import cupy as np. That's it. And now you're running all these CUDA cores doing GPU-backed data science.
05:39 OKKEN: Okay, do you remember what CUDA is?
05:41 KENNEDY: I don't know what CUDA stands for. I bet it's an acronym 'cause it's all capitalized. There's a lot. There's like layers upon layers of acronyms here. I don't actually know what CUDA cores are.
05:53 OKKEN: Okay, didn't mean to put you on the spot.
05:55 KENNEDY: It's the mechanism of parallelism on the GPU, basically.
05:58 OKKEN: Okay, nice.
06:00 KENNEDY: But I don't actually know more than that. Isn't that cool?
06:02 OKKEN: I like it.
06:04 KENNEDY: Yeah, yeah, so it's really cool. It has its compatible API. I threw a little code sample in the show notes there. And if, for some reason, you're like, you know, I actually need to customize how my code runs on the GPU, which is a thing sometimes people do, you can like program against the CUDA cores and CUDA kernels and things like that. You can actually embed in your Python code, C++ code, and CuPy will actually compile that down to a CUDA binary, which is even cooler.
06:34 OKKEN: Okay, I was just curious. So I'm really not a hardware guy, so bear with me. You said you have 12 cores. Is it on a laptop that you're running?
06:41 KENNEDY: Yeah, it's the new MacBook Pro. So it's the Intel Core i9 maxed out. It's really six cores that are each hyper-threaded is how it works, but the OS sees them as 12.
06:52 OKKEN: So, are there GPUs on a normal laptop or on your laptop or is this...
06:55 KENNEDY: Yes.
06:57 OKKEN: GPU's just something that, okay.
06:58 KENNEDY: No, no, no, there's a pretty high-end one on my MacBook. It's not as high-end as this. It's not even close, but, maybe half or something, I would guess, in terms of performance? It's a pretty bad estimate 'cause I don't really know. But, yeah, you could run this on a laptop.
07:13 OKKEN: Yeah, I'm just curious if the CuPy would speed up things on, just on a laptop or something or if it would...
07:19 KENNEDY: I would think so. I mean, you got to have an algorithm that's like well-adapted to GPUs. But if you did, then I would think so, yeah.
07:27 OKKEN: Okay, well this is neat. The people that really care about it, really care about it. So this is cool.
07:31 KENNEDY: Yeah, absolutely. You can go and get GPU clusters on AWS or on DigitalOcean or things like that.
07:39 OKKEN: Yeah. Okay.
07:39 KENNEDY: And so, you could actually ship your code up there even if you don't have one. Final note on this one is, there was a PyCon 2018 presentation on this and so I'm going to link to the presentation as well if people want to watch 30 more minutes of this.
07:55 OKKEN: I think I would.
07:55 KENNEDY: Yeah, I actually do too. It looks really interesting.
07:57 OKKEN: Yeah, all right.
07:58 KENNEDY: I feel a theme coming on.
08:00 OKKEN: In Episode 84 we did touch on, somebody called in, called in? We actually don't have phone lines. Somebody contacted us and said, hey, you should cover pre-commit, and we have. We did talk about pre-commit in Episode 84, but we just sort of talked about what it was. But today, I ran across this fairly cool article called The Automated Python Workflow Using Pre-Commits. I like this kind of an article, actually, of, okay, here's these cool tools. Using pre-commit black and flake8 how do I put that in my day-to-day workflow and how does it really work? And, this is from LJ Miranda, so good job LJ. It's got a great graphic at the start with telling you that you've got changes. When you add something you go to git add and you go to staging and then when you do a commit, what happens is the pre-commit will intercept that part and it kicks off whatever pre-commit hooks you've got set up. And, if all of those pass, then it lets the commit happen. And if it doesn't, it kicks it back. And then it shows you how to deal with all of the different configuration that is available with pre-commit. I like this. It's a good starter. If you're still quite not onboard with pre-commit, this is a good article to read.
09:13 KENNEDY: Yeah, pre-commit's pretty cool and that's a Python package that you install that then manages all the rest, which I think that's great.
09:20 OKKEN: Yeah, in this article, there's a little video and I think it's a animated GIF or something, a little short demo video that runs. I don't know how they do this. This is neat. So, it shows it in there actually.
09:32 KENNEDY: Yeah, yeah, that's really cool. I like those little auto play GIFs that'll animate stuff 'cause sometimes it's like, you know, if you could just see it happening, it would be so much more easy to grok with what this little picture's trying to tell me.
09:44 OKKEN: Yeah, and I also don't mind, something like that is fine if it has like an actual video to play but, don't give me a half-hour video. A little couple-minute video at most is great.
09:53 KENNEDY: Yeah, a half-hour GIF, probably not the way to go.
09:56 OKKEN: I don't even know if you can do that.
09:59 KENNEDY: So, one way to go that is good though, is to check out Datadog. So this episode's sponsored by Datadog, as I said, and I really appreciate them supporting the podcast. Datadog is a monitoring platform that brings metrics, logs, requests, traces, all that kind of stuff, into one place across different systems and computers and all sorts of stuff. So you can use their Trace Search & Analysis, which lets you break down Python application performance using high-cardinality attributes like show me what this customer's done across my application or, show me all the behaviors for this URL and really easy to troubleshoot your app. So, start doing that with your Python apps today with a free trial and Datadog will send you a free t-shirt which has a cute little dog on it. So visit them at pythonbytes.fm/datadog to get started. So, you were talking earlier about that cool little GIF thing and I think you can do it with Camtasia. Like you can record... So I think you can do it with Camtasia. You can record, basically, a screen cast and export it as a GIF, which is already pretty cool.
11:00 OKKEN: Oh, okay.
11:00 KENNEDY: But, this next item has a really nice, little animated GIF thing going on as well because it's super good to see it in action. So have you heard about py-spy?
11:11 OKKEN: I have not.
11:11 KENNEDY: So py-spy is interesting for a couple of reasons. It's interesting because it's a cool tool that people can use in some places that they could not previously do so. It's a Python profiler. So you can hook it up to your Python application and it'll tell you where your Python app is spending its time, what functions and what it's doing and things like that. And it acts kind of like the Unix top command, which will take over your screen and it'll show you a list that's kind of updating every couple of seconds, what's happening. That's pretty cool. So I can hook up this profiler and it'll, live, show me sort of the equivalent of like a process report, like a CPU usage report. But it'll say, right now you have these various functions that have run recently and here, like, we'll put the most expensive ones on top, things like that.
11:54 OKKEN: Oh, neat!
11:54 KENNEDY: And it's cool, right? So you can watch that little graph, that little GIF thing and see it going. This is written by Ben Frederikson and it's just taken off. I think it was started in July or something like that. It's already got 2,000 GitHub stars. What's even cooler though, is it'll let you visualize your Python's app's time without restarting or modifying your code in any way. And it can attach to running processes and then start to profile them.
12:19 OKKEN: Oh, nice!
12:19 KENNEDY: So normally, profiling happens by, I run a profiler which runs my code, which does a bunch of stuff, or, maybe, reverse, I'll write some code like import CProfile, and I'll call a function, start profile, save profile, export, et cetera. Right, like, it's really invasive. So if you do it from the outside, the profiler runs your app, it can't do it in production, it makes it slow, all sorts of stuff. If you do it the other way, you're doing all sorts of writing code to change it. This, you just say, hey, there's a random Python program. I'm going to go profile that. And it'll just attach to it.
12:50 OKKEN: Nice, yeah, you can just give it a process ID.
12:52 KENNEDY: Yeah, exactly, give it a PID. And what's cool is, because of that, that means you can use it in production. I could log into my web server that's getting pounded on, not responding correctly, or whatever, and I could actually begin to profile it without wrecking my thing or slowing it down or restarting it or whatever. Any long-running process.
13:13 OKKEN: Like while the problem is happening, you can just attach to it and figure out what's wrong.
13:18 KENNEDY: That's the key thing 'cause maybe restarting it, and rerunning it, it takes like four hours to get into that weird state. You never know, right?
13:23 OKKEN: Yeah. Oh, yeah. This is cool, sweet!
13:25 KENNEDY: It's pretty trick. So it's written in Rust, actually, but it's PIP installable, so all sorts of cool things. And then, he even goes into, then goes into how does it work. So he has a section on how does py-spy work? So I'll just read this and you tell me if this sounds like a program you would have written. It's not what I would have. py-spy works by directly reading the memory of the python program using process_vm_readv system call on Linux or vm_read on OSX or ReadProcessMemory on Windows, and then it just analyzes the memory over and over. That's crazy, right? But he knows enough about Python to go, well that means X and it just, off it goes. So, there's a bunch more details on how he actually makes it work. I'll link to that section as well. It's a pretty cool profiler and I really like the attach to running processes without affecting them. That's pretty unique, I think, and so I wanted to highlight it.
14:23 OKKEN: Yeah, nice, and it can do icicle graphs, which I don't know why that would be neat but, it looks neat.
14:29 KENNEDY: Yeah, those are cool. I get, sometimes, just visually, some things are really out-of-whack and you're like, what is that big bar from? Oh, that's array.sort. Why are we calling that a thousand times? Things like that. Let's just sort it once. All right, what do you got for us next?
14:44 OKKEN: I've got SymPy which is just sort of fun. SymPy is a, it's a, well I'm just going to read the little bit here too. Symbolic computation. So like you're in math class or something. We realized early on with programming that you can, if you punch things into the calculator too fast, it just mucks things up because you have rounding and various things like that. So symbolic computation deals with the computation of mathematical objects symbolically. This means that mathematical objects are represented exactly, not approximately, and math expressions with unevaluated variables are left in symbolic form. SymPy allows you to do that with Python and it's sort of blasted cool. I've got a little example of doing an integration of the sine of X squared over negative infinity to positive infinity and it will tell you what the answer is. And, these sorts of symbolic math manipulations, for a lot of people, boy, if I had to do this by hand, I'd be in trouble. I did not do that well in math. And so, being able to do this programmatically is cool. And the introduction and the website is pretty awesome too. It has a bunch of live. It's got a engine in the back that runs it so, you can try the examples out and pop up a little window and do it interactively. So this is neat.
16:11 KENNEDY: Yeah, there's a ton of cool stuff that comes out of this. So, for example, you can say, X comma Y equals symbols X and Y, and then, after that, you can do algebraic expressions like truly algebraically. So, like expression equals X plus 2Y, not in quotes or anything, just like as if it were regular math, and then you could like add one to that expression and it'll reform the equation and stuff like that. You can ask it to do integration. Like the example you have in our show notes is to integrate sine of X squared from minus infinity to positive infinity. Instead of giving you the answer of, oh my gosh, what is that, like, 1.5 dot dot dot dot, it just says, that's square root of two pi over two. You know like, the exact answer. That is pretty awesome. You know, we just wrecked the whole math experience for so many of our listeners who are students. They're like, you know what? That calculus class, I just solved that problem.
17:05 OKKEN: Well, I would have loved this while I was taking calculus.
17:09 KENNEDY: Yeah, for sure. Yeah, you could totally check your work. Like, there's no answers in the book. Oh yeah? Really? Hold on. They're right here. That's pretty awesome.
17:19 OKKEN: So if you could take your laptop to your tests, you're set.
17:22 KENNEDY: Yeah, probably not likely.
17:24 OKKEN: Probably not.
17:24 KENNEDY: All right, so this next one that I found, Brian, it's pretty cool. So, something that I've been digging into lately behind the scenes and I'm going to be talking more and more about, probably, in the next few weeks, is async programming in Python. I've really been doing a lot with that lately and we'll have some cool stuff to share pretty soon. But, that means I'm running across all this cool async stuff. So you've heard of WSGI, W-S-G-I, which is the Web Server Gateway Interface. That's like how Pyramid, Flask, Django, all those things, work. None of them do a great job of supporting async programming because, fundamentally, this WSGI interface is synchronous. It can't be made async. So there's this other framework called ASGI, for Async Gateway Interface, I guess, that allows these frameworks to be asynchronous. So, the thing I'm talking about this week is Starlette, which is an ASGI web framework. And, I like its little subtitle, the little ASGI framework that shines.
18:25 OKKEN: It's cute.
18:27 KENNEDY: It is cute. So it's basically built, intended to build high-performance asyncio services. So, if you have anything that talks to a database, to caches, to file systems, things like that, you can call its other web services or micro services. Super easy to build. The API is basically Flask, like a Flask-ish API, and you create like a web method. You say, async def regular view method. And you go do a bunch of stuff. And it has cool support for like response types. So you can have a file response object that you just send back to the framework that's based on async, aio files, which is a asyncio file-based thing, and there's a lot of nice integration like that.
19:10 OKKEN: Okay. You're just interested in this or do you have an application that you're going to try to...
19:14 KENNEDY: No, I'm building a course on it.
19:16 OKKEN: Oh, okay.
19:17 KENNEDY: Trying to make a nice, well-rounded async concurrent programming Python course.
19:21 OKKEN: Oh, then, well yeah.
19:22 KENNEDY: So I've been building tons of little apps and stuff. So, here we go, here's one of them.
19:26 OKKEN: Cool.
19:26 KENNEDY: If you want to build an app that is way more scalable, 10 times more scalable than regular web apps on the same hardware and whatnot, it's pretty easy to do if, mostly what that web app is doing is waiting. You can just, basically the asyncio web frameworks can just adapt to that more easily 'cause they're not blocking all the wait. I also discovered a couple cool things while looking into this. One is, they say, you should install the UltraJSON package, which the pip install command was ujson, and that is a, basically, a drop and replacement for the JSON built-in that is like between 50% and three times faster. So if you're doing a lot of JSON, you can just use UltraJSON, and that's pretty awesome.
20:13 OKKEN: Yeah, okay. I'll have to check that out too. Neat.
20:16 KENNEDY: Yeah, and it's, so all you have to do is, import ujson as json, and that makes your code faster. Of course, it has to be there but, that's pretty sweet. The other thing is, you've maybe heard of Gunicorn for the traditional web frameworks? There's a uvicorn, which is based on uv loop and Gunicorn, which is also pretty awesome for these async web frameworks.
20:39 OKKEN: Well, it's cool and I get the name, but, eventually, after everybody starts using that, people will forget where that came from and it's just going to be a weird word, uvicorn.
20:51 KENNEDY: I know, it'll be uvicorn, no, it's u-v-icorn. You got to understand where the name comes from, come on. Get it together. Well that's it for our items. I do have some extra stuff to share. How about you?
21:02 OKKEN: Just one thing I wanted to point out if I remember it.
21:04 KENNEDY: Okay, cool, I'll go first and you can think. So, really big news. You and I, we had a good time at PyCon, right?
21:10 OKKEN: Oh, yeah.
21:11 KENNEDY: I can tell you when we're going to have a good time again. It's going to be, if we can go to a tutorial, it's May 1st and 2nd. If you want to go to talks, it's May 3rd, 4th and 5th. And if you want to do the sprints, it's May 6, 7, 8 and 9. So, basically, the announcement is that, the PyCon dates are out.
21:30 OKKEN: Yes, and it's not over, I don't think it's over Mother's Day this year...
21:35 KENNEDY: I hope it's not.
21:35 OKKEN: So that's good.
21:36 KENNEDY: Hope it's not. I also have a quick little follow-up. You talked about the pre-commit package. Also another listener, Matthew Layman, sent in some notes about how his team is using it. And, basically, talked about how they're using pre-commit, the Python package, so that like their flake8 and black and other things that automatically run during continuous integration, also automatically run when people do Git commits so they have fewer failing builds, which is pretty awesome, and he has a couple of nice links. So I threw that in there at the end. And then, finally, you talked about the Gang of Four patterns last week, right? So, John Tocher, I think, is right, sent us a message pointing out another talk from PyCon AU called You don't need that, which is pretty cool. And it basically talks about how if you studied the Gang of Four patterns, a lot of what they were doing was because they were using Smalltalk or Java or C++ and, in Python, here's a new way that you just, basically, don't need that pattern. So, a pretty cool talk and I'll link to the video for that.
22:40 OKKEN: Yeah, the if you translated the Gang of Four book directly to Python, it would be like a pamphlet.
22:48 KENNEDY: That's right. Nice, did you remember your item?
22:51 OKKEN: I did not, so, oh,
22:53 KENNEDY: Save it for next week?
22:54 OKKEN: Yeah, yeah. Yeah, I'll save it for next week.
22:57 KENNEDY: We'll do this again next week, right?
22:58 OKKEN: Yeah, maybe we should just do it every week.
23:00 KENNEDY: Yeah, all right, deal. We'll do it every week.
23:03 OKKEN: Okay, cool.
23:03 KENNEDY: Cool, all right. Well, thanks for doing the show this week.
23:06 OKKEN: Thank you.
23:06 KENNEDY: You bet, bye.
23:06 OKKEN: Bye.
23:08 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.