Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #95: Unleash the py-spy!

Return to episode page view on github
Recorded on Wednesday, Sep 12, 2018.

00:00 Hello and welcome to Python bites where we deliver Python news and headlines directly to your earbuds. This is episode 95 recorded September 12th 2018 I'm Michael Kennedy and I'm Brian. Okay. Hey Brian. How you doing this fine fine Wednesday. I am excellent nice It's also excellent that data dog is sponsoring a show. So for getting further tell them Thank you, I found by setup m/data dog. It's a cool shirt if you go there and follow along We'll talk more about that later You know, I've been I feel like summer is coming to an end and I'm I've been quite lazy all summer I haven't really I'm not sure I'm ready to get back into the main swing of things, but it's upon us Yeah, and you know who else is lazy programmers are lazy productively lazy. They put it into good use Yes, they make lazy a virtue And so this was our segue from nothing into the first item which is a data set and And Dataset is a Python package that builds itself as databases for lazy people.

00:58 And this is actually something I totally want to try because it looks fun.

01:03 Their premise is programmers are lazy.

01:06 Oh, it says at first, I'll just read some of the top.

01:09 Although managing data in relational databases has plenty of benefits, they're rarely used in day to day work with small and medium scale datasets.

01:18 But why is that?

01:19 it's because people are lazy and they'll throw it in JSON or CSV instead.

01:23 Oh, they say the answer is programs are lazy.

01:25 They'll use the easy solution and I guess I can't disagree.

01:29 I've used JSON format as essentially a local database before.

01:34 But this is kind of cool.

01:36 What it is, is it's built on top of Alchemy.

01:40 So it's built on top of SQLAlchemy so it can work with any database or a SQL style database.

01:46 And it's just really easy. It looks kind of like a NoSQL.

01:51 It's kind of hard to describe, of course, over the air, but it's pretty simple and worth checking out, I think.

01:57 Yeah, I like it. It does automatic schema creation, upserts, has query helpers like distinct and stuff like that.

02:04 So if you were to say, I'm just going to use like an in-memory dictionary or other things like that, it's kind of nice that it helps with some of those things.

02:11 So you just said a couple of terms that I don't even know what those are.

02:14 are so absurd absurd is I have a record and I'm going to try to save it to the database.

02:21 If it does not exist in the database and update would fail right but if it already exists an insert would fail because it would be a duplicate key and absurd says hey data access layer take this thing and save it if it doesn't exist put it in there as a new thing if it does make an update and set the values to the new one.

02:38 Okay and it's also deals with sparse stuff so one of their initial examples is let's say you've got people or something and then the first person you give them a name and an age and you can insert that.

02:49 The second person comes along and you give them a name and age and a gender.

02:53 And then you can, you know, search easily and it, yeah, like you said, it deals with the schema for you already and you don't have to deal with that.

03:02 Yeah, and the example that you have in the show notes here uses SQLite as the back-end database, but it uses the memory connection version.

03:11 So you can just load it up with data and then query and work with it.

03:13 And then when your app shuts down, it just goes away.

03:16 Maybe you output stuff to a JSON file or whatever.

03:19 So you don't even have to store the database necessarily.

03:22 Yeah, to be able to use some queries on information, that's interesting.

03:25 But you can also just play with it this way and then turn it into to a file stored file back database as well with even with SQLite or with something else.

03:34 Yeah, absolutely.

03:35 It's a good find. It's quite interesting. I like it.

03:37 Yeah.

03:38 So I have a question for you, Brian.

03:39 Okay.

03:40 Why is NumPy, not this thing that we're going to talk about next, but NumPy itself faster than regular Python? Do you know?

03:46 I think because it's got stuff compiled in C.

03:49 Exactly, because it's written in C, and it can even do parallelism and stuff.

03:52 It can take advantage of cores.

03:54 My new laptop is ridiculous. It has 12 cores.

03:56 Nice.

03:57 That's a lot, right?

03:58 So maybe we could actually take advantage of that with NumPy.

04:01 Well, if you have NumPy code, this thing that I'm going to tell you about takes it to another level. It's called KooPi, I think is how you say it, because I think it's based on CUDA. So KooPi is what I'm going to go with. And it's, its full name is KooPi GPU NumPy.

04:18 And the idea is it's a API compatible library with NumPy. So all the NumPy features that it has, that NumPy has, you can call the same functions on KooPi, I called it. But instead of running on on your, you know, six, four, whatever cores you have on your machine, it runs them on the GPU cores, which is insane.

04:41 Oh, wow.

04:42 Okay, cool.

04:43 And I looked, I looked and I did a quick little search, just like, hey, what's like a modern machine learning or data science type GPU you might get.

04:53 So pretty standard one might be the GeForce GTX 1080 Ti.

04:58 Okay, these things are getting super expensive because of all the Bitcoin miners and stuff.

05:03 But anyway, you get one of these things and it has 3584 cores.

05:12 And you can run your code parallel on all of those.

05:15 So instead of, you know, like having like, oh my gosh, I can't believe I have 12.

05:19 No, you have 3500.

05:22 And all you have to do, the only line of code you have to change is instead of import numpy NumPy as NP, which is the very common thing that people do.

05:30 You would say, "Import CUPI as NumPy." That's it.

05:34 And now you're running on these CUDA cores doing GPU-backed data science.

05:39 Okay, do you remember what CUDA is?

05:41 I don't know what CUDA stands for.

05:43 I bet it's an acronym because it's all capitalized.

05:45 Yeah.

05:46 There's a lot.

05:47 There's like layers upon layers of acronyms here.

05:50 I don't actually know what CUDA cores are.

05:53 Okay, yeah.

05:54 I didn't mean to put you on the spot.

05:55 the mechanism of parallelism on the GPU.

05:58 >> Okay. Nice.

06:00 >> But I don't actually know more than that.

06:02 >> Yeah.

06:03 >> Isn't that cool?

06:03 >> I like it.

06:04 >> Yeah. It's really cool.

06:06 It has this compatible API through a little code sample in the show notes there.

06:11 For some reason, you're like, I actually need to customize how my code runs on the GPU, which is a thing sometimes people do.

06:19 You can program against the CUDA cores and CUDA kernels and things like that.

06:23 you can actually embed in your python code C++ code and could pie will actually compile that down to a binary.

06:32 Which is even cooler ok i was just curious so i'm really not a hardware guy so bear with me.

06:38 You said you have twelve cores is on a laptop that you're running yeah it's a new macbook pro it's a intel core i nine maxed out.

06:46 It's really six cores that are each hyper-threaded is how it works, but the OS sees them as 12.

06:52 So are there GPUs on a normal laptop or on your laptop or is this GPU just something that...

06:58 No, no, there's a pretty high-end one on my MacBook.

07:02 It's not as high-end as this, not even close, but maybe half or something I would guess in terms of performance.

07:08 That's a pretty bad estimate because I don't really know, but yeah, you could run this on a laptop.

07:12 Yeah, I'm just curious if the Coupie would speed up things on just on a laptop or something or if it would I would think so I mean you got to have an algorithm that's like well adapted to GPUs, but if you did then I would think so Yeah, okay. Well, this is neat for the right the people that really care about it really care about it. So this is cool Yeah, absolutely. And I mean you can go and get like GPU clusters on AWS or on digital ocean or things like that Yeah, okay, and so you could actually ship your code up there Even if you don't have one final note on this one is there was a pycon 2018 Presentation on this and so I'm gonna link to the presentation as well people want to watch 30 more minutes of this Yeah, I actually do too it looks really interesting yeah, all right I'm feeling a theme coming on in episode 84 we did touch on somebody called in or called in We actually don't have phone lines somebody contacted us and said hey you should cover pre commit And we have, we did talk about pre-commit in episode 84, but we just sort of talked about what it was.

08:15 But today I ran across this fairly cool article called "Automate Python Workflow Using Pre-Commits." I like this kind of an article actually of, okay, here's these cool tools, using pre-commit, black and flake, how do I put that in my work day-to-day workflow?

08:33 And how does it really work?

08:34 And this is from LJ Miranda.

08:37 So good job, LJ.

08:38 it's got a great graphic at the start with telling you that you've got changes.

08:42 When you add something, you go to get add, you go to staging.

08:46 Then when you do a commit, what happens is the pre-commit will intercept that part, and it kicks off whatever pre-commit hooks you've got set up.

08:54 If all of those pass, then it lets the commit happen, and if it doesn't, it kicks it back.

09:00 Then it shows you how to deal with all of the different configuration that is available with pre-commit.

09:07 I like this. It's a good starter. If you're still quite not on board with pre-commit, this is a good article to read.

09:13 Yeah, pre-commit's pretty cool. And that's a Python package that you install that then manages all the rest, which I think that's great.

09:20 Yeah, this article, there's a little video, and I think it's an animated GIF or something, a little short demo video that runs.

09:27 I don't know how to do this. This is neat. So it shows it in action.

09:31 - Yeah, yeah, that's really cool.

09:33 I like those little auto-play GIFs that'll animate stuff, 'cause sometimes it's like, you know, if you could just see it happening, it would be so much more easy to grok what this little picture's trying to tell me.

09:44 - Yeah, and I also don't mind, something like that is fine if it has an actual video to play, but don't give me a half hour video.

09:50 A little couple minute video at most is great.

09:53 - Yeah, a half hour GIF, probably not the way to go.

09:56 (laughing)

09:57 - I don't even know if you can do that.

10:00 So the way one way to go that is good though is to check out data dog.

10:04 So this episode sponsored by data dog as I said and I really appreciate them supporting the podcast.

10:09 Data dog is a monitoring platform that brings metrics logs request traces all that kind of stuff into one place across different systems and computers and all sorts of stuff.

10:19 So you can use their trace search and analysis which lets you break down Python application performance using high cardinality attributes like show me what this customer has done across my application or show me all the behaviors for this URL and really easy to troubleshoot your app.

10:36 So start doing that with your Python apps today with a free trial and Datadog will send you a free t-shirt which has a cute little dog on it.

10:44 So visit them at pythonbytes.fm/datadog to get started.

10:48 So you were talking earlier about that cool little GIF thing and I think you can do it with Camtasia.

10:52 Like you can record.

10:53 Okay.

10:54 So I think you can do it with Camtasia.

10:55 basically a screencast and export it as a GIF which is already pretty cool.

11:00 But this next item has a really nice little animated GIF thing going on as well because it's super good to see it in action.

11:08 So have you heard about PySpy?

11:10 I have not.

11:11 So PySpy is interesting for a couple of reasons.

11:14 It's interesting because it's a cool tool that people can use in some places that they could not previously do so.

11:20 It's a Python profiler so you can hook it up to your Python application and it'll tell you where your Python app is spending its time, what functions and what it's doing and things like that.

11:30 And it acts kind of like the Unix top command, which will take over your screen and it'll show you a list that's kind of updating every couple of seconds what's happening.

11:39 That's pretty cool.

11:39 So I can hook up this profiler and it'll live show me sort of the equivalent of like a process report, like a CPU usage report, but it'll say right now you have these various functions that have run recently and here, like we'll put the most expensive ones on top, things like that.

11:54 Oh, neat.

11:55 That's cool, right?

11:56 And so you can watch that little graph, that little GIF thing and see it going.

11:59 This is written by Ben Fredrickson, and it's just taken off.

12:02 I think it was started in July or something like that.

12:05 It's already got 2000 GitHub stars.

12:07 So what's even cooler though, is it'll let you visualize your Python's apps time without restarting or modifying your code in any way.

12:15 And it can attach to running processes and then start to profile them.

12:19 Oh, nice.

12:20 Normally profiling happens by I run a profiler which runs my code, which does a bunch of stuff or maybe I reverse, I'll write some code like imports cprofile and I'll call a function startprofile, saveprofile, export, etc.

12:34 It's really invasive.

12:35 If you do it from the outside, like the profiler runs your app, you can't do it in production, it makes it slow, all sorts of stuff.

12:41 If you do it the other way, you're doing all sorts of writing code to change it.

12:45 This you just say, "Hey, there's a random Python program.

12:47 I'm going to go profile that and it'll just attach to it.

12:50 Nice.

12:51 Yeah.

12:51 You can just give it a process ID.

12:52 Yeah, exactly.

12:53 Give it a PID.

12:54 And what's cool is because of that, that means you can use it in production.

12:58 I could log into my web server.

13:00 That's getting pounded on not responding correctly or whatever.

13:04 And I could actually begin to profile it without like wrecking my thing or slowing it down or restarting it or whatever.

13:11 Or any long running process.

13:13 Like while the problem is happening, you can just attach to it and figure out what's wrong. That's the key thing because maybe restarting it rerunning it takes like four hours to get into that weird state you never know right yeah oh yeah this is cool sweet that's pretty trick so it's written in rust actually but it's pip installable so all sorts of cool things and then he even goes into Ben goes into how does it work so here's a section on how does PySpy work so I'll just read you this and tell me this sounds like a program you would have written it's not what I would have PySpy works by directly reading the memory of the Python program using process VM red V system call Linux or VM read on OS 10 or read process memory on Windows and then it just analyzes the memory over and over that's crazy right but it knows enough about Python to go well that means X and it just you know off it goes so there's a bunch of more details on how he actually makes it work I'll link to that section as well it's a pretty cool profiler and I really like the attach to running processes without affecting them.

14:20 That's pretty unique I think and so I wanted to highlight it.

14:23 Yeah, nice.

14:24 And it can do icicle graphs which I don't know why that would be neat but it looks neat.

14:29 Yeah, those are cool.

14:30 I get sometimes you know just visually some things are really out of whack and you're like what is that big bar from?

14:36 Oh, that's a radot sort.

14:38 Why are we calling that a thousand times?

14:40 Yeah.

14:41 Things like that.

14:42 Let's just sort it once.

14:43 All right, what do you got first next?

14:44 I've got SimPy, which is just sort of fun.

14:48 SimPy is a, well, I'm just gonna read it a little bit here too, symbolic computation.

14:55 So like you're in math class or something, we realized early on with programming that you can, if you punch things into the calculator too fast, it just mucks things up because you have rounding and various things like that.

15:07 So symbolic computation deals with the computation of mathematical objects symbolically.

15:12 This means that mathematical objects are represented exactly, not approximately.

15:16 And math expressions are with unevaluated variables are left in symbolic form.

15:22 And, SymPy allows you to do that with Python and it's sort of blasted cool.

15:29 I've got a little example of, doing an integration of, of the sign of X squared over negative infinity to positive infinity, and it will tell you what the The answer is, and these sorts of symbolic math manipulations, for a lot of people, boy, if I had to do this by hand, I'd be in trouble.

15:50 I did not do that well in math.

15:52 And so being able to do this programmatically is cool.

15:55 And the introduction and the website is pretty awesome too.

15:58 It has a bunch of live, it's got engine in the back that runs it.

16:03 So you can try the examples out and pop up a little window and do it interactively.

16:10 So this is neat.

16:11 - Yeah, there's a ton of cool stuff that comes out of this.

16:14 So for example, you can say, X comma Y equals symbols X and Y.

16:18 And then after that, you can do algebraic expressions, like truly algebraically.

16:23 So like expression equals X plus two Y, not in quotes or anything, just like as if it were regular math.

16:29 And then you could like add one to that expression and it'll reform the equation and stuff like that.

16:35 You can ask it to do integration.

16:37 Like the example you have in our show notes is to integrate sine of x squared from minus infinity to positive infinity.

16:43 Instead of giving you the answer of, oh my gosh, what is that like?

16:47 1.5 dot dot dot dot.

16:49 It just says that's square root of 2 pi over 2.

16:54 The exact answer.

16:55 That is pretty awesome.

16:56 We just wrecked the whole math experience for so many of our listeners who are students.

17:01 They're like, you know what?

17:02 That calculus class, I just solved that problem.

17:05 - Well, I would have loved this while I was taking calculus.

17:09 - Yeah, for sure.

17:10 Yeah, you could totally check your work.

17:11 Like, there's no answers in the book.

17:12 Oh yeah, really?

17:13 Hold on.

17:14 (laughing)

17:16 There's an answer right here.

17:18 That's pretty awesome.

17:19 - So if you can take your laptop to your tests, you're set.

17:22 - Yeah, probably not likely.

17:24 - Probably not.

17:25 - So this next one that I found, Brian, it's pretty cool.

17:28 So something that I've been digging into lately behind the scenes, and I'm gonna be talking more and more probably in the next few weeks is async programming in Python. Like I've really been doing a lot with that lately and we'll have some cool stuff to share pretty soon. But that means I'm running across all this cool async stuff. So you've heard of WSGI, which is the Web Service Gateway Interface. That's like how Pyramid, Flask, Django, all those things work. None of them do a great job of supporting async programming because fundamentally this WSGI interface is is synchronous and can't be made async.

18:03 - Yeah.

18:04 - So there's this other framework called ASGI for async gateway interface, I guess, that allows these frameworks to be asynchronous.

18:13 So the thing I'm talking about this week is Starlet, which is an ASGI web framework.

18:20 And I like its little subtitle, the little ASGI framework that shines.

18:25 (laughing)

18:26 - It's cute.

18:28 - It is cute.

18:29 So it's basically built, intended to build high performance async I/O services.

18:35 So if you have anything that talks to a database, to caches, to file systems, things like that, even calls other web services or microservices, super easy to build.

18:45 The API is basically Flask, like a Flask-ish API.

18:50 And you create a web method, you say async def regular view method, and you go do a bunch of stuff.

18:56 It has cool support for like response types.

18:59 So you can have a file response object that you just send back to the framework that's based on async AIO files, which is an async IO file based thing.

19:08 And there's a lot of nice integration like that.

19:10 Okay.

19:10 You're just interested in this or do you have an application that you're going to try to?

19:14 No, I'm building a course on it.

19:16 Oh, okay.

19:16 Trying to make a nice, well-rounded async concurrent programming Python course.

19:21 Oh, well, yeah.

19:22 So I've been building tons of little apps and stuff.

19:25 So here we go.

19:26 Here's one of them.

19:26 If you want to build an app that is way more scalable, you know, 10 times more scalable than regular web apps on the same hardware and whatnot, it's pretty easy to do if mostly what that web app is doing is waiting, right?

19:40 You can just, you know, basically the async IO web frameworks can just adapt to that more easily 'cause they're not blocking while they're waiting.

19:49 Also discovered a couple cool things while looking into this.

19:52 One is they say you should install the UltraJSON package, which the pip install command was UJSON, and that is a basically a drop-in replacement for the JSON built-in that is like between 50% and three times faster.

20:10 So if you're doing a lot of JSON, you can just use UltraJSON and that's pretty awesome.

20:13 - Yeah, okay, I have to check that out too.

20:16 - Yeah, so all you have to do is, you know, Import a ujson as json and then that like makes your code faster. Of course it has to be there, right?

20:25 But yeah, that's pretty sweet. The other thing is you've maybe heard of g unicorn for the Traditional web frameworks. There's a uveicorn which is based on uv loop and g unicorn Which is also pretty awesome for these async web frameworks Well, it's cool and I get the name but eventually if if we if everybody starts using that people forget where that came from and it's just going to be a weird word, uviacorn.

20:50 I know, uvicorn, no it's uviacorn, you got to understand where the name comes from, come on.

20:57 Get it together.

20:58 Yeah.

20:59 Well, that's it for our items.

21:00 I do have some extra stuff to share, how about you?

21:02 Just one thing I wanted to point out, if I remember it.

21:04 Okay, cool.

21:05 I'll go first, you can think.

21:07 So really big news, you and I, we had a good time at PyCon, right?

21:10 Oh yeah.

21:11 I can tell you when we're going to have a good time again.

21:15 It's going to be, if we go to a tutorial, it's May 1st and 2nd.

21:20 If you want to go to talks, it's May 3rd, 4th, and 5th.

21:24 And if you want to do the sprints, it's May 6th, 7th, 8th, and 9th.

21:27 So basically, the announcement is that the PyCon dates are out.

21:30 Yes, and it's not over.

21:32 I don't think it's over Mother's Day this year.

21:35 I hope it's not.

21:36 I hope it's not.

21:37 I also have a quick little follow up.

21:41 You talked about the pre-commit package.

21:42 another listener, Matthew Lehman, sent in some notes about how his team is using it, and basically talked about how they're using pre-commit the Python package so that like their Flake 8 and Black and other things that automatically run during continuous integration also automatically run when people do git commits.

22:03 So they have fewer failing builds, which is pretty awesome, and has a couple of nice links.

22:08 So I threw that in there at the end.

22:10 And then finally, you talked about the Gang of Four patterns last week, right?

22:13 Yeah.

22:14 So, John Tosher, I think is right, sent us a message pointing out another talk from PyCon AU called "You Don't Need That," which is pretty cool.

22:24 And it basically talks about how if you study the Gang of Four patterns, a lot of what they were doing was because they were using Smalltalk or Java or C++.

22:33 And in Python, here's a new way that you just basically don't need that pattern.

22:37 So pretty cool talk and that was, you know, I'll link to the video for that.

22:40 - Yeah, the, yeah.

22:42 (laughs)

22:43 Yeah, if you translated the Gang of Four book directly to Python, it would be like a pamphlet.

22:47 - That's right.

22:48 (laughs)

22:50 Nice, do you remember your item?

22:51 - I did not, so.

22:53 - Oh, save it for next week.

22:54 - Yeah, save it for next week.

22:56 - Yeah, we'll do this again next week, right?

22:58 - Yeah, maybe we should just do it every week.

23:00 - Yeah, all right, deal, we'll do it every week.

23:03 - Okay, cool.

23:04 - Cool, all right, well, thanks for doing the show this week.

23:05 - No, thank you.

23:06 >> Bye.

23:07 >> Thank you for listening to Python Bytes.

23:10 Follow the show on Twitter via @PythonBytes, that's Python Bytes as in B-Y-T-E-S.

23:16 And get the full show notes at PythonBytes.fm.

23:19 If you have a news item you want featured, just visit PythonBytes.fm and send it our way.

23:24 We're always on the lookout for sharing something cool.

23:26 On behalf of myself and Brian Aukin, this is Michael Kennedy.

23:30 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page