Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #185: This code is snooping on you (a good thing!)

Return to episode page view on github
Recorded on Thursday, Jun 4, 2020.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 185, recorded June 4th, 2020. I'm Michael Kennedy. And I am Brian

00:10 Arkin. And this episode is brought to you by Datadog. More on that later. Check them out at

00:15 pythonbytes.fm/Datadog. Brian, I feel like we're all working from home. Everyone's life is

00:21 scrambled. Even like my sleep schedules are scrambled. Like some crazy stuff happened and

00:25 I slept from like 6 to 9.30 and I was up for like four hours and I slept in. Like it's just,

00:29 it's weird. Don't we need more structure in our life? Nice, nice intro. Yes, more structure.

00:35 Yeah. I'm a fan of Markdown also. Believe it, trust me, it's not a tangent. Though we have a,

00:41 just a repo that we want to point people to called Myst. It's got to be called Myst, don't you think?

00:47 Oh yeah, definitely. M-Y-S-T, which is markedly structured text. And what this is, is a fully

00:56 functional Markdown parser for Sphinx. It's Markdown plus a whole bunch of stuff from restructured test,

01:02 restructured to text. So Myst allows you to write Sphinx documentation entirely in Markdown.

01:09 And things that you could do in restructured text, but could not do in Markdown have been put in a,

01:17 there's a new flavor of Markdown. So you can do all of your directives and all sorts of cool things,

01:23 like anything you could do in restructured text with Sphinx you can now do in Markdown.

01:28 It's based on CommonMark and some other tools. So they're standing on other tools that are already

01:35 doing things really well and just extending them a bit. But this is pretty powerful. One of the things

01:40 I like about this is I particularly don't use a lot of Sphinx, but this also includes a standalone

01:47 parser so you can see how somebody's extended Markdown for these extra directives and even use some of

01:54 them in your own code if you want. Yeah, this looks really, really nice. Like restructured text is good

02:00 and all, but I don't know. If I'm going to write something like restructured text, my heart just wants

02:05 to write Markdown. I got to tell you. Yeah, me too. And I think one of the things that was holding a lot of

02:10 people back is some of the extra directives, the information boxes and other things like that,

02:17 that you can't necessarily do in Markdown off the shelf, but some extensions are nice. I played with

02:24 it a little bit doing some just, I didn't pull it down with Sphinx. I just pulled it down so that I

02:29 could run some Markdown through it and some of the extra directives to see what it has. So for instance,

02:35 some of the directives, like I tried like an information box, you can have structure around

02:40 putting an information box somewhere. And what you end up with is a div that has a class to it.

02:46 Oh, nice. If you're not using Sphinx, then you'll have to use your own CSS, I guess, to style it,

02:52 but it puts in enough hooks for you to be able to do that. That's really nice. I do wish you could

02:58 sort of indicate CSS styles and Markdown because, wow, that would just, that would be the end of what

03:04 you need HTML for, for many, many things. That would be nice. So last week you brought up

03:10 dir-inf. We were talking about how do you store your secrets? How do you activate and configure

03:16 different environments? I think I even said something about like specifying where Python was running. I

03:22 don't remember what the context was exactly, but you're like dir-inf. And actually I've been meaning

03:27 to cover this. Dunderdan, LinkedIn on Twitter, don't know what his last name is. Thanks, Dan.

03:31 sent this over to us as a recommendation. And I'm like, yeah, like you brought it up. It seems

03:36 definitely cool. So let me tell you about dir-inf. D-I-R-E-N-V. So it's an extension that goes into

03:42 your shell. And normally what you do is you open your shell and it runs your bash RC, ZHRC, whatever,

03:50 and sets up some stuff. Or if you're over on Windows, it works a little bit different, but I

03:54 think dir-inf is only for the POSIX type systems. Anyway, it'll set up some values that you put in

04:02 there like environment variables and whatnot. And that's just global, right? You can also set up when

04:09 you activate a virtual environment to export other values. That's pretty cool. But what it doesn't really

04:16 do is allow you to have like a hierarchy of values. So if I'm in this subdirectory over here,

04:21 I want this version of Python active or this version of where the Flask app lives. And then if I change

04:28 to another directory, I want it to automatically go, well, that means different values. And dir-env

04:33 basically does that.

04:34 Yeah. So as you go into different parts of your folder system, it'll look for certain files,

04:41 .envrc. And if it finds that, it'll automatically grab all the, basically all the exports and then

04:49 jam them into whatever your shell is. And it's also cool because it's not a shell, right? It's not like,

04:54 well, here's a shell that has this cool feature. It works with bash, ZShell, TCShell,

05:00 fish, and others, right? So it's basically a hook that gets installed for, like I use,

05:07 oh my, ZShell because, oh my gosh, it's awesome. And then I would just plug this into it. And as I

05:12 do stuff with ZShell, it will just apply its magic.

05:15 Yeah. And so one of the things that, one of the things you can do with this is to automatically

05:20 set a virtual environment. If you go into special directories, that's not the only thing it can do,

05:26 but that's one of the reasons why a lot of people use it.

05:29 Right. You basically, well, I guess you can't do aliases. You can't change what Python means,

05:33 but you can say where the Python path is. Yeah.

05:35 Yeah. And that's one of the things that's a limitation of this that people should be aware

05:38 of is it doesn't, the way to think of it is not as a sub RC, right? It's not a sub bash RC where

05:45 like it runs aliases and all sorts of stuff. The way it works is it runs a bash shell,

05:50 like a little tiny hidden bash shell. It imports that as the bash RC and it captures what the

05:56 exported variables are, throws away that shell, and then jams that into whatever active shell you

06:00 have, like ZShell or bash or fish or whatever.

06:04 Yeah. I would probably use this all the time if I wasn't somebody that used both Windows and

06:10 Mac and Linux frequently.

06:12 You know, probably, I bet somebody could come up with this thing for Windows as well. It's just

06:17 got to be like totally from scratch, different type of thing, right?

06:20 People have already pointed me to Windows versions of it, but it's one of those things of like,

06:26 you got to jump through hoops to make it work. And it's just not, for me, it's not solving a big

06:31 enough problem that I have that I need to jump through the hoops. But I agree. I agree. It is

06:36 cool, but it doesn't, it's not like life changing in that regard. I guess one more thing to point out is

06:41 it's, you don't have to like go to the directory where the environment RC file is. It looks up the

06:49 parent directories until it finds one. So you have this like hierarchy, like I'm down here in the,

06:53 you know, like views part of my website and the top level of that git repo, I have one of these

06:59 EMV RCs. It would find that and like activate that for you. So that's pretty cool that it has,

07:03 it's kind of like node JS where the node modules live in that regard. That's pretty cool.

07:08 Yeah. That's a really nice feature. Yeah, for sure. Also nice data dog. So before we get to the next

07:13 thing, let me talk about them real quick. They're supporting the show. So thank you. They've been

07:17 sponsors for a long time. Please check them out and see what they're offering. It's good software and

07:21 it helps support the show. So if you're having trouble visualizing bottlenecks and latency in your app,

07:26 and you're not sure where the issues are coming from or how to solve it, you can use data dogs,

07:30 end to end monitoring platform with their customizable built-in dashboards to collect

07:34 metrics and visualize app performance in real time. They automatically correlate logs and traces

07:40 at the individual level of requests, allowing you to troubleshoot your apps and track requests across

07:45 tiers. Plus their service map automatically plots the flow of these requests across your application

07:51 architecture. So you can understand dependencies and proactively monitor performance of your apps.

07:55 So be the hero that got that app at your company back on track.

07:59 Get started with a free trial at pythonbytes.fm/data dog. You can get a cool shirt.

08:04 All right, Brian, what's next? Yep. Thanks data dog.

08:07 I had a problem. So my problem was a little application that had a database. It was a,

08:12 I was using tiny DB just for development. You could use Mongo similar. It's a document database,

08:17 thrown some data into it, no problems. But I, that was one of the values that I decided to change to

08:24 use Python enums because I thought enums are cool. I don't use them very often. I'll give these a shot

08:30 because they seem like perfect. And then everything blew up because I can't, couldn't save it to the

08:36 database because enums are not serializable by default. So I'm like, there's got to be an easy

08:43 workaround for this. And, and I first ran around, ran into questions about, or topics about creating your

08:50 own serializer. That just didn't seem like something I wanted to do.

08:53 You could do it, but it's not so fun, right?

08:55 Yeah. Well, so I ran across an article, a little short article written by Alexander Holtner

09:00 called convert a Python enum to JSON. And I didn't need it converted to JSON, but I did need it

09:07 serializable. And the trick is to just, if you're, you're doing your, when you use enums, you,

09:13 you do from enum import the capital enum type, and then you have a class that derives from that.

09:19 And then you have your values. Well, if you also derive from not just enum, but another solid,

09:25 a concrete type, like, like int or string. And in my case, I was using, I used string so that my

09:33 string values would be stored. Now it is serializable and it works just the same as it always did before.

09:39 It's just, it uses the serializer from the other type and it just works incredible. So for instance,

09:46 I'm, I'm going to put a little example in the show notes about using a color, which is red and blue.

09:52 And if you just, you derive from enum, you can't convert it to JSON because it's not serializable.

09:58 You can either do an int enum, which is a built-in one or a combine a stir and enum. Now it serializes

10:05 just to the string red and blue, if that's the values. And then that's what's stored in your,

10:11 like your database too. So when I'm using, it's really handy for debugging to be able to have

10:16 these, these readable values as well.

10:18 Yeah, this is really cool. It's a little bit like abstract based classes versus concrete classes or

10:23 something like that, right? You've like the sort of general enum, but if you do the int enum,

10:27 then it has this other capability, which is cool. Or yeah. Multiple inheritance,

10:32 stir comma enum is the one you went for, right?

10:34 Yeah. So the multiple inheritance is the thing that Alexander recommended in his post. That's what

10:40 I'm using. It works just fine. But I was interested to find out that in the Python documentation for

10:46 int enum, int enum is almost just there as an example to say, we realize that it might not be

10:51 integers that you want. You might want something else, but there's an example right in the,

10:55 in the Python documentation on, on using multiple inheritance to create your own type. It doesn't

11:01 talk about serializability there, but that's one of the benefits.

11:04 Yeah. It seems like it works anyway. Awesome. How much time did it take you to figure that out?

11:08 Was it a long time?

11:09 No, I don't know. 10 minutes of Googling.

11:11 Yeah, that's pretty cool. Well, you could compute it with Python, of course, but you know,

11:15 the daytimes in Python and time spans, they're, they're pretty good actually, but they're a little

11:20 bit lacking. There's certain types of things you might want to do with them. And so there's a

11:24 couple of replacement libraries and one that Tucker Beck sent over. It's called pendulum.

11:29 That's pretty cool. Have you played with pendulum?

11:31 I haven't, but I like the name.

11:33 Yeah, I do too. It's, it's really good. I've played with arrow. So this is a little bit like

11:37 arrow, but it doesn't seem like it tries to solve exactly the same problem. It's just like,

11:41 let's make Python date times and time deltas better, which is kind of the goal of both of them.

11:46 So it's more or less a drop in replacement for standard date time. So you can create like time

11:52 deltas, which are pretty cool. Like I could say pendulum dot duration days equals 15. I have this

11:57 duration and it has more properties than the standard date time or the time delta. You know,

12:03 you get like total seconds or something like that, but that's, you know, that's not that helpful.

12:06 So this one has like duration dot weeks, duration dot hours, and so on, which is pretty cool. You can ask

12:13 for the duration in hours, like the total number of hours, not just the number of hour, you know,

12:19 like three hours and two days or whatever. But you also have this cool, like human friendly

12:24 version. So I can say duration in words and give it a locale and say like locale is U S English.

12:31 And it'll say that's two weeks in one day. Nice. You can also like, let's suppose I'm trying to do

12:36 some work with like calendars or some kind of difference. I say the time from here to there,

12:41 I want to do something for every weekday that appears. Right. So skip Saturday and Sunday.

12:46 But if it's like from Thursday to Wednesday, I need to go Thursday, Friday, Monday, Tuesday, Wednesday.

12:51 Yeah. So I could say pendulum dot now, and then I could go from that and subtract three days. So that

12:57 would be a period of three days. And that gives you what they call a period, which is a little bit

13:02 different. And then I can go to it and say, convert yourself to in weekdays. Okay. Right. Not interesting.

13:10 Then you can loop over it. You can say for each day or each time period in this period and go,

13:16 it would go, you know, over the weekdays that are involved in that time span.

13:20 That's pretty cool.

13:21 Yeah. Cause that would not be so much fun to do yourself. Right. There's a bunch of stuff that

13:25 it does. And I don't want to go like read all the capabilities and whatever, but that gives you a

13:29 sense. Like if these are the kinds of problems you're trying to work through and you're like,

13:32 man, this is a challenge to do with, with a built-in one. Check out pendulum. Also check out arrow.

13:38 I think we've covered arrow a long time ago. If we haven't, we'll, I'll cover it at some point.

13:41 It's a good one. Yeah. And I think actually, I don't think that's a matter of which one's the

13:45 best either. It's a, it's whatever seems to speak to you and, and, and has an API that

13:50 thinks like you do. Yeah. It's good that lots of people have solved things like this.

13:54 Yep. Absolutely. All right. Well, what's this next one? I'm trying to be like a private detective

14:00 or what's going on with this? Yeah. Private detective looking into and spying on your code.

14:07 So this was sent off by a Twitter account called PyLang and this is PySnooper. The claim is never

14:14 use print for debugging again. And I have to admit, I am one to lean on the print statement every once

14:21 in a while, especially if I'm just, sometimes I don't really want to do a use breakpoint because I,

14:26 I've got some code that's getting hit a lot and I really do want to see what it looks like over time.

14:31 So one of the things that people often do is throw a print statement somewhere in a line just to say,

14:37 Hey, I'm here. The other thing they do is like print out a variable name right after an assignment so

14:42 that they can see when it changes, but that's exactly.

14:45 It was this and now it's that.

14:47 Yeah. So this is exactly kind of what it does. So by default, it's just a, you can throw a decorator

14:52 onto a function and that's the easiest way to apply it for PySnooper to create a function.

14:58 And now every time that function gets run, you get a play by play log of your function.

15:04 And what it logs is it logs the parameters that gets past your function. It logs all the,

15:09 the output of your function, but also every line of the code of the function that gets run.

15:14 And every time a variable is changed, changes its value. And then even at the end, it tells you the

15:20 elapsed time for the function. So that's quite a bit. If that's great for you, great. But if it's

15:25 too much information, you can also isolate it with a width block and just take a section of your

15:30 function under test and just log a subset. And then if a local value, local variables are not enough

15:38 and you're changing some global variable, you can tell it to watch that as well. Anyway, it's a pretty

15:44 simple API and there's actually quite a few times. I think I'll probably reach for this.

15:48 When I first saw this, I'm like, ah, yeah, it's kind of cool. There's a lot of these replacements

15:52 where I think like, you know what, you've got PyCharm or you've got VS Code, you're better off just

15:58 setting a break point. And the tooling is so much better than like, say, PDB or something

16:04 like that, right?

16:05 Yeah.

16:05 This though, this solves a problem that always frustrates me when I'm doing debugging, which

16:09 is you're going around, you've got to keep a track in your mind. Okay, this value was that,

16:14 now it's this, and then it became that. And like sort of the flow of data, like at any frozen

16:19 point, you can see really well with the visual debuggers, right? Like PyCharm or whatnot,

16:24 what the state is, you can see even what's changed, but like this number of way, this list

16:28 was empty, empty, then this was added, then this was added. And here's how it evolved over time.

16:33 People should check out the read me for this because that view of it is like, there's a loop

16:38 where it shows going through the loop four times. And as like all the values and variables like build

16:43 up, so you can just like review it and see how it flows. I think it's pretty sweet actually.

16:47 Yeah. One of the other things that I forgot to mention is, is if you're like debugging a process on a

16:52 server, maybe it's a, you've got a, yeah, small service that's running and instead of standard out,

16:59 you can pipe these logs to a file and, you know, review them later.

17:03 Yeah. For definitely for a server as well, it would be nice to flip that on. And I guess with the,

17:08 with the conditional, but you could probably even in code say, do you feel like you're running into

17:14 trouble? Turn on the PySnooper for a minute and then turn it out. You know, like there's,

17:17 there's probably options there, but yeah, you definitely wouldn't want to attach a real debugger to like

17:21 production. Dude, why wasn't the site work? Oh, somebody's got to go back to their desk and hit F,

17:27 you know, F5 or continue or whatever. Yeah. That's not going to go well. So I have something that's a

17:33 pretty similar to follow this up with that's, you know, this is about debugging and seeing how your code

17:38 is running. Like per usual, we talk about one tool and people are like, oh yeah, but did you know about,

17:45 so we've talked about Austin and we've talked about some of the other cool debugger profilers.

17:49 And so over on PyCoders, they talked about Phil, F-I-L, which is a new memory profiler for data

18:00 scientists and well, general scientists. And you might wonder like, why did data scientists,

18:05 right? You know, biologists, why can't they just use our memory profile? Like why is Austin not their

18:12 thing? Right. And it may or may not be like, it may answer some great questions for them. Like

18:16 obviously they do a lot of computational stuff, making that go much faster, faster to let them

18:21 ask more questions. Right. So maybe profilers in general are like things they should pay attention

18:25 to. But you know, when they talk about this, they say, look, there's a really big difference between

18:30 servers and like data pipeline or sort of imperative, just top to bottom code. We're just going to run

18:37 scripts sort of right. And that's what scientists and data scientists do a lot. So like, I just need to do this

18:43 computation and get the answer. So with servers, if you're worried about memory, remember, this is a memory

18:48 profiler. What you're worried mostly about is, you know, this has been running for three hours. Now the server's out of

18:55 memory. That's a problem, right? Like it's, it's probably an issue of a memory leak somewhere. Something is hanging on to a

19:03 reference that it shouldn't. And it like builds up over time, like cruft and it just eventually wears it down.

19:09 And it's just like bloated, you know, with too much memory. Right. So that's the server problem. And I think

19:16 that's what a lot of the tooling is built for, but data pipelines, they go and they just run top to bottom

19:21 and they don't, for the most part, don't really care about memory leaks because they're only going to run for 10

19:27 seconds. But what they need to know is if I'm using too much memory, what line of code allocated that memory?

19:33 Like I need to know what line where I'm using too much memory and how can I like maybe use a generator

19:39 instead of a function in a list or something like that. Right. So that's what the focus of this tool

19:45 is, is it's like, it's going to show you exactly what your peak memory usage is and what line of code

19:52 is responsible for it. This is actually pretty cool. It is right. At first I thought, what is this? Like,

19:56 why do they need their own thing? But as I'm looking through, I'm like, yeah, this is actually pretty cool. And

20:00 if you go to the site, you can actually see they give you this graph, like a nice visualization of

20:05 like, here are the lines of code. And then it's like more red or less red, depending on how much

20:11 memory it's allocated. Oh wow. Yeah. And then the total amount and you can like dive into like,

20:15 okay, well I need to see like this loop or the sub function that I'm calling. How much is it? So you

20:20 can like navigate through this visual, like red, pink, gray of like memory badness, I guess. I don't know.

20:26 Memory usage. Yeah. It's not bad, right? No. Yeah. And when you're staring at code,

20:29 it's not obvious where the huge array might get generated or used. Yeah. And the example they

20:34 have here, it's like, okay, well they have a function called make big array. Okay. So like

20:38 probably you might look there and there's also things like, like using numpy, like, okay, here we're

20:44 creating a bunch of stuff with numpy and you might say, well, here's the numpy thing that we're doing

20:48 that makes too much, but you could be doing like a whole bunch of, you know, numpy and pandas work.

20:53 And like one line is actually responsible, but you're probably pretty sure it has to do with pandas,

20:58 but you're not sure where exactly. Right. So you could, you know, dig into it and see, I think it's

21:02 cool. Yeah. We thought we were using arrays and suddenly we have this huge matrix that accidentally

21:07 exactly. Why is all this stuff still in here? Yeah. Yeah. Cool. Well, anyway, if you're doing data

21:12 science and you care about memory pressure, this thing seems super easy. It even has like a try it on

21:19 your own code on the website, which I don't know what that means. So that's crazy.

21:22 Not uploading my code there, but it's fine. All right. Well, Brian, that's it for our main items.

21:28 You got anything? I don't. I've just been trying to get through the day lately. Yeah. I hear you.

21:33 Well, I have one really quick announcement and then an unannouncement in a sense. So I sent out a message

21:39 to a ton of people. So unannouncement is for them. So what I'm trying to do is I'm trying to create some

21:46 communities for students going through the courses to go through them together. And I'm calling these

21:50 cohorts. Right. So I set up like a beginner Python cohort in a web Python cohort and put like 20 or

21:57 30 people. I had 20 or 30 slots, let's say for people to go through over like three or four,

22:02 three months or so, where they each work a little, like they all work on the same part of the course

22:07 at the same time. And they're there to help each other. There's like private Slack channels and

22:11 other stuff around it. So that's really fun. But it turns out that after one day of having that open,

22:17 I got many hundreds of applicants for like 20 spots. So I had to stop taking applications. So

22:24 if people got those messages and like, Oh, I want to apply, but it looks like the form is down.

22:28 It's because there's like an insane number of applicants per spot. So those will come back

22:35 and people can sign up to get notified. There's a link in the show notes, but I just want to say like,

22:40 that's what I was doing, which is fun. But for those of you who didn't get a chance to apply,

22:44 cause it got closed right away. That's why.

22:46 And that's for training at talkpython.fm.

22:49 Yes, exactly. So there's like certain courses. And if you got one of the courses and you want to go

22:54 through it with a group of students all on the same schedule, this was like a free thing that I was

22:59 doing to try that out.

23:01 Yeah.

23:01 Right.

23:01 I think it's a neat idea.

23:02 Yeah. Thanks. Yeah. People seem to like it.

23:04 Yeah. Too many.

23:05 But yeah, I've got to give it a try, get it dialed in, then we can open up some more groups.

23:10 Yeah.

23:10 All right. Well, I've got it. I've got a joke. I kind of like for you here.

23:13 I love this one.

23:14 Are you ready for it?

23:15 Yeah.

23:15 You want to be, why don't I be the junior dev? You can be the senior dev. So the junior dev and

23:20 senior dev are having a chat. And I feel like that you may be a little skeptical of what I've done

23:24 here. Let's just do this. All right. Why don't you hit me with a question?

23:27 Okay. So where did you get the code that does this? Where did you get the code from?

23:32 Oh, I got it from Stack Overflow.

23:33 Was it from the question part or the answer part?

23:36 Isn't that so good? It's like people say copy from Stack Overflow is bad. I think this is the

23:45 real question.

23:45 You definitely don't want to copy from the question part.

23:48 Yeah. But actually, I've never heard anybody like, you know, spell that out. You know,

23:53 you can look up stuff on Stack Overflow, but at the top with the question, don't copy that. That's

23:58 the code that somebody's saying this doesn't work. Yeah.

24:01 Exactly. Exactly.

24:03 That's funny.

24:04 Yeah. This is a good one.

24:06 It's too funny.

24:08 It's too funny. All right. Well, thanks as always. Great to chat with you and

24:13 share these things with everyone.

24:14 Thank you.

24:14 Yeah. Bye-bye.

24:15 Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.

24:20 And get the full show notes at pythonbytes.fm. If you have a news item you want featured,

24:25 just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something

24:30 cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and

24:35 sharing this podcast with your friends and colleagues.

24:37 Thanks.

Back to show page