Transcript #185: This code is snooping on you (a good thing!)
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:04 your earbuds. This is episode 185, recorded June 4th, 2020. I'm Michael Kennedy. And I am Brian
00:10 Arkin. And this episode is brought to you by Datadog. More on that later. Check them out at
00:15 pythonbytes.fm/Datadog. Brian, I feel like we're all working from home. Everyone's life is
00:21 scrambled. Even like my sleep schedules are scrambled. Like some crazy stuff happened and
00:25 I slept from like 6 to 9.30 and I was up for like four hours and I slept in. Like it's just,
00:29 it's weird. Don't we need more structure in our life? Nice, nice intro. Yes, more structure.
00:35 Yeah. I'm a fan of Markdown also. Believe it, trust me, it's not a tangent. Though we have a,
00:41 just a repo that we want to point people to called Myst. It's got to be called Myst, don't you think?
00:47 Oh yeah, definitely. M-Y-S-T, which is markedly structured text. And what this is, is a fully
00:56 functional Markdown parser for Sphinx. It's Markdown plus a whole bunch of stuff from restructured test,
01:02 restructured to text. So Myst allows you to write Sphinx documentation entirely in Markdown.
01:09 And things that you could do in restructured text, but could not do in Markdown have been put in a,
01:17 there's a new flavor of Markdown. So you can do all of your directives and all sorts of cool things,
01:23 like anything you could do in restructured text with Sphinx you can now do in Markdown.
01:28 It's based on CommonMark and some other tools. So they're standing on other tools that are already
01:35 doing things really well and just extending them a bit. But this is pretty powerful. One of the things
01:40 I like about this is I particularly don't use a lot of Sphinx, but this also includes a standalone
01:47 parser so you can see how somebody's extended Markdown for these extra directives and even use some of
01:54 them in your own code if you want. Yeah, this looks really, really nice. Like restructured text is good
02:00 and all, but I don't know. If I'm going to write something like restructured text, my heart just wants
02:05 to write Markdown. I got to tell you. Yeah, me too. And I think one of the things that was holding a lot of
02:10 people back is some of the extra directives, the information boxes and other things like that,
02:17 that you can't necessarily do in Markdown off the shelf, but some extensions are nice. I played with
02:24 it a little bit doing some just, I didn't pull it down with Sphinx. I just pulled it down so that I
02:29 could run some Markdown through it and some of the extra directives to see what it has. So for instance,
02:35 some of the directives, like I tried like an information box, you can have structure around
02:40 putting an information box somewhere. And what you end up with is a div that has a class to it.
02:46 Oh, nice. If you're not using Sphinx, then you'll have to use your own CSS, I guess, to style it,
02:52 but it puts in enough hooks for you to be able to do that. That's really nice. I do wish you could
02:58 sort of indicate CSS styles and Markdown because, wow, that would just, that would be the end of what
03:04 you need HTML for, for many, many things. That would be nice. So last week you brought up
03:10 dir-inf. We were talking about how do you store your secrets? How do you activate and configure
03:16 different environments? I think I even said something about like specifying where Python was running. I
03:22 don't remember what the context was exactly, but you're like dir-inf. And actually I've been meaning
03:27 to cover this. Dunderdan, LinkedIn on Twitter, don't know what his last name is. Thanks, Dan.
03:31 sent this over to us as a recommendation. And I'm like, yeah, like you brought it up. It seems
03:36 definitely cool. So let me tell you about dir-inf. D-I-R-E-N-V. So it's an extension that goes into
03:42 your shell. And normally what you do is you open your shell and it runs your bash RC, ZHRC, whatever,
03:50 and sets up some stuff. Or if you're over on Windows, it works a little bit different, but I
03:54 think dir-inf is only for the POSIX type systems. Anyway, it'll set up some values that you put in
04:02 there like environment variables and whatnot. And that's just global, right? You can also set up when
04:09 you activate a virtual environment to export other values. That's pretty cool. But what it doesn't really
04:16 do is allow you to have like a hierarchy of values. So if I'm in this subdirectory over here,
04:21 I want this version of Python active or this version of where the Flask app lives. And then if I change
04:28 to another directory, I want it to automatically go, well, that means different values. And dir-env
04:33 basically does that.
04:34 Yeah. So as you go into different parts of your folder system, it'll look for certain files,
04:41 .envrc. And if it finds that, it'll automatically grab all the, basically all the exports and then
04:49 jam them into whatever your shell is. And it's also cool because it's not a shell, right? It's not like,
04:54 well, here's a shell that has this cool feature. It works with bash, ZShell, TCShell,
05:00 fish, and others, right? So it's basically a hook that gets installed for, like I use,
05:07 oh my, ZShell because, oh my gosh, it's awesome. And then I would just plug this into it. And as I
05:12 do stuff with ZShell, it will just apply its magic.
05:15 Yeah. And so one of the things that, one of the things you can do with this is to automatically
05:20 set a virtual environment. If you go into special directories, that's not the only thing it can do,
05:26 but that's one of the reasons why a lot of people use it.
05:29 Right. You basically, well, I guess you can't do aliases. You can't change what Python means,
05:33 but you can say where the Python path is. Yeah.
05:35 Yeah. And that's one of the things that's a limitation of this that people should be aware
05:38 of is it doesn't, the way to think of it is not as a sub RC, right? It's not a sub bash RC where
05:45 like it runs aliases and all sorts of stuff. The way it works is it runs a bash shell,
05:50 like a little tiny hidden bash shell. It imports that as the bash RC and it captures what the
05:56 exported variables are, throws away that shell, and then jams that into whatever active shell you
06:00 have, like ZShell or bash or fish or whatever.
06:04 Yeah. I would probably use this all the time if I wasn't somebody that used both Windows and
06:10 Mac and Linux frequently.
06:12 You know, probably, I bet somebody could come up with this thing for Windows as well. It's just
06:17 got to be like totally from scratch, different type of thing, right?
06:20 People have already pointed me to Windows versions of it, but it's one of those things of like,
06:26 you got to jump through hoops to make it work. And it's just not, for me, it's not solving a big
06:31 enough problem that I have that I need to jump through the hoops. But I agree. I agree. It is
06:36 cool, but it doesn't, it's not like life changing in that regard. I guess one more thing to point out is
06:41 it's, you don't have to like go to the directory where the environment RC file is. It looks up the
06:49 parent directories until it finds one. So you have this like hierarchy, like I'm down here in the,
06:53 you know, like views part of my website and the top level of that git repo, I have one of these
06:59 EMV RCs. It would find that and like activate that for you. So that's pretty cool that it has,
07:03 it's kind of like node JS where the node modules live in that regard. That's pretty cool.
07:08 Yeah. That's a really nice feature. Yeah, for sure. Also nice data dog. So before we get to the next
07:13 thing, let me talk about them real quick. They're supporting the show. So thank you. They've been
07:17 sponsors for a long time. Please check them out and see what they're offering. It's good software and
07:21 it helps support the show. So if you're having trouble visualizing bottlenecks and latency in your app,
07:26 and you're not sure where the issues are coming from or how to solve it, you can use data dogs,
07:30 end to end monitoring platform with their customizable built-in dashboards to collect
07:34 metrics and visualize app performance in real time. They automatically correlate logs and traces
07:40 at the individual level of requests, allowing you to troubleshoot your apps and track requests across
07:45 tiers. Plus their service map automatically plots the flow of these requests across your application
07:51 architecture. So you can understand dependencies and proactively monitor performance of your apps.
07:55 So be the hero that got that app at your company back on track.
07:59 Get started with a free trial at pythonbytes.fm/data dog. You can get a cool shirt.
08:04 All right, Brian, what's next? Yep. Thanks data dog.
08:07 I had a problem. So my problem was a little application that had a database. It was a,
08:12 I was using tiny DB just for development. You could use Mongo similar. It's a document database,
08:17 thrown some data into it, no problems. But I, that was one of the values that I decided to change to
08:24 use Python enums because I thought enums are cool. I don't use them very often. I'll give these a shot
08:30 because they seem like perfect. And then everything blew up because I can't, couldn't save it to the
08:36 database because enums are not serializable by default. So I'm like, there's got to be an easy
08:43 workaround for this. And, and I first ran around, ran into questions about, or topics about creating your
08:50 own serializer. That just didn't seem like something I wanted to do.
08:53 You could do it, but it's not so fun, right?
08:55 Yeah. Well, so I ran across an article, a little short article written by Alexander Holtner
09:00 called convert a Python enum to JSON. And I didn't need it converted to JSON, but I did need it
09:07 serializable. And the trick is to just, if you're, you're doing your, when you use enums, you,
09:13 you do from enum import the capital enum type, and then you have a class that derives from that.
09:19 And then you have your values. Well, if you also derive from not just enum, but another solid,
09:25 a concrete type, like, like int or string. And in my case, I was using, I used string so that my
09:33 string values would be stored. Now it is serializable and it works just the same as it always did before.
09:39 It's just, it uses the serializer from the other type and it just works incredible. So for instance,
09:46 I'm, I'm going to put a little example in the show notes about using a color, which is red and blue.
09:52 And if you just, you derive from enum, you can't convert it to JSON because it's not serializable.
09:58 You can either do an int enum, which is a built-in one or a combine a stir and enum. Now it serializes
10:05 just to the string red and blue, if that's the values. And then that's what's stored in your,
10:11 like your database too. So when I'm using, it's really handy for debugging to be able to have
10:16 these, these readable values as well.
10:18 Yeah, this is really cool. It's a little bit like abstract based classes versus concrete classes or
10:23 something like that, right? You've like the sort of general enum, but if you do the int enum,
10:27 then it has this other capability, which is cool. Or yeah. Multiple inheritance,
10:32 stir comma enum is the one you went for, right?
10:34 Yeah. So the multiple inheritance is the thing that Alexander recommended in his post. That's what
10:40 I'm using. It works just fine. But I was interested to find out that in the Python documentation for
10:46 int enum, int enum is almost just there as an example to say, we realize that it might not be
10:51 integers that you want. You might want something else, but there's an example right in the,
10:55 in the Python documentation on, on using multiple inheritance to create your own type. It doesn't
11:01 talk about serializability there, but that's one of the benefits.
11:04 Yeah. It seems like it works anyway. Awesome. How much time did it take you to figure that out?
11:08 Was it a long time?
11:09 No, I don't know. 10 minutes of Googling.
11:11 Yeah, that's pretty cool. Well, you could compute it with Python, of course, but you know,
11:15 the daytimes in Python and time spans, they're, they're pretty good actually, but they're a little
11:20 bit lacking. There's certain types of things you might want to do with them. And so there's a
11:24 couple of replacement libraries and one that Tucker Beck sent over. It's called pendulum.
11:29 That's pretty cool. Have you played with pendulum?
11:31 I haven't, but I like the name.
11:33 Yeah, I do too. It's, it's really good. I've played with arrow. So this is a little bit like
11:37 arrow, but it doesn't seem like it tries to solve exactly the same problem. It's just like,
11:41 let's make Python date times and time deltas better, which is kind of the goal of both of them.
11:46 So it's more or less a drop in replacement for standard date time. So you can create like time
11:52 deltas, which are pretty cool. Like I could say pendulum dot duration days equals 15. I have this
11:57 duration and it has more properties than the standard date time or the time delta. You know,
12:03 you get like total seconds or something like that, but that's, you know, that's not that helpful.
12:06 So this one has like duration dot weeks, duration dot hours, and so on, which is pretty cool. You can ask
12:13 for the duration in hours, like the total number of hours, not just the number of hour, you know,
12:19 like three hours and two days or whatever. But you also have this cool, like human friendly
12:24 version. So I can say duration in words and give it a locale and say like locale is U S English.
12:31 And it'll say that's two weeks in one day. Nice. You can also like, let's suppose I'm trying to do
12:36 some work with like calendars or some kind of difference. I say the time from here to there,
12:41 I want to do something for every weekday that appears. Right. So skip Saturday and Sunday.
12:46 But if it's like from Thursday to Wednesday, I need to go Thursday, Friday, Monday, Tuesday, Wednesday.
12:51 Yeah. So I could say pendulum dot now, and then I could go from that and subtract three days. So that
12:57 would be a period of three days. And that gives you what they call a period, which is a little bit
13:02 different. And then I can go to it and say, convert yourself to in weekdays. Okay. Right. Not interesting.
13:10 Then you can loop over it. You can say for each day or each time period in this period and go,
13:16 it would go, you know, over the weekdays that are involved in that time span.
13:20 That's pretty cool.
13:21 Yeah. Cause that would not be so much fun to do yourself. Right. There's a bunch of stuff that
13:25 it does. And I don't want to go like read all the capabilities and whatever, but that gives you a
13:29 sense. Like if these are the kinds of problems you're trying to work through and you're like,
13:32 man, this is a challenge to do with, with a built-in one. Check out pendulum. Also check out arrow.
13:38 I think we've covered arrow a long time ago. If we haven't, we'll, I'll cover it at some point.
13:41 It's a good one. Yeah. And I think actually, I don't think that's a matter of which one's the
13:45 best either. It's a, it's whatever seems to speak to you and, and, and has an API that
13:50 thinks like you do. Yeah. It's good that lots of people have solved things like this.
13:54 Yep. Absolutely. All right. Well, what's this next one? I'm trying to be like a private detective
14:00 or what's going on with this? Yeah. Private detective looking into and spying on your code.
14:07 So this was sent off by a Twitter account called PyLang and this is PySnooper. The claim is never
14:14 use print for debugging again. And I have to admit, I am one to lean on the print statement every once
14:21 in a while, especially if I'm just, sometimes I don't really want to do a use breakpoint because I,
14:26 I've got some code that's getting hit a lot and I really do want to see what it looks like over time.
14:31 So one of the things that people often do is throw a print statement somewhere in a line just to say,
14:37 Hey, I'm here. The other thing they do is like print out a variable name right after an assignment so
14:42 that they can see when it changes, but that's exactly.
14:45 It was this and now it's that.
14:47 Yeah. So this is exactly kind of what it does. So by default, it's just a, you can throw a decorator
14:52 onto a function and that's the easiest way to apply it for PySnooper to create a function.
14:58 And now every time that function gets run, you get a play by play log of your function.
15:04 And what it logs is it logs the parameters that gets past your function. It logs all the,
15:09 the output of your function, but also every line of the code of the function that gets run.
15:14 And every time a variable is changed, changes its value. And then even at the end, it tells you the
15:20 elapsed time for the function. So that's quite a bit. If that's great for you, great. But if it's
15:25 too much information, you can also isolate it with a width block and just take a section of your
15:30 function under test and just log a subset. And then if a local value, local variables are not enough
15:38 and you're changing some global variable, you can tell it to watch that as well. Anyway, it's a pretty
15:44 simple API and there's actually quite a few times. I think I'll probably reach for this.
15:48 When I first saw this, I'm like, ah, yeah, it's kind of cool. There's a lot of these replacements
15:52 where I think like, you know what, you've got PyCharm or you've got VS Code, you're better off just
15:58 setting a break point. And the tooling is so much better than like, say, PDB or something
16:04 like that, right?
16:05 Yeah.
16:05 This though, this solves a problem that always frustrates me when I'm doing debugging, which
16:09 is you're going around, you've got to keep a track in your mind. Okay, this value was that,
16:14 now it's this, and then it became that. And like sort of the flow of data, like at any frozen
16:19 point, you can see really well with the visual debuggers, right? Like PyCharm or whatnot,
16:24 what the state is, you can see even what's changed, but like this number of way, this list
16:28 was empty, empty, then this was added, then this was added. And here's how it evolved over time.
16:33 People should check out the read me for this because that view of it is like, there's a loop
16:38 where it shows going through the loop four times. And as like all the values and variables like build
16:43 up, so you can just like review it and see how it flows. I think it's pretty sweet actually.
16:47 Yeah. One of the other things that I forgot to mention is, is if you're like debugging a process on a
16:52 server, maybe it's a, you've got a, yeah, small service that's running and instead of standard out,
16:59 you can pipe these logs to a file and, you know, review them later.
17:03 Yeah. For definitely for a server as well, it would be nice to flip that on. And I guess with the,
17:08 with the conditional, but you could probably even in code say, do you feel like you're running into
17:14 trouble? Turn on the PySnooper for a minute and then turn it out. You know, like there's,
17:17 there's probably options there, but yeah, you definitely wouldn't want to attach a real debugger to like
17:21 production. Dude, why wasn't the site work? Oh, somebody's got to go back to their desk and hit F,
17:27 you know, F5 or continue or whatever. Yeah. That's not going to go well. So I have something that's a
17:33 pretty similar to follow this up with that's, you know, this is about debugging and seeing how your code
17:38 is running. Like per usual, we talk about one tool and people are like, oh yeah, but did you know about,
17:45 so we've talked about Austin and we've talked about some of the other cool debugger profilers.
17:49 And so over on PyCoders, they talked about Phil, F-I-L, which is a new memory profiler for data
18:00 scientists and well, general scientists. And you might wonder like, why did data scientists,
18:05 right? You know, biologists, why can't they just use our memory profile? Like why is Austin not their
18:12 thing? Right. And it may or may not be like, it may answer some great questions for them. Like
18:16 obviously they do a lot of computational stuff, making that go much faster, faster to let them
18:21 ask more questions. Right. So maybe profilers in general are like things they should pay attention
18:25 to. But you know, when they talk about this, they say, look, there's a really big difference between
18:30 servers and like data pipeline or sort of imperative, just top to bottom code. We're just going to run
18:37 scripts sort of right. And that's what scientists and data scientists do a lot. So like, I just need to do this
18:43 computation and get the answer. So with servers, if you're worried about memory, remember, this is a memory
18:48 profiler. What you're worried mostly about is, you know, this has been running for three hours. Now the server's out of
18:55 memory. That's a problem, right? Like it's, it's probably an issue of a memory leak somewhere. Something is hanging on to a
19:03 reference that it shouldn't. And it like builds up over time, like cruft and it just eventually wears it down.
19:09 And it's just like bloated, you know, with too much memory. Right. So that's the server problem. And I think
19:16 that's what a lot of the tooling is built for, but data pipelines, they go and they just run top to bottom
19:21 and they don't, for the most part, don't really care about memory leaks because they're only going to run for 10
19:27 seconds. But what they need to know is if I'm using too much memory, what line of code allocated that memory?
19:33 Like I need to know what line where I'm using too much memory and how can I like maybe use a generator
19:39 instead of a function in a list or something like that. Right. So that's what the focus of this tool
19:45 is, is it's like, it's going to show you exactly what your peak memory usage is and what line of code
19:52 is responsible for it. This is actually pretty cool. It is right. At first I thought, what is this? Like,
19:56 why do they need their own thing? But as I'm looking through, I'm like, yeah, this is actually pretty cool. And
20:00 if you go to the site, you can actually see they give you this graph, like a nice visualization of
20:05 like, here are the lines of code. And then it's like more red or less red, depending on how much
20:11 memory it's allocated. Oh wow. Yeah. And then the total amount and you can like dive into like,
20:15 okay, well I need to see like this loop or the sub function that I'm calling. How much is it? So you
20:20 can like navigate through this visual, like red, pink, gray of like memory badness, I guess. I don't know.
20:26 Memory usage. Yeah. It's not bad, right? No. Yeah. And when you're staring at code,
20:29 it's not obvious where the huge array might get generated or used. Yeah. And the example they
20:34 have here, it's like, okay, well they have a function called make big array. Okay. So like
20:38 probably you might look there and there's also things like, like using numpy, like, okay, here we're
20:44 creating a bunch of stuff with numpy and you might say, well, here's the numpy thing that we're doing
20:48 that makes too much, but you could be doing like a whole bunch of, you know, numpy and pandas work.
20:53 And like one line is actually responsible, but you're probably pretty sure it has to do with pandas,
20:58 but you're not sure where exactly. Right. So you could, you know, dig into it and see, I think it's
21:02 cool. Yeah. We thought we were using arrays and suddenly we have this huge matrix that accidentally
21:07 exactly. Why is all this stuff still in here? Yeah. Yeah. Cool. Well, anyway, if you're doing data
21:12 science and you care about memory pressure, this thing seems super easy. It even has like a try it on
21:19 your own code on the website, which I don't know what that means. So that's crazy.
21:22 Not uploading my code there, but it's fine. All right. Well, Brian, that's it for our main items.
21:28 You got anything? I don't. I've just been trying to get through the day lately. Yeah. I hear you.
21:33 Well, I have one really quick announcement and then an unannouncement in a sense. So I sent out a message
21:39 to a ton of people. So unannouncement is for them. So what I'm trying to do is I'm trying to create some
21:46 communities for students going through the courses to go through them together. And I'm calling these
21:50 cohorts. Right. So I set up like a beginner Python cohort in a web Python cohort and put like 20 or
21:57 30 people. I had 20 or 30 slots, let's say for people to go through over like three or four,
22:02 three months or so, where they each work a little, like they all work on the same part of the course
22:07 at the same time. And they're there to help each other. There's like private Slack channels and
22:11 other stuff around it. So that's really fun. But it turns out that after one day of having that open,
22:17 I got many hundreds of applicants for like 20 spots. So I had to stop taking applications. So
22:24 if people got those messages and like, Oh, I want to apply, but it looks like the form is down.
22:28 It's because there's like an insane number of applicants per spot. So those will come back
22:35 and people can sign up to get notified. There's a link in the show notes, but I just want to say like,
22:40 that's what I was doing, which is fun. But for those of you who didn't get a chance to apply,
22:44 cause it got closed right away. That's why.
22:46 And that's for training at talkpython.fm.
22:49 Yes, exactly. So there's like certain courses. And if you got one of the courses and you want to go
22:54 through it with a group of students all on the same schedule, this was like a free thing that I was
22:59 doing to try that out.
23:01 Yeah.
23:01 Right.
23:01 I think it's a neat idea.
23:02 Yeah. Thanks. Yeah. People seem to like it.
23:04 Yeah. Too many.
23:05 But yeah, I've got to give it a try, get it dialed in, then we can open up some more groups.
23:10 Yeah.
23:10 All right. Well, I've got it. I've got a joke. I kind of like for you here.
23:13 I love this one.
23:14 Are you ready for it?
23:15 Yeah.
23:15 You want to be, why don't I be the junior dev? You can be the senior dev. So the junior dev and
23:20 senior dev are having a chat. And I feel like that you may be a little skeptical of what I've done
23:24 here. Let's just do this. All right. Why don't you hit me with a question?
23:27 Okay. So where did you get the code that does this? Where did you get the code from?
23:32 Oh, I got it from Stack Overflow.
23:33 Was it from the question part or the answer part?
23:36 Isn't that so good? It's like people say copy from Stack Overflow is bad. I think this is the
23:45 real question.
23:45 You definitely don't want to copy from the question part.
23:48 Yeah. But actually, I've never heard anybody like, you know, spell that out. You know,
23:53 you can look up stuff on Stack Overflow, but at the top with the question, don't copy that. That's
23:58 the code that somebody's saying this doesn't work. Yeah.
24:01 Exactly. Exactly.
24:03 That's funny.
24:04 Yeah. This is a good one.
24:06 It's too funny.
24:08 It's too funny. All right. Well, thanks as always. Great to chat with you and
24:13 share these things with everyone.
24:14 Thank you.
24:14 Yeah. Bye-bye.
24:15 Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.
24:20 And get the full show notes at pythonbytes.fm. If you have a news item you want featured,
24:25 just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something
24:30 cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and
24:35 sharing this podcast with your friends and colleagues.
24:37 Thanks.