Transcript #265: Get asizeof pympler and muppy
Return to episode page view on github00:00 Hey there, thanks for listening.
00:01 Before we jump into this episode, I just want to remind you that this episode is brought to you by us over at TalkBython Training, and Brian through his pytest book.
00:10 So if you want to get hands on and learn something with Python, be sure to consider our courses over at TalkBython Training, visit them via pythonbytes.fm/courses.
00:21 And if you're looking to do testing and get better with pytest, check out Brian's book at pythonbytes.fm/pytest.
00:28 Enjoy the episode.
00:29 - Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:34 This is episode 265, recorded January 5th, 2022.
00:39 I'm Brian Okken.
00:41 - I'm Michael Kennedy.
00:42 - And I'm Matt Kramer.
00:43 - Matt, welcome to the show.
00:45 - Thanks, happy to be here.
00:47 - Yeah, welcome Matt.
00:47 - Who are you?
00:48 - Oh, so a huge fan, I've listened to every episode.
00:52 I actually, I'm one of these folks that started their career outside of software.
00:56 I've heard a similar parallel story a bunch of times in the past.
01:00 So I have my degree actually in Naval Architecture and Marine Engineering, which is design of ships and offshore structures.
01:07 In grad school, I started with MATLAB, picked up Python, thanks to a professor, and then over time, that's just grown and grown.
01:15 Spent eight years in the oil and gas industry, and using Python mostly for doing engineering analysis, a lot of digital type stuff, IOT type monitoring work, And about three months ago, I joined Anaconda as a software engineer, and I'm working on our Nucleus cloud platform as a backend software.
01:33 - Very cool.
01:34 - Awesome, yeah, congrats on the new job as well.
01:36 That's a big change from oil and gas.
01:38 - A couple years.
01:39 - I mean, it is in Texas and all, but it's still on the tech side.
01:43 - Yeah, no, it's related, but obviously a different focus.
01:47 I wanted to make writing code my job rather than the thing I did to get my job done, so.
01:53 - Fantastic, I'm sure you're having a good time.
01:55 - Yeah.
01:56 - Well, Michael, we had some questions for people last week.
01:59 - We did.
02:00 I wanna make our first topic a meta topic.
02:02 And by that, I mean a topic about Python bytes.
02:05 So you're right.
02:06 We discussed whether the format, which is sort of, I wouldn't say changed.
02:12 I would rather categorize it as drifted over time.
02:16 It's sort of drifted to adding this little thing and do that different thing.
02:20 And we just said, "Hey, everyone, "do you still like this format?" It's not exactly what we started with, but it's where we are.
02:25 So we asked some questions.
02:26 The first question I asked, which I have an interesting followup at the end here, by the way, is, is Python bytes too long at 45 minutes?
02:34 That's roughly the time that we're going these days, probably about 45 minutes.
02:38 And so I would say, gotta do the quick math here, I would say 70, 65%, let's say 65% are like, no, it's good.
02:46 With a third of that being like, are you kidding me?
02:48 It could go way longer.
02:49 I'm not sure we wanna go way longer, but there are definitely a couple of people that think, yeah, it's getting a little bit long.
02:54 So I would say probably 12% of people said it's too long.
02:57 So I feel like it's actually kind of a decent length.
03:01 And one of the things I thought, it's like as we've changed this format, we've added things on, right?
03:07 We added the joke that we started always doing at the end.
03:09 We added our extra, extra, extra stuff.
03:11 But the original format was the six items.
03:14 You covered three, I covered three.
03:15 Now it's two, two, and we got Matt here to help out with that.
03:18 So what is the length of that?
03:20 And it turns out that that's pretty much the same length still.
03:24 So the last episodes, 39 minutes, 32 minutes, 35 minutes, 33 minutes.
03:28 That's how long our main segments up to the end of the minute.
03:31 So it's kind of like, for people who feel it's too long, I want to just sort of say, feel free to just delete it.
03:36 Like you hear the six items, delete it at that point.
03:38 If you don't want to hear us ramble about other things that are not pure Python, you don't want to hear us talk about the joke or tell jokes, no problem.
03:45 - Yeah, stop.
03:46 - Stop, it's at the end for a reason.
03:48 So if you're kind of like, all right, I'm kind of done, then be done, that's totally good.
03:53 we'll put the important stuff up first.
03:55 The other one was, do you like us having a third coast like Matt or Shell or whoever it is we've had on recently?
04:03 And most people love that format or, you know.
04:06 - Or at least it's okay.
04:07 - It's okay, so that's like, I think that that's pretty good.
04:09 I do wanna read out just a couple of comments as well.
04:12 There's stuff that you always get that like you just can't balance it.
04:15 A couple of people are saying like, you just gotta drop the joke, like don't do that.
04:18 The other people are like, the joke is the best, who doesn't wanna stay for that?
04:21 So, you know, like, well, again, it's at the end.
04:24 So you can do that.
04:26 But I also just wanted to say thank you to everybody.
04:28 They wrote a ton of nice comments to you and me at the end of that Google forum.
04:34 So one is, "I can't tell what counts as an extra or normal, "but it's fine, I love it.
04:39 "My Dumb Bites is such an excellent show.
04:40 "Fun way to keep current.
04:43 "Brian is awesome." - Oh, good, I asked my daughter to submit that.
04:47 (laughing)
04:48 - She did good.
04:49 I think your third guest, having a third guest is great.
04:52 Like I said, drop the jokes, keep the jokes for sure, ideal.
04:55 So anyway, there's a bunch of nice comments.
04:58 I think the other thing that I would like to just speak to real quick and get your thoughts on, and maybe you as well, Matt, 'cause you've been on the receiving end of this a lot, is us having the live audience, right?
05:09 I think having a live audience is really interesting.
05:12 I also wanna just acknowledge, we knew that that would be a slight drift of format, right?
05:18 So if you're listening in the car and there's a live audience comment, it's kind of like, well, but I'm not listening to it live.
05:23 That's kind of different.
05:25 But I think it's really valuable.
05:27 One time we had four, maybe four, Python core developers commenting on the stuff we were covering.
05:33 Like that's a huge value to have people coming and sort of feeding that in.
05:37 So for me, personally, I feel like it's, yeah, it's a little bit of a blend of formats, but I think having the feedback from the audience, especially when people are involved in what we're talking about, I think that's worth it.
05:48 Brian, what do you think?
05:48 - Well, we try not to let it interrupt the flow too much, but there's some great stuff.
05:54 Like if somebody, if we say something that's just wrong, somebody will correct us and that's nice.
06:01 The other thing is sometimes somebody has a great question on a topic that like we should have talked about, but we didn't.
06:10 - We didn't, right.
06:11 We don't know everything.
06:12 We certainly don't.
06:13 So I do want to add one more thing.
06:17 There was a comment like, hey, we as hosts should let the guest speak.
06:22 We should be better interviewers.
06:23 I'm like, this is not an interview format.
06:24 You know, like talk Python is a great interview format.
06:27 Oh, that's where the guest is featured.
06:29 Testing code is a great form interview format where the guest is featured.
06:32 This is sort of just three people chatting.
06:34 It's not really an interview format.
06:36 So, and we always tell the guests to interrupt us and they just, they don't much.
06:41 So yeah.
06:42 Yeah.
06:42 So Matt, what do you think of this live audience aspect?
06:45 Like, do you feel like that's tracks or is it good?
06:46 Well, yeah, first of all, I'm glad that people generally like having a guest.
06:52 Otherwise, this would have been very awkward.
06:54 But no, I do like it.
06:56 Where'd Matt go? Oh, he must have disconnected.
06:58 Occasionally, there is a little bit of a disruption.
07:03 But I think in general, it's been great.
07:05 I've definitely been listening when times when a bunch of people are chiming in.
07:10 Because there's always, as you know, you mentioned a GUI library, and then there's about 12 other options that you may not have covered.
07:16 - Exactly.
07:17 - Instead of waiting 12 weeks, you could just get them right out.
07:20 So I think that's great.
07:20 And I'm generally an audio listener.
07:23 I listen when I'm walking my dogs, but I love having the video because when I'm interested in something, I can go hop to it right away and see what you're showing, which I really like.
07:33 - Yeah, awesome, thank you.
07:36 Two other things that came to mind.
07:37 Someone said, "It would be great if there's a way where we could submit ideas and stuff like that for guests and whatnot." - Oh, yeah.
07:46 right here at the top in our menu, it says submit.
07:48 So please reach out to us on Twitter, send us an email, do submit it there.
07:53 The other one was if we could have time links, like if you go to listen and at some certain time a thing is interesting that's mentioned, be cool if you could like link at a time.
08:05 If you look in your podcast player, it has chapters and each chapter has both a link and a time.
08:10 So like the thing that Brian's gonna talk about next, interpreters, if you want to hear about that during that section in your podcast player, you can click the chapter title and it will literally navigate you to there.
08:22 So it's already built in.
08:24 Just make sure you can see it in your device.
08:26 Yeah.
08:27 All right.
08:27 I think that's it for that one, but yeah, thank you for everybody who had comments and took the time.
08:33 Really appreciate it.
08:34 Yeah.
08:34 And just the comment, if you, if you want to be a guest, just email on that form and you might be able to do it.
08:40 That's right.
08:41 That's right.
08:42 Yeah.
08:42 Great to have you here.
08:43 - Actually, I didn't want to talk about interpreters.
08:46 - No, that's me.
08:47 - Oh, wait, you're right.
08:49 Well, you're talking about it now because I've changed.
08:51 No, let's talk about Adders.
08:53 Sorry, I saw the wrong screen.
08:54 You should go for it.
08:56 - Apparently we're not professional here, but no, it's okay.
08:59 I wanted to talk about Adders.
09:02 We haven't really talked about it much for a while because there are lots of reasons, but Adders is a great library.
09:09 And it just came out with Adders, came out with a release 21.3.0, which is why we're talking about it now.
09:16 And there's some documents, there's a little bit of change, there's some changes and some documentation changes.
09:21 And I really, in an article I wanted to cover.
09:24 So one of the things you'll see right off the bat, if you look at the overview page of the Adders site, is it's highlighting the define decorator.
09:35 It's a different kind of way that if you've used Adders from years ago, this is a little different.
09:41 So there was a different way to, a different API that was added in the last release.
09:50 And this is, or in one of the previous releases, and now that's the preferred way.
09:55 So this is what we're calling modern adders.
09:58 But along with this, I wanted to talk about an article that Hinnick wrote about adders.
10:06 And it's a little bit of a history, and I really love this discussion.
10:09 So, and I'll try to quickly go through the history.
10:14 Early on, we didn't have data classes, obviously, we had, we could handcraft classes, but there were problems with it.
10:21 And there was a library called Characteristic, which I didn't know about.
10:25 This was before I started looking into things.
10:29 That, and then Glyph and Hennig in 2015, were discussing it, ways to change it.
10:35 And that began the old original Adders interface.
10:40 And there were things like Adder.s and Adder.attrib that were partly out of the fact that the old way of characteristic attribute was a lot of typing.
10:51 So they wanted to do something a little shorter.
10:54 And then it kind of took off.
10:57 Adders was pretty popular for a long time, especially fueled by a 2016 article by Glyph called the one Python library everyone needs, which was a great, this is kind of how I learned about it.
11:09 And then there was a different kind of API that we were used to for adders and it was good.
11:17 And everything was great.
11:17 And then in 2017, Guido and Henik and Eric Smith talked about in the PyCon 2017, they talked about how to make something like that in the standard library.
11:31 And that came out of that came PEP 557 and data classes and data classes showed up in Python 3.7.
11:40 And then so what then a dark period happened, which was people were like, why do we need adders anymore if we have data classes?
11:48 Well, that's one of the things I like about this article.
11:52 And then there's an attached article that is called, why not?
11:56 Why not data classes instead of adders?
12:00 And this is, it's important to realize that data classes have always been a limited set of adders.
12:10 Adders is a super set of functionality.
12:13 And there's a lot of stuff missing in data classes like equality customization and validators.
12:20 Validators and converters are very important if you're using a lot of these.
12:24 And then also people were like, well, data classes, kind of a nicer interface, right?
12:31 Well, not anymore.
12:33 The pound defines pretty, or the at defines really nice.
12:37 This is a really easy interface now to work with.
12:39 So anyway.
12:40 - And it has typing.
12:41 - And it has typing.
12:43 And I'm glad you wrote this because I kind of was one of those people of like, am I doing something wrong if I'm using data classes?
12:53 Why should I look at adders?
12:55 And one of the things, there's a whole bunch of reasons.
12:57 One of the things that I really like is Adders has slots.
13:02 The slots are on by default.
13:04 So you have, you kind of define your class once instead of keeping it growing.
13:09 Whereas the default Python way in data classes is to allow classes to grow at runtime, have more attributes, but that's not really how a lot of people use classes.
13:19 So if you came from another language where you have to kind of define the class once and not at runtime, Adders might be a closer fit for you.
13:28 - I like it.
13:29 And it's, whether you say @define or @dataclass, pretty similar.
13:32 - Yeah, Adders is really cool.
13:35 I personally haven't used it, but I've always wanted to try it.
13:38 We're using FastAPI and Pydantic, so I've really come to like that library, but Adders is something that looks really full-featured and nice, definitely something I wanna pick up.
13:47 - Yeah, it's cool.
13:48 And Pydantic also seems very inspired by data classes, which I'm learning now.
13:54 suspected, but now learning that is actually inspired by adders and they kind of sort of leapfrog each other in this The same trend which is interesting. Yep. So yeah, cool. Good one. Brian matt I thought brian was going to talk about this, but you can talk about it. This would be me. Yeah So this one's not strictly python related, but I think it's very relevant to python so I mentioned earlier I came from a non-cs background And I've always, I've just been going down the rabbit hole for about 10 years now, trying to understand everything and pick it up and really connect the dots between how do these very flexible objects that you're working with every day, how do those get actually implemented?
14:33 And so the first thing I did, if you heard of this guy, Anthony Shaw, I think he's been mentioned once or twice, he wrote a great book, shout out, CPython Internals.
14:42 Really like that book. >> Anthony's out in the audience, he even says happy new year.
14:47 So this book is great if you want to learn how CPython is implemented.
14:50 But because I don't have a traditional CS background, I've always wanted, you know, I felt like I wanted to get a little bit more to the fundamentals.
14:57 And I don't remember where I found out about this book, but Crafting Interpreters, I got the paperback here too, I highly recommend it.
15:04 It's an implementation of a language from start to finish.
15:09 Every line of code is in the book.
15:10 It's a dynamic interpreted language, much like Python.
15:15 But I really like how the book is structured.
15:17 So it was written over, I think, five years in the open.
15:22 I think the paperback may have just come out last year.
15:25 But you walk through every step from tokenization, scanning, building a syntax tree, and all the way through the end.
15:32 But what I really like about it is you actually develop two separate interpreters for the same language.
15:38 So the first one is written in Java.
15:40 It's a direct evaluation of the abstract syntax tree.
15:45 So that was really how I got a lot of these bits in my head about what is an abstract syntax tree, how do you start from there, how do you represent these types. But the second part is actually very, where I think it becomes really relevant for Python, because the second part is written in C.
15:59 It's a bytecode virtual machine with garbage collection. So it's not exactly the same as Python, but if you want to dig down into how would you actually implement this with the types that you have available for UNC, but get something flexible, much like Python. I really recommend this. So again, it's not directly, there's some good side notes in here where he compares, you know, different implementations between different languages like Python and JavaScript, etc., Ruby. But I really like this book. I devoured it during my time between jobs and yeah, I keep telling everyone about it. So I thought it would be good for the community to hear.
16:37 - Yeah, I didn't study this stuff in college either.
16:40 I mostly studied math and things like that.
16:43 And so understanding how virtual machines work and all that is just how code executes.
16:48 I think it's really important.
16:49 You know, it's not the kind of thing that you actually need to know how to do in terms of you gotta get anything done with it.
16:55 But sometimes your intuition of like, if I ask the program to work this way and it doesn't work as you expected, you expect maybe understanding that internal, I was like, oh, it's because it's really doing this and everything's all scattered out on the heap and I thought numbers would be fast.
17:11 Why are numbers so slow?
17:12 Okay, I understand now.
17:13 - Yeah, I really liked the, I mean, it answered a lot of questions for me, like how does a HashMap work, right?
17:20 That's a dictionary in Python.
17:21 What is a stack?
17:22 Why would you use it?
17:23 What is the, when you do a disassemble and you see bytecode, what does that actually mean, right?
17:29 I really, really enjoyed it.
17:30 And he's got a really great books open source.
17:33 He's got a really great build system.
17:35 if you're interested in writing a book, it's very cool.
17:37 How the adding lines of code and things like that are all embedded in there.
17:41 And he's got tests written for every part where you add a new bit to the code, there's tests written and there's ways where he uses macros and things to block them out.
17:51 It's pretty interesting.
17:52 - Nice, testing books.
17:55 - That's pretty excellent.
17:56 Yeah, so Matt, now being at Anaconda, like that world, the Python world over in the data science stack and especially around there has so much of like, here's a bunch of C and here's a bunch of Python and they kind of go together.
18:09 Does this give you a deeper understanding of what's happening?
18:12 - Yeah, for sure.
18:13 I think CPython internals gave me a really good understanding a bit more about the C API and why that's important.
18:20 I'm sure you know and the listeners may know like the binary compatibility is really important between the two and dealing with locking and the global interpreter lock and everything like that.
18:32 So it's definitely given me a better conceptual view of how these things are working.
18:37 As you mentioned, you don't need to know it necessarily on a day-to-day basis, but I've just found that it's given me a much better mental model.
18:43 - Having an intuition is valuable.
18:46 Yeah, quick audience feedback.
18:49 Sam out in the live audience says, "I started reading this book over Christmas day "and it's an absolute joy." So yeah, very cool.
18:55 One more vote of confidence for you there.
18:58 Cool, Brian, are we ready for my next one?
19:01 - Yes, definitely.
19:03 - A little Yamale.
19:04 - Yeah, I'm hungry.
19:06 - So this one is cool.
19:08 It's called Yamale or Yamale.
19:11 I'm not 100% sure, but it was suggested by Andrew Simon.
19:14 Thank you, Andrew, for sending this in.
19:16 And the idea of this is we work with YAML files.
19:21 That's often used for configuration and whatnot.
19:25 But if you want to verify your YAML, right?
19:28 It's just text.
19:29 Maybe you wanna have some YAML that has a number for a value, or you wanna have a string, or maybe you wanna have true false, or you wanna have some nested thing, right?
19:40 Like you could say, I'm gonna have a person in my YAML, and then that person has to have fields or values set on it like a name and an age.
19:48 With this library, you can actually create a schema that talks about what the shape and types of these are, much like data classes, and then you can use YAML to say, given a YAML file, does it validate?
20:02 Think kind of like Pydantic is for JSON.
20:05 This is for YAML, except it doesn't actually parse the results out.
20:08 It just tells you whether or not it's correct.
20:10 Isn't that cool?
20:11 - I think it looks neat.
20:12 Yeah.
20:13 - Yeah, so it's pretty easy to work with.
20:16 Obviously requires modern Python.
20:18 It has a CLI version, right?
20:21 So you can just say, Yamali, give it a schema, give it a file, and it'll go through and check it.
20:26 It has a strict and a non-strict mode.
20:29 It also has an API.
20:30 So then to use it, just say, yamali.validate schema and data, either in code or on the CLI.
20:36 And in terms of schemas, like I said, it looks like data classes.
20:40 You just have a file like name colon str, age colon int, and then you can even add additional limitations, like the max integer value has to be 200 or less, which is pretty cool.
20:51 Then also, like I said, you can have more complex structures.
20:54 So for example, they have what they call a person, but then the person here, actually, you can nest them.
21:00 So you could have like part of your YAML could have a person in it, and then your person schema could validate that person.
21:06 So very much like Pydantic, but for YAML files, like here you can see, scroll down, there's an example of, I think it's called recursion is how they refer to it.
21:15 But you can have like nested versions of these things and so on.
21:18 So if you're working with YAML and you wanna validate it through unit tests or some data ingestion pipeline or whatever, I just wanna make sure you're loading the files correctly, then you might as well hit it with some YAML-y guessing.
21:33 - One of the things I like about stuff like this is that things like YAML files, sometimes people just sort of edit it in the Git repo instead of making sure it works first, and then it gets, and then having a CI stage that says, hey, making sure the YAML's valid syntax is pretty nice so that you know it before it blows up somewhere else with some weird error message.
21:58 >> Yeah, exactly.
22:00 >> Yeah, this is really cool.
22:01 Validation of these types of input files, especially YAML files is really tough, I've found just because it's indentation based and whitespace is not a bad thing, obviously, but for YAML it's tough.
22:12 I can't tell you how many hours I've banged my head against the wall in the past life, trying to get Ansible scripts to run and things like that.
22:20 This is really neat.
22:21 And anytime I see something like this, I just wish that there was one way to describe those types somewhere, like preferably in Python, just because I like that more, but this is really cool.
22:32 Yeah, I wouldn't be surprised if there's some kind of pedantic mapping to YAML instead of to JSON, and you can just kind of run it through there. But yeah, I think this is more of a challenge than it is, say, for JSON, because JSON, there's a validity to the file, regardless of what the schema is, where YAML, less so, right?
22:50 Like, well, if you didn't indent that, well, it just, that means it belongs somewhere else, I guess, you know, it's a little more free form.
22:57 So I guess that's why it's popular, but also nice to have this validation.
23:00 So yeah, thank you for Andrew.
23:02 Thank you to Andrew for sending that in.
23:04 - Yeah, so next I wanted to talk about Pimpler, which is great name.
23:10 And I honestly can't remember where I saw this.
23:12 I think it was a post or something by Bob Belderbos or something he wrote on PyBytes, I'm not sure.
23:20 Anyway, so I'll give him credit.
23:22 Maybe it was somebody else.
23:23 So if it was somebody else, I apologize.
23:25 But anyway, what is Pimpler?
23:27 Pimpler is a little tiny library, which has a few tools in it.
23:30 And it has-- one of the things it says is-- one of the things I saw--
23:37 it does a few things, but what I--
23:38 it measures, monitors, and analyzes memory behavior in Python objects.
23:43 But it's the memory size thing that was interesting to me.
23:48 So you've got, like, for instance, it has three tools built into it, asizeof and Muppy, which is a great name, and ClassTracker.
24:01 So asizeof provides a basic size information for one or a set of objects.
24:07 And Muppy is a monitoring-- I didn't play with this.
24:11 I didn't play with the ClassTracker either.
24:13 ClassTracker provides offline analysis of lifetimes of Python objects.
24:17 Maybe if you got a memory leak, you can see there's a hundred thousands of my hundreds of thousands of this type, and I thought I only had three of them.
24:26 >> Yeah. One of the things that I really liked with asizeof, is we already have sys.getSizeof in Python, but that just tells you the size of the object itself, not of the later on.
24:45 So a size of will tell you not just what the size of the object is, but all of the recursively it goes recursively and and looks at the size of all the stuff that it contents of it.
24:55 So right.
24:56 And people haven't looked at this, you know, they should check out Anthony's book, right?
24:59 But if you've got a list and say the list has 100 items in it and you say, what is the size of the list?
25:05 The list will be roughly 900 bytes because it's 108 byte pointers plus a little bit of overhead.
25:13 - Those pointers could point at megabytes of memory.
25:15 You could have 100 megabytes of stuff loaded in your list, and if it's really only 100, like, no, that's 900 bytes, not 800 megabytes or whatever, right?
25:23 So you really need to, if you actually care about real whole memory size, you gotta use something like asizeup.
25:28 It's cool that this is built in.
25:29 I had to write this myself and it was not as fun.
25:32 - Yeah, this is awesome.
25:33 I also, I hit this sometime in grad school, I remember.
25:38 When I was at a deadline or something, And just I hit the same thing about the number of bytes in a list being so small and just writing something that was hacky to try to do the same thing, but to have it so nice and available is great.
25:51 And the name is awesome.
25:53 I love silly names.
25:54 - Yeah, for sure.
25:57 - One of the example, I was confused, the example we're showing on the screen is just a, you've got a list of a few items, some of it's text, so some of them are integers and some are lists of integers or tuples of integers and being able to go down and do the size of everything.
26:14 But then there's also a, you can get more detailed.
26:17 You can give it a sized, a size with a detail numbers.
26:22 I'd have to look at the API to figure out what all this means.
26:27 But the example shows each element, not just the total but each element, what the size of the different components are, which is kind of cool.
26:34 But it lists like a flat size and I'm like, what's the flat thing?
26:38 So I had to look that up and flat, the flat size returns the flat size of a Python object in bytes determined as the basic size.
26:47 So like in these examples, it's like the tuple is just a flat, the tuple itself is 32 bytes, but the tuple and its contents is 64.
26:57 - I see.
26:58 So flat is like sys.getSizeOf and size is a size of that bit.
27:03 - I think that's what it is, but yeah, not sure.
27:07 but that's what I'm thinking.
27:08 - Yeah, so for people who are listening, they don't see this, you should check out the docs page, right, like a usage example, because if you have a list containing a bunch of stuff, you can just say, basically, print this out and it shows line by line, this part of the list was this much and then it pointed at these things, each of those things is this big and it has constituents and so on.
27:28 My theory is that the detail equals one is recursed one level down, but don't keep traversing to like show the size of numbers and stuff.
27:35 - Yeah, probably.
27:36 - Yeah, cool.
27:36 it. This is great. Yeah. All right. Okay, so I'm going to talk about HVPlot and HVPlot.interactive specifically. So this is something I actually wasn't very aware of until I joined Anaconda, but one of my colleagues, Philip Roediger, who I know was on Talk Python at one point, is the developer working on this. And basically, when you're working in the PyData ecosystem, There's pandas and X-Array and Dask.
28:06 There's all these different data frame type interfaces, and there's a lot of plotting interfaces.
28:11 And there's a project called HoloViews or HVPlot, which is a consistent plotting API that you can use.
28:19 And the really cool part about this is you can swap the back end.
28:23 So for example, pandas default plot will use .plot, and it'll make a matplotlib.
28:28 But if you want to use something more interactive, like Bokeh or HoloViews, you can just change the backend and you can use the same commands to do that.
28:37 So that's cool.
28:38 >> That's cool and you set it on the data frame.
28:40 >> Yeah, exactly.
28:42 So what you do is you import hvplot.pandas, and then on the data frame, if you change the backend, you just do dataframe.plot, and there's a bunch of rational defaults built in for how it would show the different columns in your data frame versus the index.
28:59 >> I like that because you could swap out the plots by writing one line, even if you've got hundreds of lines of plotting and stuff, right?
29:05 It just picks it up.
29:06 Exactly, yeah.
29:07 And the common workflow for a data scientist is you're reading in a lot of input data, right?
29:13 Then you want to transform that data.
29:15 So you're doing generally a lot of method chaining is a common pattern where you want to do things like filter and select a time and maybe pick a drop a column and do all kinds of things, right?
29:26 At the end, you either want to show that data or write it somewhere or plot it, which is very common.
29:31 Now this interactive part, Philip demoed this, or he gave a talk at PyData Global about two months ago I think.
29:39 It kind of extends on that, and this blew my mind when I saw it.
29:43 So if you have a data frame like thing and you put .interactive after it, then you can put your method chaining after that.
29:50 So this is an example where you say I want to select a discrete time, and then I want to plot it.
29:57 And this particular example doesn't have a kernel running in the back end, so it's it's not gonna switch, but if you were running this in an actual live notebook, it would be changing the time on this chart.
30:09 And again, this is built to work with a lot of the big data type APIs that match the pandas API.
30:16 - Nice.
30:16 So for people listening, if you say .interactive and then you give the parameter that's meant to be interactive, that just puts one of those I Python widget things into your notebook right there, right?
30:27 That's cool. - Yeah.
30:28 So a related library is called Panel, which is, it is for building dashboards directly from your notebooks.
30:37 So you can, if you had a Jupyter notebook, you could say panel serve and pass in the notebook file, and it'll make a dashboard.
30:45 That's the thing I wanna show in a second here.
30:48 But the way the interactive works is really neat.
30:51 So wherever you would put a number, you can put one of these widgets.
30:55 And so you can have time selectors, You can have things like sliders, and you can have input boxes, and things like that.
31:03 And all you do is you would change the place where you put your input number, and put one of those widgets in.
31:08 And then it sort of, I actually don't know how it works exactly under the hood, but from what I understand, you put this interactive in, and then it's capturing all the different methods that you're adding onto it.
31:18 And anytime one of those widget changes, it will change everything from that point on.
31:23 And so the demo here was from another panel contributor Mark Skov-Madsen, and I'm just going to play this and try to explain it.
31:31 So we have a data pipeline on the right where we've chained methods together.
31:35 And what he's done here is he's just placed a widget as a parameter to these different methods on your data frame. And then this is actually a panel dashboard that's been served up in the browser. And you can see this is all generated from the little bit of code on the right. So if you want to do interactive data analysis or exploratory data analysis, you can really do this very easily with this interactive function.
31:59 When I saw this, I hit myself in the head because normally, my pattern here was I had a cell at the top with a whole bunch of constants defined.
32:08 I would manually go through and change the time, start time from this time to this time, or change this parameter to this, and run it again and over and over.
32:16 >> You got to remember to run all the cells that are affected by it.
32:19 >> Exactly. The fact that you can do this interactively while you're working.
32:24 I could see how this would just, you don't break your flow while you're trying to work.
32:30 The method chaining itself is I really like too, because you can comment out each stage of that as you're going and debugging what you're working on.
32:38 This is really neat. I put a link in the show notes to the actual talk as well as this gist that Mark Skobmatt's input on GitHub.
32:48 It blew my mind. I would have made my life a lot had I known about this earlier.
32:54 >>Yeah, and one of the important things I think about plotting and interactive stuff is even if your end result isn't a panel or an interactive thing, sometimes getting to see the plot, seeing the data in a visual form, helps you understand what you need to do with it.
33:14 >>Yeah, exactly.
33:15 I mean, I did a lot of work in the past with time series data.
33:18 And time series data, especially if this was sensor data, you had a lot of dropouts, you might have spikes, and you're always looking at it and trying to make some judgment about your filter parameters and being able to have that feedback loop between changing some of those and seeing what the result is, is a huge game changer.
33:38 >> You can hand it off to someone else who's not writing the code and say, "Here, you play with it.
33:42 Give it to a scientist or somebody." >> That's exactly right.
33:46 That's what panel's all about is, The biggest challenge that I always had and many data scientists have is you do all your analysis in a notebook, but then you got to show your manager or you got to show your teammates.
33:57 And going through that trajectory can be very challenging.
34:02 These new tools are amazing to do that.
34:05 But that's how I turned myself into a software engineer because that's what I wanted to do.
34:09 But I went down the rabbit hole and learned Flask and Dash and how to deploy web apps and all this stuff.
34:16 Well, I'm glad you did.
34:18 Yeah, maybe I wouldn't be here if I hadn't done that. But yeah, this is really cool. And I definitely recommend people look at this. There was also another talk, sorry, this is an extra, but there was another talk at PyData Global, hosted by James Bednar, who's our head of consulting, but he leads PyViz, which is a community for visualization tools. And it was a comparison of four different dashboarding apps.
34:42 So it's panel, dash, voila, and Streamlit.
34:47 They just had main contributors from the four libraries talking about the benefits and pros and cons of all of them.
34:53 So anyone who wants to go look at those, I definitely recommend that too.
34:56 >> That sounds amazing. All those libraries are great.
34:59 >> Nice. Thanks. Speaking of those extra parts of the podcast that make the podcast longer, we should do some extras.
35:06 >> We should do some extras.
35:09 Got any?
35:10 I don't have anything extra.
35:11 Matt, how about you?
35:12 Yeah, two things.
35:14 So first, if you can show my screen, last year Atacana hired the Piston developers.
35:20 Piston is a faster implementation fork of CPython.
35:24 I think it was at Instagram first, I can't recall.
35:27 But anyway, right before the holidays, they released pre-compiled packages for many of, a couple hundred of the most popular Python packages.
35:36 So if you're interested in trying Piston, I put a link to their blog post in here.
35:41 They're using Conda right now.
35:43 They were able to leverage a lot of the Conda Forge recipes for building these.
35:46 This is that binary compatibility challenge that we talked about earlier.
35:50 So I know the team's looking for feedback on that.
35:54 If you want to try that, feel free to go there.
35:56 And it mentions in the blog that they're working on PIP.
35:58 That's a little harder too, just because of how, you know, the build stages for all the packages aren't centralized with PIP.
36:05 So it's a little more challenging for them to do that.
36:07 And then just the last thing is, you know, I don't want to be too much of a salesman here, but we are hiring.
36:16 It's an amazing place to work and I definitely recommend anyone to go check it out if they're interested.
36:21 Fantastic. Yeah, and you put a link in the show notes if people want to.
36:25 Yeah, it's anaconda.com/careers.
36:27 And we're doing a lot of cool stuff and growing.
36:30 So if anyone's looking for work in, in data science or just software and building out some of the things we're doing to try to help the open source community and bridge that gap, it's spelled it wrong, bridge that gap between the enterprise and open source and data science in particular.
36:45 Yeah.
36:46 Yeah.
36:46 It definitely seems like a fun place to work.
36:48 So cool.
36:48 People looking for a change or for a fun Python job.
36:52 Yeah.
36:52 And people do reach out to, yeah, cool.
36:55 People do reach out to Brian and me and saying, Hey, I really want to get a Python job and doing other stuff, but how do I get a Python job?
37:02 Help us out.
37:02 So we don't know, but we can recommend places like Anaconda for sure.
37:07 Yeah.
37:07 It looks like there's about 40 jobs right now and, so pick it out.
37:10 Fantastic.
37:11 Oh, wow.
37:11 That's awesome.
37:12 All right.
37:12 Well, would it surprise you if I had some extra things?
37:16 It would surprise me if you didn't.
37:18 All right.
37:20 First of all, I want to say congratulations to Will McGugan.
37:23 We have gone the entire show without mentioning rich or textual.
37:28 Can you imagine?
37:29 Almost.
37:30 But no.
37:31 Only because I knew you were going to talk about this, otherwise I would have thrown it in.
37:35 Yeah, so Will, last year, a while ago, I don't know the exact number of months back, but he's planning to take a year off of work and just focus on rich and textual.
37:45 It was getting so much traction.
37:47 He's like, I'm just going to live off my savings and a small amount of money from the GitHub sponsorships and really see what I can do trying that.
37:55 - Well, it turns out he has plans to build some really cool stuff and has actually, based around Rich and Textual in particular, and he has raised a first round of funding and started a company called textualize.io.
38:12 How cool is that?
38:13 - Well, we don't know because we don't know what it's gonna do.
38:16 - All you do is if you go there, it's like a command prompt.
38:18 You just enter your email address.
38:20 I guess you hit enter, something happens.
38:22 Let's find out what happens.
38:23 Yes, I'm confirmed.
38:25 basically just get notified about when textualize comes out of stealth mode, but congrats to Will.
38:29 That's fantastic.
38:30 Another one, we've spoken about tenacity.
38:32 Remember that, Brian?
38:33 - Yeah.
38:34 - So tenacity is cool.
38:35 You can say, here's a function that may run into trouble.
38:37 If you just put @tenacity.retry on it and it crashes, it'll just try it again until it succeeds.
38:44 That's probably a bad idea in production.
38:45 So you might want to put something like stop after this or do a little delay between them or do both.
38:51 I was having a race condition.
38:53 We're trying to track when people are attempting to hack, talk Python, the training side, the Python bite side and all that.
39:00 And it turns out when they're trying to attack your site, they're not even nice about it.
39:04 They hit you with a botnet of all sorts of stuff.
39:06 And like lots of stuff happens at once and there was this race condition that was causing trouble.
39:10 So I put retry, a tenacity.retry, boom, solved it perfectly.
39:15 So I just wanted to say, I finally got a chance to use this to solve some problems, which was pretty cool.
39:19 - That's really cool.
39:20 The other one that's similar to this, which I've used, and I think, I don't know if you've used Brian, but it's called pytest Flaky.
39:26 - Yeah. - And it's awesome because I was working with this time series data historian.
39:31 I had a bunch of integration tests in my last job, but you know, network stuff, it would drop out occasionally.
39:36 And so you can do very similar type things and wrap your test in an @flaky decorator and do similar type stuff and, you know, give it three tries or something before you make it fail.
39:49 - Yeah, exactly. That's cool.
39:50 That's what I think mine does three tries and it's like randomly a couple of second delay or something.
39:55 Remember that part, Brian, where we talked about, it's really cool if people are in the audience while we talk about stuff and then get a little feedback.
40:00 So Will McGugan says, "Hey, thanks guys.
40:02 "Can't wait to tell you about it." Yeah, congrats, Will, that's awesome.
40:04 Glad to see you out there.
40:05 All right, a couple of other things.
40:07 Did you know that GitHub has a whole new project experience?
40:11 That's pretty awesome.
40:12 Have you seen this?
40:13 - I haven't.
40:14 I haven't seen this.
40:15 - So you know how there's like this Kanban board, Kanban board, where you have like columns you can move your issues between them.
40:21 So just last week, they came out with this thing called a beta projects where it still can be that, or it can be like an Excel sort of view where you have little dropdown combo boxes.
40:32 Like I wanna move this one in this column by going through that mode or as a board, or you can categorize based on some specification, like show me all the stuff that's in progress and then give me that as an Excel sheet and all these different views you have for automation.
40:47 and then there's APIs and all sorts of neat stuff in there.
40:51 So if you've been using GitHub projects to do stuff, you can check this out.
40:55 It looks like you could move a lot more work towards that on the project management side of software than you used to.
41:01 - This is really neat, yeah.
41:02 In my previous job, I was using Azure DevOps.
41:06 I was always wondering when some of those features might move to GitHub.
41:08 I don't know if that's what happened here, but being able to have this type of project management in there for this type of things, it's really, really great.
41:17 - Yeah, super cool.
41:18 - Yeah, one of the things I love about stuff like this is because even, I mean, yes, a lot of companies do their project management on, or projects on in GitHub or places like that, but also open source projects often have, they're often have the same needs of project management as private commercial projects, so.
41:41 - Yeah.
41:42 - I personally, I only have a few open source small projects that are kind of personal and no one would probably want to use them.
41:49 But even just keeping notes about to do's and future stuff and it would be really nice.
41:56 Just for future you if nothing else, right?
41:58 Yeah.
41:59 Awesome.
42:00 Okay, so this is cool.
42:01 Now the last, yeah, this last thing I want to talk about is Markdown.
42:03 So Roger Turrell turned me on to this.
42:08 is this new Markdown editor, that's cross platform, yes, cross platform called Typora.
42:15 And we all spend so much time in Markdown that, just wow, this thing is incredible.
42:21 It's not super expensive and it looks like a standard Markdown editor.
42:25 So you write Markdown and it gives you a whizzy wig, you know, what you see is what you get, style of programming, which is not totally unexpected, right?
42:34 But what is super cool is the way in which you interact with it.
42:38 and actually I am going to show you real quick.
42:40 So you can see it and then you can tell people like, what do you think about this?
42:45 Here, I think that's it, back.
42:47 - Waiting. - There, okay, yeah.
42:49 So here's a Markdown file for my course, just the practices and whatever.
42:53 You can say, you know what, I would like to view that in code style, right?
42:57 Well, that's kind of cool.
42:58 We want to edit this, you click here and it becomes.
43:01 - Ooh, comes Markdown.
43:02 - Becomes Markdown, but this is a boring file.
43:04 So let's see about, it has a whole file system that navigates like through your other markdown stuff, hierarchically, so like here, chapter eight's a good one.
43:13 So we go over to chapter eight on this, and now you can see some more stuff.
43:16 Like you can go to set these headings and whatnot, but if you go to images, like you can set a caption, and then you could even change the image, like right here, if it were a PNG, it's not, but so I'll put it back as JPEG, and then it comes back.
43:28 You can come down and write a code fence, use the right symbol, and you can say def A, right, whatever, And then you pick a language.
43:37 Isn't that, isn't that dope?
43:39 Oh, this is so good.
43:41 So if, if you end up writing a lot of Markdown and if you need to get back, you just, go back and switch back to raw Markdown and then go back to this fancy style.
43:49 I think this is really a cool way to work on Markdown.
43:53 I'm actually working on a book with Roger and, it's got tons of Markdown and it's been a real joy to actually use this thing on it.
44:01 So yeah.
44:02 Does it have BI mode?
44:03 Probably not.
44:05 I don't know about that, but it has themes.
44:07 I can do like a night mode or I can do like a newspaper mode or take your pick.
44:13 It's pretty cool.
44:15 >> The weirdo grad student in me is upset that this isn't LaTeX.
44:19 >> It has built-in LaTeX.
44:21 >> Oh, that's not a lot of stuff.
44:23 >> Yeah, you can do inline LaTeX and there's a bunch of settings you can set for the LaTeX.
44:28 It's got a whole math section in there.
44:31 >> Oh, that's sweet. Okay.
44:32 >> Yeah, let's see.
44:33 - Am I the only person that went all the way through college pronouncing it latex?
44:37 - I did too, but I just learned that the cool way of saying latex.
44:41 - It's latex, yeah.
44:42 It's French, no, I don't know.
44:45 But no, yeah, it has support for like chemistry settings, like inline latex and math and all sides of good stuff.
44:51 So yeah, I'm telling you, this thing's pretty slick.
44:54 All right, well, I gotta do my screen share back because so you all can see the joke because the joke is very good and we're gonna cover it.
45:02 - Where's the joke? - But it's at the end.
45:03 It's at the end, so if people don't wanna listen to the joke, they don't have to.
45:06 Brian, I blew it.
45:08 - You did? - I blew it, I blew it.
45:10 Before we move off the Markdown thing though, Anthony Shaw says, "Editorial for iPhone and iPad "is really nice too." Cool.
45:16 So, but let's do the joke.
45:19 So I blew it because I was saving this all year.
45:22 I saw this like last March, and I'm like, this is gonna be so good for Christmas.
45:27 And then we kind of like had already recorded the episode, we're not gonna do it, we'll just take a break over.
45:32 So we didn't have a chance to do it.
45:33 So people are going to have to go back just a little tiny bit for this one.
45:38 Are you ready?
45:39 - Yes.
45:39 - Matt, you ready?
45:41 - Yeah.
45:41 - So this goes, this sort of a data, database developer type thing here.
45:45 And it's on a, I don't know why it's on a printout.
45:50 But anyway, it's called SQL clause as in SQL clause.
45:54 So it's, he's making a database, he's sorting it twice, select star from contacts where behavior equals nice.
46:01 SQL clause is coming to town.
46:04 - Nice.
46:05 - It would have been so good for Christmas, but we can't keep it another year.
46:09 I gotta get it out for television.
46:10 - You gotta sing it.
46:11 ♪ SQL clause is coming to town ♪ - Yep, exactly.
46:16 - Okay, I wanna share a joke that I don't have a picture for.
46:20 - All right, do it.
46:21 - But my daughter made this up last week.
46:23 I think she made it up, but it's just been cracking me up for, and I've been telling it to everybody.
46:28 So it's a short one.
46:30 Imagine you walk into a room and there's a line of people all lined up on one side, that's it.
46:36 That's the punchline.
46:38 - I love it.
46:39 Nice.
46:41 We had my cookie candle last time.
46:46 My kid always eats cookies.
46:49 - We've got a dad joke of the day channel in our slack at work and it makes me oof every time.
46:55 - Nice.
46:58 - Nice, okay.
46:59 - All right, nice to see everybody.
47:02 Thanks Matt for joining the show.
47:03 - Thank you for having me.
47:04 - Good to see you Michael again as always.
47:06 - Yeah, good to see you.
47:07 Thank you, thank you.
47:08 Thanks for listening to Python Bytes.
47:10 Follow the show on Twitter via @pythonbytes.
47:13 That's Python Bytes as in B-Y-T-E-S.
47:16 Get the full show notes over at pythonbytes.fm.
47:19 If you have a news item we should cover, just visit pythonbytes.fm and click submit in the nav bar.
47:24 We're always on the lookout for sharing something cool.
47:26 If you wanna join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live.
47:34 That's usually happening at noon Pacific on Wednesdays over at YouTube.
47:38 On behalf of myself and Brian Okken, this is Michael Kennedy.
47:42 Thank you for listening and sharing this podcast with your friends and colleagues.