Transcript #157: Oh hai Pandas, hold my hand?
Return to episode page view on github00:00 Hello and welcome to Python bites where we deliver Python news and headlines directly to your earbuds. This is a episode 157 recorded November 14 2019.
00:10 I'm Brian Okken and I'm Michael Kennedy and this episode is brought to you by Digital Ocean.
00:15 So Michael we're gonna cover a topic that we've covered a little bit before.
00:19 I think we covered cerebrus right or Cerberus. Cerberus yeah we covered Cerberus which is like a validation layer for unstructured data.
00:29 So this is as built as part of the Eve framework by Nicola who runs both of those projects.
00:35 And it's really nice, right? So I get like some JSON post back to my rest. It's a rest framework, I get a JSON post for some data. I have some models to find it can tell you whether they're a fit or not. It can tell you what's required. I do think the way you set it up is a little bit of out of band. So Colin Sullivan shot us a note after that said, hey, that's really cool. You you should also talk about Pydantic.
01:00 Had you heard of Pydantic?
01:01 - I think so, but yeah, tell me more.
01:03 And it's got a great name.
01:04 - Yeah, it has a great, it definitely has a, yeah, yeah, it's got a super name.
01:08 I didn't believe I've heard of it, but I didn't do anything with it.
01:10 So on Colin's suggestion, I checked it out, and yeah, this is a sweet simple framework that solves some really nice problems.
01:18 And a lot of times with these frameworks, I'm like, yeah, I would love to use this, but at the same time, like, it's not that helpful.
01:24 And so I'm not sure I'm actually gonna use it.
01:26 I could just put a little test in my class to make sure this file, like this thing parses an int or this name is here or whatever.
01:33 But this one like might convince me to do it because yeah, this is super, super cool.
01:37 All right, let me tell you what it is.
01:38 So it's data validations and settings managements for using Python type annotations.
01:43 And it's the type annotations that make me really extra happy.
01:46 - Oh, really?
01:47 Okay. - Yeah.
01:48 So you know how we've got data classes and you can have like annotated values there and you get a little validation and whatnot.
01:55 But this is super cool.
01:57 So I can just take a created class and say it has things like an ID, which is an integer, a name, which equals a default string, a date time, which has a default of none, things like that, right?
02:09 So you basically, either you have type annotations or the thing has a default value, which implies the type, okay?
02:17 And this probably represents some data that's exchanged over rest or something like that, right?
02:22 Some sort of dictionary.
02:23 So if I get a dictionary back, And then what I can do is I can just star star unpack that dictionary into the object, the class that I've defined.
02:32 So basically keyword arguments, id equals whatever the value is, name equals whatever the name in the dictionary is, and so on.
02:38 And it will validate all that using some really simple rules that you follow along.
02:43 So we've got a class and it has an id, which is an integer, but has no default value, it has no none.
02:49 That means the id has to obviously be an integer, but it's also required.
02:53 If it's not there, an error will be raised.
02:55 The name is a string, so it has to be a string, but because it has a default value, it's optional to pass it.
03:02 That's cool, right?
03:03 The date time, which is a date time field, is not required because it has none as a value, but if nothing's passed, it's just going to be none, so it's an optional date time, that's pretty cool.
03:15 So some of the reasons that I think this is cool and they call out in their web page is that it works automatically with all the IDEs that you already have.
03:23 Right.
03:23 There's no like special, Oh yeah, there's a YAML file that tells me what this schema looks like for this, or there's a JSON schema that comes back and like, no, it's standard Python with type annotations.
03:35 So your IDE knows already what all those things are and you don't have to like backfill that.
03:41 Right.
03:41 So the, the validation also works for just working with the classes.
03:44 That's pretty cool.
03:45 Right?
03:45 Yeah.
03:45 That's cool.
03:46 Yeah.
03:46 It's supposed to be faster than all the other libraries they tested.
03:49 and they have a link to the ones that they did.
03:50 It also supports really rich recursive validation.
03:55 So if you've got like a list or a tuple and maybe like stuff is inside there, right?
04:03 Or something like that, right?
04:04 You've got some nested types.
04:06 So it'll actually recursively traverse the stuff that you're looking for, right?
04:11 So it doesn't just test the top level things, it tests like the entire object graph.
04:15 And by default, the way it works is you derive from the Pydantic base model, which is cool, but you can also use a decorator on a data class, which we talked about.
04:24 It's very similar 'cause the type annotations, and it'll actually put parsing and validation on there for you.
04:29 - Oh, that's neat.
04:30 - Yeah, so if you really wanna use data classes, you can make them better with Pydantic as well.
04:33 - Okay. - Yeah, simple, right?
04:35 - Yeah, so where do you put it in your, so do you get it like when you get data in, you validate the data with Pydantic then?
04:41 - That's the thing is you don't even validate it.
04:42 That's what's so sweet about it.
04:43 There's not even a validation step.
04:45 You have the class, and in the notes, I'm putting a user class, so I'll maybe reference that.
04:50 And then you get some external data, which is a dictionary.
04:53 And then you just create the user object, say, user of star star dictionary, and it'll unpack it keyword-wise.
04:59 The validation happens in the fact that you don't have a dunder init on your class, and it derives from the Pythonic base model, so it uses its dunder init, which theoretically does the validation.
05:09 >> Okay, so you can't not validate then.
05:11 >> Yes, exactly.
05:12 Yeah, you just basically, I tried to create the class, and either that worked really well or not so much.
05:16 - Oh, that's actually pretty cool.
05:18 - And you get like a JSON response of all the things that were wrong with the validation as part of the exception, I believe.
05:22 So you can actually go, no, there's actually three things wrong here, not just, well, the first thing it hit made it crash.
05:28 - So this is obviously useful for like REST APIs and stuff like that or using, grabbing external data.
05:33 But there's a lot of times where we're passing dictionaries around between components and it'd be good to have some, if there's less trusted components, to have some sort of validation.
05:42 So this is pretty cool.
05:43 - Yeah, even web forms that get posted back, a lot of times those come back in Pyramid or Flask as dictionaries, right?
05:50 If you wanted to map that to a class, right, you could get validation.
05:53 There's a lot of places, yeah.
05:54 - Yeah, cool. - Even settings files, right?
05:55 - Yeah, yeah, there's a lot of people that just throw stuff that gets adjacent or something that gets thrown in a file, and it's user editable also.
06:04 So you have to validate it because who knows what somebody edited it to, so.
06:10 - Yeah, absolutely.
06:11 All right, what you got next for us?
06:12 I am doing, hopefully doing a favor, adding work to Ned Batchelder.
06:17 So he posted on Twitter recently that there is changes afoot in coverage.py.
06:23 So coverage is, hopefully everybody knows, coverage is great for using to tell you how much of your code base, your test suites are covering.
06:33 I mean, that's how it's usually used.
06:34 You could potentially do anything to try to measure coverage, but usually it's around a test suite or something.
06:40 Anyway, so the change is they've added measurement contexts.
06:44 So allowing you, while it's collecting data for coverage, it collects what was the context of what it was doing while it was covering certain bits of code.
06:55 Now that seems a little, the obvious use model of that, there's lots of use models.
07:00 The obvious use model is which test covered which line of code.
07:05 And to have that, and that's a lot of data.
07:07 So he's changed the way the data for coverage is being stored and it's pretty cool.
07:13 So I'm gonna jump to the conclusion.
07:15 There's this cool feature, the context feature is very cool.
07:18 I wanna talk about that.
07:19 But first of all, it is a little bit of a break in the coverage, use of coverage.
07:24 I think the reason is just because there's a, the way the data is stored, there's a little local database stored.
07:30 So there's another dependency that isn't an external dependency.
07:35 It's a built-in dependency, but it's something that some versions of Python don't always have, I guess.
07:43 So for that reason, he's asking everybody, please try out the beta one, coverage 5.0 beta one, and try it out and let them know if there's any issues.
07:52 - Right, so basically the idea is go try it, see if what you're doing before still works.
07:56 If not, let them know real quick before it becomes permanent, right?
07:59 - Right, exactly.
08:01 And I really want this to become permanent because measurement context is so cool.
08:05 I tried it out this morning.
08:07 I'm gonna put in show notes, I wasn't really clear on how to download, how to install a beta version of something.
08:13 So you just do the, like for this, it's pip install coverage double equal 5.0 B1.
08:20 - Okay.
08:21 - So we'll put that in the show notes.
08:22 It's not too bad to install it.
08:24 And then also I didn't put this in the show notes, but one of the other tricks I found out is if you wanna know what versions are available to pip install, you can just do the coverage equal equal and then don't list a version and you'll get an error message that says, I don't know what you're talking about, but here's all of the versions that are available.
08:42 - That's pretty awesome, I didn't know that.
08:43 - Yeah, that's pretty cool.
08:44 So I traded out a few lines of code to, or a few lines of command line stuff to run coverage on a little dummy file.
08:51 And sure enough, if I generate the HTML report, on the right-hand side of the window, or the screen, I've got little drop-downs on every line of code to tell me which test covered which line of code.
09:03 - I like that a lot, that's cool.
09:05 Yeah, that's great. - Very neat.
09:06 - Yeah, super nice.
09:07 I look forward to it.
09:08 - Okay, I don't know why I think this is funny.
09:10 My brain's just not working, man.
09:11 Will you do the ad read?
09:12 - Got it.
09:13 Now, this episode is brought to us by DigitalOcean, and I just want to tell you about something brand new that's gone from beta to general availability, memory-optimized droplets.
09:25 Droplets are DigitalOcean's words for virtual machines, right?
09:28 Goes to cloud, cloud's full of rain, rain droplets, that sort of thing.
09:32 And if you have some sort of workload that requires a lot of memory, well, then these things are like super optimized that.
09:38 So it has eight gigs of RAM for each dedicated virtual CPU.
09:43 You can get them with two or many, many more, right?
09:47 Multi-core systems.
09:48 So basically you can go all the way from 16 gigs to 256 gigs of RAM, which is like a ridiculous amount of RAM.
09:58 One thing you do to make your app run faster is to make sure like it never touches the disk, right?
10:02 So Vic could just cache everything, that would be great.
10:05 So they're really good for things like high performance SQL or NoSQL databases, large memory caches and indices, indexes, things like that.
10:14 And just lots of big data and stuff running like with large runtime requirements.
10:18 So you need between 16 to 256 gigs of RAM and you want to just pay mostly for the memory, right?
10:26 The pricing's optimized around that use case, then check them out at pythonbytes.fm/digitalocean.
10:32 They're a big supporter of the show.
10:34 Speaking of cool stuff, the PSF and the Python Software Foundation Packaging Working Group, actually, that group of the PSF, they're looking to hire some folks.
10:45 They're looking for, I think, three developers and maybe a project manager.
10:50 I can't remember exactly all the details, but quite a few number of people to make pip better.
10:55 Like you just said, if you said, pip install coverage double equals, it'll help you, right?
11:00 So this is supposed to be a much better setup.
11:04 So the idea is that one of the things that could be improved in pip is its dependency resolver.
11:12 So it's, this package depends on this thing, but other package also maybe depends on that, but a different version, or I don't know how often it's happened to you, but I've had the order in which I list stuff in the requirements causing issues because one requires, I don't know, doc opt of this version, the other one requires doc opt of another version, and how can you possibly install them both at the same time, right?
11:35 Weird stuff like that.
11:36 - Poetry has noticed this problem, and it has a solution to it, but it's around poetry, and it'd be really cool if that sort of dependency resolution was built in to pip.
11:46 That would be great.
11:47 - Yeah, the underlying idea is to make distributing and installing Python software just more reliable and easier.
11:54 So funding has been allocated to two contractors, a senior developer and an intermediate developer, that's what it is, to work on developing, testing, and building this feature, the test infrastructure, code review, bug triage, all that kind of stuff.
12:09 And this is a non-trivial offering.
12:12 So I believe the senior developer will end up getting $116,000 out of this based on the time they're estimating and the rate.
12:21 And then the either senior developer or the contractors, I can't remember, get 103,000 each.
12:27 This is quite significant.
12:28 - Yeah, not too shabby.
12:29 - Yeah, that's like a, not just a, "Hey, I need somebody to work on this for a couple weeks." That's like a legit thing.
12:34 So if you'd like to contribute to Python, work on PIP, things like that, just go check out this link.
12:39 It shows you how to apply.
12:41 - Very cool.
12:41 - Yeah, so when I work on pandas, Brian, I kind of feel a little bit lost.
12:46 There's all these operations, and I don't use pandas enough to kind of actually know what I should be doing.
12:51 Often it's in the context of Jupyter Notebooks where the autocomplete slightly less good than PyCharm or VS Code.
12:57 I could always use some help when I'm working on pandas.
12:59 How about you?
13:00 - Yeah, I could.
13:01 I know people that, there's a lot of people that work in it all the time, but I usually just jump in for some particular use.
13:07 And I know I don't know the best way to do things.
13:10 There's a thing called the Dove Panda.
13:13 I think I'm saying that right, D-O-V Panda.
13:16 And this was submitted by Dean Langstrom, Langsom, sorry.
13:20 I think that it's his project.
13:22 But essentially it's a overlay on, I'm just gonna read his thing.
13:27 It says directions, so DovePanda has directions and are hints and tips for using pandas in an analysis environment.
13:36 DovePanda is an overlay for working with pandas.
13:39 And so the idea is like if you have this installed also, you're working in a Jupyter notebook and you start typing stuff, you start doing pandas operations, It looks at what you did and provides hints and it pops up in little windows in your notebook to give you hints on, I think you're doing this, but there's a better way to do it, or giving you tips.
14:01 - So it's like Clippy for pandas in Jupyter.
14:05 - Yeah, but it's definitely, it's sort of, but instead of having just one Clippy that pops down, they're in your notebook so you don't have to deal with them right away, but you can go back and improve your use of pandas within the notebook.
14:20 It's pretty cool.
14:21 - Yeah, it actually looks really helpful.
14:23 So the example they have, they've got a bunch of pictures on the GitHub repo you can check out.
14:28 But like for example, there's one where someone's calling pd.concat and taking two data frames and specifying the axis equals one.
14:36 And then the little panda pops up and says, "All data frames have the same columns," which hints for concat on axis zero.
14:45 You specified axis one, which may result in unwanted behavior and it'll show you the code.
14:49 Or after concatenation, you're gonna have duplicate column names, pay attention, and things like that.
14:55 It's got a bunch of great little tricks.
14:57 And then, you know how you had mentioned Kevin Markham from dataskool.io and his tips?
15:03 You can type dovepanda.tip and it'll pull up a Kevin Markham tweet.
15:07 - That's pretty cool.
15:08 - Like inside your notebook, it'll pull up like some random tip.
15:11 - Yeah, that's pretty cool.
15:12 - Yeah. - Circle there.
15:13 - And if you, like, you can use it, apparently you can use it, not even just in notebooks.
15:18 So there's a command line mode where you can set the output to be, you know, there's no inline output to go to, so you can tell it to print the output to just standard out or to a display or to somewhere else.
15:30 That's nice.
15:31 So if you're using, you wanna have these sort of tips, but you're not using notebooks, you can still get them.
15:36 So. - Yeah.
15:37 Very cool.
15:38 This next one is really simple, but I think some folks will find it super useful.
15:43 You know, maybe you've picked up that project from someone else at work, and they're not following all the best Python practices.
15:51 You see a bunch of import stars all over the place.
15:54 And you're like, man, didn't somebody tell these people that import star is not worth it, right?
16:00 That there's all these potential drawbacks.
16:02 So enter remove star.
16:05 Remove star is a command line app you can run or command you can run.
16:10 And you point it at either a module, a file, a directory, something like that.
16:15 it will go through and by default it'll just find the issues where import star is done and then it will look at the actual files and say well you said import star but you're actually just you know like from collections import star maybe you're actually just using named collections and counter or something like that yeah maybe that's it or tools anyway you're just using one or two things and it'll say you know what you could replace that line with from collections import name tuple.
16:44 Right?
16:45 And it could just adjust that or you could actually give it a command to say, no, just change all my files.
16:49 Fix it.
16:50 Yeah.
16:51 This is very cool.
16:52 Yeah, it's great.
16:53 So it's not that it just says import star is bad.
16:56 It actually figures out what of that star is being used and what you should actually write and then we'll write it for you.
17:01 Yeah.
17:02 So my normal operation when I see something like this is just to comment out the import statement and see what breaks.
17:10 that's not the best way to do things.
17:12 So this is way better.
17:13 I like it.
17:14 - Yeah, yeah.
17:15 It reminds me a little bit of Flint, F-L-Y-N-T, which will take all your strings and rewrite them as f-strings.
17:21 This will take all your import stars and rewrite them as proper specific imports.
17:25 - OMG, I totally forgot about Flint.
17:28 We've got a whole bunch of code that we wrote for 3.5 that still has all the old stuff in it.
17:35 So yeah, I gotta use that.
17:36 - Well, it's about to get a whole lot better.
17:38 Hit it with Flint.
17:38 It's so good.
17:39 - Yeah, definitely.
17:40 - Awesome, all right, well, that's it.
17:42 Remove stars, not a whole lot to it.
17:43 It's just a great little command line tool you can use to make your Python code better.
17:47 - Yeah, so the last thing I wanna talk about today, actually, oddly enough, we didn't plan this, is another, it came from Brian Rutledge too.
17:56 So the PSF thing that we talked about, the hiring developers came from him too.
18:01 So we've got two stories from Brian.
18:03 So thanks, Brian, for helping us out.
18:05 - Yeah, absolutely, thanks, Brian.
18:06 Double thanks.
18:07 Also, one of the things that Brian's working on is a pytest plugin called pytest Quarantine.
18:13 This is so cool.
18:14 Hopefully, all your tests pass, but let's say you've got a, you just implemented, you got really fantastic, you got into testing and you started it right in a bunch of tests and you put it on a code base and you got a bunch of failures.
18:28 You know you're going to fix them, but you're not going to fix them right away.
18:31 So what do you do?
18:32 And the idea with pytest Quarantine is it saves a list.
18:35 So you run it once and you tell it to save a list of all the failing tests.
18:39 And it saves it somewhere and you can throw it in Git or something, store it.
18:44 And then you run it again with that test or that list, and it automatically marks all of the tests that have failed in the past as X fails.
18:55 Now this is something you can do manually to say, I know this is gonna fail, just run it as an X fail instead of, it separates it from a failure.
19:04 know there's arguments of whether that's a good or bad but it's very useful so that you can still use your suite to find new failures while you're working on the old ones. Anyway this is a nice little extra tool I think it's super cool. I also wanted to bring this up because he sent me this really nice email. So apparently I met Brian a couple times at PyCon in Cleveland and he said he was a started out as a complete pytest newbie and bought my book, started working through it, loved pytest, and then helped his company to adopt pytest and then wrote this plugin and he wrote it at work and convinced his company to be able to release it as open source. So that's super cool. Yeah that's really great. Yeah good work Brian. This sounds like super useful. You know you've got to make some huge change if it breaks 50 tests. You can't start solving all 50 at once right? You got to like chop your way out of them. So yeah. Chip away at it. Yeah exactly.
19:59 quarantine them and then just you know take them one at a time. So yeah I like it I mean there are ways in which you can deal with this like in PyCharm you could say run only this test or run certain ones but you know like it doesn't help you on continuous integration or something like that right so yeah I think this is great. And one of the things I wanted to bring up also is I've dealt with this in the past on a temporary basis of course where you've got for some reason a breaking change that fails some things you're working through them and we have occasionally if there's like a known failure that the fix is scheduled right it's a we know about it we're gonna fix it but it's not going to be fixed for like three weeks but you can add X fail to the test itself but one of the issues with that is you're to add the X fail mark you edit the test file so one of the benefits of this is you're not actually editing the test file you're editing a different file that marks those so that's kind of right you don't want those changes to show up and get saying well, we made all these changes to these tests, but actually, no, we're just trying to fix something else and get them out of our way.
21:01 Yeah, I like it.
21:02 All right, well, that's it for all of our main items.
21:03 Brian, you got anything extra you wanna throw out there?
21:05 - I do not.
21:07 How about you?
21:08 - I've got some pretty cool news.
21:09 So I recently decided to go through the effort of figuring out how much energy all of our services and servers use, right?
21:18 So for like delivering Python bytes and Talk Python and Talk Python Training courses and all that stuff.
21:24 And I figured out how much that was and went out and bought renewable energy credits to offset all the carbon from all of our infrastructure.
21:32 - Wow, that's neat.
21:33 - Yeah, yeah, yeah.
21:34 So I'm gonna keep doing that going forward.
21:36 So not a huge, huge amount, but it's, you know, I think a good signal for all the other companies out there as well to say, look, if this podcast or these podcasts can be carbon neutral for their server infrastructure, why can't we, right?
21:50 - Yeah. - Yeah, cool.
21:51 So anyway, small, but hopefully can trigger some good change. Are you ready for a joke? I am so ready for a joke. I need it this week. Well it's it's more science than it is programming but I think our audience will generally generally like it. So I'm gonna tell the joke and then explain the joke because I'm not sure everyone will know but I think a lot of us will get it. And jokes are so much more funny if you explain them all. I know absolutely they are. So imagine a time not too long ago Dr. Heisenberg from Quantum Mechanics fame. He's driving down the highway and he gets pulled over for speeding. The policeman comes over, the officer says, "Excuse me sir, do you know how fast you were going?" Heisenberg pauses for a moment and then answers, "No, but I do know where I am." - I love that. That's so funny. - Yeah, so the Heisenberg uncertainty principle basically says that the position and velocity of an object cannot both be measured exactly at the same time, not not even theoretically.
22:52 You can know one or the other, but not both.
22:54 So yeah, he knows where he is.
22:55 - Oh yeah, funny.
22:57 - Pretty good.
22:57 All right, well, thanks for being here.
22:59 - Super cool, thanks. - Good to be back together after taking off and hiding in Florida for a while.
23:03 Now we're back on the usual track.
23:04 - Yeah. - Yeah.
23:05 All right, have a good one.
23:06 - You too, bye. - Bye.
23:07 - Thank you for listening to Python Bytes.
23:09 Follow the show on Twitter @pythonbytes.
23:11 That's Python Bytes as in B-Y-T-E-S.
23:14 And get the full show notes at pythonbytes.fm.
23:17 If you have a news item you want featured, visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool.
23:24 This is Brian Okken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.