Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #356: Ripping from PyPI

Return to episode page view on github
Recorded on Tuesday, Oct 10, 2023.

00:00 - Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 356, recorded October 10th, 2023.

00:10 I'm Michael Kennedy.

00:11 - And I'm Brian Okken.

00:12 - And this episode is brought to you by us.

00:14 Courses over at DocPython Training, the complete pytest course from Brian, Patreon supporters, and find us over on Fosstodon.

00:21 That's probably the best way to chat with us these days.

00:23 And also be part of the live show, pythonbytes.fm/live, usually Tuesdays at 11 Pacific time.

00:29 And you can also see all the old versions there as well.

00:32 So Brian, what are you going to start us with today?

00:35 >> Well, I thought I would start with a, it's going to have a theme of getting information from Fosstodon, people letting us know.

00:44 So I saw today a post by the PsychoPG group, so Psycho Postgres, and the post says, "It feels weird, but it's time to start considering "PsychoPG 2, the present, and PsychoPG 3, the future.

01:02 We've entered the time where PsychoPG 3 is the present and 2 is the respectable past.

01:08 Updated the feature page and a few other resources on the website to reflect this." So I thought I would check it out.

01:15 - Awesome. That's big news.

01:17 You know, Postgres is clearly the biggest database that people use in the Python space, according to the survey we talked about recently.

01:23 So yeah, this is relevant, right?

01:25 - Yeah, and this is a great library.

01:27 I've been using the version two for a long time, and I guess kind of like haven't been paying attention to the three, but it's been out for a while.

01:34 So the three page talks about, basically talks about the, it's a new implementation of the most used, reliable and feature rich Postgres adapter for Python.

01:47 And there is some differences though.

01:49 So the, apparently most of the two was written in C, but three is written in a combination.

01:57 So a lot of it's in Python and some of the speed ups are in C.

02:01 And so the big announcement really is that there are no, so we're gonna link to both the three page and the features page.

02:09 And the features page has a nice comparison of two versus three.

02:14 So really the recommendation is they're still going to maintain two, but you should maybe think about, mostly think about three for new projects.

02:24 So if you already have an existing project that's running two, go ahead and leave it running with that.

02:29 I couldn't find where they're doing it, when or if or where they're doing announced an end of life.

02:35 So I don't think that's even planned at this point for end of life for two, eventually probably.

02:40 But so two has been around since 2006, three started in first release in 2021, but they've got a whole bunch of cool things in here.

02:49 So some of the things I thought were pretty cool was we've got native asyncio support.

02:56 That looks pretty nice.

02:58 Native support for more Python types, such as enums and Postgres types.

03:03 So Postgres, one of the cool Postgres types was a multi-range so that's supported now.

03:08 That's pretty nice.

03:09 And then I don't know what parameter bindings are actually, but apparently two has a client-side parameter bindings and three defaults to server-side parameter bindings, but you can still do client-side if you want to.

03:24 So lots of fun stuff in here.

03:27 Advanced connection pool, static typing support.

03:32 Yeah, so I say why not try three and only go back to two if you really need to.

03:39 - Yeah, this is super exciting.

03:41 Those are a lot of great features.

03:42 I think the async support was previously done through a separate library, and now it's just part of it.

03:49 That's pretty cool.

03:50 >> Yeah.

03:51 >> The what is the present, what is the past, what is the future?

03:54 We'll come back to that in our extras.

03:56 We'll have some fun there.

03:57 >> Okay.

03:57 >> Then also, our audience is awesome.

04:01 My feed letter says, "Hey, I migrated PyPI from PsychoPG 2 to 3 in June.

04:07 It was not too hard, but took some time to do safely." So, hey, when you pip install things or PyPI lookup things, it's already using this.

04:15 Awesome.

04:16 >> Yeah.

04:16 >> Yes.

04:17 >> That's cool. Nice.

04:18 >> Indeed, indeed.

04:20 All right, shall we convert some data?

04:22 That's what I got next.

04:23 Sure.

04:24 Let's talk about it.

04:25 So we all know about Pydantic, right?

04:27 And Pydantic is cool.

04:28 So Pydantic lets us create classes that derive from some Pydantic base type, base model.

04:35 And it will do conversions and parsing of JSON and so on.

04:39 But maybe, maybe you know about data classes, right?

04:42 These are built in.

04:43 They're not anything separate.

04:45 They're part of Python itself.

04:46 And so being able to use them is pretty cool.

04:49 and Raymond Peck commented on one of our YouTube videos and said, "Hey, I use this thing called Dacite." Hopefully that's the right pronunciation, Dacite. And the idea is it allows you to create Python data classes in a similar way. So simple creation of data classes from dictionaries, APIs and other things. And you just create a data classes as you normally would, data just sort of classes with, you know, global or static level field type, right?

05:21 Like class user name colon string age colon int and so on. They just put the data class wrapper on it, right? And it gives it a bunch of nice things like hashability, comparability, etc. Right? Constructors. But if you've got a dictionary that has sub objects that themselves should be data classes and maybe a list of other things, there should be data classes, actually turning that JSON into one of these more complicated versions is a hassle, right?

05:49 You also might need to do data conversion as well, right, to make sure that the types match.

05:54 So that's what you can use this for, and it's pretty neat.

05:57 You can just say from dict, say site.fromdict, and you say what data class it's going to go to and what the dictionary is, and like a little factory method, out pops one of these objects.

06:07 Cool, right?

06:08 - Yeah, that's pretty cool.

06:09 - Yeah, and let's see, features include nested structures like I described, basic type checking, optional fields, union, so if you say it can be an int or a string, it'll check that it's one of those two, but not a float or something, forward references, collections, and interestingly, custom type hooks.

06:27 So for example, you can say anytime that you're gonna take a string, actually call string.lower.strip on it if it's not null, right, if it's not none, things like that, right?

06:37 you can actually get into the parsing side a little bit if you need to.

06:41 So that's pretty neat.

06:42 What else have we got here?

06:44 Let me scroll down.

06:45 Docs, of course, about all the things I said.

06:48 I think that's probably it.

06:50 I guess one thing it's worth pointing out, it says right in the docs, they say, "It's important to mention that it's not a data validation library." Now, when I first read that, I'm like, "But there's all these types." Is it supposed to find out when I say, the thing takes a string and an int and a bool for its three fields, like it will actually do that.

07:09 It does that, but you can't say like the string must be this regular expression and this many characters and the int has to be positive and like those types of things it doesn't do, right?

07:19 So it's just dictionary to object, possibly complex data class object, but with type validation and that's about it.

07:27 But still, I think that's super, super useful.

07:29 >> So kind of partway validator things.

07:32 >> Yeah. More like a proper parser without the validation itself.

07:37 >> Yeah.

07:37 >> Yeah. Indeed. So anyway, that's what I got for people.

07:40 It was news to me, so check it out.

07:42 >> I think that's pretty neat.

07:44 So I am going to continue on with a topic of, I guess, the ever rustification of Python projects.

07:55 This one is one that we use every day, PIP.

07:59 So this was this came to us from Owen, Owen Lamont, I think, Owen Lamont.

08:05 Thanks, Owen.

08:06 So I said, hey, you guys might be interested in this.

08:08 It's a under the project group.

08:12 Prefix dev is rip rip for for a rust pip written in rust.

08:20 And I was ready to go try it.

08:22 It's not ready to try yet, but it's still pretty exciting.

08:24 So the kind of the headline fast bare bones pip implementation in Rust.

08:30 And it's not just an installer, though. So it has, it's got, what does it have so far?

08:36 It's got, you can download and aggressively cache PyPI metadata, resolve PyPI packages using a project called Resolvo, which is a kind of a Rust thing. And then still on the planned list is actually installing the files.

08:53 (laughs)

08:55 I'm just chuckling because, yeah, I just jumped the gun.

09:00 But this is new, so it's fine that it's published early.

09:04 So first commits look like about two weeks ago.

09:07 So I'm pretty excited about this.

09:10 I think it'd be fun to try it out and look at possibly different resolvers and how they handled it versus normal pip.

09:19 So kind of neat.

09:20 - Yeah, when I saw this too, I was pretty excited.

09:24 So thanks, Owen, for sending this in.

09:25 Yeah, it's cool.

09:26 And Mike says, "Let her rip." I love it.

09:29 - Yeah, so it looks like maybe we should've swapped these last two topics, but I don't know.

09:37 Let's go with your next topic.

09:39 - Well, I do think that this one would've been pretty good for you to cover, but too bad, I'm already on it, so here we go.

09:46 I guess the only stronger tie-in to me is it's in response to a Talk Python episode.

09:50 So this one comes to us from Marwin, and thank you Marwin for sending it in and writing this article called, "How Not to Footgun Yourself When Writing Tests, "A Showcase of Flaky Tests." And it says, "I was writing this article "after listening to talk Python "with Gregory Kampfhammer and Owen Perry "talking about flaky tests." So that was the subject of that.

10:11 Basically talked about all of their experience here, which is cool, like a definition, and really a lot of examples of flaky tests.

10:20 I mean, you know, Brian, did you get to check any of these out?

10:22 I haven't looked at this yet.

10:23 No.

10:24 Well, we'll do it live.

10:26 Okay.

10:26 Okay.

10:26 So the first one is, really about concurrency and said, well, look, I've got a bunch of tests, maybe I could speed them up by using threading and run a bunch of them.

10:35 Oh yeah.

10:36 That'd be fun.

10:37 However, there's a real simple example of like, Hey, I've got an account and I can transfer money from one account to the other.

10:44 So first account that withdraw this amount and then second account deposit that out, right.

10:49 And how could that go wrong?

10:51 So do a bunch of those and a, Hey, if we want to make those faster, let's run them in some threads, right.

10:57 Rather than using, say, one of the PI test plugins, like more properly, right.

10:59 This is more to highlight, like what might go wrong, you know?

11:02 and it turns out that we have the GIL and I think, I think Marlon's right.

11:07 I think people do think that, the GIL will just kind of save you from concurrency.

11:12 Right.

11:13 Because only one thing can run at a time.

11:15 So how are you going to have a problem?

11:16 Well, well, it's one thing.

11:18 It's one kind of bytecode at a time.

11:21 - Exactly, one Python bytecode.

11:23 But here's the thing, if your program ever enters into a temporarily invalid state, ever, you may need some kind of concurrency locks or something.

11:35 And I think, my reading of Python stuff, I don't see this very often.

11:39 And I think actually a lot of people should be doing more locks, honestly.

11:44 So even in this example, I withdraw some money.

11:46 And now for just a moment, program is in a temporarily, temporarily invalid state until it's deposited into the other account, right?

11:53 Yeah.

11:53 So that's this moment.

11:54 Like if the GIL says, okay, you ran enough, we're going to switch to the other one, then somebody tries to, the other one reads that state.

12:02 That's going to be trouble.

12:03 Right?

12:04 So they were talking about, well, how do you actually, you know, how do you actually check this?

12:08 And here's something I actually didn't even know.

12:10 There's look, you can actually make that switching back and forth more aggressive.

12:16 You can control that switching that the, the GIL does on the, how much I thought on one thread it'll do before it switches to another by getting the switch interval.

12:25 And here they set it to one 10th of a millisecond.

12:28 Oh wow.

12:29 And then they do a bunch of work and then they put it back.

12:31 And that's pretty interesting.

12:32 Did you know you could do that?

12:33 I didn't, this is pretty cool.

12:35 To, I know this, this might be worth covering the article right here.

12:39 Just that, you know?

12:40 Yeah.

12:40 Good for, yeah.

12:41 For testing these race conditions.

12:43 Yes, exactly.

12:44 Like make it, make it worse.

12:46 And also running on more cores, potentially.

12:48 I don't know, probably that doesn't too much matter.

12:50 Okay, so to avoid boilerplate, you can reach out to the pytest Repeat plugin.

12:54 Weren't you just talking about this?

12:56 I know you're doing some stuff with it.

12:57 - Yeah, I'm one of the maintainers on it now.

12:59 There's my picture.

13:00 - Yeah, I feel like you, yeah, I feel like you had actually just mentioned it.

13:05 Maybe it was the Git article or something, but anyway, recently I thought you were just talking about this.

13:09 - Yeah, last week, so.

13:10 - Yeah, exactly.

13:11 Also worth pointing out, a similar and more straightforward plugin, possibly for this job is pytest Flake Finder, which is meant to find flaky tests.

13:20 >> Oh, yeah. Let's just hang out for here.

13:25 One of the differences they're saying is that you can repeat your test multiple times with repeat or Flake Finder, you can repeat your whole suite.

13:33 That's one of the things I need to change for repeat, because you can do the same thing with repeat.

13:38 You can run the whole suite.

13:39 It's just hidden in two lines of the readme, and it needs to be more bolded that you you can change the scope and repeat the whole thing.

13:47 >> Nice. Randomness, for example, algorithms that are non-deterministic, like heuristic ones, so that's pretty interesting.

13:57 So they do, what is this, like a distance algorithm or something that's heuristic.

14:02 So they say like NP-close, which they're testing on NP-close, whereas NumPy, like are these vectors close?

14:11 says basically fix this by actually, you know, computing the tolerance and they use a little statistics like probably more statistics than I know, but let's say three standard deviations away, something like that.

14:23 It's interesting. Obviously floating point arithmetic is always trouble, loss of precision is always trouble.

14:29 But one they talk about here that's interesting is using integers, like integers in Python are arbitrarily large, which I think probably complicates C interoperability every now and then, but otherwise it's like a good thing generally. However, if you're doing NumPy, NumPy has C backing for a lot of its types, right? Like int32 and so on. So you could end up with, if you specify a particular data type in there, when you create your array, right? Data type is NP and 32, then you do have to care about the 2.14 billion limit, right?

15:10 Yeah.

15:11 I mean, you probably know that all the time from C, right?

15:13 You got to worry about variable sizes and signed, unsigned shorts and whatever.

15:19 Yeah.

15:20 And be careful about the order of operations.

15:21 So you don't overflow in the middle of an operator, a set of operations.

15:24 Yeah.

15:25 Let's see.

15:26 There's some interesting things about buzzing your data, like sending a bunch of crazy data or even using hypothesis to try to find edge cases, timeouts for external systems, be like super explicit about those.

15:37 So there's just a bunch of cool examples.

15:40 And you're like, this is a really properly long article here.

15:43 So I think it really highlights a lot of good examples.

15:46 Follow up to that podcast episode, but just good for testing as well.

15:49 - Yeah, I can't wait to read that more closely and listen to that episode.

15:53 I have to admit, I haven't listened to it yet.

15:55 - Yeah, it's a good one.

15:56 Blaze out in the audience wonders if we have to reinvent these corner cases for Rust?

16:01 I imagine we probably do, Blaise. Good point.

16:04 Yeah, possibly.

16:05 How extra are you feeling, Brian?

16:07 I'm feeling pretty extra.

16:09 Actually, myself, not too much.

16:11 I've just, I've been I've been actually doing a lot of personal projects.

16:15 So I haven't been doing a lot of work projects to announce.

16:19 However, those are wrapping up.

16:21 The personal stuff's wrapping up. So I hope to get more Python people and Python test episodes out soon and more course chapters coming.

16:28 So everything in due time.

16:30 Nice.

16:30 How about you?

16:31 I have some extras as well.

16:33 First, I just got back from Pi Bay last night.

16:36 So that was a lot of fun.

16:38 Pi Bay is always a good time.

16:40 If I can zoom out and get the video to play even.

16:42 So really cool environment and saw, you know, nice to meet a lot of people there.

16:46 So for those of you I met, great to meet you.

16:49 Also, I just want to give a shout out to Sparkmail.

16:52 I just started using Sparkmail to try to kind of unify some stuff.

16:55 What a cool, what a cool app for MacOS email.

16:59 So people, if they're, if you're like fed up, was using different web front ends for different things.

17:05 And it was like, ah, they're all a little bit different.

17:07 One has E for archive.

17:09 One has A for archive, but like Proton Mail, like the A for archive only periodically works.

17:14 Sometimes it works and then you're like, why is it so frustrating?

17:17 Like maybe I could just use one thing and all in that was really fun.

17:21 So also I think a big part of the development team is in Ukraine.

17:24 And so happy to be supporting those folks as well.

17:28 Nice.

17:28 So somewhere it says like made, from, you know, made, made from like hello from Ukraine or something like that, which is cool.

17:36 However, one of the challenges is one of my personal email domains is actually backed by Proton Mail.

17:41 I think I talked about that before, but Proton Mail has end to end encryption.

17:46 And so you can't talk to it with a third party email client, right?

17:49 Okay.

17:49 Because it can't decrypt it.

17:51 It doesn't use IMAP, at least not directly.

17:54 So if you use ProtonMail and you want to have something that is not, you know, there's a proper, like a standard email client, you can install this thing called Proton, what's it called?

18:05 Bridge?

18:06 ProtonMailBridge is its name.

18:07 And what it is it runs locally on your computer.

18:10 It does all the end-to-end encryption and then puts it like, it has a password protected but not end-to-end encrypted IMAP thing that just runs on localhost.

18:20 So you just attach to localhost for your IMAP and you have ProtonMail plugged into, you know, their example is our Outlook, which just made me get a little wheezy just thinking about it.

18:29 But you know, it also works on SparkMail and other nice things.

18:33 So I had been using SuperHuman, which was really nice, but that's only Gmail, which is such a hassle.

18:39 So this works for anything, which makes me super happy.

18:42 - Yeah, I don't think I'm using--

18:42 - What do you do for email?

18:43 - I just use the web clients, but mostly it's FastMail for email.

18:48 - Yeah, nice.

18:49 That's what I had been doing for 10 years, but I just kind of like, there were just too many and they were, I don't know, weird.

18:54 And I'm like, let me try this.

18:55 I really like it.

18:56 - I think I will check it out.

18:58 One of the things you brought up Outlook, I thought it was, I have to use Outlook for work and it still drives me crazy that Control + F is not find, it's forward.

19:09 - Oh my gosh, yes.

19:11 - It's terrible.

19:12 - Yeah, this thing is nice.

19:13 Like it has sort of digital wellbeing stuff where it will only show you, you can have it time out.

19:18 So it brings you to this thing like, "Hey, check your email just like two or three times a day.

19:22 Show me on this little list here that'll just show, like say, people that are important to me, but nobody else." You can block senders.

19:29 It's pretty sweet.

19:30 - Nice, cool.

19:31 - Yeah, yeah, yeah.

19:32 Okay, next, where I hinted at before, is I ran across this YouTube channel called Dust.

19:41 Man, are they making amazing science fiction.

19:44 Have you seen this?

19:45 - Just, you shared it with me last week.

19:47 It was pretty cool.

19:48 Yeah, it's just this independent channel and they are posting like new, if you like short sci-fi, like 10, 20 minutes sci-fi stories, the production quality is just off the charts.

20:00 So I recommend to people actually interested in this, FTL, Faster Than Light, which is about faster than light travel.

20:06 And it's pretty neat, like the graphics and stuff is, it's surprisingly good for what it is.

20:13 So people can check that out.

20:14 And also one called, called Oceanus, which is like about this, sort of underwater world and yeah, it's like this one's 30 minutes.

20:24 It's long.

20:25 But anyway, if people want, you know, short form science fiction, this is pretty awesome.

20:30 I'll link to it in the show notes.

20:31 Oh, that's pretty cool.

20:32 Yeah.

20:33 Blaze out there says FTL is a great short.

20:35 I totally agree.

20:36 It's a very, very well done.

20:37 Yeah.

20:38 And it's not always the Hollywood, like, of course the good person has to, you know, the has to triumph at the end of course, it's just a matter of how.

20:46 Yeah, you never know.

20:47 Some of these are pretty open-ended, as you might expect a 10-minute show to be.

20:51 - Well, you know, I think there's some half-hour shows on TV that really are only like 15 minutes if you take the commercials out.

20:57 - I know.

20:58 A lot of the comments on, if you look at like the FTL one, for example, the comments are like, "This is a better show than Hollywood studios make "with millions of dollars and large teams.

21:08 "Like, how are you all doing this?" So, anyway, I thought people might appreciate this us given our audiences probably a little bit techie.

21:14 Yeah.

21:14 Cool.

21:15 Right.

21:15 If you, everyone did this, like rewrite your, your software, like some old junkie thing, wrote it in some old code, you're going to rewrite the new awesomeness, frequently.

21:25 Yes.

21:25 There is an amazing, so this is the joke.

21:29 So there's an amazing video, music video, which is a parody on American pie.

21:34 And for those of you who are not familiar with American pie, it's one, it's a really great song.

21:38 Oh, you see what it's eight.

21:39 No, I'm not seeing it.

21:40 (laughing)

21:42 It's eight and a half minutes long.

21:44 And so this guy, Dylan Beatty, he's really talented.

21:48 And he redid one that basically is like a journey through all the follies of his different perspectives through his programming career.

21:57 And it starts out in like assembly, and then it goes, I don't know what it's the next one.

22:01 Is it VB6 or something?

22:03 And then, oh, it's just, it's an amazing, amazing thing.

22:06 But eight and a half minutes, I'm not gonna play it.

22:08 So I'm just going to say, go watch the video.

22:10 - Yeah.

22:11 - I'm sure it will connect with you.

22:12 What do you think, Brian?

22:13 - It's very good.

22:14 And then check out his channel 'cause there's a bunch of great nerdy videos on his channel.

22:18 So it's good.

22:19 - Yeah, if we scroll down here, what will we find in the recommended?

22:22 You give Rest a bad name.

22:24 - That's funny.

22:25 That was a good one, yeah.

22:26 - The bug in the JavaScript, I think we featured before, but it's like starting to think I might need a drink because the bug is in the JavaScript.

22:36 - That's pretty good, yeah.

22:37 >>Fun.

22:38 >>Yeah, fun.

22:39 Anyway, so this is an entry point into quite a bit of time of programmer fun ideas.

22:46 >>Okay, so that's a programmer one.

22:48 I've got like a dad joke, science joke that I wanted to share because I just ran across it recently and I just thought it was so funny.

22:57 So it's just a comment.

22:59 There are more hydrogen atoms in a single molecule of water than there are stars in the entire solar system.

23:06 And I talked to several people about it and they just looked at me blankly and said, "That can't be true." I'm like, "Sure, there's two hydrogen atoms in a molecule of water and there's one star in our solar system." >> That's awesome.

23:22 And those two hydrogen atoms.

23:23 Did the hydrogen atoms come from stars?

23:25 I don't know, were they just the stars?

23:27 Anything larger than that should have come from stars.

23:29 Yeah, that's awesome.

23:30 I love it.

23:31 It does make you think, because like if you think galaxy, universe, whatever, right?

23:36 - Yeah.

23:36 - But solar system, I mean, solar, singular.

23:39 - Yeah, and I had, it was funny, some of the comments were like trying to calculate the volume of the water and how many atoms might be there.

23:48 No, it's not atoms, it's a single molecule of water, not a glass of it, so pretty funny.

23:55 - Yeah, that's how I first started, like, well, how large is the glass?

23:57 How many, okay, how many molars is that, and how many, oh wait, that's not what it says at all, that's irrelevant.

24:02 - Yeah.

24:03 - I love it.

24:04 - Anyway.

24:05 Cool.

24:05 Wow.

24:06 Once again, great chatting with you weekly.

24:08 Yep.

24:09 You as well.

24:09 And thanks to everyone for listening.

24:11 See y'all later.

Back to show page