Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #356: Ripping from PyPI

Return to episode page view on github
Recorded on Tuesday, Oct 10, 2023.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 356, recorded October 10th, 2023. I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And this episode is brought to you by us. Courses over at Talk Python Training,

00:16 the complete pytest course from Brian, Patreon supporters, and find us over on Fostadon.

00:20 That's probably the best way to chat with us these days. And also be part of the live show,

00:25 pythonbytes.fm/live, usually Tuesdays at 11 Pacific time. And you can also see all the old

00:31 versions there as well. So Brian, what are you going to start us with today?

00:35 Well, I thought I would start with a, it's kind of going to have a theme of getting information from

00:41 from Fostadon, people letting us know. So I saw today a post by the PsychoPG group. So

00:52 PsychoPostgres and the post says it feels weird, but it's time to stop considering PsychoPG2,

00:59 the present and PsychoPG3, the future. We've entered the time where PsychoPG3 is the present

01:06 and two is the respectable past. Update, updated the feature page and a few other resources on the

01:11 website to reflect this. So I thought I would check it out. So Awesome. That's big news. You know, Postgres is clearly the biggest database that people use in

01:21 the Python space, according to the survey we talked about recently. So yeah, this is relevant, right?

01:25 Yeah. And this is a great library. I've been using the version two for a long time. And I guess

01:30 kind of like haven't been paying attention to the three, but it's been out for a while.

01:34 So the three page talks about, basically talks about the, it's a new implementation of the most used,

01:42 reliable and feature rich Postgres adapter for Python. And there is some differences though. So

01:49 the apparently most of two was written in C, but three is written in a combination. So a lot of it's

01:58 in Python and some of the speed ups are in C. And so the big announcement really is that there are no,

02:05 so we're going to link to both the three page and the features page. And the features page has a nice

02:11 comparison of the, of two versus three. So the really, the recommendation is they're still going to

02:18 maintain two, but you should maybe think about mostly think about three for new projects. So if you

02:24 already have an existing project that's running to go ahead and leave it running with that, I couldn't

02:30 find where they're doing it when, or if, or where they're doing announced an end of life. So I don't

02:36 think that's even planned at this point for end of life for two eventually probably, but so two has been

02:41 around since 2006, three started in first release in 2021, but they've got a whole bunch of cool things

02:48 in here. So some of the things I thought were pretty cool was we've got a native asyncio support.

02:56 That looks pretty nice. Native support for more Python types, such as enums and Postgres types.

03:03 So Postgres, one of the cool Postgres types was a multi-range. So that's supported now. That's pretty

03:08 nice. And then I don't know what parameter bindings are actually, but apparently two has a client side

03:16 parameter bindings and three defaults to server side parameter bindings. But you can still do client

03:23 side if you want to. So lots of fun stuff in here. Advanced connection pool, static typing support.

03:31 Yeah. So I say, why not try three and only go back to two if you really need to.

03:39 Yeah. This is super exciting. Those are a lot of great features. I think the async support was previously done

03:44 through a separate library and now it's just part of it. That's pretty cool.

03:50 The what is the present? What is the past? What is the future? We'll come back to that in our extras.

03:56 We'll have some fun there.

03:57 Okay. And then also our audience is awesome. Mike says, Hey, I migrated PyPI from psycopg two to three in June. It was not too hard, but took some time to do safely. So, Hey, when you pip install things or IPI look up things, it's already using this. Awesome.

04:16 Yeah. Oh, that's cool. Nice. Indeed. Indeed. All right. Shall we convert some data? That's what I got next.

04:23 Sure. Let's talk about it. So we all know about Pydantic, right? And Pydantic is cool. So Pydantic lets us like create classes that derive from some Pydantic base type, base model, and it will do conversions and parsing of JSON and so on. But maybe, maybe you know about data classes, right? These are built in. They're not anything separate.

04:44 They're part of Python itself. And so being able to use them is pretty cool. And Raymond Peck commented on one of our YouTube videos and said, Hey, I use this thing called Daysite. Hopefully that's the right pronunciation. Daysite. And the idea is it allows you to create Python data classes in a similar way. All right. So simple creation of data classes from dictionaries.

05:07 So APIs and other things. And you just create a data classes as you normally would. Data classes, just sort of classes with, you know, global or static, static level field type, right? Like class user name, colon string, age, colon, and so on.

05:24 So if you just put the data class wrapper on it, right? And it gives it a bunch of nice things like hash ability, comparability, et cetera, right? Constructors.

05:32 But if you've got a dictionary that has sub objects that themselves should be data classes and maybe a list of other things that should be data classes, actually turning that JSON into one of these more complicated versions is a hassle, right?

05:48 You also might need to do data conversion, as well, right, to make sure that the types match. So that's what you can use this for. And it's pretty neat. You can just say from dict, a site dot from dict, and you say what data class it's going to go to, and what the dictionary is and like a little factory method out pops one of these objects. Cool, right?

06:07 Yeah, that's pretty cool. Yeah. And let's see, features include nested structures, like I described, basic type checking, optional fields, union. So if you say it can be an enter a string, it'll check that it's one of those two, but not a float or something, forward references, collections, and interestingly, custom type hooks. So for example, you can say anytime that you're going to take a string, actually call string a lot dot lower dot strip on it, if it's not null, right, if it's not none, things like that, right?

06:37 So you can actually get into the parsing side a little bit if you need to. So that's pretty neat. What else have we got here? Let me scroll down and talks, of course, about all the things I said. Yeah, I think that's, that's probably it. I guess one thing it's worth pointing out, it says, right in the docs, they say, it's important to mention that it's not a data validation library. Now, when I first read that, I'm like, but there's all these types, right? Is it like, is it supposed to find out when I say the thing takes a string and an int and a bool for it?

07:07 It's three fields, like it will actually do that. It does that, but you can't say like the string must be this regular expression and this many characters and the int has to be positive and like those types of things it doesn't do, right? So it's just dictionary to object, possibly complex data class object, but with type validation and that's about it. But still, I think that's super, super useful.

07:29 So kind of partway validator things.

07:31 Yeah. Yeah. More like a proper parser without the validation itself. Yeah. Yeah. Indeed. So anyway, that's what I got for people. It's, it was news to me. So check it out.

07:42 I think that's pretty neat. So I am going to continue on with a topic of guests, the ever rustification of, of Python projects. And this one is one that we use every day pip. So this was, this came to us from Owen, Owen Lamont, I think.

08:04 Owen Lamont. Thanks, Owen. So I said, said, Hey, you guys might be interested in this. It's a, a pro under the project group.

08:12 And I was ready to go try it. It's not ready to try yet, but it's still pretty exciting. So the kind of the headline fast bareboins pip implementation and rust, and it's not just an installer though. So it has, it's got, what does it have so far? It's got, you can download an aggressively cache,

08:40 a pipe PI PI metadata, resolve PI packages using a pack project called resolvo, which is a kind of a rust thing.

08:47 And then still on the planned list is actually installing the files.

08:53 So doesn't, I I'm just chuckling because yeah, I just jumped the gun, but this is new.

09:01 So it's, it's fine that it's published early.

09:04 So first commits look like about two weeks ago. So I'm pretty excited about, about this. I think it'd be fun to try it out.

09:12 And, and look at different, possibly different resolvers and how they handled it versus normal pip. So kind of neat.

09:20 Yeah. When I saw this too, I was pretty excited. So thanks Owen for sending this in. Yeah, it's, it's cool. And Mike says, let her rip. I love it.

09:30 Yeah. So it looks like maybe we should swap the, these last two topics, but I don't know. Let's go with the, with your next topic.

09:39 Well, I do think that this one would have been pretty good for you to cover, but too bad. I'm already on it. So here we go.

09:45 I guess the only like stronger tie to me is it's in response to a talk Python episode. So this one comes to us from Marwin and thank you Marwin for sending it in and writing this article called how not to foot gun yourself when writing tests, a showcase of flaky tests.

10:00 And it says, I was writing an article. I was writing this article after listening to talk Python with Gregory Kempfhammer and Owen Perry talking about flaky tests. So that was the subject to that basically talked about all of their experience here, which is cool.

10:15 like a definition and really a lot of examples of flaky tests. I thought, I mean, you know, Brian, did you get to check any of these out?

10:22 I haven't looked at this yet. No.

10:24 Well, we'll do it live. Okay.

10:26 Okay.

10:26 So the first one is, really about concurrency and said, well, look, I've got a bunch of tests. Maybe I could speed them up by using threading and run a bunch of them.

10:35 Oh yeah.

10:36 That'd be fun. However, there's a real simple example of like, Hey, I've got an account and I can transfer money from one account to the other. So first account that withdraw this amount and then second account deposit that amount. Right. And how could that go wrong?

10:51 So do a bunch of those in a, Hey, if we want to make those faster, let's run them in some threads, right? Rather than using say one of the pie test plugins, like more properly. Right. This is more to highlight, like what might go wrong.

11:01 You know? and it turns out that we have the GIL and I think, I think Marlon's right. I think people do think that, the GIL will just kind of save you from concurrency. Right. Because only one thing can run at a time. So how are you going to have a problem? Well, well, it's one thing.

11:18 It's one kind of byte code at a time.

11:21 Exactly. One Python byte code. But here's the thing. If your program ever enters into a temporarily invalid state ever, you may, you may need, some kind of concurrency locks or something. And I think my reading of Python stuff, I don't see this very often. And I think actually a lot of people should be doing more locks, honestly.

11:43 So even in this example, I withdraw some money and now for just a moment, the program is in a temporarily, temporarily invalid state until it's deposited into the other account. Right. So that's this moment. Like if the GIL says, okay, you read enough, we're going to switch to the other one. Then somebody tries to, the other one reads that state. That's going to be trouble. Right. So they were talking about, well, how do you actually, you know, how do you actually check this? And here's something I actually didn't even know.

12:10 There's look, you can actually make that switching back and forth more aggressive. You can control that switching that the, the GIL does on the, how much Python on one thread it'll do before it switches to another by getting the switch interval. And here they set it to one 10th of a millisecond.

12:28 Oh, wow.

12:29 And then they do a bunch of work and then they put it back. And that's pretty interesting. Did you know you could do that?

12:33 I didn't. This is pretty cool to, I know this might be worth covering the article right here. Just that.

12:39 Yeah. Good for testing these race conditions.

12:43 Yes, exactly. Like make it, make it worse. And also running on more cores potentially. I don't know. Probably that doesn't too much matter.

12:50 Okay. So to avoid boilerplate, you can reach out to the high test repeat plugin. Weren't you just talking about this? I know you're doing some stuff with it.

12:57 Yeah. I'm one of the maintainers on it now. There's my picture.

13:00 Yeah. I feel like you, yeah, I feel like you had actually just mentioned it. Maybe it was the get, the get article or something. But anyway, recently I thought you were just talking about this.

13:09 Yeah, that's sweet.

13:10 Yeah, exactly. Also worth pointing out a similar and more straightforward plugin possibly for this job is pytest FlakeFinder, which is meant to find flaky tests.

13:20 Oh, yeah.

13:21 Okay. So let's just hang out for here. One of the differences they're saying is that you can repeat your test multiple times with repeat or FlakeFinder, you can repeat your whole suite.

13:33 That's one of the things I need to change for repeat because you can do the same thing with repeat. You can run the whole suite.

13:39 It's just kind of hidden in two lines of the readme and it needs to be more bolded that you can change the scope and repeat the whole thing.

13:47 Oh, nice. Nice. All right. Randomness. For example, algorithms that are non-deterministic, like heuristic ones. So that's pretty interesting. So they do, what is this, like a distance algorithm or something that's heuristic.

14:01 So they say like, you know, NP close, which they're testing on NP close, whereas NumPy, like are these vectors close, says basically fix this by actually, you know, computing the tolerance and they use a little statistics.

14:17 Yeah. Probably more statistics than I know, but let's say three standard deviations away or something like that. It's interesting. Obviously floating point arithmetic is always trouble. Lost precision is always trouble.

14:29 But one they talk about here that's interesting is using integers, like integers in Python are arbitrarily large, which I think probably complicates C interoperability every now and then. But otherwise is like a good thing generally.

14:46 However, if you're doing NumPy, NumPy has C backing for a lot of its types, right?

14:52 Like int32 and so on.

14:55 So you could end up with, if you specify a particular data type in there,

15:01 when you create your array, right?

15:03 Data type is np int32.

15:04 Then you do have to care about the 2.14 billion limit, right?

15:10 Yeah.

15:10 I mean, you probably know that all the time from C, right?

15:13 You've got to worry about variable sizes and unsigned shorts and whatever.

15:18 Yeah.

15:19 And be careful about the order of operations so you don't overflow in the middle of a set of operations.

15:24 Yeah.

15:24 Let's see.

15:25 There are some interesting things about fuzzing your data, like sending a bunch of crazy data

15:30 or even using hypothesis to try to find edge cases.

15:33 Timeouts for external systems.

15:35 Be like super explicit about those.

15:37 So there's just, you know, there are a bunch of cool examples.

15:40 And you're like, this is a really properly long article here.

15:42 Nice.

15:43 So I think it really highlights a lot of good examples.

15:46 Follow up to that podcast episode, but just good for testing as well.

15:49 Yeah.

15:50 I can't wait to read that more in closely and listen to that episode.

15:53 I have to admit, I haven't listened to it yet.

15:55 Yeah.

15:55 It's a good one.

15:56 Blaze out in the audience wonders if we have to reinvent these corner cases for Rust.

16:01 I imagine we probably do, Blaze.

16:03 Good point.

16:04 Yeah.

16:04 Possibly.

16:05 How extra are you feeling, Brian?

16:07 I'm feeling pretty extra.

16:08 Actually, myself, not too much.

16:11 I've just, I've been, I've been actually doing a lot of personal projects.

16:15 So I haven't been doing a lot of work projects to announce.

16:19 However, those are wrapping up.

16:21 The personal stuff's wrapping up.

16:22 So I hope to get more Python people and Python test episodes out soon and more course chapters

16:28 coming.

16:28 So everything in due time.

16:30 Nice.

16:30 How about you?

16:31 I have some extras as well.

16:33 First, I just got back from Pi Bay last night.

16:36 So that was a lot of fun.

16:38 Pi Bay is always a good time if I zoom out and get the video to play even.

16:42 So really cool environment.

16:43 And it's all, you know, nice to meet a lot of people there.

16:46 So for those of you I met, great to meet you.

16:49 Also, I just want to give a shout out to Sparkmail.

16:52 I just started using Sparkmail to try to kind of like unify some stuff.

16:55 What a cool, what a cool app for macOS email.

16:59 So people, if you're like fed up, was using different web front-ins for different things.

17:05 And it was like, ah, they're all a little bit different.

17:07 One has E for archive.

17:09 One has A for archive.

17:10 But like ProtonMail, like the A for archive only periodically works.

17:14 Sometimes it works.

17:16 And then you're like, why is it so frustrating?

17:17 I'm like, maybe I could just use one thing.

17:19 And all in that was really fun.

17:21 So also, I think a big part of the development team is in Ukraine.

17:24 So happy to be supporting those folks as well.

17:28 Nice.

17:28 Somewhere it says like made from, you know, made from like hello from Ukraine or something

17:35 like that, which is cool.

17:36 However, one of the challenges is one of my personal email domains is actually backed by ProtonMail.

17:41 I think I talked about that before.

17:42 But ProtonMail has end-to-end encryption.

17:45 And so you can't talk to it with a third-party email client, right?

17:49 Okay.

17:49 Because it can't decrypt it.

17:51 It doesn't use IMAP.

17:52 At least not directly.

17:54 So if you use ProtonMail and you want to have something that is not, you know, there's a

17:59 proper, like a standard email client.

18:01 You can install this thing called Proton, what's it called?

18:05 It's Bridge.

18:05 ProtonMail Bridge is its name.

18:07 And what it is it runs locally on your computer.

18:10 It does all the end-to-end encryption and then puts it like it has a password protected but

18:16 not end-to-end encrypted IMAP thing that just runs on localhost.

18:19 So you just attach to localhost for your IMAP.

18:22 And then you have ProtonMail plugged into, you know, their examples are Outlook, which

18:26 just made me get a little wheezy just thinking about it.

18:29 But, you know, it also works on SparkMail and other nice things.

18:33 So I had been using SuperHuman, which was really nice, but that's only Gmail, which is such

18:38 a hassle.

18:38 So this works for anything, which makes me super happy.

18:41 Yeah, I don't think I'm using...

18:42 What do you do for email?

18:43 I just use the web clients, but mostly it's FastMail.

18:47 Yeah, nice.

18:49 That's what I had been doing for 10 years.

18:51 But I just kind of like, there were just too many and they were, I don't know, weird.

18:54 And I'm like, let me try this.

18:55 I really like it.

18:56 I think I will check it out.

18:57 One of the things that you brought up Outlook, I thought it was...

19:00 I have to use Outlook for work and it still drives me crazy that Control-F is not find.

19:08 It's forward.

19:09 Oh my gosh.

19:10 Yes.

19:11 It's terrible.

19:12 Yeah, this thing is nice.

19:13 It has sort of digital well-being stuff where it will only show you...

19:17 You can have it timeout.

19:18 So it brings you to this thing like, hey, check your email.

19:21 It's like two or three times a day.

19:22 Show me on this little list here that'll just show, say, people that are important to me,

19:26 but nobody else.

19:27 You can block senders.

19:28 It's pretty sweet.

19:30 Nice.

19:30 Cool.

19:31 Yeah.

19:31 Yeah.

19:32 Okay.

19:34 Next, one I hinted at before is I ran across this YouTube channel called Dust.

19:40 Okay.

19:41 Man, are they making amazing science fiction.

19:44 Have you seen this?

19:45 Just, you shared it with me last week.

19:47 It was pretty cool.

19:48 Yeah.

19:48 It's just this independent channel.

19:50 And they are posting like new, if you like short sci-fi, like 10, 20 minutes sci-fi stories.

19:57 The production quality is just off the chart.

20:00 So I recommend to people actually interested in this, FTL, Faster Than Light, which is about

20:05 faster than light travel.

20:06 And it's pretty neat.

20:07 Like the graphics and stuff is, it's surprisingly good for what it is.

20:13 So people can check that out.

20:14 And also one called Oceanus, which is like about this sort of underwater world.

20:22 And yeah, it's like this one's 30 minutes.

20:24 It's long.

20:25 But anyway, if people want, you know, short form science fiction, this is pretty awesome.

20:30 I'll link to it in the show notes.

20:31 Oh, that's pretty cool.

20:31 Yeah.

20:32 Blaze out there says FTL is a great short.

20:35 I totally agree.

20:36 It's very, very well done.

20:37 Neat.

20:38 Yeah.

20:38 And it's not always the Hollywood, like, of course the good person has to, you know, the

20:42 hero has to triumph at the end.

20:44 Of course, it's just a matter of how.

20:46 Yeah.

20:46 You never know.

20:47 Some of these are pretty open-ended as you might expect a 10 minute show to be.

20:50 Well, you know, I think, you know, there's some half hour shows on TV.

20:55 That really are only like 15 minutes if you take the commercials.

20:57 I know.

20:58 A lot of the comments on, if you look at like the FTL one, for example, the comments are

21:02 like, this is a better show than Hollywood studios make with millions of dollars and large

21:08 teams.

21:08 Like, how are you all doing this?

21:09 So anyway, I thought people might appreciate this given our audience is probably a little

21:13 bit techie.

21:14 Yeah.

21:14 Cool.

21:15 Right.

21:15 Have you ever wanted to just like rewrite your software, like some old junkie thing,

21:20 wrote it in some old code and you're going to rewrite the new awesomeness?

21:24 frequently.

21:25 Yes.

21:25 There's an amazing, so this is the joke.

21:28 So there's an amazing video, a music video, which is a parody on American pie.

21:34 And for those of you who are not familiar with American pie, it's one, it's a really great

21:38 song.

21:38 Oh, you're just singing.

21:38 But it's eight.

21:39 No, I'm not singing.

21:40 Okay.

21:42 It's eight and a half minutes long.

21:44 And so this guy, Dylan Beattie, he's really talented and he redid one that basically is

21:50 like a journey through all the follies of his different perspectives through his programming

21:56 career.

21:56 And it starts out in like assembly.

21:59 And then it goes, I don't know what it's the next one.

22:01 Is it VB6 or something?

22:03 And then, oh, it's just, it's an amazing, amazing thing.

22:06 But eight and a half minutes, I'm not going to play it.

22:08 So I'm just going to say, go watch the video.

22:10 Yeah.

22:10 It's, I'm sure it will connect with you.

22:12 What do you think, Ray?

22:13 It's very good.

22:14 And then check out his channel because there's a bunch of great nerdy videos on his channel.

22:18 So it's good.

22:19 Yeah.

22:19 If we scroll down here, what will we find in the recommended?

22:21 You give rest a bad name.

22:24 That's funny.

22:25 That was a good one.

22:25 Yeah.

22:26 The bug in the JavaScript, I think we featured before, but it's like starting to think I might

22:32 need a drink because the bug is in the JavaScript.

22:35 That's pretty good.

22:36 Yeah.

22:37 Fun.

22:38 Yeah.

22:38 Fun.

22:39 Anyway, so this is an entry point into quite a bit of time of programmer fun ideas.

22:46 Okay.

22:46 So that, and that's a programmer one.

22:48 I've had like a dad joke, science joke that I wanted to share because I just, I ran across

22:53 it recently and I just thought it was so funny.

22:56 So it's just a comment.

22:59 There are more hydrogen atoms in a single molecule of water than there are stars in the entire solar

23:05 system.

23:05 And I talked to several people about it and they just looked at me blankly and said, that

23:10 can't be true.

23:10 I'm like, sure.

23:11 There's two hydrogen atoms in a molecule of water and there's one star in our solar system.

23:18 That's awesome.

23:20 And those two hydrogen atoms.

23:23 Do the hydrogen atoms come from stars?

23:25 I don't know.

23:26 Were they just the stars?

23:27 Anything larger than that should have come from stars.

23:29 Yeah.

23:29 That's awesome.

23:30 I love it.

23:31 It doesn't make you think, cause like if you think galaxy universe, whatever, right?

23:35 Yeah.

23:36 But solar system, I mean, solar singular.

23:39 Yeah.

23:39 And I had, it was funny that some of the comments were like, like trying to calculate the volume

23:45 of the water and how many atoms might be there.

23:48 And I'm like, no, it's not an atom.

23:49 It's single molecule of water.

23:51 I'm not, not a glass of it.

23:54 So pretty fun.

23:55 Yeah.

23:55 That's how I first started.

23:56 Like, well, how large is the glass?

23:57 How many?

23:58 Okay.

23:58 How many more is that?

23:59 And how many?

23:59 No, wait.

24:00 That's not what it says at all.

24:01 That's irrelevant.

24:02 Yeah.

24:03 I love it.

24:03 Anyway.

24:04 All right.

24:04 Well, cool.

24:05 Well, once again, great chatting with you weekly.

24:08 Yep.

24:08 You as well.

24:09 And thanks to everyone for listening.

24:11 See y'all later.

Back to show page