#325: It's called a merge conflict

Published Tue, Feb 28, 2023, recorded Tue, Feb 28, 2023

Watch the live stream replay

About the show

Sponsored by Microsoft for Startups Founders Hub.

Connect with the hosts

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.

Michael #1: Python Parquet and Arrow: Using PyArrow With Pandas

Parquet is an efficient, compressed, column-oriented storage format for arrays and tables of data.
Less wrangle-able than Pandas, but way faster and lower memory
Questions answered
- Can we use Pandas DataFrames and Arrow tables together, and if so, how is this done? (It turns out the answer is yes, and it’s quite simple, as we’ll see).
- In what ways are Arrow tables “better” than Pandas DataFrames? In other words, for which tasks are Arrow tables better suited? Conversely, what tasks are possible or easy in Pandas that are difficult or impossible in Arrow?
- As an on-disk format, how does Parquet compare to popular alternatives such as feather, orc, CSV, etc.?

Brian #2: FastAPI-Filter

Arthur Rio
Add query string filters to your api endpoints and show them in the swagger UI.
The supported backends are SQLAlchemy and MongoEngine.
FastAPI-Filter documentation
The philosophy of fastapi_filter is to be very declarative. You define the fields you want to be able to filter on as well as the type of operator, then tie your filter to a specific model.
default filters: neq, gt, gte, in, isnull, lt, lte, not/ne, not_in, nin, like/ilike
The swagger support is actually quite cool.

Michael #3: 12 Python Decorators to Take Your Code to the Next Level

Decorators are awesome
This is mostly home-grown decorators, but some standard ones too
Notable ones:
- @warps
- @lru_cache
- @repeat
- @timeit
- @retry ← no please use tenacity
- @countcall
- @rate_limited
- @dataclass
- @register
- @property
- @singledispatch

Brian #4: PyHamcrest

Contributed by Txels
PyHamcrest is a framework for writing matcher objects, allowing you to declaratively define “match” rules.
PyHamcrest tutorial
Having a tool that allows you to pick out precisely the aspect under test and describe the values it should have, to a controlled level of precision, helps greatly in writing tests that are “just right.”
From Brian: I’ve been reluctant to try matcher style assertion helper libraries, as, with pytest, assert works just fine. However, I can see cases where PyHamcrest assertions could help test readability, and that’s always a win.
Examples:
- equality: assert_that(theBiscuit, equal_to(myBiscuit))
- exceptions: assert_that(calling(parse, bad_data), raises(ValueError))
- async: assert_that(``**await** resolved(future), future_raising(ValueError))
- boolean: assert_that(theBiscuit.isCooked())
There’s predefined matchers for
- objects, numbers, text, logical checks, dequences, dictionaries

Extras

Brian:

pytest tips and tricks - recent post, and discussion on upcoming Talk Python episode
sharing pytest fixtures - placeholder page where I’ll share slides and code after my talk.

Michael:

Joke: A group of developers is called …

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 325, recorded February 28th, the last day of February in 2023.

00:12 I am Brian Okken. And I'm Michael Kennedy. And before we jump in, I want to thank everybody

00:17 that shows up for the live stream. If you haven't shown up for the live stream before,

00:21 it's a lot of fun. People can stop and ask questions and chat and everything, and

00:26 it's a good way to say hi. And we enjoy having you here or watch it afterwards if this is a bad

00:32 time for you. Also want to thank Microsoft for Startup Founders Hub for sponsoring this episode.

00:38 They've been an excellent sponsor of the show. And they've also agreed to have us be able to play with

00:44 the sponsor spots and do some AI reading. So this one's going to be a fun one, this one. So I'm

00:49 excited about it. I am too. It's going to be fun. So why don't you kick us off with our first

00:54 topic today? All right. Let's jump right in. You like solid code. So how about some codesolid.com?

01:02 Has nothing to do with solid code, but it's still interesting and it does have to do with code.

01:05 This one is something called Parquet and Arrow. Have you heard of Apache Arrow or the Parquet file

01:13 format, Brian? I don't. I've heard of Arrow, but I don't think I've heard of Parquet.

01:17 So when people do a lot of data science, you'll see them do things like open up Jupyter Notebooks and

01:23 import Pandas. And then from Pandas, they'll say load CSV. Well, if you could think of a whole bunch

01:29 of different file formats and how fast and efficient they might be stored on disk in red, how do you

01:35 think CSVs might turn out? Pretty slow, pretty large, and so on. And Arrow through PyArrow has some really

01:44 interesting in-memory structures that are a little more efficient than Pandas, as well as it has access to this Parquet format.

01:52 So does Pandas through an add-on, but you'll see that it's still faster using PyArrow. So basically, that's what this article that I found is about. It highlights how these things compare, and it basically asks the questions like, can we use Pandas data frames and Arrow tables together? Like if I have a Pandas data frame, but I want to then

02:01 switch it into PyArrow for better performance at some point for some analysis, can I do that? Or if I start with PyArrow, could I then turn it into a data frame and hand it off to Seaborn or some other thing that expects a Pandas data frame?

02:23 Answer is yes. Short version there. Are they better? In which ways are they better? Which way are they worse? And then the bulk of the analysis here is like, yeah, we could save our data, read and write our data from a bunch of different file formats. Parquet, but also things like Feather, Oryx, CSV, and others, even Excel. What should we maybe consider using?

02:49 Okay. So installing it is just pip install. PyArrow, super easy, same type of story. If you want to use it with Pandas, so I've got some Pandas data frame, and then I want to then convert it over, that's super easy. So you can use, go to PyArrow, and you say PyArrow.table, say from Pandas, and give it a Pandas data frame, and then boom, you've got it in PyArrow format.

03:17 Okay.

03:17 One of the things that's interesting is with Pandas is a real nice wrangling, exploration style of data.

03:26 So I can go and I can just show the data frame, and it'll tell me there are 14 columns, and this example, 6,433 rows, and it'll list off the headers and then the column data.

03:38 If I do the same thing in PyArrow, I just get, it's kind of human readable. You just get like a dump of junk, basically. It's not real great.

03:48 So that aspect, certainly using Pandas, is nice for this kind of exploration.

03:53 Another thing about PyArrow is the data is immutable. So you can't say, oh, every time that this thing appears, actually replace it with this canonical version.

04:01 You know, if you get like a Y, lowercase yes, and capital yes, you want to make them all just lowercase yes, or just the Y.

04:08 Like, you got to make a copy instead of change it in place. So that's one of the reasons you might stick with Pandas, which is pretty interesting.

04:16 But you can do a lot of really interesting parsing and performance stuff that you would do with, like you would do with Pandas.

04:24 But if your goal is performance, and performance measured in different ways, how much memory does it take up in computer RAM?

04:32 How much disk space type of memory does it take up? How fast is it to read and write from those?

04:37 It's pretty much always better to go with PyArrow, you know?

04:41 So for example, if I take those same sets of data, those two sets of data from, I think this is the New York City taxi data, some subset of that really common data set.

04:50 It's like digit grouping. It's a little over three megs of memory for the data frame.

04:57 And it's just under a hundred, sorry, three megs. Yeah, I don't know if I said three megs of data for Pandas, whereas it's just under one meg for PyArrow.

05:07 So that's three times smaller, which is pretty interesting there. Yeah?

05:10 Yeah.

05:11 The other one is if you do like mathy things on it, like if you got a whole, you got tables of numbers, you're really likely to talk about things like the max or the mean or the average and so on.

05:24 Now, if you do that to Pandas and you do it to PyArrow, you'll see it's about eight times faster to do math with PyArrow than it is to do it with Pandas.

05:34 That's pretty cool, right?

05:35 Yeah.

05:36 The syntax is a little grosser, but yeah.

05:39 The syntax is a little grosser. I will show you a way to get to this in a moment that is less gross, I believe.

05:45 Okay.

05:46 And then Alvaro out there does say, if you want fast data frames, polars plus parquet is the way to go.

05:53 Okay.

05:54 He's reading, skating to where the puck is going to be, indeed.

05:58 And Kim says, presumably the immutability plays a large part in the performance.

06:03 I suppose so.

06:04 Yeah.

06:06 And then also some feedback of real-time analytics here.

06:09 Alvaro says, I got a broken script from a colleague.

06:11 I rewrote it in Pandas and it took about two hours of process.

06:15 In polars, it took three minutes.

06:17 So that's a non-trivial sort of bonus there.

06:21 All right.

06:22 Let me go over the file formats and I'll just really quickly, I think we've talked about polars,

06:27 but I'll just reintroduce it really quick.

06:29 So if we go and look at the different file formats, we could use parquet.

06:34 So we could say two parquet with pi arrow and you get it out.

06:38 And these numbers are all kind of like insane.

06:41 Four milliseconds versus reading it with two milliseconds.

06:44 If you use the fast parquet, which is the thing that allows data frames to do it, it's 14 milliseconds,

06:50 which is a little over three times slower, but it's still really, really fast, right?

06:55 There's feather, which is the fastest of all the file formats with a two millisecond save time,

07:00 which is blazing.

07:01 There's orc.

07:02 I have no idea what orc is.

07:03 It's a little bit faster.

07:04 Or if you want to show that you're taking lots of time and doing lots of processing,

07:10 doing lots of data science-y things, you could always do Excel, which takes about a second almost.

07:14 I mean, on a larger data set, it might take lots longer, right?

07:17 You're like, oh, I'm busy.

07:18 I can't work.

07:19 I'm getting a coffee because I'm saving.

07:21 Well, I mean, there's some people that really have to export it to Excel so that other people can make mistakes later.

07:27 Yes, exactly.

07:28 Because life is better when it's all go-tos.

07:30 Yeah.

07:31 But no, you're right.

07:33 If the goal is to deliver an Excel file, then obviously.

07:36 But this is more like considering what's a good intermediate just storage format.

07:39 And then CSV is actually not that slow.

07:42 It's still slower, but it's only 30 milliseconds.

07:45 But the other part that's worth thinking about, remember, this is only 6,400 rows.

07:50 The Parquet format is 191K.

07:52 The Pandas one is almost 100K more, which is interesting.

07:56 The Feather is almost half a meg.

07:58 Oric is three-quarters of a meg.

08:00 Excel is half a meg.

08:01 CSV is a meg, right?

08:02 So a meg, it's almost a five times file size increase.

08:06 So if you're storing tons of data and it's 5 gigs versus 50 gigs, you know, you maybe want to think about storing it in a different format.

08:14 Plus you read and write it faster, right?

08:16 So these are all pretty interesting.

08:18 And Polar.rs is the lightning-fast data frame built in Rust.

08:24 And Python is built on top of PyArrow.

08:27 I had a whole, you know, built on top of Apache Arrow.

08:31 I had a whole Talk Python episode on it.

08:33 I'm pretty sure I'd talked about Polar.rs before on here as well.

08:36 But it's got like a really cool sort of fluent programming style.

08:41 And under the covers, it's using PyArrow as well.

08:43 So pretty neat.

08:45 Yeah.

08:45 So if you're really looking to say like, I just want to go all in on this, as Alvaro pointed out, I think it was Alvaro, that Polar is, yeah, that Polar is pretty cool.

08:54 Okay.

08:55 Neat.

08:55 And Henry out there, real-time feedback is, Pandas is fully supporting PyArrow for all data types in the upcoming 1.5 and 2.0 releases.

09:03 There was just a blog post on it on the DataPythonista blog.

09:08 It's not clear if they're switching to it.

09:11 I believe it's NumPy at the moment as the core, but it could be supporting it, which is awesome.

09:17 Yeah.

09:17 Thanks, Henry, for that live update there.

09:19 Yeah.

09:19 Well, then also, he said, but it did say basically starting to get native PyArrow speed with Pandas by just selecting the backend in the new Pandas version.

09:28 So cool.

09:29 Awesome.

09:30 Yeah, yeah.

09:30 Very, very cool.

09:31 So lots of options here.

09:33 But I think a takeaway that's kind of worth paying attention to here is choosing maybe Parquet as a file format, regardless of whether you're using Pandas or PyArrow or whatever, right?

09:44 Because I think the default is read and write CSV.

09:46 And if your CSV files are ginormous, that might be something you want to not do.

09:51 Yeah.

09:51 Okay.

09:51 All right.

09:52 Over to you.

09:52 Well, you said I've ever heard of Parquet.

09:55 And before we get to the next topic, I was thinking, like, is it Butter or is it Parquet?

10:02 It was a whole thing from when we were kids.

10:04 That's right.

10:06 That's margarine.

10:07 Yum.

10:08 Parquet.

10:09 Had a little tub that talked.

10:11 It was neat.

10:11 Oh, that's right.

10:13 It did.

10:13 It had a little mouth.

10:14 Yeah.

10:15 I want to talk about FastAPI a bit.

10:18 So this topic, FastAPIFilter, comes from us from Arthur Rio.

10:22 And Arthur, actually, it's his library.

10:25 FastAPIFilter.

10:27 And this is pretty cool.

10:31 So I'm going to pop over to the documentation quickly.

10:34 But what it is, it's a query string filters for your API endpoints so that you can show them in Swagger and use them and stuff for cool things.

10:43 So I'll pop over to the documentation.

10:48 So it says query string filters.

10:51 It supports backends SQLAlchemy and Mongo Engine.

10:54 So that's nice.

10:56 But let's say, well, we'll get to what the filters look like later.

10:59 But in the Swagger interface, this is pretty neat.

11:02 So let's say you're grabbing the users and you want to filter them by, like, the name.

11:07 You can do a query in the name or the age less than or age greater than or equal.

11:13 These are pretty nice.

11:16 So it says the philosophy of FastAPIFilter is to be very declarative.

11:21 You define fields that you want to be able to filter on as well as the type of operator and then tie your filters to a specific model.

11:29 It's pretty easy to set up.

11:30 The syntax is pretty, well, we'll let you look at it.

11:33 But it's not that bad to set up the filters.

11:36 Yeah, a lot of pydantic models, as you might expect, it being FastAPI.

11:40 Yeah.

11:41 So you plug in these filters.

11:43 But then you get things like the built-in ones are, like, not equal, greater than, greater than, equal in, those sorts of things.

11:51 But you can do some pretty complex query strings then.

11:54 Like, oh, there's some good examples down here.

11:57 So, like, the users, but order by descending name or order by ascending ID.

12:04 There's, like, plus and minus for ascending.

12:06 And you can have order by.

12:07 And you can filter by, like, the name, custom orders.

12:11 And actually putting some filters right in your API string is kind of an interesting idea.

12:18 I don't know if it's a good idea or a bad idea, but it's interesting.

12:20 Yeah, this is a real interesting philosophy of how do I access the data in my database as an API.

12:29 Yeah.

12:30 And I would say there's sort of two really common ways.

12:35 And then there's a lot of abuse of what APIs look like and what you should do.

12:39 You know, just remote procedure calls and all sorts of randomness.

12:42 But the philosophy is I've got data in a database and I want to expose it over an API.

12:46 Do I go and write a bunch of different functions in FastAPI in this example where I decide, here's a way where you can find the recent users.

12:57 And you can then possibly take some kind of primer about a sort or maybe how recent of the users do you want to be.

13:04 But you're writing the code that decides, here's the database query.

13:08 And it's generally focused on recent users.

13:11 Right.

13:12 That's one way to do APIs.

13:14 The other is I kind of want to take my database and just make it queryable over the Internet.

13:19 Right.

13:20 And this is with the right restrictions.

13:22 It's not necessarily a security vulnerability, but it's just pushing all of the thinking about what the API is to the client side.

13:29 Right.

13:29 So if I'm doing Vue.js, it's like, well, we'll wrap this onto our database.

13:33 And you ask it any question you can imagine as if you had a direct query line to the database.

13:38 Right.

13:38 So that's why you would do maybe the age greater than or you could do some of those filters where you say, give me all the users where the created date is less than such and such.

13:48 Yeah.

13:49 Or greater than such.

13:49 You know, that would basically be like the new users.

13:51 Right.

13:51 But it's up to the client to kind of know the data schema and talk to it.

13:55 And this, you know, this is that latter style.

13:57 If you like that.

13:58 Awesome.

13:59 You know, you can you can expose a relational database over SQLAlchemy or MongoDB to Mongo Engine.

14:04 And it looks pretty cool.

14:05 My thoughts on where I probably I mean, I'm not using this in production, but my thoughts on where I might use this even disregarding like one of the Brandon's concerns.

14:15 Brandon Brainer says exposing my API field names makes me nervous.

14:20 But there's a there's a part of your, oops, part of your development where you're not quite sure what queries you want.

14:28 So custom writing them.

14:30 Maybe you're not ready to do that or it'll be like a lot of back and forth.

14:34 So a great I think a great place to be for this would be when you're working with you've got your front end and your back end code, your API code, and you're you're trying to figure out what sort of searches you want.

14:46 And you can use something like this to have it right be in the in the actual API query.

14:52 And then once you figure out like all the stuff you need, then you could go back if you want to and hard code a different API invoice with similar stuff.

15:02 Maybe.

15:02 I don't know.

15:02 Yeah.

15:03 Yeah.

15:03 And not everything's built the same.

15:05 Right.

15:05 Kim out there points out that many of the APIs that he uses or builds are for in-house use only.

15:11 Yeah.

15:11 Right.

15:12 And so it's just like instead of coming up very, very focused API endpoints, it's like we'll kind of just leave it open.

15:18 And people can use this service to access the data in a somewhat safe way, like a restricted way.

15:23 Yeah.

15:25 So it's what are you building?

15:26 Like, are you putting it just on the open Internet or are you putting it, you know, inside?

15:30 That's very true.

15:31 Yeah.

15:32 Like I've got a bunch of projects I'm working on that are internal and like I who cares if somebody knows what my data names are and stuff.

15:39 Right.

15:40 Well, and what is in it?

15:41 Are you storing social security numbers and addresses or are you still storing voltage levels for RF devices?

15:48 Exactly.

15:49 Oh, no.

15:49 The voltage levels have leaked.

15:51 Oh, no.

15:51 Right.

15:52 I mean, that flexibility might be awesome.

15:54 Yeah.

15:55 I mean, the end, like it's secretive.

15:58 We don't want it to get out in the public, but it's not like something that internal users are going to do anything with.

16:04 So, yeah.

16:05 Yeah.

16:05 Yeah, exactly.

16:06 Cool.

16:07 Well, yeah, that's really, really a nice one.

16:09 So, Brian, sponsor this week?

16:12 Yeah.

16:13 Microsoft for Startups Founders Hub.

16:15 But if you remember last week, we did an ad where we asked an AI to come up with the ad text for us.

16:25 In like an official sort of official sounding way.

16:27 Yeah.

16:27 Yep.

16:28 So, this week, you pushed it through the filter and said to try to come up with the wording in a hipster voice.

16:37 Right?

16:38 So, here we go.

16:39 Tell us about it.

16:39 With a hipster style.

16:40 I'll try.

16:41 Yo, Python Bytes fam.

16:43 This segment is brought to you by the sickest program out there for startup founders, Microsoft for Startup Founders Hub.

16:49 If you're a boss at running a startup, you're going to want to listen up because this is the deal of a lifetime.

16:56 Microsoft for Startup Founders Hub is your ticket to scaling efficiently and preserving your runway, all while keeping your cool factor intact.

17:04 With over six figures worth of benefits, the program is serious next level.

17:09 You'll get 150K in Azure credits, the richest cloud credit offering on the market, access to the OpenAI APIs, and the new Azure OpenAI service, where you can infuse some serious generative AI into your apps.

17:25 And a one-on-one technical advisor from the Microsoft squad who will help you with your technical stack and architectural plans.

17:33 This program is open to all, whether you're just getting started or already killing it.

17:39 And the best part, there's no funding requirement.

17:41 All it takes is five minutes to apply and you'll be reaping the benefits in no time.

17:45 Check it out and sign up for Microsoft for Startup Founders Hub at pythonbytes.fm/foundershub 2022.

17:52 Peace out and keep listening.

17:54 It's insane the power of these AIs these days.

17:58 And, you know, if you want to get access to OpenAI and Azure and GitHub and all those things, well, a lot of people seem to be liking that program.

18:05 So it's cool.

18:06 They're supporting us.

18:07 Yeah.

18:08 Also cool that they're letting us play with the ad.

18:10 That's neat.

18:10 Yes.

18:10 With their own tools, indeed.

18:12 Okay.

18:12 What I got next, Brian, is stuff to take your code to the next level, brah.

18:18 Twelve.

18:19 But this sounds pretty interesting.

18:21 Twelve Python decorators to take your code to the next level.

18:24 Nice.

18:24 Decorators are awesome.

18:25 And they're kind of like a little bit of magic Python dust you can sprinkle onto a method and make things happen, right?

18:32 Now, about half of these are homegrown.

18:34 Half of those I'd recommend.

18:36 And then a bunch of them are also, the other half is maybe the built-in ones that come from various places.

18:41 So I'll just go through the list of 12 and you tell me what you think.

18:43 The first one that they started off with in this article doesn't thrill me.

18:47 It says, hey, I can wrap this function with this thing called logger and it'll tell me when it starts and stops.

18:51 Like, yeah, no, no thanks.

18:52 That doesn't seem interesting.

18:54 But the next one, especially if you're already focused on decorators and psyched about that, is the functools wraps.

19:01 Yeah.

19:02 Right?

19:02 Wraps is definitely, you've got to use it.

19:05 Yeah, it's basically required.

19:06 If you create a decorator, and they show you how to do that on the screen here, and you try to interact with the function that is decorated, well, you're going to get funky results.

19:14 Like, what is the function's name?

19:16 Well, it's the name of the decorator, not the actual thing.

19:18 What if it's arguments?

19:19 It's star, star, star, star, kwrs.

19:21 What is documentation?

19:22 Whatever the name, the documentation of the decorator is and all that.

19:25 So with wrapper or with wraps, you can wrap it around and it'll actually kind of pass through that information, which is pretty cool.

19:32 So if you're going to do decorators wrapped, that's kind of a meta decorator here.

19:36 Yeah.

19:36 Another one I think is really cool.

19:38 Not for all use cases, not really great on the web because of the scale out of cross process story that often happens in deployment.

19:45 But if you're doing data science-y things or a bunch of repetitive processing, the LRU cache is like magic.

19:52 Unless you are really memory constrained or something.

19:54 Yeah.

19:55 Love LRU cache.

19:56 Yeah.

19:56 You just put it on a function and you say at LRU cache and you can even give it a max size.

20:01 And it just says, as long as given us a fixed input, you'll get the same output every time.

20:07 Then you can put the LRU cache on it.

20:10 The second time you call it the same arguments, it just goes, you know what?

20:12 I know that answer.

20:13 Here you go.

20:14 And it's an incredibly easy way to speed up stuff that takes like numbers and like well-known things that are not objects, but can be tested.

20:21 Like, yeah, these are the same values.

20:22 And if you don't care about the max size, you can just use the decorator cache now.

20:26 You don't need to have the LRU part.

20:28 Oh, nice.

20:29 Great addition.

20:30 Next up, we have at repeat.

20:32 Suppose for some reason I want to call a function multiple times.

20:36 Like if I want to try to say, what if I call this a bunch of times just for, say, load testing or I want to just, you know, kind of in during development.

20:45 I can't see this being used in any realistic way.

20:47 But you can just say this is one that they built.

20:49 You just wrap it and say repeat this in number of times.

20:52 That might be useful.

20:53 Yeah.

20:54 Time it.

20:55 So time it is one that you could create that I think is pretty nice.

20:59 Like this is one of the homegrown ones that I do think is good is a lot of times you want to know how long a function takes.

21:04 And one thing you could do is you could grab the time at the start.

21:07 Here are these imperfect counters, which is pretty excellent.

21:09 And then at the end, grab the time, print it out.

21:12 But then you're messing with your code, right?

21:14 It'd be a lot easier to just go, you know what?

21:16 I just want to wrap a decorator over some function and have it print out stuff.

21:20 Just usually during development or debugging or something, not in production.

21:23 But you're like, well, how long did this take?

21:25 So just yesterday I was fiddling with a function.

21:28 I'm like, if I change it this way, will it get any faster?

21:31 It's a little more complicated, but maybe there's a big benefit.

21:34 And I put something like this on there and like, yeah, it didn't make any difference.

21:37 So we'll keep on the simple bit of code in place.

21:40 Yeah.

21:40 And if it's like super fast, you can also do things like loop it, like add a loop thing there

21:46 so that it runs like 100 times and then do the division or something.

21:50 That's a really good point.

21:51 And these are composable, right?

21:54 Decorators are composable.

21:55 So you could say at time it, at repeat 1000, call your function.

21:59 Oh, yeah.

21:59 Yeah.

22:00 Right?

22:00 I mean, all of a sudden repeat's starting to sound useful.

22:03 They have a retry one for retrying a bunch of times.

22:06 No.

22:07 Tenacity.

22:09 Don't do that.

22:11 There's some that are really, really fantastic with many options.

22:14 Don't bother rewriting some of those because you've got things like tenacity that has exponential

22:19 back off, limiting the number of retries, customizing different behaviors and plans based on exceptions.

22:25 So grab something like tenacity.

22:27 But the idea of understanding the retries is kind of cool.

22:30 Thanks for reminding us about tenacity.

22:32 I forgot about that.

22:33 Yeah.

22:33 That's a good one, right?

22:34 Count call.

22:35 If you're doing debugging or performance stuff, you're just like, why does it seem like this

22:40 is getting called like five times?

22:41 It should be called once.

22:42 This is weird.

22:43 And so you could actually, they introduced this count call decorator that just every time

22:48 a function is called, it's now been called this many times, which sounds silly, but are

22:52 you trying to track down like an N plus one database problem or other weird things like

22:56 that?

22:56 If you don't really know why something bizarre is happening a ton of times, this could be

23:00 kind of helpful.

23:01 Yeah.

23:02 Rate limited.

23:03 This one sounds cool as well.

23:05 Like I only want you to call this function so often per second and you can decide what

23:11 to do.

23:11 In this case, it says we're going to time.sleep.

23:14 I'm not so sure that makes a lot of sense, but it was asynchronous.

23:16 You could await asyncio.sleep and it would cause no overhead on the system.

23:20 It wouldn't clog anything up.

23:21 It would just make the caller wait.

23:23 So there's some interesting variations there as well.

23:26 Keep scrolling.

23:27 And then some more built-in ones.

23:29 Data classes.

23:30 If you want to have a data class, just at data class, the class.

23:33 Brian, do you use data classes much?

23:35 Yes, quite a bit.

23:36 Nice.

23:37 I like my classes to be VC funded.

23:39 So I use Pydantic more often.

23:41 Let's see last week.

23:44 Congratulations to the Samuel team there.

23:47 But honestly, I typically use Pydantic a little bit more because I'm often going to use it with

23:52 FastAPI or Beanie or something over the wire.

23:55 But I really like the idea of data classes too.

23:58 All right.

23:59 A couple more.

24:00 Register.

24:00 Let me know if you know about this one.

24:02 I heard about it a little while, but I haven't ever had a chance to use it.

24:05 But the AT at exit module in Python, it has a way to say when my program is shutting down,

24:13 even if the user like control C is out of it, I need to make sure that I delete, say,

24:17 some file I created or call an API and tell it real quick.

24:21 Like, you know what?

24:22 We're gone.

24:22 Or I don't know.

24:23 Something like that.

24:24 Right.

24:24 You just need.

24:25 There's something you got to do on your way out.

24:26 Even if it's a force exit.

24:28 Yeah.

24:29 You can go.

24:30 I have.

24:30 Sorry to interrupt.

24:32 I have used this.

24:32 No, I do.

24:33 Yeah.

24:34 When did you use it?

24:34 What do you use it for?

24:35 Similar sort of thing.

24:36 I've got like some thing in the background that I want to make sure that we.

24:40 There's a little bit of cleanup that's done before it goes away.

24:44 But I just wanted to correct this.

24:46 This says from at exit import register and then decorate with register.

24:52 I think it looks better if you just import at exit and do the decorator as at exit.register

24:58 because it's better documentation.

24:59 I totally agree.

25:00 I totally agree.

25:01 There's a couple of things in this article where the code is a little bit.

25:06 No, it was the other article that I did that was a little bit that I talked about.

25:10 It was a little bit weird.

25:11 But I agree.

25:11 Keeping the namespace tells you like, well, what the heck are you registering for?

25:15 Right.

25:15 Yeah.

25:15 I think namespaces are a good idea.

25:17 I definitely use them.

25:18 But anyway, so you can just put this decorator on a function.

25:21 And when you exit, they show an example of some loop going just while true.

25:24 And they control C out of it.

25:25 It says, hey, we're cleaning up here.

25:27 Now, bye.

25:28 Yeah.

25:28 Which is, that's a pretty nice way to handle it instead of trying to catch all the use cases

25:32 with exceptions and try finallys and so on.

25:35 All right.

25:35 Property.

25:36 Give your fields, behaviors, and validation.

25:40 Getter, setters, and so on.

25:42 Love it.

25:42 And single dispatch, I believe we've spoken about before, where you can give, basically,

25:47 you do argument overloads for functions.

25:51 So you can say, here's a function.

25:52 And here's the one that takes an integer.

25:54 And here's the one that takes a list.

25:55 And these are separate functions and separate implementations.

25:57 And you do that with that single dispatch decorator.

26:00 You know, I actually always forget about this.

26:03 But I do too.

26:04 I kind of am glad I forget about it.

26:06 Because I think I would use it too much.

26:09 Maybe.

26:10 I used to love function overloading.

26:14 When I was doing C, C++, C# type stuff, I would really count on it.

26:18 And I thought I would miss it in Python.

26:20 And I haven't.

26:21 Well, I noticed that some people that convert to Python from C will just assume that it has

26:27 function overloading.

26:28 And it just doesn't work.

26:30 That's known as function erasure.

26:32 Function erasure.

26:33 The last one wins, right?

26:34 Yeah.

26:34 We talked about that last time.

26:35 Oh, no.

26:37 We talked about that when we talked on Talk Python, which maybe we'll mention at the end.

26:40 But yeah, last time we talked.

26:42 Yeah.

26:44 Anyway, those are the 12 that they put in the article.

26:46 Most of them are really great.

26:49 Some of them point you at things like tenacity, which is also really good.

26:53 So that's what I got.

26:54 Nice.

26:54 Well, I would like to talk about testing too a bit.

26:58 Let's talk about Pyhamcrest.

27:00 So this topic is contributed by TXLs on the socials.

27:08 So thanks, TXLs.

27:09 So Pyhamcrest, and the thought was, like, Brian talks about testing a lot.

27:15 So why haven't you covered this?

27:16 So what Pyhamcrest is, is a matcher object declarative rule matcher thing that helps you

27:25 with the certs and stuff like that.

27:27 Have you used this?

27:27 I have not.

27:28 My first thought it was like some kind of menu item on a holiday dinner.

27:33 But no.

27:34 I literally only heard about this because you put it in the show notes.

27:38 So this is news to me.

27:39 The idea is instead of, like, all the asserts.

27:42 So you've got a whole bunch of assert things, like assert that, assert that, and equal to,

27:46 and a bunch of hamcrest things that you can import.

27:48 So you can do things like, instead of saying assert the biscuit equals my biscuit, you can

27:55 say assert that the biscuit equal to my biscuit.

27:58 So at first, so I've always thought asserts are fun.

28:02 Like, I get this for unit test.

28:04 But for pytest, do we need it?

28:06 Because you could just use assert in pytest.

28:09 Yeah.

28:09 However, I'm kind of easing up on that argument because I can see a lot of places where just

28:15 really, if you can make your assertions more readable in some contexts, then why not?

28:20 Sure.

28:21 And I don't know about this one.

28:23 But if it's got things like go through a list and assert everything as equal in the list,

28:28 right?

28:28 Yeah.

28:29 Or higher order things where it would be kind of complex to implement the test that is the

28:35 thing you want to assert.

28:36 Like, these three fields are equal of these three things, right?

28:39 Then it becomes a little less obvious.

28:40 And if this has a really nice story, it looks like it does.

28:45 Yep.

28:45 There's a whole bunch of matchers within it.

28:47 Like, for objects, it's like equal to and has length.

28:50 It has property.

28:51 Has properties is interesting.

28:54 So you could like assert on duck typing.

28:56 Hopefully, it has these values or something.

29:00 Numbers close to greater than, less than.

29:03 Of course, these asserts are fine with this.

29:05 But the logical stuff, the logical and sequences is, I think, where I probably might use it.

29:11 Things like all of or any of or anything.

29:13 Or that's neat.

29:15 Like, all of these things are true.

29:17 And you can combine this with or.

29:20 Like, all of these or all of those or something.

29:23 Sequences contains in any order.

29:28 That's kind of interesting.

29:30 Yeah, nice.

29:31 Has items.

29:32 Is in.

29:32 Again, these are things that are testable in Python raw.

29:37 Like, just raw tests.

29:38 Not too bad.

29:40 But if it's more readable, sure, why not?

29:43 So there's some that are shown.

29:47 Like, especially with raising error.

29:48 Like, exceptions.

29:49 Oh, where did I get it?

29:51 Oh, the tutorial has a bunch of cool stuff in it.

29:53 The things like assert that calling translate with args curse word.

29:59 Raises a language error.

30:00 Well, that's kind of neat.

30:01 Very naughty.

30:04 Assert that broken function raises exception.

30:06 Okay.

30:07 I mean, in pytest, you've got the raises thing with pytest raises.

30:12 But it is.

30:13 Some people have a hard.

30:15 Like, it's not obvious.

30:16 And this maybe.

30:17 Maybe this looks better.

30:19 The.

30:20 This is kind of neat.

30:21 It use.

30:22 You can use assertion exceptions with async methods.

30:26 So it has a resolved item.

30:28 So you can say assert that await resolved future results in future raising value error or something.

30:35 Yeah.

30:36 Nice.

30:36 That's cool.

30:37 So.

30:37 Yeah.

30:38 So a lot of predefined matchers.

30:40 And I guess it has some syntactic shirker things like is underscore.

30:46 So just if it sounds better to have an is in there, you can add it.

30:50 So assert that the biscuit is equal to.

30:52 Doesn't do anything.

30:54 But it, like, sounds better.

30:56 So why not?

30:57 I guess.

30:58 If you wanted to read that English, like, insert a no-op verb.

31:02 Yeah.

31:03 But.

31:04 But I guess I do want to highlight this because why not?

31:09 I mean, I'm.

31:10 Since I'm writing a lot of test code, I'm used to all the different ways you can check different equivalents of values or comparisons.

31:16 So I don't know how much I would use this.

31:20 But for.

31:20 I've seen a lot of people struggle with how to how to write an assertion.

31:24 And so having some help with the library, why not?

31:27 So this is pretty neat.

31:28 Yeah.

31:28 This totally resonates with me.

31:30 I like it.

31:30 So.

31:31 Well, that's our six items.

31:34 Six.

31:34 Four items.

31:35 Do you have any extras for us this week?

31:40 I do have a few extras.

31:42 Let me throw them in here.

31:44 First of all, it's a few weeks old.

31:45 I didn't remember to put it up here.

31:48 But Python 3.11.2 is out as well as 3.10.10.

31:54 And the alpha 5 of 3.12.

31:56 We're getting kind of close to beta, it feels like, for 3.12, which will be exciting because then we'll get real visibility into what's probably going to be happening for the next version of Python.

32:05 That's cool.

32:06 Yeah.

32:07 I'm testing for 3.12 already with our CI builds.

32:11 Nice.

32:13 For example, with 3.11.2, there were 192 commits since 3.11.1.

32:19 194, rather.

32:20 So that's pretty non-trivial right there.

32:22 And they link over to somewhere that looks, I don't know, what am I supposed to learn from that?

32:27 Here's the changes from 3.11 to 3.12.

32:29 So I always go to downloads, full list of downloads.

32:31 Scroll down to the particular version here and go to release notes.

32:36 And there you go.

32:37 That's probably what they should be linking to.

32:38 And here's all the things.

32:39 There's some that are in here that are things that you might actually care about.

32:43 Like, for example, fixed race condition while iterating over thread states in thread.local.

32:49 You might not want that in your code.

32:50 And various other things.

32:52 Yeah.

32:53 Look at all these changes here.

32:55 This is a lot.

32:55 Yeah.

32:56 Nice.

32:56 Go team.

32:58 Yeah.

32:59 Go team.

32:59 You might think, oh, it's just a dot plus one, plus 0.0.1 sort of thing to it.

33:06 But no, it's got some interesting changes.

33:09 As well as I haven't looked at what's happening in others, but maybe some of those are important enough to pull backwards, those fixes.

33:15 Also, more recent, as in eight days ago, we've got Django 4.2 beta.

33:22 Beta one.

33:23 And, you know, typically the philosophy is once it hits beta, the API should be stable, the features should be stable.

33:30 It's just about fixing bugs.

33:31 Doesn't always work out that way, but that's generally the idea.

33:33 So, basically, here's your concrete look at Django 4.2.

33:37 Yeah.

33:38 Right?

33:39 And 4.2 looks exciting.

33:40 Yeah, absolutely.

33:42 So, you know, they've got some release nodes and various things about what's going on.

33:46 You can go check that out.

33:47 So, they got Psycho PG3.

33:50 So, Postgres support.

33:52 It now supports Psycho PG version 3.1.8 or higher.

33:56 You can update your code to use that as a back end.

33:59 I'm still using 2, so I better, I didn't know there was a 3.

34:02 It's the, no, careful, Brian.

34:04 Psycho PG2 is likely to be deprecated and removed at some point in the future.

34:08 Okay.

34:10 Comments on columns and tables.

34:12 So, that's kind of neat in the database model.

34:15 So, the ORM gets some love there.

34:16 No comment on that.

34:18 Yeah, no comment.

34:19 Very good.

34:20 Some stuff about the so-called breach attack.

34:22 I have no idea, but it seems to have to do with GZIP.

34:24 So, check that out.

34:25 Another one that's interesting is in-memory file storage and custom file stores.

34:30 This is for making testing potentially faster.

34:32 So, if you're going to write some files as part of a behavior, you can say, just write them

34:37 in memory.

34:37 Don't have to clean them up.

34:38 And they write really fast.

34:39 Yeah.

34:40 It phenomenally speeds up testing.

34:42 It's good.

34:42 Yeah, I bet.

34:43 All right.

34:44 So, there's that.

34:45 And then also, I want to give a shout out.

34:48 I'll put it like this.

34:49 I want to give a shout out to an app real quick that people might find useful by way of a journey.

34:54 So, rewriting the Talk Python apps in Flutter, which all the APIs are Python, but we're having

35:00 apps on macOS, Windows, Linux, iOS, and Android.

35:04 That's really hard to do with Python.

35:06 So, Flutter is what we're using.

35:07 And it's going along really well.

35:09 Here's a little screenshot for you, Brian, to show you what we've got so far.

35:12 Isn't that cool?

35:13 Yeah.

35:14 Yes.

35:14 And another, like, here's the little app and stuff.

35:17 So, I think I'm really happy with how it's coming together.

35:18 I think it's going to be a better mobile app experience for, and an existing desktop experience

35:23 for, like, offline mode with the Talk Python courses.

35:25 Oh, cool.

35:26 Yeah.

35:27 So, that'll be really neat.

35:28 The thing I want to tell you about is something I just applied to it.

35:31 This thing called ImageOptim.

35:33 And what you can do is you can just take the top level of your project.

35:36 So, I did this for, say, the Talk Python training website.

35:39 I did this for the mobile app.

35:40 Just take the very top level project folder and just throw it on this app.

35:44 And it'll go find all the images, all the vector graphics, and everything, and minimize the heck out of them.

35:50 So, for example, when I did that on the mobile app, it went from 10 megs of image assets to 8 megs of image assets.

35:56 It's lossless.

35:57 Like, no one will know the difference other than me that I've done it.

36:00 And it dropped 20% of the file size, which is not the end of the world.

36:04 But given how much work it is, it's not too bad.

36:07 Well, the lossless part is the important bit.

36:09 So, that's pretty exciting.

36:11 Yeah, exactly.

36:12 So, it'll do things like if it's a PNG and it sees you're using a smaller color palette than what it's actually holding.

36:18 It's like, oh, we can rewrite that in a way that doesn't make it actually look different, but takes up less storage.

36:23 Basically, it's a wrapper over things like Moe's JPEG, PNG Quaint, PNG Crush, Google Zapfali.

36:31 I don't know how to say these things.

36:33 But there are a bunch of lossless image manipulation tools, and it just applies those to all of them in a super easy way.

36:40 And this thing's open source itself.

36:41 Cool.

36:41 So, yeah.

36:42 Anyway, if people have websites out there, you know, they consider just like, take your website, throw it on here, and it'll tell you, you know, make sure it's all checked in and get.

36:51 Do this.

36:52 See what it says.

36:52 It gives you a little report at the bottom.

36:54 Like, you saved either 10K or you saved 5 megs, depending.

36:58 You can decide whether to keep the changes.

36:59 Yeah.

37:00 Cool.

37:00 Yep.

37:01 All right.

37:01 That's all my extras.

37:02 How about you?

37:03 I just have a couple.

37:04 Yesterday, I talked with you on Python Byte.

37:08 No, on Talk Python about pytest tips and tricks.

37:11 And I just wanted to point out that the post is available for people to read if they want to go look through it.

37:17 And if you have comments, please, or questions, let me know, of course.

37:21 Also, in March, I think I've brought this up before, but I'll be speaking at PyCascades.

37:27 There's a picture of me without hair.

37:31 And I did stick up a blog post on pythontest.com, just a placeholder so that I can link the slides and code afterwards.

37:42 So that's up.

37:44 Yeah, awesome.

37:44 And that's it.

37:46 Yeah, that's going to be a really cool talk.

37:48 I think a lot of people are interested in how you share fixtures and build them for the team or cross project.

37:53 As well, as it was really great to have you on Talk Python, we talked a bunch of cool pytest things.

37:59 And that'll be out in a few weeks for people, if they don't want to watch the YouTube version.

38:01 And then we'll let people know when that's available.

38:05 But yeah, absolutely.

38:06 But hopefully they're all subscribed to Talk Python already anyway.

38:09 Of course, I'm sure they are.

38:10 Yeah, they are.

38:11 How about a joke?

38:13 Are we ready?

38:14 Yes, let's do a joke.

38:15 Let's do it.

38:16 So this one, this is a quick and easy one.

38:18 And for people listening, no pictures even.

38:20 This one comes from Nick's Craft on Twitter.

38:24 And it says, developers, let us describe you as a group, right?

38:28 Things, groups of things sometimes have weird names, right?

38:32 Like a group of wolves is called a pack.

38:35 A group of crows is called a murder.

38:37 We think we should call a group of developers, Brian.

38:40 That's hilarious.

38:42 A group of developers is called a merge conflict.

38:44 Isn't that good?

38:47 Yeah, it is.

38:48 The comments are pretty good.

38:49 If you scroll down here, some of them are silly.

38:54 Some are just like, yep.

38:56 Yeah.

38:58 Anyway, they're pretty good.

39:00 But yeah, a group of developers is called a merge conflict.

39:04 And so true it is.

39:05 You can even have a merge conflict with yourself.

39:07 Be a group of one.

39:08 How about a group of tech CEOs with social media accounts?

39:15 Be a lawsuit.

39:17 That's right.

39:18 An SEC investigation.

39:21 That's right.

39:22 Well, fun as always.

39:25 Thank you.

39:26 Yeah.

39:26 Thanks, everybody for showing up, as always.

39:29 And we'll see everybody next week.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript