#294: Specializing Adaptive Interpreters in Full Color

Published Tue, Jul 26, 2022, recorded Wed, Jul 13, 2022

Watch the live stream:

Play on YouTube

Watch the live stream replay

About the show

Sponsored by Microsoft for Startups Founders Hub.

Michael #1: Specialist: Python 3.11 perf highlighter

via Alex Waygood
Visualize CPython 3.11's specializing, adaptive interpreter. 🔥
PEP 659 – Specializing Adaptive Interpreter
Specialist uses fine-grained location information to create visual representations of exactly where and how CPython 3.11's new specializing, adaptive interpreter optimizes your code.
Dark, rich colors indicate code with many quickened instructions (and, therefore, high specialization potential), while light, pale colors indicate code with relatively few specialization opportunities.

Brian #2: tomli “A lil’ TOML parser”

Fully compatible with TOML spec 1.0.0
This is the library that tomllib from Python 3.11 is based on, so great to use for Python 3.7-3.10 applications.
- We discussed Python 3.11 and PEP 680 on episode 273
Real Python has a great introduction for TOML in Python: Python and TOML: New Best Friends
- TOML as a config format, key-value pairs, data types
- using both tomli and tomllib
- Loading TOML documents into Python
- And like, writing, and updating toml docs programatically, which, although cool, I think the bulk of users can kinda skip over. But the the first 3 sections are an excellent reference.
- Tables are cool, with [name] and [name.subsection] syntax, as well as arrays of tables with [[name]] syntax. I didn’t know how to do that before this article.

Michael #3: Pydantic V2 Plan

via Douglas Nichols and John Thagen
A very detailed plan
Goal to have all this done by the end of October, definitely by the end of the year.
Samuel Colvin take a sabbatical to work on this (sound familiar?)
Some details highlighted by John:
- Moving the core logic to Rust to drastically increase performance (17x) https://pydantic-docs.helpmanual.io/blog/pydantic-v2/#performance
- Strict mode (something I've wanted for a long time): https://pydantic-docs.helpmanual.io/blog/pydantic-v2/#strict-mode
- Cleaning up required vs nullable: https://pydantic-docs.helpmanual.io/blog/pydantic-v2/#required-vs-nullable-cleanup
- Naming cleanup: https://pydantic-docs.helpmanual.io/blog/pydantic-v2/#model-namespace-cleanup
This is a huge change, but overall it looks very promising for the community. It will likely require refactors by downstream users, so pinning pydantic using Poetry/pip-tools etc like always is a good idea.
Many things have Pydantic at the core, so this matters, including:
- FastAPI
- Beanie
- SQLModel
- Pydastic
- …

Brian #4: pikepdf

a Python library for reading and writing PDF files
Based on QPDF, which is written in C++.
Features:
- Supports password protected PDFs
- Creates linearized ("fast web view") PDFs
- Integrates with Jupyter and IPython notebooks for rapid development
Some cool uses
- copy pates from one PDF into another
- split and merge PDFs
- extract content
- replace content, such as replacing images, without altering the rest of the file.
Documentation mentions that if you only want to write PDFs, consider other libs, such as reportlab.

Extras

Brian:

pytest-check
- I’ve set up 2fa for my account, so now I have no excuse for not looking into feature requests and merge requests for pytest-check, other than like all the other things I’m doing.
- I don’t have data for the top 3,500 for the last 6 months, but there is a list of the top 5,000 for last 30 days.
- pytest-check is #1677 in the last 30 days.
- pytest is #72 on the same list.
- pydantic is #117
- There are 57 pytest plugins that show up in the top 3,500 python packages. (packages that start with “pytest-”)
- pytest-check is #20 of those. I guess it’s time to do another top plugins episode of Test & Code.

Joke:

Error, OK, I’ll check the logs

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 I am pulling off a very, very cool trick.

00:03 I just want to point out before we get started.

00:04 Okay.

00:05 On the Talk Python channel, I'm doing a podcast with Anthony Shaw and Shane from Microsoft

00:10 about Azure and Python and some CLI stuff they built and FastAPI.

00:14 And at the exact same time, I'm doing this one here.

00:17 They're both streaming live.

00:18 I don't know how that's happening.

00:21 The other one was recorded two months ago, and we couldn't release it because some of the things weren't finished yet.

00:27 So I hit go on that.

00:29 The real one, if you're bouncing around, the real one is here.

00:32 Okay.

00:33 So a joint is here.

00:34 Anyway, with that, you ready to start a podcast?

00:36 Yeah, definitely.

00:37 Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:42 This is episode 294, recorded July 12th, 2022.

00:47 I'm Michael Kennedy.

00:48 And I am Brian Okken.

00:49 It's just us this weekend, or this today.

00:51 It's just us.

00:52 Yeah.

00:52 Yeah, it's, I don't know.

00:55 Dean out of the audience asks, is this a daily podcast show now?

00:59 I'm a little bit torn about it.

01:01 I feel like we almost could do a daily show.

01:03 But then I think what it might take to do a daily show, knowing how much work a weekly show is.

01:08 No, it's not a daily podcast.

01:10 No.

01:10 It might be fun to do sometime.

01:12 Just do like a full week or something.

01:14 Right.

01:15 Exactly.

01:15 Just a super, there's so much news.

01:18 We're doing it every day for the week.

01:20 Cool.

01:20 But just like the same topics, like six days in a row.

01:23 Just do them over.

01:23 Yeah.

01:25 Exactly.

01:27 Exactly.

01:28 All right.

01:29 Am I up first this week?

01:30 You are.

01:31 Yes.

01:31 Right on.

01:32 Well, let me tell you about something special.

01:36 Specialist.

01:37 Okay.

01:38 Just last week, I believe it was, I interviewed Alex Waygood, who did the write-up for the Python

01:44 Language Summit.

01:44 And as part of the topics we were discovering, you know, the Python Language Summit and Python

01:49 this year is focusing a lot on performance and what's called the Shannon Plan.

01:54 So this is Mark Shannon's plan to make Python five times faster over five releases.

01:59 It's got a ton of support at Microsoft.

02:01 Peter Van Rossum is there working on it, but they've hired like five or six other people who

02:06 are full-time working on making Python faster now.

02:08 So awesome.

02:09 Awesome.

02:10 Thank you for that.

02:11 However, one of the things that made Python 3.11 fast is some of the early work they did.

02:18 And it comes down to PEP 659, a specializing adaptive interpreter.

02:24 So let me tell you about this feature, this performance improvement first, and then we'll

02:30 see what specialist is about because it's about understanding and visualizing this behavior.

02:34 Okay.

02:35 So one of the things that is a problem with Python, because it's dynamic and its types can change

02:42 and what can be passed could vary.

02:44 I mean, you could have type hints, but you can violate the type hints all day long and

02:48 it's fine.

02:48 So what the interpreter has to do is say, well, we're going to do all of our operations super

02:54 general.

02:54 So if I have a function and it's called add and it takes X and Y and it returns X plus Y

02:59 seems easy, but is that string addition?

03:02 Is that numerical addition?

03:05 Is that some custom operator overloading with a dunder add or whatever it is in some type?

03:12 If it fails in one way, you kind of got to reverse it.

03:14 Like there's all this unknown, right?

03:15 Yeah.

03:16 What if you knew, what if you knew those were integers and not classes or not strings?

03:22 You could run different code.

03:24 You wouldn't have to first figure out what they are.

03:26 Are they compatible?

03:27 Do you do the add in low level CPython internals?

03:32 Or do you go to like some Python class and do it?

03:34 Right.

03:34 You could be much more focused.

03:36 Yeah.

03:37 Additionally, if it was adding for a list, you could say, well, if I know their list,

03:41 what we just do is go list dot extend and we give it the other list, right?

03:45 We don't hunt around and figure out all this other stuff.

03:48 So that's the general idea of the specializing interpreter is it goes through and it says, look,

03:54 we don't know for sure what could be passed here.

03:57 But if it looks like over and over we're running the same code and it's always the same types,

04:03 is there a way we could specialize those types, right?

04:07 Is there a way that we could put specific code for adding numbers or specific code for combining lists?

04:14 And this is called adaptive and speculative specialization.

04:18 Okay.

04:19 Okay.

04:21 And my favorite part of it, when it's performed, it's called the quickening.

04:25 Quickening is the process of replacing slow instructions with faster variants.

04:30 So kind of like I said, it has some advantages over immutable byte code.

04:35 It can be changed at runtime.

04:36 Like you see, we're always adding integers.

04:39 It can use super instructions that span lines or take multiple operands.

04:43 And it does not need to handle tracing as it can fall back to the original byte code for that.

04:48 Okay.

04:49 So there's a whole bunch of stuff going on here.

04:51 Like the example they give is you might want to specialize load adder.

04:56 So load adder is a way to say, give me the value that this thing contains.

05:00 But what is the thing?

05:01 One of the things you might do is you might realize it's an instance class.

05:05 And then you would call load adder instance value.

05:09 Okay.

05:09 You might realize it's a module and you might call load adder module or slot or so on, right?

05:14 But if you knew, you don't have to go through first the abstract step and then figure out which of these it is.

05:20 You just do the thing that it is.

05:21 Okay.

05:22 So that's the idea of this PEP.

05:24 This is one of the things that's making Python 3.11 faster.

05:27 Awesome.

05:28 So to the main topic.

05:30 Okay.

05:30 And I'll just, just as a note, I'm saying, okay, as if I understand what you just said,

05:35 but most of it just went.

05:37 It's all right.

05:38 I think, well, let's, let's look at pictures.

05:39 Okay.

05:40 All right.

05:40 So this thing by Brant Boucher is called Specialist.

05:46 And it's about visualizing this specializing adaptive interpreter.

05:50 Okay.

05:50 Good.

05:51 Okay.

05:52 So it says Specialist uses fine-grained location information to create visual representations

05:57 of exactly where and how CPython 3.11's new specializing adaptive interpreter optimizes

06:04 your code.

06:04 And it's not just interesting.

06:06 It has actionable information.

06:08 So for example, see here, and if you've got to pull up this, the website, if you're just

06:14 listening, if you see in that website, you'll see some color.

06:18 You'll see green, less green, yellow, orange, and all the way to red.

06:23 So there's two aspects.

06:24 There's sort of a darkness as well as a color.

06:27 So the most, like where Python could take advantage of this feature, you see green

06:33 where it can't, you see red.

06:34 And imagine a spectrum.

06:37 It goes like green, yellow, orange, red.

06:40 So it's, it's not on or off.

06:42 It's how much could it specialize?

06:43 Okay.

06:44 Okay.

06:44 So what you see here, for example, is it's able to take some numbers and an integer and

06:52 a string, and then use the fact that it knows what those are to make certain things like a

06:57 pending and output and doing some character operations on it.

07:02 Yeah.

07:02 Right.

07:03 It was able to replace that with a different runtime behavior because of this quickening.

07:07 All right.

07:07 So let's skip down here.

07:08 I gave you a bit of the background.

07:10 So it says, let's look at this example.

07:11 We have F to C, which converts Fahrenheit to Celsius.

07:15 And what it does is, okay, we're going to take an F and it has type hints that say float,

07:20 float.

07:20 Okay.

07:21 So, but those don't matter.

07:22 So it says, we're going to take an F and subtract 32 from it.

07:26 And then we're going to do simple math.

07:28 We're going to take that result, that range, that, that, size of temperature there based

07:33 on zero and then multiply it by five and divide it by nine.

07:36 And we all learned this in chemistry class or somewhere, or we talked about diverting different,

07:41 measurement.

07:42 Yeah, of course.

07:43 Yeah.

07:44 Right.

07:44 So these are straightforward, but there's actually problems in here that make it slower and prohibit

07:50 Python from quickening it as much as it can be quickened.

07:54 Okay.

07:55 So if we take this code, it just runs, F to C and C to F, and it gives it some test

07:59 values and says, just do it and tell us what happened.

08:02 We can run specialists on it.

08:03 And it says, okay, this X here is, the green areas indicate regions of code that were

08:08 successfully specialized where red areas are unsuccessful.

08:13 Like it tried and it failed.

08:14 So it says one of the problems is start out the X equals F minus 32.

08:20 It says, well, we can quicken operations on numerical types that are the same, but for now

08:26 there's not a float int and float variant of this.

08:29 It's got to be float float.

08:30 Oh, right.

08:31 So it says, right.

08:32 You, you could have gotten a faster operation there, but because the types didn't match,

08:36 you won't.

08:37 But then what it did get out is an X and that was great.

08:39 an X, which is a float and it's going to do some stuff and it could sort of make it

08:43 better, but it said, look, here's some multiplication again by an integer and a float.

08:47 So that's not quickened.

08:48 And this division division is apparently never quickened.

08:51 So what can we do?

08:52 Well, with that information to say, well, what's the problem with subtracting 32?

08:57 Well, it wasn't a float.

08:58 What if I said 32.0?

08:59 Oh, yes.

09:00 All right.

09:01 That gets replaced by faster code.

09:02 Oh, nice.

09:03 Right.

09:03 Yeah.

09:04 So that's pretty nice.

09:05 And if you want to return, it was adding like X plus 32 for the other direction.

09:09 And now it's 32.0.

09:10 That's faster.

09:11 Okay.

09:11 Well, what else?

09:12 What if we, now you can see when we did that part of the, conversion X times five divided

09:19 by nine, if we put a 5.0, that gets faster still, but the divide is never quickened.

09:24 Okay.

09:24 Well, what if we put the divide in parentheses?

09:26 It doesn't really matter if it's X times five divided by nine or X times five divided by

09:32 nine, right?

09:32 It's, these are mathematically equivalent, but they're not equivalent to Python because

09:36 that, that operation results in, it leverages constant folding, right?

09:42 Five divided by nine is pre-computed in Python to be a float.

09:46 Okay.

09:46 Right.

09:47 At parse time, right?

09:48 That's just how it works with constants.

09:49 If it says it can do math with constants ahead of time, it does it.

09:51 So that becomes a float.

09:53 And then float times float is now quickened.

09:54 Right.

09:55 Isn't this cool the way you can apply this and actually make your code faster, not just

09:59 go, Oh, it's interesting.

10:00 It must be quick in it there, but it's actionable.

10:02 It is really pretty cool.

10:04 And I'd really like to see this incorporated into an editor or something to say you, your

10:09 code will be faster if you just add a point zero here or something like that.

10:13 And it's going to become a float anyway.

10:15 It doesn't matter.

10:16 It just, why would you write 32.0 when you just meant 32 seems more precise to say 32.

10:22 Cause I'm used to doing that to thinking if it's okay.

10:26 Well, me personally, I, if I know it's going to be a float math, I usually do point zero,

10:30 but maybe, maybe that's not a normal thing.

10:32 You're such a C programmer.

10:34 All right.

10:39 Well, I think this is really cool.

10:40 This is a specialist.

10:41 And, you know, I don't know if I have any code that does math at that finer, greater level

10:47 that I really care, but maybe, you know, if you're in charge of a library where you've

10:51 got a tight loop or you do a lot of math science stuff where it matters, this can be really

10:55 useful.

10:56 And what's cool is it's not like, and switch to rust or switch to C or switch to Cython

11:02 and it'll take effect.

11:03 Like, no, this, this is like straight Python code.

11:06 This is just, how do I take most advantage of what is already happening for performance

11:10 boosts in 3.11 that we haven't had before?

11:13 I think, and I think it's going to be just one more workflow step.

11:16 So you've got, you profile your code, your whole thing is a little bit slower than you'd

11:22 like it to be.

11:22 You throw a profiler on it.

11:24 You see the bottleneck areas that you could improve.

11:27 And you think, should I like rewrite some of this in rust or C or, you know, what should

11:32 I do?

11:32 Well, first off, let's try doing this, like throw, throw, throw this at it and, and, and

11:39 have the optimizer from 3.11 help you out.

11:42 And, and yeah, so I think this, I can definitely see that this is going to be part of people's

11:49 workflow, but yeah.

11:50 I agree that you want to profile it.

11:52 Yes, exactly.

11:54 Cause while it's fun to do this.

11:56 Yeah.

11:57 Only focus where it's going to matter.

11:58 Don't, don't optimize a bunch of stuff that doesn't.

12:01 So Brian out in the audience says, different Brian, is there a plan to do lossless type

12:06 conversion or maybe flake eight can make this kind of suggestion?

12:09 Yeah, exactly.

12:10 That'd be fair.

12:12 Yeah.

12:12 I'm not really sure if you don't want to write the code where you get different outputs probably.

12:18 Right.

12:18 But everything that was happening here, you were, you ended up with the same outcome anyway.

12:23 It's just like, well, do I do the division first or the multiplication?

12:27 Or do I start with an ant that results after some addition subtraction with a float?

12:31 Or is I just make them all floats?

12:33 Right.

12:33 I feel like it's, in most cases, it shouldn't be changing the outcome.

12:38 So.

12:38 Okay.

12:39 Yeah.

12:40 Yeah.

12:40 Cool.

12:41 Anyway, that's, that's what I got for the first one.

12:43 How about you?

12:44 Well, we're kind of sticking with a 3.11 theme so far.

12:48 Well, we can use Toml now, but in 3.11, we are going to have a Toml that'd be part of

12:56 part of Python 3.11 with PEP 6.80.

13:00 And we covered that in episode 273.

13:02 But I, I, one of the things we didn't mention was that, was the Toml lib is, is,

13:11 and I think we did mention it, it's based on Tomly, but Tomly you can use right now.

13:15 So a lot of projects are switching to use Tomly as a, as their Toml, Toml parser, to, to

13:23 read, read, like pyproject.toml or, or read their own, config file.

13:28 And, and so I just wanted to highlight it.

13:32 It's a Tomly is the, a little Toml parser.

13:36 it's a cute little thing on the project.

13:39 It's cute.

13:39 But, but I was reminded of it because, real, the real Python people, put

13:46 out actually looks like gear.

13:49 Sorry, I'm not going to try to pronounce that name.

13:51 real Python, wrote an article called, Python and Toml new, new best friends.

13:58 And I really love it's a, it's a very comprehensive article.

14:02 but I really love at least the first three parts of it, using Toml as a config format,

14:07 getting to know key value pairs and, load Toml with Python, because this is kind

14:13 of what you're going to do with it.

14:15 You're going to write config files for something.

14:17 And I just kind of, it's, this is a great introduction of Toml for Python.

14:21 And that's kind of what we care about.

14:23 Right.

14:23 So, it goes through like just getting, getting used to what Toml looks like, what a

14:30 config file looks like talking about how all the keys, even if you, it's like key value

14:35 stuff.

14:35 And even if you, you put a number there or something, it's going to be a string.

14:39 All the keys get converted to strings, even if they don't look like them.

14:42 and they are, they're, they're UTF eight.

14:46 So you can use, unit code in there as well, which is kind of neat.

14:50 put your emojis in there.

14:52 Yeah.

14:53 Well, can you, is, is our emojis UTF eight?

14:56 I think mostly, many of them are interesting.

15:00 That'd be fun to put, put emojis in your, I don't know.

15:05 What motor we were in.

15:06 Are we running in cow mode or lizard mode?

15:08 I'll do lizard.

15:09 Yeah.

15:09 Okay.

15:09 Well, if you're running in lizard mode.

15:11 Okay.

15:11 I got to try that to see.

15:14 I should have done that before.

15:16 Oh my gosh.

15:16 I think almost it's both horrible and amazing to imagine writing like config files to like

15:21 put it and put it in lizard mode.

15:23 Do it.

15:23 Yeah.

15:24 one of the things that I didn't before reading this article, one of the things I

15:28 didn't know you could do in Toml, because I just sort of cursory, I use it with pipe project

15:32 at Toml and that's about it, but you can do, so, talks about, normal, how

15:40 to read stuff.

15:40 But one of the things is, Oh, what was I going to talk about?

15:44 Arrays, and you can do arrays of things which are neat and tables and arrays of tables,

15:50 which is like, so you have arrays of tables or these bracket bracket things.

15:56 And, and then you can do dot stuff.

15:59 So if you have like, was it user and user dot player, these will show up as,

16:06 as like, you know, sub dictionary key things.

16:10 and so one of the things that I, and I played with it this morning and, it really,

16:15 I should have had a something to show, but the thing I like to do is to just read it.

16:20 just like, this article talks about reading it, just read the Toml file into Python

16:26 and print it.

16:27 and then you can, and it'll print out as a dictionary and then you can create whatever

16:33 format you want for your Toml file.

16:34 And then you can just see what it's going to look like.

16:36 And then you know how to access it.

16:39 That's one of the best ways to do that.

16:40 That's awesome.

16:41 Yeah.

16:42 What an interesting format.

16:43 That's pretty, that's pretty in depth.

16:45 And, a blast from last week past Ashley.

16:49 Hey, Ashley says UTF eight can encode any Unicode character emoji, your heart emoji.

16:54 Heart out of it.

16:55 Very.

16:56 Oh yeah.

16:57 You could do like, you know, is it in heart mode?

16:59 Heart equals true.

17:00 Heart equals false.

17:01 oh, for optimize optimizer, you could do a flame emoji equals true.

17:06 exactly.

17:07 So I love it.

17:08 Yeah.

17:09 I think, look, we have not leveraged the configuration as emoji sufficiently.

17:13 Oh yeah.

17:14 I think, I think a pie test should rewrite all of its config figs as emoji items.

17:18 Just do a PR.

17:19 I'm sure we'll take it.

17:20 Yeah.

17:23 It'd be good.

17:23 All right.

17:24 Yeah.

17:24 All right.

17:25 Let me tell you about our sponsor for this week before we move on.

17:27 So this week is brought to you by Microsoft Founders Hub.

17:31 In fact, they are supporting a whole bunch of upcoming episodes.

17:34 So thank you a whole bunch to Microsoft for startups here.

17:37 Starting business is hard by some estimates.

17:39 Over 90% of startups go out of business within their first year.

17:43 With that in mind, Microsoft for startups set out to understand what startups need to be

17:47 successful and to create a digital platform to help overcome those challenges.

17:51 Microsoft for startups Founders Hub.

17:53 Their hub provides all founders at any stage with a bunch of free resources to help solve

17:58 challenges.

17:59 And you get technology benefits, but also really importantly, access to experts, guidance and

18:04 skilled resources, mentorship and networking connections and a bunch more.

18:08 So unlike a bunch of other similar projects in the industry, Microsoft for startup founders

18:14 hub does not require startups to be investor backed or third party validated to participate.

18:19 It's free to apply.

18:21 And if you apply, get in.

18:23 Then it's you're in.

18:24 It's open to all.

18:25 So what do you get if you join or apply and then get accepted?

18:28 So you can speed up your development with access to GitHub, Microsoft cloud, the ability to unlock

18:32 credits over time, as in it gets over a hundred thousand dollars worth of credits over time over

18:38 the first year.

18:38 If you meet a bunch of milestones, which is fantastic.

18:41 Help your startup innovate.

18:42 Founders Hub is partnering with companies like OpenAI, the global leader in AI research and

18:47 development to provide benefits and discounts too.

18:50 Yeah.

18:50 Through Microsoft startup founders hub, becoming a founder is no longer about who you know.

18:54 You'll have access to the mentorship network, giving you all access to a pool of hundreds of mentors across a range of disciplines.

19:01 Areas like idea, validation, fundraising, management, coaching, sales marketing, as well as specific technical technical stress points.

19:07 To me, that's actually the biggest value is the networking and mentor side.

19:12 So you'll be able to book a one-on-one meeting with these mentors, many of whom are founders themselves.

19:17 Make your idea a reality today with the critical support you'll get from Microsoft or startups founders hub.

19:22 Join the program at pythonbytes.fm/founders hub.

19:25 Link will be in your player's show notes.

19:27 Nice.

19:27 Yeah.

19:28 Cool.

19:28 Indeed.

19:29 All right.

19:30 I guess I'm up next with this order we got.

19:32 And oh my goodness, Samuel Colvin, take a bow.

19:35 Because he put out a plan for what's happening with Pydantic version two.

19:42 But the reason I say take a bow is this is one detailed plan that is really, really thought through, thought out, backed up with a bunch of GitHub discussions and so on.

19:52 Wow.

19:53 So the idea is Pydantic started out as an interesting idea.

19:57 And surprise, surprise, a bunch of people glommed onto it probably more than it was originally envisioned to be so.

20:04 So, for example, SQL model from Sebastian Ramirez is like, Pydantic models are now our ORM to the database with all the interesting stuff that ORMs have.

20:14 And Roman Wright said, guess what?

20:16 We could do that for MongoDB as well.

20:18 Same with the Pydastic thing we recently spoke about.

20:21 And then Sebastian Ramirez is like, also, like, hey, FastAPI, this can be both our data exchange as well as our documentation.

20:28 I was like, oh my goodness, what's going on here?

20:31 So there's a bunch of stuff on the insides that could be better, let's say, or maybe time to rethink this.

20:39 So in this plan, it talks about what they'll add, what they'll remove, what will change, some of the ideas for how long it will take, and so on.

20:46 Yeah.

20:47 Here's a pretty significant thing.

20:49 I'm currently taking a kind of sabbatical after leaving my last job to work on this, which goes until October.

20:55 So that's a big commitment to I'm going to help make Pydantic better.

21:00 Oh, wow.

21:01 It sounds familiar.

21:02 It sounds a bit like Rich and Textual and those types of things as well.

21:07 But this is a big, big commitment from Samuel, and he's really doing a ton of work.

21:11 It says, people seem to care about my project.

21:14 It's downloaded 26 million times a month.

21:17 Wow.

21:18 Which is insane.

21:19 Yeah, it's awesome.

21:20 That's kind of incredible.

21:22 It is.

21:22 And so it says, here's the basic roadmap.

21:24 Implement a few features in what's now called the Pydantic core.

21:28 We just had Ashley, who, as we saw, is out in the audience.

21:31 Hey, Ashley.

21:32 Who give a bit of a shout out to this feature.

21:34 And also, I do want to also credit a couple other people's because Douglas Nichols and John

21:39 Thagan also let me know that this was big news coming.

21:42 So thank you all for that.

21:43 The Pydantic core is being rewritten in Rust, which doesn't mean you have to know or do anything.

21:49 It just means you have to pip install something.

21:51 You get a binary compiled thing that runs a lot faster.

21:54 Okay, so more on that in a second.

21:56 First, they're working to get 110 out and basically merge every open PR that makes sense

22:01 and close every PR that doesn't make sense.

22:04 And then profusely apologize to why your PR that you've spent a long time making was closed

22:09 without merging.

22:10 Some other bookkeeping things.

22:12 Start tearing the Pydantic code apart and see how many existing tests can still be made

22:17 to pass and then release eventually Pydantic.

22:19 The goal is to have this done by October, probably by the end of the year for sure.

22:22 A couple of things worth paying attention to.

22:24 There are a bunch of breaking changes in here.

22:26 A lot of things are being cleaned up, reorganized, renamed, some removed.

22:31 Like from ORM, people might be using that with SQLAlchemy.

22:34 That's being removed, for example, and so on.

22:37 So there's, if you depend heavily on Pydantic, especially if you build a project like Beanie

22:43 that depends heavily on Pydantic, you're going to need to look at this because some of the

22:46 stuff won't work anymore.

22:47 But let's highlight a couple of things here.

22:49 Performance.

22:50 This one is really important because this is the data exchange level for FastAPI.

22:57 This is the database transformation level.

23:00 When I do a query from the database, what comes back comes back in some raw form and then is

23:03 turned into a Pydantic model.

23:05 And those are computationally expensive things that happen often.

23:09 And in general, Pydantic version 2 is about 17 times 1,700% faster than V1 when validating

23:17 models in a standard scenario.

23:19 It says between 4 to 50 times faster than Pydantic 1.

23:22 Hmm.

23:23 Wow.

23:24 That's cool, right?

23:24 Yeah.

23:25 That alone should make your ears perk up and go, excuse me, my ORM just got 17 times faster.

23:30 Wait a minute.

23:31 I'm liking this.

23:32 I know that this is not the only thing that happens at ORM level, but the ones that, the

23:37 ones I called out that depend heavily on it, like that's in the transformation path.

23:41 So this is important.

23:42 Yeah.

23:43 This is actually, I'm super impressed.

23:46 I have not, I normally don't even see this sort of advanced planning in commercial projects.

23:51 Yes.

23:52 Oh yeah.

23:53 You could do a whole business startup that doesn't have the amount of thought that went

23:57 into like what's happening in the next version of Pydantic.

24:00 It's ridiculous.

24:00 Yeah.

24:01 It's incredible.

24:02 I mean, I was serious when I said take a bow.

24:04 It really lays out, opens a discussion about certain things and so on.

24:09 So like another one is strict mode.

24:11 I think I even saw a comment in the chat about it.

24:14 So one of the things I actually like about Pydantic, but under certain circumstances,

24:19 I can see why you would not want it is if you have something you say is an integer field

24:23 and then you pass one, two, three, the number rate.

24:26 But if you also pass quote one, two, three, Pydantic will magically parse that for you.

24:31 Like this happens all the time on the internet.

24:32 Like a query string has a number, but query strings are always strings.

24:35 There's no way to have anything but strings.

24:37 Yeah.

24:38 So you got to convert them.

24:39 Right.

24:39 So this automatically does that.

24:41 But if you don't want that to happen, you say, you gave me a string.

24:44 It's invalid.

24:45 You can turn on strict mode, which is off by default, I believe.

24:48 There's also a bunch of plain.

24:49 Go ahead.

24:49 So strict mode does the conversion or strict mode?

24:53 Strict mode won't do the conversion.

24:54 It says, you said it's an int.

24:56 You gave me a string.

24:57 Nope.

24:57 Rather than, could it be an integer?

25:00 Let's try that first.

25:01 You know what I mean?

25:02 Yeah.

25:03 Maybe one of the things you do is, in the ORM level, one of those things, you might put it in strict mode so it doesn't do as much work trying to convert stuff.

25:11 I don't know if that actually would matter.

25:12 But formalizes a bunch of conversions.

25:15 It has built-in JSON support and different things.

25:18 Another big thing is this Pydantic core will be able to be used outside of Pydantic classes now.

25:27 So you can do a significant performance to improve stuff like adding validation to data classes or validating arguments and query strings or a type dick or a function argument or whatever.

25:39 Yeah.

25:40 Let's see.

25:41 Next up.

25:42 And let's see.

25:44 This one.

25:45 Strict mode.

25:46 We talked about strict mode.

25:47 Another one is required versus nullable.

25:50 There's a little bit of ambiguity of, you know, if you said something's a string, that means it's required and it can't be none.

25:56 If you say it's a string type none or it's an optional string or something like that, then basically the behaviors were a little bit different.

26:06 So originally, I think this is when typing was pretty new.

26:09 It said Pydantic previously had a confused idea of required versus nullable.

26:14 This mostly resulted from Sam's misgivings about marking a field as optional but requiring a value to be provided to it but allowing it to be set to none or something along those lines.

26:25 Anyway, there's minor changes around that.

26:28 Let's see.

26:29 Final one that I want to cover is namespace stuff.

26:32 And this is like a whole bunch of things are now getting renamed.

26:36 So for example, if you override, if you implemented or overrode validate JSON, it's now model underscore validate JSON.

26:43 If you had is instance, it's now model is instance.

26:45 Okay.

26:46 There's a bunch of these changes all over the place.

26:48 Yeah.

26:48 That look like they're going to cause breaking changes.

26:51 They're easy to fix.

26:51 Just change the name.

26:52 But, you know, it's not nothing.

26:54 Also, parse file.

26:56 I still love his hander here.

26:59 Pars file.

27:00 This was a mistake.

27:01 It should have never been in Pydantic.

27:02 We're removing it.

27:03 Okay.

27:04 Pars raw.

27:04 Partially replaced by this other thing.

27:07 Anything else it did was a mistake.

27:09 From ORM, this has been moved somewhere else.

27:11 Schema and so on.

27:13 So you just, like, there's a lot of stuff that people are using here.

27:15 So just have a look.

27:16 Try it out.

27:17 Don't just go, oh, then version 2 is out.

27:19 Is this going to work?

27:19 Like, this is going to have some significant changes.

27:21 And another reason why it's really awesome that he goes through so much detail is because

27:27 there's going to be stuff that breaks.

27:29 So it's a breaking interface change.

27:32 And so, yeah, it's cool that it's this detailed.

27:36 And a couple things to notice.

27:38 Let's see.

27:40 Somebody else in the chat mentioned.

27:42 Richard mentioned.

27:44 And he has emojis in the headers.

27:46 Yeah, there's emojis in the headers.

27:48 And I got to say, like, the navigation in the table of contents, very cool.

27:54 It goes to, like, light gray for areas you've already seen.

27:59 And then.

27:59 Oh, that's interesting.

28:00 It's a cool thing.

28:01 Yeah, it's quite cool.

28:03 I think it went on and on.

28:05 But two real quick things.

28:06 One, there'll be no pure Python implementation of the core.

28:10 It's always Rust.

28:11 But they list out the platforms where it'll be compiled to, including WebAssembly.

28:15 Oh, nice.

28:16 They previously had some Cython in what was supposed to be pure Python's Pydantic.

28:22 And so now a kind of bonus is the Pydantic model, the Pydantic package, becomes a pure Python package, whereas previously it wasn't.

28:30 So they've taken, like, all of that behavior and put it under this core thing that ships as a Rust binary.

28:35 And now instead of doing some Cython middle ground, it's pure Python again.

28:40 So that's interesting refactoring, I think.

28:42 Yeah.

28:43 Yeah.

28:43 And finally, documentation.

28:45 When you get a validation error, it gives you a link to the documentation in the JSON error message.

28:52 That's pretty cool.

28:53 Yeah.

28:54 That's nice.

28:55 All right.

28:56 Yeah.

28:56 Anyway, that's quite a plan, isn't it, Brian?

28:58 Yeah.

28:58 Quite a plan.

28:59 All right.

29:00 Well, I'm excited for it.

29:01 Okay.

29:03 Well, next topic is a little more lighthearted.

29:07 It's about fish.

29:09 Pike, to be specific.

29:12 No, it's about PDFs.

29:15 So it's just a cool project I noticed.

29:19 Pike PDF.

29:20 It's a Python library for reading and writing PDF files.

29:24 What's the big deal?

29:25 We've had these before.

29:26 But this is, it's based on QPDF, which is a C++-based library.

29:33 And it's presently continued being maintained.

29:38 So it's kind of pretty fast.

29:41 Well, actually, I'm assuming it's fast if it's C++ in the background.

29:44 Yeah.

29:44 But it's also pretty just nice and elegant to do things.

29:51 And the documentation has this nice fish, which is good.

29:56 I always like cool diagrams, cool logos.

29:59 But some of the neat things that you can do with it.

30:03 So it's recommending that you not use it if you're just writing PDF files.

30:08 That there's other things that you can use.

30:12 What was it?

30:12 Like Report Lab to write PDFs.

30:14 But if you're having to read or modify PDFs, then this is where it shines.

30:19 You can do things like copy pages from one PDF to another.

30:22 Split and merge PDFs.

30:24 Extract content out of PDFs.

30:27 Like if you're using it for data stuff.

30:30 You get a report in PDF and you're trying to pull the information out.

30:34 You can use it for that.

30:37 Or images.

30:37 You can pull all the images out of a PDF file.

30:39 Or this is kind of cool.

30:41 You can replace images in a PDF file and generate a new one without changing anything else about the file.

30:47 It's kind of neat.

30:48 So just kind of a neat, if people are working with reading or modifying PDF files, maybe check this one out.

30:57 Yeah, this looks great.

30:57 The fact that it's in C++, I'm guessing it's probably standalone.

31:01 I remember I've done some PDF things before and it felt like I had to install some OS level thing that it shelled out to.

31:08 So this is cool.

31:09 Nice on the ReadMe, it has a comparison of some of the different PDF libraries that you could use.

31:19 And some of the reasons why you might want this one, like it supports more versions.

31:24 I didn't realize that one of these libraries I've heard of before, PDF-RW, doesn't support the newer versions.

31:31 So bummer.

31:33 And then also password-protected files, it supports that.

31:39 Except for, but not public key ones, but just normal passwords.

31:42 Straight passwords, yeah.

31:44 Yeah.

31:44 That's great.

31:44 So it's kind of neat.

31:45 Also like the measure of actively maintained, the commit activity per year over the year or something like that.

31:52 Oh, right.

31:53 That's kind of interesting.

31:54 Yeah, it's an interesting metric.

31:55 It seems good.

31:56 I haven't really thought about it lately, but.

31:58 Yeah.

31:59 Nice.

31:59 All right, yeah, this is a great one.

32:01 Well, so that's it for our main items.

32:04 Yeah, what else you got?

32:06 Any extras?

32:07 Well, last week, we talked about the critical packages.

32:13 Critical packages.

32:14 Or at some recent.

32:16 Yeah, last week, we talked about critical packages.

32:20 Either yesterday or last week, depending on how you consume this material.

32:23 Exactly.

32:23 Yeah.

32:24 So I was surprised to find out that pytestCheck, the plugin I wrote, was one of those.

32:31 I'm like, really?

32:32 Because it's like the top 1%.

32:34 So if anybody's curious, I wanted to just highlight that a little bit.

32:39 So pytestCheck is a plugin that allows multiple failures per test.

32:44 And one of the best ways, it's a secondary way that one of the contributors added, is you

32:49 can use it as a context manager.

32:51 You can say, like, with check, and then do an assert.

32:54 And then you're going to have multiple of those within a.

32:56 I like the one-liner even.

32:57 That's pretty nice.

32:58 Yeah.

32:59 And this is totally, like, black will totally reformat this if you ran it through black.

33:03 But it's nice.

33:04 You'd have to block it out.

33:06 Anyway, I was like, how could it be?

33:09 Well, I'm curious what on the list it was.

33:13 So there's a place called, what, HugoVK.

33:18 Has a top PyPI packages list.

33:22 And it's updated.

33:23 I think it's just updated once a month or something.

33:25 But you can do the top 5.

33:27 You can do the top 5,000.

33:29 Yeah, it's the top 5,000 or 1,000 or 100.

33:33 And so I was curious about where on the list I was.

33:38 I'm number 1,677.

33:41 So kind of far down the list.

33:43 But hey, we're just talking.

33:45 It's still in the top third of the top 1%.

33:47 That's pretty awesome.

33:48 The pytest is number 72.

33:51 That was pretty neat.

33:52 And Pydantic, which we covered, was, I just checked, 117.

33:57 But there are 57 pytest plugins that show up in the top 3,500.

34:03 So that's pretty neat.

34:04 Wow.

34:04 Pretty neat.

34:05 That is pretty neat.

34:05 That's all I got for extras.

34:07 All right.

34:08 Well, I have zero extras.

34:09 So mine are finished as well.

34:11 How about a joke?

34:12 Yeah.

34:12 Great.

34:13 All right.

34:14 I told you we're coming back to it.

34:15 So this one comes from Netta.

34:17 Netta Code Girl at Netta.mk.

34:21 And let me just pull this one up here.

34:24 Right.

34:24 So this one is, there's this colleague here.

34:28 Can I make this?

34:29 There we go.

34:29 Make it a little bigger.

34:30 There's the two women who are developers, Netta and her unnamed friend who always has gotten

34:37 in trouble with the elevator last time, basically.

34:39 And there's this sort of weird manager looking guy that comes in and says, I tested your

34:44 chat bot, but some of its replies are really messed up.

34:47 Well, that's what testing is all about.

34:50 I'll go through the logs later, says one of the girls.

34:52 No, no, no.

34:53 No, no, no.

34:54 No, no.

34:54 No need.

34:55 Right.

34:56 Check out the faces.

34:57 She's like, excuse me.

34:59 I'm not even sure I want to open the logs now.

35:02 Yeah.

35:02 Don't look at the logs.

35:04 That's what testing's for.

35:06 I'll go through the logs later.

35:08 Well, yeah, she's got some good ones in her list there.

35:13 So love it.

35:14 Yeah.

35:15 I like the art too.

35:16 Nice art.

35:17 I do too.

35:18 It is.

35:19 So also nice was our podcast.

35:21 Thanks for being here.

35:22 Thank you.

35:22 Yeah.

35:23 You bet.

35:23 See you next week.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript