Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #174: Happy developers use Python 3

Return to episode page view on github
Recorded on Wednesday, Mar 18, 2020.

00:00 Hello, and welcome to Python Bytes, where we deliver news and headlines directly to your earbuds. This is episode 174, recorded March 18, 2020. I'm Brian Okken.

00:11 And I'm Michael Kennedy.

00:12 And this week, this episode is brought to you by Talk PythonCourses and the pytestBook.

00:18 Yeah, yay, it's brought to you by us.

00:19 Yeah, us.

00:21 More about that later, huh?

00:22 Yeah. So we're doing something a little different. We're recording in two different locations Because of actually we always record in two different locations, but But the locations are sometimes not the location, especially the location you're in.

00:35 Yeah, I often, yeah, record somewhere else. But today I'm at home because a lot of people are at home working remotely in home offices now because of, I don't even know how to pronounce it. I read it. COVID COVID-19. Yeah, it is an insane time on so many levels, But I would say certainly there's a lot of tech people out there who may be working from home for the first time.

01:00 You know, I know there's a lot of large companies that feel like you need to go to be in the office and you need to do the work.

01:07 And yet a lot of the tools that we use as developers are very suited to the situation that many of us around the world find ourselves in working from home, working asynchronously and whatnot.

01:20 Right.

01:20 GitHub, Slack, email, Zoom, whatever it is.

01:25 It's interesting to see the rest of the world scale up to kind of what we've been doing for a long time.

01:30 - We were lucky that our office was recently moved in July, and during the move, we tried to set everybody up to be able to remote work because some people had longer commutes than before.

01:44 And it happened to be, I mean, it's just fortunate that we set that up before this happened.

01:49 And I'm also very fortunate that I'm a software worker.

01:52 There's a lot of people that, I mean, our work can continue for the most part with little interruption, but it's a harder environment.

02:01 But a lot of people that are not technical workers can't do that.

02:05 Yeah.

02:06 It's such a bummer.

02:06 You know, my daughter, she just got a new job and she was supposed to start, actually, she was supposed to start today and they sent her a message, you know what?

02:14 or closed or closed indefinitely and there's no reason for you to come and get trained to work here because who knows what it's going to look like in a month or two. I mean that's the reality for a lot of people. It's rough. One of the reasons why we started talking about this this morning is just to say you know to reach out to everybody and say yeah hope everybody's doing okay and yeah let us know some stories if you want to share. Yeah maybe some interesting tech angles right? Like problems you run into or things you found that really worked or whatever. But yeah, everyone out there be safe. It's not always fun, but just find a place to hole up and just wait this thing out and be safe. Yeah, that's a good idea for some extra things like related to that.

02:57 I'll add one of these, on our add-ons at the end, I'll add one to that.

03:02 All right, super. Well, I want to start out by talking about community.

03:06 I was partly thinking about this because of the coronavirus stuff and a lot of people possibly have maybe two extra hours in the day, because they're not commuting, maybe.

03:17 I'm sorry if you have a two-hour commute or an hour commute on each end, but you might have some extra time.

03:23 One of the things you might want to share and spend some time doing is beefing up documentation on open-source projects.

03:29 There actually was a great article called documentation as a way to build community by Melissa Mendoza, I think Mendoza, sorry, Melissa, but it talks about how educational materials can have a huge impact and, and effectively bring people into a community and beefing up the documentation story on open source projects can actually help bring more people to use it and help. I mean, it seems obvious, but, But it isn't really, and people aren't doing it.

04:00 There's a lot of projects that lack in really good documentation, and there's a lot of reasons for that.

04:05 Talking about the reasons, I think it's interesting.

04:08 Decentralized development and a lot of projects start with just somebody scratching their own itch, and they don't need documentation for that.

04:17 But it grows into other people getting involved.

04:20 A lot of people, it's more glamorous to add new features or fix a nasty bug and adding more documentation, nobody really knows how to do that.

04:30 I think it's important and spending some more focus.

04:33 One of the directions of this article says, it was targeting a specific project, but I think it really can be really more than just this one, is splitting up the documentation into organizing it in four different areas; tutorials, how-tos, reference guides, and explanations.

04:51 these four areas and subsections of those can be targeted towards different people, targeted towards beginners or advanced people, or somebody just looking something up.

05:01 One of the great things about that is it makes it easier for somebody to jump in and say, "Oh, there's one little piece of things, how to do something, I can contribute to that.

05:11 I might not know why it works, but I can contribute to how to in some tutorials." Whereas maybe some of the more expert people in the project can do some of the explanations of how things are working.

05:22 Also, a lot of teams shift, or some projects have the new people come in and say, "Hey, you want to help out?

05:29 Why don't you write documentation?" I think that's a great thing.

05:32 But then you've got documentation that's just filled with the beginner people that content from beginners that might not be from some of the experienced people.

05:41 I think there's some good information here, and I think focusing on documentation might be a good thing.

05:46 >> I like the article. I like the idea of it, right, that you can build a community.

05:51 Certainly you can contribute to these projects quite easily in this way.

05:56 Breaking it up into these categories is really clever because then you can definitely, just sit down and think, oh, I'm gonna write some docs for this thing.

06:03 Well, that's pretty wide open, right?

06:05 But I'm gonna write a short tutorial, which I had to learn because I had to use this thing and now I know how to do that.

06:10 Why don't I generalize it and make a tutorial?

06:12 That seems like a really easy way to get yourself on the contributor list, beef up your resume, say I contributed to this project, et cetera, I think it's good.

06:20 - One of the things I'd like to reach out to people, some of the beginner stuff, a great thing to do is while you're learning a project, isn't writing new content, but while you're reading documentation on a project, if there's typos, if there's just grammar errors, it may have been written by somebody that isn't native English, so you can help out by just fixing some of those things.

06:42 And then also, while you're going through things, If you stumble on something and it's difficult to follow the instructions, it might be that the instructions need to be modified.

06:51 And why not just do like a pull request of modifying those instructions to be the way it really works.

06:57 And I think that'd be cool.

06:58 - Yeah, that'd be great.

06:59 You know, another area that might be interesting is to write tests.

07:02 - Yeah, definitely.

07:03 - A lot of projects lack tests or they're just marginally tested and you're like, well, okay, I'm gonna create this tutorial and I wanna make sure the things I'm saying work.

07:13 So let me add some tests to verify what I believe to be true, to be true and go ahead and commit that back to the project.

07:18 - Yeah, and modifying tests, if the tests are not readable, they should be.

07:22 And maybe you can make them more readable.

07:23 - Yeah, I guess I kind of started thinking about that because documentation and like tests feel a little bit like a form of documentation.

07:30 - Yeah, definitely.

07:32 - Yeah.

07:33 Well, cool.

07:34 Well, I'm pretty passionate about fast websites.

07:37 As you probably know, I talk about trying to make websites fast all the time.

07:41 Our website's pretty fast.

07:42 Speed is important to slow websites strong, push people away.

07:46 - They do.

07:47 I think it was Amazon or somebody did a study saying like every, you know, 100 milliseconds latency of perceived latency to the user.

07:56 And you know, it has a very tangible, like whole number percentage drop in actual sales.

08:02 - Yikes.

08:03 - Yeah, sales are not the most important thing necessarily.

08:05 Maybe if you're Amazon, they are, but it's just gives you a sense of like, well, 100 milliseconds, you can barely perceive that as a person.

08:11 And yet as those things add up, right, it starts to really make a difference in behavior.

08:16 So I want to talk about this article, sort of riff on some topics covered in the article, more or less, called "The Django Speed Handbook, "Making a Django App Faster" by Shabell Mansour.

08:28 Now, the title has Django, and some of the examples are really about Django, but this actually applies to most websites and Python websites and whatnot.

08:38 So if you do Flask, I think that's still be super, super relevant.

08:42 The first thing though that I want to point out is actually a Django thing.

08:46 And it does appear at least in Pyramid as well.

08:49 So there's this, in Django, there's a thing called the Django debug toolbar.

08:53 And it lets you explore the different requests, see how long they're taking.

08:58 You can even get in there and look at the ORM calls and what's happening.

09:02 So that's pretty awesome.

09:04 Like Pyramid has this as well.

09:05 You can actually see the SQL alchemy calls going to the database and the timing and how many database queries there even are on a given page.

09:13 It's pretty ridiculous to be able to use that to analyze what you're--

09:17 it's almost like you've attached a little debugger or profiler all the time, and it's just right there.

09:22 That's cool.

09:23 Do you have to turn it off, then?

09:24 Well, when you go into production, you don't include it in the setting, like the run settings for production, obviously, right?

09:30 That would be bad.

09:31 But some of those settings, even in the debug mode, you have to turn them on.

09:35 I'm not sure about the Django one, but the pyramid one, you definitely--

09:38 like the profiler's not on by default because that'll slow it down a little bit.

09:42 But you can click a box and then go do the request again.

09:45 All right, so that's a real quick and easy way just to see what your app is up to.

09:49 Then one of the things you really wanna pay attention to, and this is gonna be a bit of a theme on today's show, is talking to databases.

09:56 So when you're working with an ORM or just talking to the database, specifically here, the Django ORM, but this is super relevant for SQLAlchemy as well, is you want to be really careful of the so-called n plus one problem, which happens when you navigate relationships.

10:14 So for example, if I have, let's say a category, I'm going to show a category of books and the category has a books relationship, or maybe there's some other thing like that, I get all the categories back and I want to tell you how many books are in each one or something.

10:30 Like as you go through the things that come back, you end up doing one query for each property that you access on each instance of that object.

10:39 So if you do a query that returns 20 things, you might end up talking to the database 21 times.

10:44 It's a common problem in ORMs, but it also has an easy fix, which is why that debug toolbar is cool, 'cause you could turn it up and say, well, turn it on and say, oh look, why are there 24 queries on this page, right?

10:55 I feel like I did one, like, well, sort of.

10:57 So you can use select rated, related and prefetch related and it'll basically join or pre-query those related objects together in one massive query.

11:10 So you don't actually go back to the database N plus one times.

11:13 - Okay, nice.

11:13 - Yeah, and that's a big deal.

11:14 And, you know, SQLAlchemy has a joined load and sub query that you can basically accomplish the same thing.

11:20 So he's got a cool example of not a huge database, but using these two properties in the Django RM, going 24 times faster.

11:28 - Oh wow, yeah.

11:29 - Right, I mean it's basically not changing the code at all except saying, you know I'm gonna use this related property so just query that as part of the query instead of like doing 20, you know, however many queries you're going back for.

11:40 Really, really nice.

11:41 Related to that is indexes.

11:44 So if you're not thinking about and using indexes, you should be.

11:48 I mean that's like easily a thousand times faster to do a query against a lot of data with an index versus without.

11:55 And then if you've got these joins, it's even better, you know.

11:57 So, super important, but do be aware that indexes make writes slower.

12:03 So, if you have, not, most websites don't write data like crazy, although some APIs do.

12:10 So, it's usually not as big of a problem, but just be aware that writes are slow, slower with indexes, but queries are much, much faster.

12:18 Another thing they talk about, which is really helpful, is using pagination, pagination, where instead of saying, here's 1,000 items, here's 50 and you can ask for the next 50 and the next 50 and so on.

12:29 That's super easy to do with Django ORM or SQLAlchemy or anything like that.

12:32 So that's a really good one.

12:34 >> So does that often line up with like if you're showing, like if your page only shows 50 things, only fetch 50 things then?

12:40 >> Yeah, exactly. It's super easy to put in the query string like page equals five.

12:45 Then you just do a skip and a limit or or whatever the ORM using has like for the skip and take type of thing, right?

12:55 So it's super easy, you can compute it yourself, but it makes a big difference, right?

12:59 Also, if you have long running tasks, long running things to do, make them either background tasks and like extra other processes or celery or something, or just use, if the person making the call has to wait on it, be sure to use async, right?

13:16 So you're not blocking up everything.

13:18 Another super easy way to make things fast, and many of these things we're doing at Pythonbytes.fm and the other websites, is to turn on gzip.

13:28 So you can just go to like NGINX or whatever your web server is and say gzip the response.

13:35 He's got a really simple example here where the response size of the page and the CSS and whatnot is nine times smaller by just adding the gzip middleware to Django.

13:46 I wouldn't actually add it to Django if this was me.

13:48 I would add it to Nginx, 'cause that's the outer shell web server.

13:53 Just let it do it.

13:54 And you don't have to, you're probably not talking directly to the server running Django.

13:58 But anyway, somewhere along the way, gzip your content, 'cause that'll be big.

14:03 Similarly, minify your static files, and bundle them, and cache them, and all of those good things, right?

14:11 There's some cool libraries that he talked about in there.

14:14 I think it was called Whitespace.

14:16 I'm pretty sure it's called Whitespace.

14:17 that they're using in Django to minify and bundle the files.

14:22 So we don't use Whitespace, and we don't use Django.

14:25 We use WebAssets and CSSmin and JSmin, which are three awesome Python libraries to bundle that.

14:31 So if you go and look at Python Bytes or Talk Python or any of those sites, you can see that there's like a packed CSS and a packed JavaScript that has probably 20 CSS files that's smushed into one with those things and minified and whatnot.

14:46 So that's pretty cool.

14:47 There's two ways to measure page performance.

14:49 One is like how fast is the server responding, right?

14:53 But that's not the most important thing to the user.

14:56 The most important thing is how does it feel to them.

14:58 So Google has this thing called PageSpeed, which they're even using for measuring your SEO ranking.

15:04 So put your website into there.

15:05 I have a link for Talk Python Trainings ranking.

15:09 I spent three days straight getting it from like 40 out of 100 to 99 or 100 out of 100.

15:17 But it was quite the journey.

15:19 So that took a while.

15:21 You can both measure it for mobile and desktop.

15:24 And it has slightly different rankings.

15:25 Also, shrink your images with ImageOptim, which works for macOS and Linux.

15:32 It doesn't work on Windows.

15:33 But there's some really great options there.

15:35 And it'll basically do completely lossless compression of your images.

15:39 So they might be like 40 or 50% smaller.

15:42 And visually, you literally couldn't distinguish them.

15:45 - Interesting, yeah.

15:46 - Yeah.

15:46 And last recommendation is lazy load your images.

15:51 This is not something I've really explored, but apparently Google Chrome images now support a lazy attribute.

15:58 Oh, nice.

15:59 Yeah.

16:00 And then for things that don't support it, there's a lazy load JavaScript library.

16:03 Basically your images, you say, here's as it scrolls into view, it'll download them.

16:07 But if it's off the page and you never scrolled and it'll never load it.

16:10 That's great.

16:11 Yeah, pretty clever.

16:12 So this is just some of the things covered in that article.

16:15 So if you're out there and you're like, I need to get my site to go faster, it cannot be three seconds per page load.

16:21 That's ridiculous.

16:22 Like start looking through some of these things.

16:23 It'll really help, especially if you're using Django.

16:25 But even if you're using some other Python framework, I think it'll still be quite relevant.

16:29 Yeah, most of these are relevant to any web stuff.

16:32 Yeah, yeah, they're super, super general.

16:34 Like some of the libraries they talk about plug into Django.

16:36 So it's kind of a little extra boost if you're doing Django.

16:38 But yeah, this is relevant to everyone.

16:40 Yeah.

16:41 All right, what do you got next?

16:42 into as a listener's suggestion from the author of the library.

16:45 So this is like JIT podcasting, right?

16:47 Yeah, it just came in this morning and I love it. It's from Conrad Hallas, I think. It's called D-A-C-I-T-E, maybe D-C-I-T, D-C-I-T, D-C-I-T, I don't know. But it's cool. It simplifies the creation of data classes from dictionaries. So when I first heard it, I'm thinking, okay, well, I love to, I'm using data classes like all the time now because I really like them. There's a a lot of cool aspects of them.

17:12 You can have default values.

17:14 I really like that I can easily exclude some of the fields.

17:18 You can take them out of the comparison.

17:21 So some objects can be equal, even if they're not completely equal sort of thing.

17:26 And I love that aspect.

17:28 And there's a whole bunch of other cool stuff about them.

17:30 So I'm using it more and more.

17:32 But our data all over us that we get from databases and whatever, it often gets converted to dictionaries and not to data classes.

17:40 So this is a little library that has basically it's one function called fromDict that converts dictionaries to data classes.

17:48 And my first reaction was, I can already do that.

17:51 If you do the star star or the double splat.

17:55 - Dictionary to keyword argument type of thing.

17:58 - Yeah, I mean, you can do that for simple data classes and simple dictionaries, that works just fine.

18:05 But I looked into this more and this fromDict from Desight, It allows you to do nested structures.

18:11 So you can have a data class with another data class field and arrays of lists or tuples of data classes and as some of the types.

18:20 You can do unions in their collections, nested structures.

18:24 It even has this thing called type hooks, which allows you to have a custom converter for certain types of data that come in.

18:33 So his example is like, for all the strings, lowercase them or something like that.

18:38 But you can definitely have that for certain types.

18:41 It's pretty neat.

18:42 Oh, that's cool.

18:42 Or if you've got some kind of string that's a date time, you parse it out of an ISO string or whatever.

18:48 Yeah, that's a good example, actually.

18:50 That's cool.

18:50 So one of the things that messes you up on my example of just taking a dictionary and expanding it as arguments to a data class constructor is that it doesn't really work if all the names don't match up.

19:04 but this one allows you to have, if your data class only has a few fields, but your dictionary has like tons of stuff in there, by default it just ignores the stuff that doesn't match up.

19:15 And so if you've got like a name and an ID, and there's names and IDs coming from the dictionary, but there's also like a whole bunch of other things like a URL and stuff like that, it just ignores that.

19:27 That's the default, but you can also turn on strict mode that says, no, I expect it to match up directly and I want a warning.

19:33 And then there's a whole bunch of exceptions that get raised if something goes wrong in the conversion.

19:37 And I'm just excited to use this 'cause it's a really cool tool to convert data to data classes.

19:44 It's nice.

19:45 - Yeah, this looks super nice.

19:46 It's one of those things that seems to automate like the crummy part of programming, right?

19:51 Like I'm getting this data submitted to me from an API or from somebody calling my API and who knows what they're sending me.

19:59 But here's how, like, long as this thing lines up right, I tell it these fields are not optional, or this type has to be such and such.

20:06 If that works, then we're good.

20:07 Otherwise, tell them 400, that didn't work, or the file couldn't be loaded, or whatever it is.

20:11 And there's definitely-- so Conrad made a point in the documentation to say that it is not a schema validation library.

20:18 That's not the intent of it.

20:20 It is really just intended for the conversion.

20:23 So especially with external APIs, I think combining this with a schema validation is a good idea.

20:31 But you could definitely go from schema validation to this and have data classes in the end.

20:36 It'd be great.

20:36 Yeah, it's a cool project.

20:38 And I love how it leverages the brand new Python stuff, the data classes.

20:42 Anyway, we should plug ourselves as sponsors.

20:44 Yeah.

20:47 Well, we should definitely let people know about what we're doing, right?

20:50 So you've got this book on testing or something?

20:53 I actually kind of love that I had some feedback early on when the book came out.

20:58 Python Testing with pytest is the book that I'm talking about.

21:01 And it did come out in 2017, the end of 2017.

21:04 And I got some really great feedback from people saying they really loved following the book on this podcast.

21:10 And I apologize for the lawnmower in the background, if it goes through.

21:14 I wanted to point out that I had a couple of people ask me, it came out in 2017, is it still valid?

21:21 And I want to take the time to say yes, it is.

21:24 The intent of the book was never to be a thorough, complete inventory of everything you can do with pytest.

21:30 It was a quick, what are the 80% of pytests that you're going to use all the time?

21:35 And that is the core of pytest and how to think about it.

21:39 There is new goodies that have been added since 2017, and it's good to check those out.

21:44 But you could run with what's in this book and still be very productive.

21:47 Nice.

21:48 It's definitely made me more productive and better with pytest.

21:51 So it's great.

21:52 Thank you.

21:53 Yeah, you bet.

21:54 I also want to tell people about the courses that we have over at Talk Python Training.

21:57 We've got a bunch of new ones we've been releasing.

21:59 I do try to let you know when the new ones are out, but we've got like 120 hours of Python content over there on a bunch of projects that you can do.

22:08 The 100 Days of Code courses all have like projects for every single day for 100 days.

22:13 And yeah, so just check them out.

22:15 We're gonna release a couple new courses coming soon and I'll be sure to let you know.

22:19 But yeah, support us by checking out our work, right?

22:22 - Yeah, I wanna tell people one of the things I love about the Talk Python courses is there's a lot of content there and I'm a busy person and sometimes it's overwhelming to me to look at a course to say it's like 12 hours of content on a course or something like that, six hours or something even.

22:38 And however, the way that you've got it set up with bookmarks into separate videos and different topics, it's the outline of the courses are so incredible that if you really need to just jump to the right place to learn something, you can do that.

22:54 And even though you can just watch them in series and just watch the whole thing, you can do that, of course.

22:59 But being able to jump around and go back and use it as a reference is a great thing.

23:05 So thanks.

23:05 - Yeah, thanks.

23:06 Yeah, we definitely work hard on making that a possibility.

23:08 So I appreciate that.

23:10 Now, do you know what the Python clock reads right now?

23:13 - Oh, I haven't checked.

23:14 What does it read?

23:16 - It reads zero, zero, zero, zero, zero, zero.

23:20 It's the Python clock has, clock bell has told for the folks who have to convert.

23:28 This next thing I wanna share with everyone comes from LinkedIn and Barry Warsaw.

23:32 Barry's been part of Python for a very long time, doing a lot of cool stuff there.

23:37 And he was on the team that helped LinkedIn move from legacy Python to modern Python.

23:43 - Okay.

23:44 - Yeah, so it's called How We Retired Python 2 and Improved Developer Happiness.

23:49 So a couple years ago, 2018, LinkedIn started working on this multi-quarter effort to transition to Python 3.

23:58 So maybe some of the lessons from here will help people out there for whom they haven't actually migrated all the way to Python 3.

24:06 That'd be good, right?

24:07 So basically they said, they did a inventory and they found they have 550 code repositories they had to migrate.

24:16 That's a lot of different projects.

24:18 And some of them depend on the others.

24:21 So they said, look, Python is not the thing powering our main web app.

24:27 I think it's Java.

24:28 I'm not a hundred percent sure.

24:29 But anyway, it's, it's not their main thing.

24:31 And so there's a bunch of like independent microservices and tools and data science projects that are all using this.

24:38 So their first pass at getting all those different things migrated was to say, we're going to have a bilingual philosophy for Python, meaning it'll run on two and three at the same time.

24:53 Okay.

24:54 And then once you get it there, the main problem that you could run into is I depend on a library, like this is standard legacy Python.

25:01 I depend on a library that requires Python 2.

25:05 Therefore, everything that I use, that I build that depends on that library must also be Python 2, right?

25:11 Yeah.

25:12 bilingual thing that they did. This was to prevent that blockade. So anyone who wants to build new stuff on Python 3 could still use the libraries and do so. That was the plan. They actually had a whole team that oversaw this effort across projects, across thousands of engineers called the Horizontal Initiatives Program. So that was to across all these different projects address that.

25:38 And then in phase one, first quarter 2019, they went and they found the most important repositories, the ones that were, if you put them into a dependency graph at the bottom, and they said, "We're going to port those to Python 3 first," because they're blocking everything else.

25:55 And then they kind of finished it off in the second half of 2019.

25:59 So they basically said, "All right, now we got the foundation done.

26:02 We can start upgrading the libraries that depend on all these lower-level bits." And then, you know, they said, looking back, you'll like this part, Brian.

26:09 They said our primary indicator for knowing that the migration was done, that we were all right, was that our builds passed and our tests ran and everything was okay.

26:19 And then eventually they went through and said, all right, we're going to turn off the ability to run Python 2 type of tests in continuous integration.

26:27 Now let's see what keeps working.

26:28 Oh, yeah.

26:29 Okay.

26:30 Yeah.

26:31 So one of the things you can imagine is important is having tests, right?

26:33 Because if you don't have tests, CI/CD doesn't tell you a lot.

26:36 It just does the CD part.

26:39 Better for better or worse.

26:41 Yeah, so they said, look, here's some guidelines for people, other organizations who are on similar paths, but earlier, I said plan early and engage your organization's Python experts.

26:53 Find and leverage champions in the affected teams and help them promote the benefits of Python 3 to everyone.

27:00 Stop this bilingual approach so people can at least begin if they want to go to Python 3.

27:07 Invest in test and test coverage, co-coverage, because these will be your best metrics of success.

27:14 And then finally, ensure your data models explicitly deal with this, what used to be one thing, bytes and strings in Python 2 and now is of course two totally separate things.

27:25 They said that was really the biggest challenge that they ran into is that making that distinction correctly. Yeah, those are a hurdle. Are you guys all upgraded? Yeah, it was a library that we were using that didn't support Python 3 yet. The reasoning was the library talks to a DLL that has, you know, C++ strings or C strings and old Python strings converted just fine but they don't now. Unicode fancy ones, yeah?

27:52 Not so easy. Yeah. Cool, so to wrap this up, they said the benefits they have from in this whole process is they no longer have to worry about supporting Python 2 and they've seen their support loads decrease, and decrease in a good way, not you don't have to support the old crummy stuff.

28:08 You can depend on the latest open source libraries.

28:11 A lot of libraries these days only work with Python 3.

28:14 And they opportunistically and enthusiastically adopted TypeHinting and mypy to improve overall quality, which is pretty cool.

28:23 - Yeah, that is good.

28:25 - Yeah, I'm looking forward to this next one you got.

28:27 - This actually ties nicely you brought up the Django speedups and I probably should have talked to this about this right afterwards but anyway here we go. There was an article that I'm not saying I agree or disagree because I don't know enough about it but the article was called the troublesome active record pattern and I guess in you know like Ruby and stuff we talk about that they talk about active record more, I think. But in Python world, it's the object relational mappers, ORMs, like the Django ORM or SQLAlchemy is also an ORM. And those are essentially the same as active record. That's, I think, that's the same pattern, right? Well, certainly the Django ORM follows that pattern. SQLAlchemy, it has a lot of similarities, but its design pattern is technically called a a unit of work.

29:15 Okay.

29:16 The main variation is like on Django or things like that is you go to the object and you call save.

29:23 Whereas so that happens on the individual objects.

29:26 Whereas in SQLAlchemy, you make a bunch of changes and then there's this unit of work thing and you call save and it submits all the changes in one giant batch.

29:36 But here's the interesting thing is like this whole article is like the troublesome active record pattern.

29:42 My reading of it really was the troublesome ORM pattern.

29:47 And so for the most part, it's kind of a immaterial distinction, although technically design pattern wise, they're not exactly the same.

29:55 - Okay, okay, well, yeah.

29:57 So the idea being like you just brought it up that the object, when you're referencing a bunch of objects and you have object save and things like that, there's a whole bunch of issues with that.

30:08 One of the issues is if you want to query things about the data, not necessarily all the data, but things like if you've got a bunch of books, for example, and you just want to count the number of books, well, you might have to just retrieve them all.

30:23 Or if you want to count all of the software testing books written by Oregon authors, you'd have to just ask me or you'd have to grab like all of them and grab all the data and then search on do in Python, look for stuff.

30:38 in a for loop or something.

30:40 The other problem was around transactions, because if I have a book item and then change something about it, and then save it back in, there's nothing stopping some other process.

30:52 You know, the read modify write doesn't work that well if you've got multiple readers and writers.

30:58 And I was looking this up, SQLAlchemy has sessions, or you said there's a unit of work thing.

31:04 Don't know if those are atomic.

31:05 - Yeah, yeah, they're the same, yeah.

31:06 Okay, Django has an atomic setting, but I don't know if that's by default or if it always, or if you have to specifically say work with transactions.

31:16 I did notice in some of the Django documentation that does say that transactions slow things down.

31:21 So you don't want to do transactions if you're just reading, for instance.

31:25 But, and then the author of the article, Cal Peterson, mentions that REST APIs often have the same problems and some microservice architectures have a similar sort of issue.

31:37 It's just around REST APIs instead of the object model.

31:41 You're reading tons of data when you don't need to.

31:44 He brought up some solutions, at least for you can just directly use SQL or use some properties that do queries that are more like SQL.

31:55 Doing transactions helps too.

31:57 But basically he was recommending avoiding the active record style access patterns around the REST APIs he brought up that a GraphQL and RPC style APIs are some solutions to the same problem in REST APIs.

32:11 As somebody that's moving towards learning more about web development and working with ORMs, I really did want to bring this up and find out what you thought of all of this.

32:20 - Sure, it's interesting.

32:22 There are a lot of good valid points that Cal's making here.

32:26 I feel like the focus should almost be, instead of the troublesome active record pattern is you're using your ORM wrong, learn how to use it right.

32:35 So let me give you some examples.

32:37 So the one of the challenge here that we see is if you're going to create a record and you wanna get it back, you have to get it back by the primary key.

32:46 Maybe if you're doing exactly on just the straight ORM record pattern, but you can just do a query and do like a give me the first or one item or something like that.

32:55 There's a part where he's looping over stuff saying, here we're looping back to just get the ISBN off these things, right?

33:00 You're pulling all the properties, Like you're doing basically a select star from table, just ultimately and a serialization of that result, just to get like the ISBN.

33:10 Well, in SQLAlchemy, I don't know Django or ML enough, but SQLAlchemy, you can say only return these columns.

33:18 I want just the ID and the title or the, I just want the ID and the ISBN, don't return the other results, right?

33:24 So that's an option.

33:26 And plus one thing we already discussed, right?

33:28 You just use the sub query or the filter select or whatever it is for Django.

33:32 And you can avoid those, right? So like, as you kind of go through these, you're like, okay, well, most of the time, these problems are actually solved with some aspect of like a proper ORM. Now, the transaction one is it really, I think, super interesting, because it sort of often gets to the heart of this debate about ORMs. And you're saying, well, okay, here's this active record thing where it's not really leveraging transactions. We know transactions are good. And so this, this is bad, because it doesn't do anything.

34:02 But in practice, it's not so clean as that.

34:05 So for example, suppose I'm working on a web app and I have a grid, like a grid that was maybe could be loaded off of a rest endpoint, bring that into it, right?

34:14 And I've got this grid and I can type in it.

34:16 And there's a button that calls, says save.

34:18 There's no way that it makes sense to do a transaction around that, right?

34:23 I'm not going to transactionally begin loading the grid and wait for me to press save, right?

34:27 That's going to lock up the database for every user.

34:30 Yeah.

34:30 Any scenario like that, like rest endpoints, right?

34:33 If I've got a phone and I've got my mobile app and it hits the rest endpoint, pulls it down the data and I hit a type on it and I hit save, you can't do that transactionally.

34:41 Like it just, you would lock up the site like right away, right?

34:45 So it doesn't make any sense.

34:47 So there's just other patterns like optimistic concurrency is a super common pattern in ORMs that would work with active record or SQLAlchemy's unit work beautifully.

34:56 And the idea is I'm going to make some kind of version in that record.

35:01 And when I pull it back, it's going to come with the version that I got.

35:04 And when you hit save, you say, update this record where the version is the version I have.

35:10 So if someone else has updated it, it increments that version.

35:13 And it says, no, no, there's no record.

35:14 You can't update this.

35:15 Right.

35:17 So you, you basically say, ah, it looks like someone changed this behind you, like your grid and their grid, they hit save before you, so you got to deal with like syncing this back up.

35:25 Right.

35:25 So there's a lot of times where it's, it would feel great to like have a transaction, but that transaction actually can't be used anyway.

35:32 And ORMs have like nice built-in ways where you can easily slot in like optimistic concurrency and stuff.

35:38 So that's my thought.

35:39 I think this is an interesting article.

35:41 It's definitely interesting to think about all the points brought up, but I often think that the tools have like clever, non-obvious ways to solve most of these problems.

35:49 - Yeah, and I guess to be a little bit on Cal's side here, that the tools have clever, non-obvious ways to deal with them, maybe that's an issue.

36:00 That all of our beginning tutorials on how to use Django or how to use SQLAlchemy or how to use other ORMs are just ignoring that stuff because it's more advanced.

36:10 But people often just read the beginning tutorial and then go do a startup or something.

36:16 - Yeah, sure, and then you end up with your page loading like in six seconds and you don't know why.

36:21 - Yeah.

36:22 - Which is not great.

36:22 Maybe we could teach people the right way to do it from the beginning.

36:25 I do wish that some of these patterns were more built in.

36:29 Like, I wish optimistic concurrency was there by default in the ORMs.

36:32 And you've kind of got to like roll that yourself and whatnot.

36:36 So anyway, it's a really interesting article to think about.

36:38 And I think it dovetails nicely with my sort of performance one as well.

36:41 Because it's, they're kind of two sides of the same coin a bit there.

36:45 Yeah. Okay.

36:46 All right. Well, I have the second side to your coin that is the Dacity.

36:50 -Dacity? Whatever that one was called. -Yeah.

36:53 So this is a cool thing by Steve Brazier called "Types at the Edge of Python".

36:59 The edges of Python.

37:01 And so Steve apparently creates a bunch of APIs.

37:04 And I think, yeah, he was using FastAPI at the time when he was talking about all these ideas.

37:10 But it's kind of generally valid for all of them.

37:13 Because look, when I start with a new, when I create a new API, these days, I start with three things. I start with Pydantic, mypy, and some kind of error tracking like rollbar or sentry or something like that.

37:23 Okay.

37:24 That's pretty interesting, right? So Pydantic is a data translation and validation library, much like Dacity.

37:31 Right?

37:33 Yeah.

37:33 They're not the same, but they kind of play in the same realm. They transform JSON with validation and type checking over there. And then there's mypy, which looks like you can use Pydantic to help specify some of the types on your classes, and then use mypy to verify that you're not missing some kind of check.

37:53 So he says, look, the most common error you're going to run into as a Python developer in general is attribute error, none type object has no attribute x, where x is whatever you're trying to do, right?

38:04 Yeah.

38:04 I mean, that just means you got none instead of a value, and you're trying to continue to work with that class in some way.

38:11 It's a void dereference in C.

38:13 Yes, exactly.

38:14 So wouldn't it be nice if it said none is not an allowed value for this or you have none and you can no longer operate on it or something like that.

38:23 So Pydantic will actually give you those types of errors. It'll convert things like attribute errors and mismatch type errors to explain what was wrong. Right. So that's pretty awesome.

38:34 And so you can use Pydantic to actually specify what your understanding of the interface like if you're calling an API, the stuff that you expect to get back.

38:42 I think this is going to be a date.

38:43 I think this is an optional string and whatnot.

38:46 It says, then when you launch your code into production, your assumptions are tested against reality.

38:51 That's pretty cool.

38:52 And it says, if you're lucky, they turn out to be correct.

38:55 But if not, you're going to run into some of these none type errors and PyDantic can help with that.

39:00 But then you can also, once you put in the typing into your code, then mypy will go on helping.

39:06 So for example, if you're taking an argument that says, first, you think it's a string.

39:11 So you say colon str refers type, then you go work with it.

39:14 And that means it cannot be none, right?

39:16 Like none ability is explicitly set in the type thing in Python and the type space.

39:22 So if you find out that it could be none, then you're going to go and say, this is a typing dot optional of string, right?

39:29 Like that's what it's got to be.

39:30 If it could be none or a string, you'd find that out and then specify that in Pydanic.

39:34 And then if you run mypy against it and you start working with an optional string, you don't check for it to be none first.

39:41 My pile actually give you an error saying that you're not checking for none basically.

39:46 So it'll even tell you like the missed if statements or other conditional code to like verify that like no it's not the optional none it's actually the value.

39:54 Okay. That's pretty cool right? And if you want to strip me up before. Yeah for sure.

39:58 I mean normally it's just it's just not present and it's not because I thought as a dynamic language like C++ would have the same problem right if you take a pointer and you just start to work with it and see, C++, the compiler's not going to say, "You didn't check that for, you know, equal to null first." It just doesn't do that, right?

40:17 So this is a really awesome addition for safety in your code.

40:21 So he was talking about how FastAPI automatically integrates with Pydanic out of the box, which is pretty cool.

40:26 And then also, at the end, he has a kata, a mini kata, that works you through these ideas.

40:33 So a kata is like a practice to play with these typing ideas.

40:36 - Yeah, and a nice picture of how these all fit in.

40:40 - Yeah, yeah, yeah, there's some cool diagrams.

40:42 So anyway, if you're building APIs and you're taking data, especially from sources where they might give you junk when you expected something valuable, or you're not really sure, you're like, "The docs say this, but I remember getting something different some other time," this is a really cool way to formalize that and then have your code automatically check it.

40:58 - Yeah, this is cool. I like it.

40:59 - Yeah, awesome.

41:00 - That's all of our six items.

41:02 Do you have any extra little things to share?

41:05 Well, I kind of went overboard on the extras this week, but I'll keep them all quick because there's a bunch of cool stuff out there that people send in.

41:12 First, Jack McHugh did a really cool thing.

41:15 So Jack McHugh created a blog post or a page on a site called Python Bytes Awesome Package List.

41:23 Have you seen this?

41:24 Yeah.

41:25 And he like listened to 171 episodes in 174 days or something like that of Python bytes.

41:34 I mean, this is awesome because as I flip through this, there's a couple of things I've forgotten.

41:38 I'm like, oh, that's cool.

41:39 Oh, we must have talked about that, but I don't even remember.

41:42 It's got beautiful pictures.

41:43 It's, I mean, it's kind of an awesome list, but it's for a podcast, so that is super cool, Jack.

41:49 Thank you, thank you.

41:50 I'll be sure to link to it at the end.

41:52 And I hope you keep adding to it.

41:54 That would be great, but no pressure.

41:57 I wanna talk about VB.net for a second.

41:59 That's kind of weird, right?

42:00 - Why?

42:01 - Yeah, because I kind of appreciated VB back in the early days when it was like a drag and drop VB6 and whatnot.

42:09 And then Microsoft came out with a thing called Visual Basic.net and it was complete crap.

42:13 Didn't like it, but here's what's interesting, is like they have just announced that they are no longer maintaining, they'll keep that thing running, but they will no longer work on it.

42:23 And I just thought it was interesting.

42:24 Like here's a fairly major language, not super top five or something, but it's kind of a major language that's declared dead.

42:32 And I just thought it was kind of interesting to point out, man, languages, they can go dead.

42:38 It's weird.

42:39 - Yeah, I think this one should have been shot a long time ago, but you know.

42:44 - It's also worth thinking about this, I agree, by the way, it should have never existed, but anyway, that's a different story.

42:50 It's also an interesting take on, here's a language controlled by a single company, and they can just decide they don't like it anymore.

42:57 Right? - Yeah.

42:58 It really happened to Python because there's not a single person or organization that goes, "Ah, we're done." Yeah.

43:04 Well, that's actually one of the fears I have for, I mean, even Java.

43:09 Java is not controlled by one company, but it kind of is sort of.

43:12 Yeah.

43:13 Yeah.

43:14 Well, and there's also that Supreme Court case or the legal case of like, are you allowed to copy the Java API?

43:22 I don't think that's resolved yet.

43:23 I can't remember.

43:24 It's still working its way through the courts.

43:25 I want to reiterate, people that actually have a job in Visual Basic or love it, I'm not dissing you.

43:31 I just had a personally bad experience with Visual Basic and didn't enjoy it.

43:36 I had a good experience with Visual Basic 5, but that was in like 1993 or something.

43:42 Okay, so also we talked about COVID-19, all the crazy stuff going on.

43:48 As tragic as much of it is, there's some really interesting data science that can be done and some dashboards that can be built and whatnot.

43:55 So someone on Twitter, let me pull up their name, just pointed to a whole bunch of COVID-19 datasets.

44:02 Beekeep, I'm gonna call that Beekeep.

44:04 I'll put that on Twitter, so check that out.

44:07 Like the Johns Hopkins CSS-E dataset and some other dashboards and some things on Kaggle.

44:13 So if you're in data science, you wanna explore it, here's some datasets that are probably interesting.

44:17 Then finally, work on a new course, adding a CMS to your data-driven web app.

44:21 That'll be a lot of fun.

44:22 I'll talk more about that later.

44:23 But I'm just super excited to be creating more courses as we kind of talked about earlier.

44:28 - Yeah, one of the things we talked about is people working from home and getting around technical problems with that.

44:34 That happened to me just this morning.

44:36 So this morning I tried to hook up, I realized that I had an external keyboard that's working fine-ish.

44:43 I wanted to use like a real mouse, so I plugged in an external mouse with a little click wheel thing on it and realized that on Apple, the click wheel behavior just goes the wrong direction for scrolling, and it confused me.

45:00 And you can reverse it, but I didn't want my trackpad to be reversed.

45:04 My trackpad's fine.

45:05 So they're tied together for some reason, weird.

45:08 So Dave Forjak, sorry Dave, he suggested I use something called a scroll reverser, that is a little tiny app that allows you to untie those and have trackpad scrolling and mouse scrolling be different.

45:26 And thank you, Dave.

45:27 - That's awesome.

45:28 That's super cool.

45:29 I guess my work from home thing that I've been playing with is with Zoom, you can have virtual backgrounds.

45:36 You don't even have to have a green screen.

45:38 You can have like alternate backgrounds just by uploading an image and it'll put you in, you know, an office space instead of a messy bedroom or whatever it is.

45:47 - Oh, nice.

45:48 Yeah, so you can block out the kids behind you and stuff like that.

45:51 - Yeah, exactly.

45:52 You don't have to see the kids being crazy home from school and whatnot.

45:55 Anyway, yeah, a lot of stuff we're learning around those types of things.

45:58 And I think the joke that I chose for us this week is gonna be perfect for the opening of community, as documentation as building community that you brought up.

46:08 - Okay, cool.

46:09 - This is before that person gets inspired from listening to you and actually makes things better.

46:14 All right, so let me set the stage here.

46:16 There's three people.

46:19 Two of them clearly more senior and a very excited new person sitting in a laptop, like beaming with enthusiasm, ready to get going on the whole project.

46:29 And one of the senior person says to the other, "And this is Jim, our new developer." The other one says, "Great, does he already know something about our system?" The new person turns around, "I read the whole documentation." Blank looks between the senior people, "No." (laughing)

46:48 - Yeah, yeah.

46:50 - It's good, right?

46:51 - Yeah, definitely.

46:52 I started a job once in my career where I had read the documentation 'cause it was an internal job transfer.

46:57 I read the documentation before getting there and the people there that didn't know they had documentation.

47:03 So it was so out of date, nobody currently there knew it.

47:08 - It may be a little out of date if they don't even know it exists.

47:11 (laughing)

47:13 - Yeah.

47:14 - All right, well, awesome.

47:15 - Cool, well thanks a lot.

47:17 - You bet, great to be here with you as always.

47:19 See you later.

47:20 - Bye.

47:21 - Thank you for listening to Python Bytes.

47:22 Follow the show on Twitter @pythonbytes.

47:24 That's Python Bytes as in B-Y-T-E-S.

47:27 And get the full show notes at pythonbytes.fm.

47:30 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

47:35 We're always on the lookout for sharing something cool.

47:37 This is Brian Okken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page