Transcript #174: Happy developers use Python 3
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver news and headlines directly to your earbuds.
00:05 This is episode 174, recorded March 18th, 2020. I'm Brian Okken.
00:11 And I'm Michael Kennedy.
00:12 And this week, this episode is brought to you by Talk Python Courses and the pytestBook.
00:18 Yeah, yeah, it's brought to you by us.
00:19 Yeah, us.
00:21 More about that later, huh?
00:21 Yeah. So we're doing something a little different. We're recording in two different locations
00:26 because of... Actually, we always record in two different locations, but...
00:31 But the locations are sometimes not the location, especially the location you're in.
00:35 Yeah, I often record somewhere else. But today I'm at home because a lot of people are at home
00:41 working remotely in home offices now because of... I don't even know how to pronounce it.
00:47 I read it. COVID-19.
00:49 Yeah. It is an insane time on so many levels. But I would say certainly...
00:56 There's a lot of tech people out there who may be working from home for the first time.
01:00 You know, I know there's a lot of large companies that feel like you need to go to be in the office
01:06 and you need to do the work. And yet a lot of the tools that we use as developers are very suited
01:13 to the situation that many of us around the world find ourselves in.
01:16 Working from home, working asynchronously and whatnot, right?
01:21 GitHub, Slack, email, Zoom, whatever it is.
01:25 It's interesting to see the rest of the world scale up to kind of, you know, what we've been doing for a long time.
01:30 We were lucky that our office was recently moved in July.
01:35 And during the move, we tried to set everybody up to be able to remote work
01:40 because some people had longer commutes than before.
01:45 And it happened to be...
01:46 I mean, it's just fortunate that we set that up before this happened.
01:49 And I'm also very fortunate that I'm a software worker.
01:52 There's a lot of people that...
01:53 I mean, our work can continue for the most part with little interruption.
01:58 But it's a harder environment.
02:01 But a lot of people that are not technical workers can't do that.
02:05 Yeah.
02:05 It's such a bummer.
02:06 You know, my daughter, she just got a new job.
02:09 And she was supposed to start...
02:11 Actually, she was supposed to start today.
02:12 And they sent her a message, you know what, we're closed.
02:15 We're closed indefinitely.
02:16 And there's no reason for you to come and get trained to work here because who knows what it's going to look like in a month or two.
02:24 I mean, that's the reality for a lot of people.
02:26 It's rough.
02:26 One of the reasons why we started talking about this morning is just to say,
02:30 you know, to reach out to everybody and say, yeah, I hope everybody's doing okay.
02:34 And yeah, let us know some stories if you want to share.
02:39 Yeah, maybe some interesting tech angles, right?
02:41 Like problems you run into or things you found that really worked or whatever.
02:46 But yeah, everyone out there, be safe.
02:47 It's not always fun, but just find a place to hole up and just wait this thing out and be safe.
02:54 Yeah, that's a good idea for some extra things like related to that.
02:57 I'll add one of these.
02:58 On our add-ons at the end, I'll add one to that.
03:02 All right.
03:03 Super.
03:03 Well, I want to start out by talking about community.
03:06 I was partly thinking about this because of the coronavirus stuff.
03:10 And a lot of people possibly have maybe two extra hours in the day because they're not commuting.
03:16 Maybe.
03:17 Or, you know, I'm sorry if you have a two-hour commute or an hour commute on each end.
03:21 But you might have some extra time.
03:23 So one of the things you might want to share and spend some time doing is beefing up documentation on open source projects.
03:29 This actually was a great article called Documentation as a Way to Build Community by Melissa Mendoza.
03:37 I think Mendoza.
03:39 Sorry, Melissa.
03:40 But it talks about how educational materials can have a huge impact and effectively bring people into a community.
03:47 And beefing up the documentation story on open source projects can actually help bring more people to use it and help.
03:55 I mean, it seems obvious, but it isn't really.
03:58 And people aren't doing it.
04:00 There's a lot of projects that lack in really good documentation.
04:03 And there's a lot of reasons for that.
04:05 And talking about the reasons I think is interesting.
04:08 Decentralized development and a lot of projects start with just somebody scratching their own itch.
04:14 And they don't need documentation for that.
04:17 But it grows into other people getting involved.
04:20 And a lot of people, it's more glamorous to add new features or fix a nasty bug and adding more documentation.
04:28 Nobody really kind of knows how to do that.
04:30 I think it's important and spending some more focus.
04:33 One of the directions of this article says it was targeting a specific project.
04:38 But I think it really can be really more than just this one.
04:41 is splitting up the documentation into organizing it in four different areas.
04:47 Tutorials, how-tos, reference guides, and explanations.
04:52 These four areas and subsections of those can be targeted towards different people, targeted towards beginners or advanced people or somebody just looking something up.
05:01 One of the great things about that is it makes it easier for somebody to jump in and say,
05:05 Oh, there's like one little piece of things, how to do something.
05:10 I can contribute to that.
05:11 I might not know why it works, but I can contribute to how to and some tutorials.
05:16 Whereas maybe some of the more expert people in the project can do some of the explanations of how things are working.
05:21 And also a lot of teams kind of shift or some projects have the new people come in and say,
05:28 Hey, you want to help out?
05:29 Want to new write documentation?
05:30 And I think that's a great thing.
05:32 But then you've got documentation that's just filled with the beginner people that content from beginners that might not be,
05:38 you know, from some of the experienced people.
05:40 And so I think there's some good information here.
05:43 And I think focusing on documentation might be a good thing.
05:46 I like the article.
05:47 I like the idea of it, right?
05:49 That you can build a community.
05:51 Certainly you can contribute to these projects quite easily in this way.
05:55 Breaking it up into these categories is really clever because then you can definitely just sit down and think,
06:02 wow, I'm going to write some docs for this thing.
06:03 Well, that's pretty wide open, right?
06:05 But I'm going to write a short tutorial, which I had to learn because I had to use this thing.
06:09 And now I know how to do that.
06:10 Why don't I generalize it and make a tutorial?
06:12 That seems like a real easy way to get yourself on the contributor list.
06:15 Beef up your resume.
06:18 Say I contributed to this project, etc.
06:19 I think it's good.
06:20 One of the things I'd like to reach out to people, some of the beginner stuff, a great thing to do is while you're learning a project, isn't writing new content.
06:29 But while you're reading documentation on a project, if there's typos, if there's just grammar errors, it may have been written by somebody that isn't native English.
06:38 So you can help out by just fixing some of those things.
06:41 And then also while you're going through things, if you stumble on something and it's difficult to follow the instructions, it might be that the instructions need to be modified.
06:50 And why not just do like a pull request of modifying those instructions to be the way it really works?
06:56 And I think that'd be cool.
06:58 Yeah, that'd be great.
06:59 Another area that might be interesting is to write tests.
07:02 Yeah, definitely.
07:03 A lot of projects lack tests or they're just marginally tested.
07:07 And you're like, well, okay, I'm going to create this tutorial and I want to make sure the things I'm saying work.
07:12 So let me add some tests to verify what I believe to be true and go ahead and commit that back to the project.
07:18 Yeah.
07:18 And modifying tests.
07:20 If the tests are not readable, they should be.
07:22 And maybe you can make them more readable.
07:23 Yeah, I guess I kind of started thinking about that because documentation and tests feel a little bit like a form of documentation.
07:30 Yeah, definitely.
07:31 Yeah.
07:32 Well, cool.
07:33 Well, I'm pretty passionate about fast websites.
07:36 As you probably know, I talk about trying to make websites fast all the time.
07:40 Our website's pretty fast.
07:42 Speed is important.
07:43 Slow websites, push people away.
07:46 They do.
07:46 I think it was Amazon or somebody did a study saying like every, you know, 100 milliseconds latency of perceived latency to the user.
07:56 And, you know, it has a very tangible, like whole number percentage drop in actual sales.
08:02 Yikes.
08:02 Yeah.
08:03 Sales are not the most important thing necessarily.
08:04 Maybe if you're Amazon, they are.
08:06 But it just gives you a sense of like, well, 100 milliseconds.
08:09 You can barely perceive that as a person.
08:11 And yet as those things add up, right, it starts to really make a difference in behavior.
08:15 Yeah.
08:16 So I want to talk about this article, sort of riff on some topics covered in the article, more or less, called the Django Speed Handbook, Making a Django App Faster by Chabelle Mansour.
08:27 Now, the title has Django and some of the examples are really about Django, but this actually applies to most websites and Python websites and whatnot.
08:37 So if you do flask, I think this will still be super or super relevant.
08:42 The first thing, though, that I want to point out is actually a Django thing.
08:46 And it does appear at least in Pyramid as well.
08:48 So there's this, in Django, there's a thing called the Django Debug Toolbar.
08:53 And it lets you explore the different requests, see how long they're taking.
08:57 You can even get in there and look at the ORM calls and what's happening.
09:02 So that's pretty awesome.
09:03 Like Pyramid has this as well.
09:05 You can actually see the SQLAlchemy calls going to the database and the timing and how many database queries there even are on a given page.
09:13 It's pretty ridiculous to be able to use that to analyze what yours.
09:17 It's almost like you've attached a little debugger profiler all the time and it's just right there.
09:22 That's cool.
09:22 Do you have to turn it off then?
09:24 Well, when you go into production, you don't include it in the setting, like the run settings for production, obviously, right?
09:30 Got it.
09:30 That would be bad.
09:31 But some of those settings, even in the debug mode, you have to turn them on.
09:35 I'm not sure about the Django one, but the Pyramid one, you definitely, like the profiler is not on by default because that'll slow it down a little bit.
09:42 But you can click a box and then go do the request again.
09:44 Okay.
09:44 All right.
09:45 So that's a real quick and easy way just to see what your app is up to.
09:48 Then one of the things you really want to pay attention to, and this is going to be a bit of a theme on today's show, is talking to databases.
09:55 So when you're working with an ORM or just talking to the database, specifically here, the Django ORM, but this is super relevant for like SQLAlchemy as well, is you want to be really careful of the so-called N plus one problem, which happens when you navigate relationships.
10:14 So for example, if I have, let's say a category, I'm going to show a category of books and the category has a books relationship, or maybe there's some other thing like that.
10:27 I get all the categories back and I want to tell you how many books are in each one or something.
10:30 Like as you go through the things that come back, you end up doing one query for each property that you access on each instance of that object.
10:39 So if you do a query that returns 20 things, you might end up talking to the database 21 times.
10:44 It's a common problem in ORMs, but it also has an easy fix, which is why that debug toolbar is cool, because you could turn it up and say, well, turn it on and say, oh, look, why are there 24 queries on this page?
10:55 Right?
10:55 I feel like I did one, like, well, sort of.
10:57 So you can use select rated related and prefetch related, and it'll basically join or pre-query those related objects together in one massive query.
11:10 So you don't actually go back to the database N plus one times.
11:12 Okay, nice.
11:13 Yeah, and that's a big deal.
11:14 And, you know, SQLAlchemy has a joined load and subquery that you can basically accomplish the same thing.
11:19 So he's got a cool example of not a huge database, but doing, making, using these two properties in the Django ORM going 24 times faster.
11:28 Oh, wow.
11:29 Yeah.
11:29 Right?
11:29 I mean, it's basically not changing the code at all, except saying, you know, I'm going to use this related property.
11:35 So just query that as part of the query instead of, like, doing, you know, however many queries you're going back for.
11:40 Really, really nice.
11:41 Related to that is indexes.
11:43 So if you're not thinking about and using indexes, you should be.
11:48 I mean, that's, like, easily a thousand times faster to do a query against a lot of data with an index versus without.
11:54 And then if you've got these joins, it's even better, you know?
11:57 So super important, but do be aware that indexes make writes slower.
12:03 So if you have not, most websites don't write data like crazy, although some APIs do.
12:10 So it's usually not as big of a problem, but just be aware that writes are slow or with indexes, but queries are much, much faster.
12:17 Another thing they talk about, which is really helpful, is using pagination, pagination, where instead of saying, here's a thousand items, here's 50, and you can ask for the next 50 and the next 50 and so on.
12:28 That's super easy to do with Django ORM or SQLAlchemy or anything like that.
12:32 So that's a really good one.
12:34 So does that often line up with, like, if you're showing, like, if your page only shows 50 things, only fetch 50 things then?
12:40 Yeah, yeah, exactly.
12:41 And it's super easy to put in the query string, like, page equals five, right?
12:46 Okay.
12:46 And then you just do a skip and a limit or whatever the ORM using has, like, for the skip and take type of thing, right?
12:55 So it's super easy.
12:56 You can compute it yourself, but it makes a big difference, right?
12:59 Also, if you have long-running tasks, long-running things to do, make them either background tasks and, like, extra other processes or celery or something, or just use...
13:11 If the person making the call has to wait on it, be sure to use async, right?
13:16 So you're not blocking up everything.
13:18 Yeah.
13:18 Another super easy way to make things fast, and many of these things we're doing at pythonbytes.fm and the other websites, is to turn on gzip.
13:27 So you can just go to, like, Nginx or whatever your web server is and say gzip the response.
13:34 He's got a really simple example here where the response size of the page and the CSS and whatnot is nine times smaller by just adding the gzip middleware to Django.
13:44 Oh, wow.
13:45 I wouldn't actually add it to Django if this was me.
13:48 I would add it to Nginx because that's the outer shell web server.
13:52 Just let it do it, and you don't have to...
13:55 You're probably not talking directly to the server-running Django.
13:58 But anyway, somewhere along the way, gzip your content because that'll be big.
14:02 Similarly, minify your static files and bundle them and cache them and all of those good things, right?
14:10 Okay.
14:10 There's some cool libraries that he talked about in there.
14:14 I think it was called Whitespace.
14:15 I'm pretty sure it's called Whitespace that they're using in Django to minify and bundle the files.
14:22 So we don't use Whitespace.
14:23 We don't use Django.
14:24 We use WebAssets and CSSmin and JSmin, which are three awesome Python libraries to bundle that.
14:31 So if you go and look at Python Bytes or Talk Python or any of those sites, you can see that there's a packed CSS and a packed JavaScript that has probably 20 CSS files that's smushed into one with those things and minified and whatnot.
14:45 Okay.
14:46 So that's pretty cool.
14:47 There's two ways to measure page performance.
14:49 One is like how fast is the server responding, right?
14:52 But that's not the most important thing to the user.
14:56 The most important thing is how does it feel to them.
14:57 So Google has this thing called PageSpeed, which they're even using for measuring like your SEO ranking.
15:04 So put your website into there.
15:05 I have a link for Talk Python trainings ranking.
15:09 I spent three days straight getting it from like 40 out of 100 to 99 or 100 out of 100.
15:16 But it was quite the journey.
15:19 So that took a while.
15:20 You can both measure it for mobile and desktop.
15:23 And it has slightly different rankings.
15:25 Also, shrink your images with ImageOptim, which works for macOS and Linux.
15:32 It doesn't work on Windows, but there's some really great options there.
15:35 And it'll basically do completely lossless compression of your images.
15:39 So they might be like 40 or 50% smaller.
15:41 And visually, you could not, you literally couldn't distinguish them.
15:44 Interesting.
15:45 Yeah.
15:45 Yeah.
15:46 And then last recommendation is lazy load your images.
15:50 This is not something I've really explored, but apparently Google Chrome images now support a lazy attribute.
15:58 Oh, nice.
15:58 Yeah.
15:59 And then for things that don't support it, there's a lazy load JavaScript library.
16:02 Basically, your images.
16:03 You say, here's, as it scrolls into view, it'll download them.
16:07 But if it's off the page and you never scroll, then it'll never load it.
16:10 That's great.
16:11 Yeah.
16:11 Pretty clever.
16:11 So this is just some of the things covered in that article.
16:14 So if you're out there and you're like, I need to get my site to go faster, it cannot be three seconds per page load.
16:20 That's ridiculous.
16:21 Like, start looking through some of these things.
16:23 It'll really help, especially if you're using Django.
16:25 But even if you're using some other Python framework, I think it'll still be quite relevant.
16:29 Yeah.
16:30 Most of these are relevant to any web stuff.
16:32 Yeah.
16:32 Yeah.
16:32 They're super, super general.
16:34 Like some of the libraries, they talk about plug into Django.
16:36 So it's kind of a little extra boost if you're doing Django.
16:38 But yeah, this is relevant to everyone.
16:39 Yeah.
16:40 All right.
16:40 What do you got next?
16:41 Well, this actually came into as a listener suggestion from the author of the library.
16:45 So this is like JIT podcasting, right?
16:47 Yeah.
16:48 It just came in this morning and I love it.
16:50 It's from Conrad Hollis, I think.
16:52 It's called D-A-C-I-T-E.
16:56 Maybe DeCite?
16:56 DeCite?
16:57 DeCite?
16:58 I don't know.
16:59 But it's cool.
17:00 It simplifies the creation of data classes from dictionaries.
17:03 So when I first heard it, I'm thinking, okay, well, I love DIC.
17:07 I'm using data classes like all the time now because I really like them.
17:10 There's a lot of cool aspects of them.
17:12 You can have default values.
17:13 I really like that I can easily have exclude some of the fields.
17:18 You can take them out of the comparison.
17:20 So some objects can be equal even if they're not completely equal sort of thing.
17:25 And I love that aspect.
17:28 And there's a whole bunch of other cool stuff about them.
17:30 So I'm using them more and more.
17:31 But our data all over us that we get from databases and whatever, it often gets converted to dictionaries and not to data classes.
17:40 So this is a little library that has basically it's one function called from dict
17:45 that converts dictionaries to data classes.
17:48 And my first reaction was I can already do that.
17:51 If you do the star star or the double splat.
17:55 Dictionary to keyword argument type of thing.
17:58 Yeah.
17:59 I mean, you can do that for simple data classes and simple dictionaries.
18:03 That works just fine.
18:05 But I looked into this more.
18:06 And this from dict from to site, it allows you to do nested structures.
18:11 So you can have a data class with another data class field and arrays of lists or tuples of data classes.
18:18 And as some of the types, you can do unions in their collections, nested structures.
18:24 It even has this thing called type hooks, which allows you to have a custom converter for certain types of data that come in.
18:32 So his example is like if for all the strings, lowercase them or something like that.
18:38 But you can definitely have that for certain types.
18:41 It's pretty neat.
18:41 Oh, that's cool.
18:42 Or if you got like some kind of string that's a date time, you know, parse it out of an iso string or whatever.
18:48 Yeah, that's a good example, actually.
18:49 That's cool.
18:50 So one of the things that messes you up on my example of just taking a dictionary and expanding it as arguments to a data class constructor is that it doesn't really work if all the names don't match up.
19:04 But this one allows you to have if your data class only has a few fields, but your dictionary has like tons of stuff in there.
19:12 By default, it just ignores the stuff that doesn't match up.
19:15 And so if you've got like a name and an ID and there's names and IDs coming from the dictionary, but there's also like a whole bunch of other things like URL and stuff like that.
19:25 It just ignores that.
19:26 That's the default.
19:27 But you can also turn on strict mode that says, no, I expect it to match up directly and I want a warning.
19:33 And then there's a whole bunch of exceptions that get raised if something goes wrong in the conversion.
19:37 And I'm just excited to use this because it's a really cool tool to convert data to data classes.
19:44 It's nice.
19:44 Yeah, this looks super nice.
19:46 It's one of those things that seems to automate like the crummy part of programming, right?
19:51 Like I'm getting this data submitted to me from an API or from somebody calling my API and who knows what they're sending me.
19:59 But here's how like long as this thing lines up, right?
20:02 Right.
20:02 I tell it these fields are not optional or this type has to be such and such.
20:05 If that works, then we're good.
20:07 Otherwise, you know, tell them 400 that didn't work or the file couldn't be loaded or whatever it is.
20:11 And there's definitely so Conrad made a point in the documentation to say that it is not a schema validation library.
20:18 That's not the intent of it.
20:20 It is really just intended for the conversion.
20:23 So especially with external APIs, I think combining this with a schema validation is a good idea.
20:31 But you could definitely go from schema validation to this and have data classes in the end.
20:36 It'd be great.
20:36 Yeah, it's a cool project.
20:38 And I love how it leverages the brand new Python stuff, the data classes.
20:41 Anyway, we should plug ourselves.
20:44 Yeah.
20:44 As sponsors.
20:45 So.
20:46 Well, we should definitely let people know about what we're doing, right?
20:49 So you've got this book on testing or something?
20:53 I actually kind of love that I had some feedback early on when the book came out.
20:57 Python Testing with pytest is the book that I'm talking about.
21:01 And it did come out in 2017, the end of 2017.
21:04 And I got some really great feedback from people saying they really loved following the book on this podcast.
21:09 And I apologize for the lawnmower in the background.
21:12 If it goes through, I wanted to point out that I had a couple of people ask me,
21:17 it came out in 2017.
21:19 Is it still valid?
21:20 And I want to take the time to say, yes, it is.
21:23 The intent of the book was never to be a thorough, complete inventory of everything you can do with pytest.
21:29 It was a quick, what are the like 80% of pytest that you're going to use all the time?
21:35 And that will is the core of pytest and how to think about it.
21:38 There is new goodies that have been added since 2017.
21:42 And it's good to check those out.
21:44 But you could run with what's in this book and still be very productive.
21:47 Nice.
21:48 It's definitely made me more productive and better with pytest.
21:51 So it's great.
21:52 Thank you.
21:52 Yeah, you bet.
21:53 And I also want to tell people about the courses that we have over at Talk Python Training.
21:57 We've got a bunch of new ones we've been releasing.
21:59 I do try to let you know when the new ones are out.
22:01 But we've got like 120 hours of Python content over there on a bunch of projects that you can do.
22:07 The 100 days of code courses all have like projects for every single day for 100 days.
22:12 And yeah, so just check them out.
22:15 We're going to release a couple new courses coming soon.
22:17 And I'll be sure to let you know.
22:19 But yeah, support us by checking out our work, right?
22:22 Yeah.
22:22 I want to tell people one of the things I love about the Talk Python courses is there's a lot of content there.
22:28 And I'm a busy person.
22:29 And sometimes it's overwhelming to me to look at a course to say it's like 12 hours of content on a course or something like that.
22:36 Six hours or something even.
22:38 And however, the way that you've got it set up with a whole like bookmarks into separate videos and different topics, it's the outline of the courses are so incredible that if you really need to just jump to the right place to learn something, you can do that.
22:53 And even though you can just watch them in series and just watch the whole thing, you can do that, of course.
22:59 But being able to jump around and go back and use it as a reference is a great thing.
23:04 So thanks.
23:05 Yeah, thanks.
23:06 Yeah, we definitely work hard on making that a possibility.
23:08 So I appreciate that.
23:10 Now, do you know what the Python clock reads right now?
23:13 Oh, I haven't checked.
23:14 What does it read?
23:15 It reads zero, zero, zero, zero, zero, zero.
23:20 It's the Python clock has clock bell has told for the folks who have to convert.
23:27 This next thing I want to share with everyone comes from LinkedIn and Barry Warsaw.
23:32 Barry's been part of Python for a very long time doing a lot of cool stuff there.
23:36 And he was on the team that helped LinkedIn move from legacy Python to modern Python.
23:43 Okay.
23:43 Yeah.
23:44 So it's called how we retired Python to an improved developer happiness.
23:48 So a couple years ago, 2018, LinkedIn started working on this multi-quarter effort to transition to Python 3.
23:58 So maybe some of the lessons from here will help people out there for whom they haven't actually migrated all the way to Python 3.
24:06 That'd be good, right?
24:06 Yeah.
24:07 So basically, they said they did an inventory and they found they have 550 code repositories they had to migrate.
24:16 That's a lot of different projects.
24:18 Yeah.
24:19 And some of them depend on the others.
24:21 So they said, look, Python is not the thing powering our main web app.
24:26 I think it's Java.
24:28 I'm not 100% sure.
24:29 But anyway, it's not their main thing.
24:31 Instead, there's a bunch of like independent microservices and tools and data science projects that are all using this.
24:38 So their first pass at getting all those different things migrated was to say, we're going to have a bilingual philosophy for Python, meaning it'll run on 2 and 3 at the same time.
24:52 Okay.
24:53 And then once you get it there, the main problem that you could run into is I depend on a library.
24:58 Like this is standard legacy Python.
25:01 So I depend on a library that requires Python 2.
25:05 Therefore, everything that I use that I build that depends on that library must also be Python 2, right?
25:11 Yeah.
25:11 So this bilingual thing that they did, this was to prevent that blockade, right?
25:17 So anyone who wants to build new stuff on Python 3 could still use the libraries and do so.
25:22 That was the plan.
25:23 They actually had a whole team that oversaw this effort like across projects or across like thousands of engineers called the Horizontal Initiatives Program.
25:34 So that was to like kind of across all these different projects address that.
25:38 And then in phase one, first quarter of 2019, they went and they found the most important repositories, the ones that were, if you put them into a dependency graph at the bottom, and they said, we're going to port those to Python 3 first because they're blocking everything else.
25:55 And then they kind of finished it off in the second half of 2019.
25:58 So they basically said, all right, now we got the foundation done.
26:01 We can start upgrading the libraries that depend on all these lower level bits.
26:05 And then, you know, they said, looking back, you'll like this part, Brian.
26:09 They said, our primary indicator for knowing that the migration was done, that we were all right, was that our builds passed and our tests ran and everything was okay.
26:19 And then eventually they went through and said, all right, we're going to turn off the ability to run Python 2 type of tests in continuous integration.
26:26 Now let's see what keeps working.
26:28 Oh, yeah.
26:29 Okay.
26:29 Yeah.
26:29 So one of the things you can imagine is important is having tests, right?
26:32 Because if you don't have tests, CI, CD doesn't tell you a lot.
26:36 It just does the CD part.
26:38 Better for better or worse.
26:40 Yeah.
26:42 So they said, look, here's some guidelines for people, other organizations who are on similar paths, but earlier, they said, plan early and engage your organization's Python experts.
26:52 Find and leverage champions in the affected teams and help them promote the benefits of Python 3 to everyone.
26:59 Adopt this bilingual approach so people can at least begin, if they want to, go to Python 3.
27:06 Invest in tests and test coverage, code coverage, because these will be your best metrics of success.
27:13 And then finally, ensure your data models explicitly deal with this, what used to be one thing, bytes and strings in Python 2.
27:23 And now is, of course, two totally separate things.
27:25 Like that's, they said that was the really the biggest challenge that they ran into is that making that distinction correctly.
27:30 Yeah.
27:31 Those are hurdles.
27:32 Are you guys all upgraded?
27:33 Yeah.
27:33 It was a library that we were using that didn't support Python 3 yet.
27:37 The reasoning was the library talks to a DLL that has, you know, C++ strings or C strings.
27:46 And old Python strings converted just fine, but they don't now.
27:50 Unicode fancy ones.
27:52 Yeah.
27:52 Not so easy.
27:53 Yeah.
27:53 Yeah.
27:55 Cool.
27:56 So to wrap this up, they said the benefits they have from this whole process is they no longer have to worry about supporting Python 2.
28:02 And they've seen their support loads decrease and decrease in a good way.
28:06 Not, you don't have to support the old crummy stuff.
28:08 You can depend on the latest open source libraries, right?
28:12 A lot of libraries these days only work with Python 3.
28:14 And they opportunistically and enthusiastically adopted type hinting in mypy to improve overall quality, which is pretty cool.
28:22 Yeah.
28:23 That is good.
28:24 Yeah.
28:25 I'm looking forward to this next one you got.
28:26 This actually ties nicely because you brought up the Django speedups.
28:31 And I probably should have talked about this right afterwards.
28:33 But anyway, here we go.
28:34 There was an article that I'm not saying I agree or disagree because I don't know enough about it.
28:40 But the article was called the troublesome active record pattern.
28:44 And I guess in, you know, like Ruby and stuff, we talk about that.
28:49 They talk about active record more, I think.
28:52 But in Python world, it's the object relational mappers, ORMs, like the Django ORM or SQLAlchemy is also an ORM.
29:01 And those are essentially the same as active record.
29:04 That's, I think, that's the same pattern, right?
29:06 Well, certainly the Django ORM follows that pattern.
29:09 SQLAlchemy, it has a lot of similarities, but its design pattern is technically called a unit of work.
29:15 Okay.
29:16 The main variation is like on Django or things like that is you go to the object and you call save.
29:22 Whereas, so that happens on the individual objects.
29:26 Whereas in SQLAlchemy, you make a bunch of changes and then there's this unit of work thing and you call save and it submits all the changes in one giant batch.
29:34 But basically, but here's the interesting thing is like this whole article is like the troublesome active record pattern.
29:41 But my reading of it really was the troublesome ORM pattern.
29:46 And so, for the most part, it's kind of an immaterial distinction.
29:51 Although, technically, design pattern-wise, they're not exactly the same.
29:55 Okay.
29:55 Okay.
29:56 Well, yeah.
29:57 So, the idea being like you just brought it up that the object, when you're referencing a bunch of objects and you have object save and things like that, there's a whole bunch of issues with that.
30:08 One of the issues is if you want to query things about the data, not necessarily all the data, but things like if you've got a bunch of books, for example, and you just want to count the number of books, well, you might have to just retrieve them all.
30:23 Or if you want to count all of the software testing books written by Oregon authors, you'd have to just ask me.
30:29 Or you'd have to grab all of them and grab all the data and then search on, in Python, look for stuff in a for loop or something.
30:40 The other problem was around transactions, because if I have a book item and then change something about it and then save it back in, there's nothing stopping some other process.
30:52 You know, the read, modify, write doesn't work that well if you've got multiple readers and writers.
30:58 And I was looking this up.
30:59 SQLAlchemy has sessions, or you said there's a unit of work thing.
31:03 Don't know if those are atomic.
31:05 Yeah, yeah, they're the same, yeah.
31:06 Okay, Django has an atomic setting, but I don't know if that's by default or if it always, or if you have to specifically, say, work with transactions.
31:16 I did notice in some of the Django documentation that it does say that transactions slow things down.
31:21 So you don't want to do transactions if you're just reading, for instance.
31:25 And then the author of the article, Cal Peterson, mentions that REST APIs often have the same problems,
31:32 and some microservice architectures have a similar sort of issue.
31:37 It's just around REST APIs instead of the object model.
31:40 You're reading tons of data when you don't need to.
31:43 He brought up some solutions, at least for you can just directly use SQL or use some properties that do queries that are more like SQL.
31:55 Doing transactions helps too.
31:56 But basically, he was recommending avoiding the active record style access patterns around the REST APIs.
32:03 He brought up that GraphQL and RPC style APIs are some solutions to the same problem in REST APIs.
32:11 As somebody that's moving towards learning more about web development and working with ORMs, I really did want to bring this up and find out what you thought of all of this.
32:20 Sure.
32:20 It's interesting.
32:21 There are a lot of good, valid points that Cal's making here.
32:26 I feel like the focus should almost be, instead of the troublesome active record pattern is, you're using your ORM wrong, learn how to use it right.
32:35 So let me give you some examples.
32:37 So one of the challenges here that we see is, if you're going to create a record and you want to get it back, you have to get it back by the primary key.
32:45 Maybe if you're doing exactly on just the straight ORM record pattern, but you can just do a query and do it like a give me the first or one item or something like that.
32:54 There's a part where he's looping over stuff saying, here we're looping back to just get the ISBN off these things, right?
33:00 You're pulling all the properties, like you're doing basically a select star from table just to ultimately, and a serialization of that result just to get like the ISBN.
33:10 Well, in SQLAlchemy, I don't know Django ORM well enough, but SQLAlchemy, you can say only return these columns.
33:18 I want just the ID and the title or the, I just want the ID and the ISBN.
33:22 Don't return the other results, right?
33:24 So that's an option.
33:26 The N plus one thing we already discussed, right?
33:28 You just use the subquery or the B filter select or whatever it is for Django and you can avoid those, right?
33:33 So like, as you kind of go through these, you're like, okay, well, most of the time these problems are actually solved with some aspect of like a proper ORM.
33:43 Now, the transaction one is really, I think, super interesting because it sort of often gets to the heart of this debate about ORMs.
33:50 And you're saying, well, okay, here's this active record thing where it's not really leveraging transactions.
33:57 We know transactions are good.
33:59 And so this is bad because it doesn't do it.
34:02 But in practice, it's not so clean as that.
34:05 So for example, suppose I'm working on a web app and I have a grid, like a grid that was maybe could be loaded off of a REST endpoint, bring that into it, right?
34:14 And I've got this grid and I can type in it and there's a button that says save.
34:18 There's no way that it makes sense to do a transaction around that, right?
34:23 I'm not going to transactionally begin loading the grid and wait for me to press save, right?
34:28 That's going to lock up the database for every user.
34:30 Yeah.
34:30 Any scenario like that, like REST endpoint, right?
34:33 If I've got a phone and I've got my mobile app and it hits the REST endpoint, pulls it down to the data and I hit a type on it and I hit save, right?
34:40 You can't do that transactionally.
34:41 Like it just, you would lock up the site like right away, right?
34:45 So it doesn't make any sense.
34:47 So there's just other patterns like optimistic concurrency is a super common pattern in ORMs that would work with Active Record or SQL Alchemist, you know, work beautifully.
34:56 And the idea is I'm going to make some kind of version in that record.
35:00 And when I pull it back, it's going to come with the version that I got.
35:04 And when you hit save, you say, update this record where the version is the version I have.
35:10 So if someone else has updated it, it increments that version and it says, no, no, there's no record.
35:14 You can't update this.
35:15 All right.
35:17 So you basically say, ah, it looks like someone changed this behind you, like your grid and their grid.
35:22 They hit save before you.
35:23 So you got to deal with like syncing this back up, right?
35:25 So there's a lot of times where it's, it would feel great to like have a transaction, but that transaction actually can't be used anyway.
35:32 And ORMs have like nice built-in ways where you can easily slot in like optimistic concurrency and stuff.
35:38 So that's my thought.
35:39 I think this is an interesting article.
35:41 It's definitely interesting to think about all the points brought up, but I often think that the tools have like clever, non-obvious ways to solve most of these problems.
35:49 Yeah.
35:50 Yeah.
35:50 And I guess to be, I'll be a little bit on Cal side here that the tools have clever, non-obvious ways to deal with them.
35:58 Maybe that's, that's an issue that all of our beginning tutorials on how to use Django or how to use SQLAlchemy or how to use other ORMs are just ignoring that stuff because it's more advanced.
36:10 But people often just read the beginning tutorial and then go do a startup or something.
36:16 Yeah, sure.
36:17 And then you end up with your page loading like in six seconds and you don't know why.
36:20 Yeah.
36:21 Which is not great.
36:22 Maybe we could teach people the right way to do it from the beginning.
36:25 I do wish that some of these patterns were more built in.
36:28 Like I wish optimistic concurrency was there by default in the ORMs and you've kind of got to like roll that yourself and whatnot.
36:35 So anyway, it's a really interesting article to think about.
36:38 And I think it dovetails nicely with my sort of performance one as well because it's, they're kind of two sides of the same coin a bit there.
36:45 Yeah.
36:46 Okay.
36:46 All right.
36:47 Well, I have the second side to your coin.
36:48 That is the Dacity.
36:49 Dacity.
36:50 This is it?
36:51 Whatever that one was called.
36:52 Dacity.
36:53 Yeah.
36:53 So this is a cool thing by Steve Brzeer called Types at the Edge of Python.
36:59 The Edges of Python.
37:00 And so Steve apparently creates a bunch of APIs.
37:04 And I think, yeah, he was using FastAPI at the time when he was talking about all these ideas.
37:10 But it's kind of generally valid for all of them.
37:12 Because look, when I start with a new, when I create a new API these days,
37:16 I start with three things.
37:18 I start with Pydantic, mypy, and some kind of error tracking like Rollbar or Sentry or something like that.
37:23 Okay.
37:24 That's pretty interesting, right?
37:24 So Pydantic is a data translation and validation library, much like Dacity.
37:31 Right?
37:32 Yeah.
37:33 They're not the same, but they kind of play in the same realm.
37:35 They transform JSON with validation and type checking over there.
37:40 And then there's mypy, which looks like you can use Pydantic to help specify some of the types on your classes.
37:48 And then use mypy to verify that you're not missing some kind of check.
37:53 So he says, look, the most common error you're going to run into as a Python developer in general is attribute error.
37:58 None type object has no attribute X.
38:01 Where X is whatever you're trying to do, right?
38:03 Yeah.
38:04 I mean, that just means you got none instead of a value, and you're trying to continue to work with that class in some way.
38:11 It's a void dereference in C.
38:13 Yes, exactly.
38:14 So wouldn't it be nice if it said none is not an allowed value for this, or you have none and you can no longer operate on it or something like that?
38:23 So Pydantic will actually give you those types of errors.
38:26 It'll convert things like attribute errors and mismatch type errors to explain what was wrong, right?
38:33 So that's pretty awesome.
38:34 And so you can use Pydantic to actually specify what your understanding of the interface, like if you're calling an API, the stuff that you expect to get back.
38:42 Okay, I think this is going to be a date.
38:43 I think this is an optional string and whatnot.
38:46 It says, then when you launch a code into production, your assumptions are tested against reality.
38:51 That's pretty cool.
38:53 And it says, if you're lucky, they turn out to be correct.
38:55 But if not, you're going to run into some of these none type errors and Pydantic can help with that.
39:00 But then you can also, once you put in the typing into your code, then mypy will go on helping.
39:06 So for example, if you're taking an argument that says, you know, first, you think it's a string.
39:12 So you say colon stir refers type, then you go work with it.
39:14 And that means it cannot be none, right?
39:16 Like none ability is explicitly set in the type thing in Python and the type space.
39:22 So if you find out that it could be none, then you're going to go and say, this is a typing dot optional of string, right?
39:29 Like that's what it's got to be.
39:31 If it could be none or a string, you'd find that out and specify that in Pydantic.
39:34 And then if you run mypy against it and you start working with an optional string, you don't check for it to be none first.
39:41 Mypy will actually give you an error saying that you're not checking for none, basically.
39:46 So it'll even tell you like the missed if statements or other conditional code to like verify that like, no, it's not the optional none.
39:54 It's actually the value.
39:55 Okay.
39:55 That's pretty cool, right?
39:56 And then if you will.
39:57 Tripped me up before, yeah.
39:58 Yeah, for sure.
39:59 I mean, normally it's just not present.
40:01 And it's not because Python is a dynamic language.
40:04 Like C++ would have the same problem, right?
40:06 If you take a pointer and you just start to work with it in C, C++, the compiler is not going to say, you didn't check that for, you know, equal to null first.
40:15 It just doesn't do that, right?
40:17 Yeah.
40:17 So this is a really awesome like addition for like safety in your code.
40:21 So he was talking about how FastAPI automatically integrates with Pydantic out of the box, which is pretty cool.
40:26 And then also at the end, he has a kata, a mini kata that like works you through these ideas.
40:33 So a kata is like a practice to like play with these typing ideas.
40:37 Yeah.
40:37 And a nice picture of how these all fit in.
40:40 Yeah.
40:40 Yeah.
40:41 Yeah.
40:41 There's some cool diagrams.
40:42 So anyway, if you're building APIs and you're taking data, especially from sources where they might give you junk when you expected something valuable or you're not really sure, you're like the docs say this, but I remember getting something different some other time.
40:54 This is a really cool way to formalize that and then have your code automatically check it.
40:58 Yeah.
40:58 This is cool.
40:59 I like it.
41:00 Yeah.
41:00 Awesome.
41:00 That's all over our six items.
41:02 Do you have any extra little things to share?
41:05 Well, I kind of went overboard on the extras this week, but I'll keep them all quick because there's a bunch of cool stuff out there that people send in.
41:12 First, Jack McHugh did a really cool thing.
41:15 So Jack McHugh created a blog post or a page on a site called Python Bytes Awesome Package List.
41:23 Have you seen this?
41:24 Yeah.
41:24 And he like listened to 174, 171 episodes in 174 days or something like that of Python Bytes.
41:34 I mean, this is awesome because as I flip through this, there's a couple of things I've forgotten.
41:38 I'm like, oh, that's cool.
41:39 Oh, we must have talked about that, but I don't even remember.
41:41 It's got beautiful pictures.
41:43 It's, I mean, it's kind of an awesome list, but it's for a podcast.
41:47 So that is super cool, Jack.
41:49 Thank you.
41:50 Thank you.
41:50 I'll be sure to link to it at the end.
41:52 And I hope you keep adding to it.
41:54 That would be great, but no pressure.
41:57 Yeah.
41:57 I want to talk about VB.net for a second.
41:59 That's kind of weird, right?
42:00 Yeah.
42:01 Because I kind of appreciated VB back in the early days when it was like a drag and drop VB6 and whatnot.
42:08 And then Microsoft came up with a thing called Visual Basic.net and it was complete crap.
42:12 Didn't like it.
42:13 But here's what's interesting is like they have just announced that they are no longer maintained.
42:20 They'll keep that thing running, but they will no longer work on it.
42:23 And I just thought it was interesting.
42:24 Like here's a fairly major language, not super top five or something, but it's kind of a major language that's like declared dead.
42:32 And I just thought it was kind of interesting to point out like, man, languages, they can go dead.
42:37 It's weird.
42:39 Yeah.
42:40 I think this one should have been shot a long time ago, but you know.
42:44 It's also worth thinking about this.
42:46 I agree, by the way.
42:47 It should have never existed.
42:48 But anyway, that's a different story.
42:50 It's also an interesting take on like, here's a language controlled by a single company and they can just decide they don't like it anymore.
42:57 Right?
42:57 Like this wouldn't really happen to Python because there's not a single person or organization that goes, ah, we're done.
43:03 Yeah.
43:04 Well, that's actually one of the fears I have for, I mean, even Java.
43:08 Java is not controlled by one company, but it kind of is sort of.
43:12 Yeah.
43:13 Yeah.
43:13 Well, and there's also that, that Supreme Court case or the legal case of like, are you allowed to copy the Java API?
43:21 I don't think that's resolved yet.
43:23 I can't remember.
43:23 It's still working its way through the courts.
43:25 I want to reiterate.
43:26 I don't.
43:27 I don't think that's just a good idea.
43:29 I don't think that's a good idea.
43:29 I don't think that's a good idea.
43:29 I don't think that's a good idea.
43:30 I don't think that's a good idea.
43:31 I don't think that's a good idea.
43:31 I don't think that's a good idea.
43:32 I don't think that's a good idea.
43:33 I don't think that's a good idea.
43:34 I don't think that's a good idea.
43:35 I don't think that's a good idea.
43:36 I don't think that's a good idea.
43:36 I don't think that's a good idea.
43:37 I don't think that's a good idea.
43:38 I don't think that's a good idea.
43:39 I don't think that's a good idea.
43:39 I don't think that's a good idea.
43:40 I don't think that's a good idea.
43:41 I don't think that's a good idea.
43:41 I don't think that's a good idea.
43:42 I don't think that's a good idea.
43:43 I don't think that's a good idea.
43:44 I don't think that's a good idea.
43:45 I don't think that's a good idea.
43:46 I don't think that's a good idea.
43:47 I don't think that's a good idea.
43:49 I don't think that's a good idea.
43:50 I don't think that's a good idea.
43:51 I don't think that's a good idea.
43:52 I don't think that's a good idea.
43:53 I don't think that's a good idea.
43:54 I don't think that's a good idea.
43:55 I don't think that's a good idea.
43:56 I don't think that's a good idea.
43:57 I don't think that's a good idea.
43:58 I don't think that's a good idea.
43:59 I don't think that's a good idea.
44:00 I don't think that's a good idea.
44:01 I don't think that's a good idea.
44:02 I don't think that's a good idea.
44:03 I don't think that's a good idea.
44:04 I don't think that's a good idea.
44:05 I don't think that's a good idea.
44:06 I don't think that's a good idea.
44:07 like the Johns Hopkins CSSE data set and some other dashboards and some things on Kaggle.
44:13 So if you're in data science, you want to explore it.
44:15 Here's some data sets that are probably interesting.
44:17 And finally, working a new course, adding a CMS to your data-driven web app.
44:21 That'll be a lot of fun.
44:22 And I'll talk more about that later.
44:24 But I'm just super excited to be creating more courses as we kind of talked about earlier.
44:28 Yeah.
44:29 One of the things we talked about is people working from home and getting around technical problems with that.
44:34 That happened to me just this morning.
44:36 So this morning I tried to hook up.
44:38 I realized that I had an external keyboard that's working fine-ish.
44:43 I wanted to use like a real mouse.
44:46 So I plugged in an external mouse with a little click wheel thing on it and realized that on Apple,
44:53 the click wheel behavior just goes the wrong direction for scrolling.
44:58 And it confused me.
45:00 And you can reverse it, but I didn't want my trackpad to be reversed.
45:04 The trackpad's fine.
45:05 So they're tied together for some reason.
45:07 Weird.
45:08 So Dave Forjack, sorry Dave, he suggested I use something called the scroll reverser.
45:17 That is a little tiny app that allows you to untie those and have trackpad scrolling and mouse scrolling be different.
45:25 And thank you, Dave.
45:26 That's awesome.
45:27 That's super cool.
45:28 I guess my work from home thing that I've been playing with is with Zoom, you can have virtual backgrounds.
45:37 You don't even have to have a green screen.
45:38 You can have like alternate backgrounds just by uploading an image and it'll put you in, you know, an office space instead of a messy bedroom or whatever it is.
45:47 Oh, nice.
45:48 Yeah.
45:48 So you can block out the kids behind you and stuff like that.
45:51 Yeah, exactly.
45:52 You don't have to see the kids being crazy home from school and whatnot.
45:55 Anyway.
45:56 So you can do a lot of stuff we're learning around those types of things.
45:58 And I think the joke that I chose for us this week is going to be perfect for the opening of community as documentation as building community that you brought up.
46:08 Okay.
46:09 This is before that person gets inspired from listening to you and actually makes things better.
46:13 All right.
46:14 So let me, let me set the stage here.
46:16 There's three people, two of them clearly more senior and a very excited new person sitting in a laptop, like beaming with enthusiasm, ready to get going on the whole project.
46:28 And one of the senior person says to the other, and this is Jim, our new developer.
46:33 The other one says, great.
46:35 Does he already know something about our system?
46:37 The new person turns around.
46:39 I read the whole documentation.
46:41 Blink looks between the senior people.
46:44 No.
46:45 Yeah.
46:46 Yeah.
46:47 That's good, right?
46:48 Yeah, definitely.
46:49 I started a job once in my career where I had read the documentation because it was an internal job transfer.
46:57 I read the documentation before getting there and the people there that didn't know they had documentation.
47:03 So it was so out of date.
47:06 Nobody currently knew it existed.
47:08 Yeah.
47:09 It may be a little out of date if they don't even know it exists.
47:11 Yeah.
47:12 All right.
47:13 Well, awesome.
47:14 Cool.
47:15 Well, thanks a lot.
47:16 You bet.
47:17 Great to be here with you as always.
47:19 See you later.
47:20 Bye.
47:21 Thanks for listening to Python Bytes.
47:22 Follow the show on Twitter at Python Bytes.
47:24 That's Python Bytes as in B-Y-T-E-S.
47:27 And get the full show notes at Pythonbytes.fm.
47:30 If you have a news item you want featured, just visit Pythonbytes.fm and send it our way,
47:36 where I was on the lookout for sharing something cool.
47:38 This is Brian Okken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.