Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #306: Some Fun pytesting Tools

Return to episode page view on github
Recorded on Tuesday, Oct 18, 2022.

00:00 - Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 306, recorded October 18th, 2022.

00:10 I'm Michael Kennedy.

00:12 - And I'm Brian Okken.

00:14 - Very exciting to have a whole bunch of things to share this week.

00:17 Also want to say thank you to Microsoft for Startups for sponsoring yet another episode of this one.

00:23 Brian, we've had a very long dry summer here in Oregon And I was afraid that we would have terrible fires and it'd be all smoky and all sorts of badness.

00:33 And there've been plenty of fires in the West, but not really around here for us this summer.

00:37 We kind of dodged the bullet until like today.

00:40 - It's a little smoky today.

00:42 - We smoke, go inside.

00:44 - Yeah.

00:45 - I thought we dodged it, sadly, no.

00:47 - So I think it's affecting my voice a little bit.

00:49 So apologies for that.

00:51 - We'll put that filter on you and we'll make you sound like someone else and you'll be fine.

00:55 - Yeah.

00:57 - Yeah, it's also affecting me, so who knows?

01:00 But anyway, we'll make our way through.

01:03 We will fight through the fire to bring you the Python news.

01:06 Hopefully they get that actually put out soon.

01:09 - Like the post office.

01:10 - Yeah, let's kick it off.

01:12 What's your first thing?

01:13 - So I've got, let's put it up.

01:15 So I've got, add to stream.

01:18 I've got awesome pytest speedup.

01:20 So this is awesome.

01:23 Yeah, so actually, some people may have noticed the test and code is not really going on lately.

01:29 One of the things that makes it easier for me is when I SQL testing related articles, I don't have a decision anymore.

01:35 I can just say, "Hey, it's going to go here." Now, testing code will eventually pick up something again, but I'm not sure when.

01:42 For now, if I find something cool like this article, I'll bring it up here.

01:46 >> I think I make you show up every week.

01:49 Talk about fun stuff anyway.

01:51 >> This is a GitHub repo, And we're seeing kind of seeing more of this of people writing instead of blogging, they just write a like a readme as a repo.

02:03 I know this is such a weird trend. I totally get it. And it's good, but it's also weird.

02:08 But it's kind of neat that people can update it. So they can just keep it up. And you can see people get a PR to your blog posts. That's not normally how it goes.

02:16 Yeah, not sure. But it's probably harder to throw Google Analytics at it, right?

02:23 - Oh yeah, we'll see like whether you should do that or not.

02:26 - So anyway, so this comes to us from Nate Zupan, cool name by the way.

02:33 And he also has, we'll include a link in the show notes to a talk he gave at a Plone in Namur 2022.

02:43 So just recently.

02:45 Anyway, so he goes through best practices to speed up your pytest suite.

02:52 And he's just kind of lists them all at the top here, which is nice.

02:56 Hardware first.

02:59 Well, first of all, when he goes into the discussion, he talks about measuring first.

03:02 So before you start speeding anything up, you should measure because you want to know if your changes had any effect.

03:10 And if it's making support a little bit weirder, then you don't want to make the change if it's only marginal.

03:16 So I like that he's talking about that, of like each step of the way here, measure to make sure it makes a difference.

03:23 - Right.

03:24 - So first off, and I'm glad he brought this up, is check your hardware.

03:29 Make sure you've got the fast hardware if you have it.

03:33 So one of the, and I've noticed this before as well, is, so here we go, measure first.

03:39 But some CI systems allow you to have self-hosted runners, and it's something to consider.

03:46 Whether your CI is in the cloud or you've got virtual, like a server with some virtual machines around to be able to run your test runners, they're not going to be as fast as physical hardware if you've got some hardware lying around that you can use.

04:03 So that's something to consider, to throw hardware at it.

04:07 And then test collection time.

04:08 Some of the problems with the speed of pytest is using, if you've got, if you run it from the top level directory of a project and you've got tons of documentation tons of source code, it's going to look everywhere.

04:21 So don't let it look in those places.

04:23 So there's there's ways to turn that off.

04:25 So with no recursive and giving it the the directory.

04:29 I also wanted to point out he didn't talk about this in the article, but I want to point out that something to use is, oh, it went away.

04:37 Test paths. So use test paths to say specifically.

04:41 So the no recursive says essentially avoid these directories.

04:45 But test paths pretty much says this is where the tests are.

04:49 look here. so those are good.

04:51 Nice. Yeah. I've done that before on some projects, like on the Talk Python Training website where there's got a ton of text files and things laying around.

05:00 And I've done certain things like that to exclude, you know, PI test and PI charm and other, other different things to look there in those places where like there's no code, but there's a ton of stuff here and you're going to go hunting through it. Yeah.

05:13 So like I really sped up the startup time for a pyramid scanning for files that have route definitions in them, right?

05:21 For URL endpoints, because it would look through everything.

05:24 Apparently it doesn't matter, at least looking for files through directories with tons of stuff.

05:28 And, and this is like, it makes a big difference if you have a large project.

05:31 For sure.

05:32 Yeah.

05:32 It's significant.

05:34 so it was something to think about, and documentation too.

05:38 You don't, unless you're really testing your documentation, you don't need to look there.

05:41 So, hardware fast, make collection fast.

05:45 This one is something I haven't used before, but I'll play with it.

05:49 Python don't write byte code, a environmental flag.

05:53 I guess it comments that it might not make a big difference for you, but it might.

05:59 So, you know, I don't know.

06:02 So Python writes the byte code normally, and maybe it'd be faster if you didn't do that during tests, might as well try.

06:10 There's a way to disable pytest plugins to, yeah, let's just go built-in pytest plugins.

06:18 You can say no, like no nos, or no doc tests if you're running those.

06:23 I haven't noticed that it speeds it up a lot, but it's, again, it's something to try.

06:28 And then a subset of tests.

06:31 So this is especially important if you're in a TDD style.

06:36 And one of the things that I think some people forget is your test, if you've got your tests organized well, you should be able to run a subset anyway, 'cause you've got like the feature you're working on is in a sub-directory of everything else.

06:49 And just run those when you're working on that feature and then you don't run the whole suite.

06:54 There's a discussion, and this goes along with the unit tests mostly, but disable networking, unless you're intending to have your code using network connections, you can disable that for a set of tests or the whole suite.

07:11 And then also disaccess, trying to limit that.

07:14 And he includes a couple ways to ensure those.

07:19 And then a really good discussion, a fairly chunky discussion on database access and optimization to databases, including discussion around rollback.

07:31 And there was something else that I hadn't seen before.

07:35 Let me see if I can remember.

07:37 - Yeah, there's some interesting things.

07:38 I think, I know you've spoken about it your pytest course about using fixtures for setup of those common type things, right?

07:47 >> One of the things I'm not familiar with is truncate.

07:50 Have you used the database truncate before?

07:52 >> No.

07:53 >> Apparently, that allows you to set the whole database up, but delete all the stuff out of it, like to empty the tables.

08:02 If a big chunk of the work of setting up data is setting, getting all the tables correct, then truncate might be a good way to clean them all out and then refill them if you need to.

08:15 But also, yeah, like you said, paying attention to fixtures, it's really good.

08:21 And then the last thing he brings up is just run them in parallel.

08:24 By default, pytest runs single, each test one at a time.

08:28 And if you've got a code base that you're testing that can allow, like you're not testing a hardware resource or something, that you can allow parallel, go ahead and turn those on, turn on the use X dist and turn or something else and run them in parallel.

08:43 So a really good list and I'm glad he put it together.

08:47 Also very entertaining talk.

08:48 So give it, give this talk a look.

08:51 - Yeah, absolutely.

08:52 Brandon out in the audience says, people at work have been trying to convince me that tests should live next to the file they are testing rather than in a test directory.

09:01 You know, I created a test directory that mirrors my app folder structure with my tests in there.

09:06 Any opinions?

09:07 I don't like that, but--

09:10 - Neither do I, honestly.

09:12 - If you like it, I guess, okay.

09:13 I've heard that before, but I haven't heard people in Python recommending that very often.

09:18 - Yeah, for me, I feel, I understand why, like, okay, here's the code, here's the test.

09:24 Maybe the test can be exactly isolated to what is only in that file, but sometime, you know, like, as soon as you start to blend together, like, okay, well, this thing works with that class to achieve its job, but it, you know, you kind of, it kind of starts to blur together and like, well, what if those are in the wrong places?

09:44 Well, now it's like half here and I don't know, it just, it leads to like lots of, I don't know, it's like trying to go to your IDE and say, I have these seven methods, please write the test for it.

09:55 And it says test function one, test function two, test function three.

09:58 You're like, no, no, no, that is not really what you're after, but I feel it kind of leads, towards that like, well, here's the file, let's test all the things in this file.

10:07 And it, which is not necessarily the way I would think about testing.

10:10 - Well, also, are you really test?

10:12 I mean, it kind of lends itself to starting to test the implementation instead of testing the behavior.

10:18 - Yes, exactly.

10:19 - Because you might have, if you've got a file that has no test associated with it, somebody might say, well, why is the test for that?

10:25 And you're like, well, that file is just an implementation detail.

10:28 It's not something we need to test because you can't access directly from the API.

10:33 So right, it's completely covered by these two other other tests.

10:37 And it will, by the way, there are other folders.

10:38 Go find them.

10:39 Also the stuff you're speaking about here by like making collection fast and such.

10:46 Also it's a little bit tricky.

10:48 potentially sharing fixtures might be a little more tricky that way.

10:51 I don't know.

10:52 My, my vote is, is to not mix it all together.

10:55 Plus, do you want to ship your test code with your product?

10:58 Maybe you do, but often you don't.

10:59 it's harder, harder if they're all woven together.

11:03 That's true.

11:03 Yeah.

11:04 Yeah, so anyway.

11:06 That's the same thing.

11:08 Also, Henry Schreiner out there kind of says, I don't like distributing tests in wheels.

11:12 Only Estes, so like a test folder as well.

11:14 Yeah, I'm with you.

11:16 I think, Brandon, the vote here is test folder.

11:20 But, you know, that's just us.

11:22 Yeah.

11:23 Awesome.

11:25 All right, well, this is, yeah, this is a good find.

11:26 You want to hear my first one?

11:28 This is a bit of a journey.

11:30 It's a bit of a journey.

11:31 So let's start here.

11:34 So I have a perfectly fine laptop that I can take places if I need to for work, take it to the coffee shop to work.

11:42 If I'm going on like a two week vacation, it's definitely coming with me, right?

11:47 It's even if my intent is to completely disconnect, I still have to answer super urgent emails.

11:53 If the website goes down, any of the many websites I seem to be babysitting these days, like I've got to work on it.

12:00 Like there could be urgent stuff, right?

12:02 So I just, I take it with me.

12:04 But I'm on this mission to do that less, right?

12:07 'Cause I have a 16 inch MacBook Pro, it's pretty heavy, it's pretty expensive.

12:11 I don't necessarily wanna like take it camping with me.

12:14 But what if, what if something goes wrong, Brian?

12:16 What if I gotta fix it?

12:17 Do I really wanna drive the four hours back because I got a message that like, you know, the website's down and everyone's upset, can't do their courses or they can't get the podcast.

12:25 No, I don't want that.

12:26 So I would probably take the stupid thing and try to not get it wet.

12:30 So I'm on this mission to not do that.

12:32 So I just wanted to share a couple of tools and, you know, people, if they've got thoughts, I guess probably the YouTube stream chat for this would be the best or on Twitter, they could let me know.

12:41 But I think I found like the right combination of tools that will let me just take my iPad and still do all the DevOps-y life that I got to lead.

12:51 So that it's not good for answering emails.

12:53 You know, I have like minor RSI issues and I can't type on an iPad, not even a little like keyboard that comes with it.

12:59 Like I've got my proper Microsoft ergonomic sculpt and you can plug that into an iPad.

13:06 But once you start taking that, you know, like, well, you might as well just take the computer.

13:09 So, two tools I want to give a shout out to prompt by panic.

13:14 Panic is a Portland company, so shout out to the local team.

13:17 Is it at the disco or exactly?

13:21 They don't really freak out that much of the disco panic there.

13:25 But prompt is a SSH client for iOS in particular for iPad.

13:29 But you could, I mean, if you wanted to go extreme, you could do this on your phone.

13:32 How far are you going camping or where are you going?

13:36 This lets you basically import your SSH keys and do full-on SSH like you would in your iTerm2 or terminal.

13:46 >> Turns your iPad into a dump terminal.

13:48 >> Yeah, and it does. You can easily log into you know, the Python bytes server and over SSH, do all the things that you need to do.

13:57 So, you know, if you got to get into the server, you got to like, okay, well, I really have to just go restart the stupid thing or change a connection string because who knows what, right?

14:05 You could you do it, it seems to work pretty well. The only complaint, the only complaint that I have for it is it doesn't have nerd fonts.

14:15 So my oh my posh, dude, this is serious business.

14:18 Don't laugh.

14:18 My nerd fonts, like I can't do PLS.

14:21 I can't do oh my posh and get like the cool like shell prompt with all the information.

14:28 No, it's all just boxes.

14:30 It's rough.

14:30 No, it's fine.

14:31 It would be nice, but it does have cool things like if you need to press control shift that or you know, it has like a special way to pull up the all those kinds of keys.

14:41 So you press control and then some other type of thing or You know, it has up arrow, down arrow, it has like if you want to cycle through your history.

14:47 It's got a lot of cool features like that where you can kind of integrate that.

14:52 So it works, I think it's going to work.

14:53 I think this is the one half of the DevOps story.

14:56 The other part is, oh my goodness, what if it's a code problem?

15:02 Do I really want to try to edit code over this prompt thing through the iPad on, you know, in like Emacs or what am I, no, I don't want to do that.

15:12 So the other half is GitHub.

15:15 In particular, the VS Code integration into GitHub.

15:20 So if you remember, like here I have pull up on the screen, just with any public repo or your private ones.

15:27 This is my ginger partial thing for basically integrating HTMX with Flask.

15:32 But you can press the dot.

15:34 If you press dot, it turns that whole thing into a Cloud-hosted VS Code session.

15:41 That's awesome, right? Even has auto complete. So if I hit like dot there, you can see it on my auto complete.

15:46 That's pretty cool.

15:47 That's pretty cool. But how do you press dot when you're on a web page?

15:51 And in iPad, there is no dot.

15:54 Because you can't pull up the keyboard. The only thing you do pull up the keyboard is go to an input section.

15:59 And once you're in input, well, it just types out. It doesn't do that.

16:02 Why? So here's the other piece. All right, here's the other piece.

16:05 So you go over here and you change github.com/mikeckenney/djengelpartials to github.com/dev/whatever.

16:12 Boom, done.

16:13 So if you got to edit your code, you just go change the .com to .dev and you have an editor, you can check it back in, like in my setup, if I commit to the production branch, it kicks off a continuous deployment, which will like automatically restart the server and reinstall like the things that might need if it has a new dependency or something.

16:32 I could literally just come over here, make some changes, do a PR over to the production branch, or push some HAL merge over to the production branch, and it's done. It's good to go. Isn't that awesome?

16:42 >> Just edit live.

16:44 Just edit your server live.

16:46 >> No. I saw somewhere somebody was complaining about the prompt saying it's really hard for me to edit my code on the server.

16:53 I'm like, why would you?

16:55 No, it should be hard. You don't do that.

16:57 >> Don't do that. Yeah.

16:59 So I went to try this, but I have to do the two-factor authentication to get into my account.

17:05 Yeah, yeah, yeah, you got to do that. Brandon also says, "Hey, I'll buy you a keyboard case." I absolutely hear you, and I would love – you have no idea how jealous I am of people that can go and type on their laptops and type on these small things. Like RSI, I would be destroyed in like an hour or two if I did it. It's not a matter of do I want to get the keyboard or not. I just I just can't, so anyway, it's not that bad to be me, but I'm not typing on small square keyboards.

17:33 It just doesn't work.

17:34 It's just something I can't do.

17:35 - Okay, so-- - All right.

17:36 - Just no. - Run, run.

17:38 (both laughing)

17:40 Exactly, no, I just, because when I was 30, my hands got messed up, and they just, they almost recovered, but not 100%, right?

17:48 - I know you got more going on than I do, though.

17:49 So I just got back from four days off, and I took the iPad.

17:54 And I had to answer a few emails, but for me, these short emails, the little key bed, the cover thing, it works fine.

18:04 Even though those are expensive.

18:06 When you add, oh, I want an iPad, but I also want the keyboard thing, and I want the pencil, suddenly it's like almost twice as much.

18:15 - It is, it is, absolutely.

18:17 And just people who have been paying attention for the last two hours, Apple just released new iPads with M2s, So people can go check that out if they wanna spend money.

18:26 I'm happy with mine, I'm gonna keep it.

18:27 All right, before we move on to the next thing, Brian.

18:31 - Okay.

18:32 - Let me tell you about our sponsor this week.

18:33 So as has been the case usual, thank you so much.

18:37 Microsoft for Startups Founders Hub is sponsoring this episode.

18:41 We all know that starting a business is hard.

18:44 By a lot of estimates, over 90% of startups go out of business in just the first year.

18:48 There's a lot of reasons for that.

18:50 Is it that you don't have the money to buy the resources?

18:52 Can you not scale fast enough?

18:54 Often it's like, you have the wrong strategy or do you not have the right connections to get the right publicity or you have no experience in marketing.

19:03 Lots, lots of problems, lots of challenges.

19:07 And as software developers, we're often not trained in those necessary areas like marketing, for example.

19:13 But even if you know that, like there's others, right?

19:16 So having access to a network of founders, like you get in a lot of accelerators, like Y Combinator, would be awesome.

19:22 So that's what Microsoft created with their founders hub.

19:26 So they give you free resources to a whole bunch of cloud things, Azure, GitHub, others, as well as very importantly, access to a mentor network where you can book one-on-one calls with people who have experience in these particular areas.

19:43 Often many of them are founders themselves and they've created startups and sold them and they're in this mentorship network.

19:50 So if you wanna talk to somebody about idea validation, fundraising, management and coaching, sales and marketing, all those things, you can book one-on-one meetings with these people to help get you going and make connections.

20:03 So if you need some free GitHub and Microsoft Cloud resources, if you need access to mentors and you wanna get your startup going, now make your idea a reality today with the support from Microsoft for Startups Founders Hub.

20:15 It's free to join, it doesn't have to be venture-backed, doesn't have to be third party validated.

20:21 You just apply for free at pythonbytes.fm/foundershub2022.

20:26 The link is in your show notes.

20:27 Thanks a bunch to Microsoft for sponsoring our show.

20:32 What's next, Brian?

20:33 - Well, that article that I already read about the speeding up pytest, it had a whole bunch of cool tools in it.

20:38 So I wanted to go through some of the tools that were in the article that I thought were neat.

20:43 One of them for profiling and timing was a thing called Hyperfine.

20:48 And this is a not, I don't think it's a Python thing, but you like for max, you had to brew install it.

20:54 But one of the things it does is you can give it, you give it like two things and it runs both of them and it can run it multiple times and then give you statistics comparing them.

21:08 So it's a really good comparison tool to, you know, like if you're testing your test suite to see how long it runs.

21:15 may as well run it a couple times and see.

21:17 - For people who didn't see yet the example from that first article you covered, a lot of those were CLI flags, right?

21:26 Like dash dash, no, no's for disabling the plugin and so on, so you could have two commands on the command line where you basically change the command line arguments to determine those kind of things, right?

21:40 - Yeah, exactly, so run it a couple times and run the test suite a couple of times each and just see if I had these no flags or this other flag or with the environmental variable.

21:54 Actually, I don't know how you could do that in there.

21:56 You can set environmental variables in command line maybe.

21:58 - Yeah, I'm sure that you can somehow.

22:00 - Yeah. (laughs)

22:01 - Inline an export statement or something, who knows?

22:03 - At the very least, you can run the same command twice.

22:06 You can run it, set the environmental variable, and then run it again to see if it makes a difference.

22:11 - Yeah.

22:12 That was neat.

22:13 I don't know why I've got the API referencing.

22:16 Oh, the thing I wanted to talk about was duration.

22:20 So let me find that.

22:22 I think I lost it.

22:23 So we did talk about duration.

22:26 Durations.

22:27 Oh, well.

22:28 Oh, here it is.

22:29 So durations, if you give it a number, like durations 10, pytest will give you like the 10 slowest tests and tell you how far, how slow they are.

22:37 But you can, if you don't give it anything, it just does all of it.

22:41 But the other thing that's been fairly recent, it wasn't there when I started using pytest, is durations min.

22:48 So you can give it, when you give it durations with blank or in zero, it times everything, but that might be overwhelming.

22:58 So you can give it a minimum duration in seconds to only include, only time the tests that are all over a second or something like that.

23:07 - Right, right.

23:07 If it's really, if it's 25 milliseconds, like just I don't want to see it.

23:11 >> Yeah, I'm not going to spend time trying to speed that up.

23:15 Another cool thing brought up was Pi Instrument, which is a very pretty way to look at the times that you're spending on different things.

23:26 It's not just for testing, but you could use it for other stuff.

23:28 But apparently, in the user guide, there is specifically how to profile your tests with Pi Test using Pi Instrument.

23:36 That's a cool bit of documentation.

23:39 This doesn't actually look obvious, so maybe I'm looking at this wrong, but I'm glad they wrote this up.

23:47 >> Yeah.

23:48 >> It's cool.

23:49 >> Basically profiling your, oh, interesting.

23:52 You do it as a fixture.

23:54 >> Yeah.

23:55 >> So you create the profiler, you start the profiler, then you yield nothing, which triggers the test to run, and then you stop the profiler and do the output.

24:04 That's really cool.

24:05 >> Yeah, pretty cool way to do that.

24:06 So profiling each test.

24:08 >> Yeah. It's a bit mind bending on the coroutines.

24:12 >> So it's cool they're using it as a fixture because if you had the fixtures set up by default as a function, so it'll go around every function.

24:22 But if you set it up as a module, you could just find the slow test modules in your system, which might be an easier way to speed things up.

24:30 We're looking. Anyway. I was thrilled that my little pytest skip slow plugin that I developed as part of it.

24:39 I didn't even come up with the ideas for the code, but that came out of the pytest documentation. But it wasn't a plugin yet, but I developed this plugin during writing the second edition of the book and it showed up in his article, which is cool.

24:53 More interesting is pytest Socket, which is a plugin that can turn off it just turns off socket, Python socket calls.

25:03 And then it raises a particular exception.

25:08 So it doesn't, like if you just install it, it doesn't turn things off.

25:12 You have to pass in a disable socket to your test suite and then it turns off accessing the external world.

25:19 So this is a kind of a cool way to easily find out which tests are failing because your network is not connected.

25:26 So go figure out if you really want to.

25:28 - If you want to say definitely don't talk to the network or don't talk to the database, turn off the network and see what happens.

25:34 - Yeah, and then you can, I mean, but even if you did want part of your test suite to access the network, you could test it to make sure that there aren't other parts of your test suite that are accessing it when they shouldn't.

25:43 So it'd be a cool debugging tool.

25:46 And then file system stuff too, there's PyFakeFS fake file system that you can mock file system.

25:52 So even things that you wanna write, you don't actually have to have the files left around.

25:56 You can leave them around just long enough to test them so you can use this.

25:59 - That's perfect.

26:00 - And then the last thing I thought was cool was a way, there's a thing called Blue Racer that you can attach to a GitHub CI to check in merges.

26:13 So if somebody merges something, you can check to see if they've terribly slowed down your test suite.

26:18 So it kind of reports that.

26:21 I don't think it fails on slower tests, but it just sort of reports what's going on.

26:28 So yeah, it gives you a little report of like the nice what happened on the branch and if the test suite slowed down.

26:35 So yeah, thanks to know.

26:37 >> Yeah, that's a cool project.

26:38 Blue racer. Nice.

26:39 Okay. It's automatic, which is lovely.

26:42 >> Yeah.

26:44 >> So nice. All right.

26:46 Well, I've got one more item for us as well, Brian.

26:49 >> Yeah.

26:49 >> So we talked a little bit about, you talked about Pi upgrade.

26:53 The last show, I think it was.

26:55 >> Yeah.

26:56 >> We talked about some of these other ones.

26:58 So I wanna talk about, I'm gonna give a shout out to ReFurb, very active project last updated two days ago, 1,600 stars.

27:06 And the idea is basically you can point this at your code and it'll just say, here are the things that are making it seem like the old way of doing things.

27:16 You should try doing it the newer way.

27:18 So for example, here's something it's asking if the file name is in a list, right?

27:25 one of the ways you can see if filename equals X or filename equals Y or filename equals Z, you would say if filename in X comma Y or comma Z, right? And that's a more concise and often considered more Pythonic way. But do you need a whole list allocated just to ask that question?

27:42 What about a tuple? And here we have a with open filename as F then contents F dot read. And we and then we have the split lines and so on.

27:51 And so, well, if you're using pathlib, just say path.readText, you don't need the context manager, you don't need two lines, just do it all in one.

28:00 And so on this simple little bit of code here, they just run refurb against your, this example Python file, and it'll say use tuple xyz instead of list xyz for that in case.

28:13 And then what I really like about it is it finds like exactly the pattern that you're doing.

28:17 So it says you're using with open something as F, then value equals F.read, use value equals path of X.readText one line.

28:29 It gives you pretty, it doesn't say you should use path read text.

28:32 It gives you in the syntax of, here's the multiple lines you did, do this instead. Nice, right?

28:39 >> I don't think I've ever used read text.

28:41 So I learned something new.

28:43 >> I hadn't either, but you know what I do now.

28:45 It also says you can replace X starts with Y, or starts with Z with starts with X, Y, Z as a tuple and that'll actually test.

28:56 >> One or the other?

28:57 >> Yeah, one or the other.

28:58 >> Okay.

28:59 >> It says instead of printing with an empty string, there's no reason to allocate an empty string, just call print blank that does the same effect.

29:06 There's a whole bunch of things like that that are really nice here.

29:09 You can ask it to explain, you're like, "Dude, what's going on here?

29:14 You told me to do one, two, three.

29:16 What's the motivation?

29:19 You'll get a help text.

29:20 Here's the bad version, here's the good version, here's why you might consider that.

29:23 For example, given a string, don't cast it again to a string, just use it.

29:29 Maybe more important is you can ignore errors.

29:32 You can ignore, just do a dash dash, ignore a number.

29:36 There's one which I'll show you in a second, which I've started adopting that for when I use it.

29:41 Or you can put a hash no QA and put a particular warning to be disabled, or you can just say no.

29:48 Just leave this line alone.

29:49 Like I just don't wanna hear it.

29:50 Don't tell me.

29:51 So you can say #noqa, then it'll catch like all of them.

29:55 - Okay.

29:56 - Okay.

29:57 So I ran this on the Python Bytes website, and we got this.

30:00 It says, there's a part where it like builds up a list and then takes some things out, trying to create a unique list.

30:07 I think this might be for like showing some of the testimonials.

30:11 It says, give me a list of all a bunch of testimonials and then randomly pick some out of it.

30:16 And then it'll delete the one it randomly picked and then pick another so it doesn't get duplication.

30:20 There's other things like that as well, also in the search.

30:23 And so I write del X bracket Y to get rid of the element or whatever it's called, item.

30:29 And they say, you know what, on a dictionary, you should just use X dot pop of Y.

30:33 I think the del is kind of not obvious entirely what's going on.

30:36 Sometimes it means free memory.

30:37 Sometimes it means take the thing out of the list, right?

30:40 So they're like, okay, do this.

30:41 And I got the square bracket in warning instead of the parenthesis, the tuple version.

30:47 And then also I had a list and I wanted to make a separate shallow copy of it.

30:53 So I said list of that thing.

30:55 And it said, you can just do list.copy or thing.copy and it'll create the same thing, but it's a little more discoverable what the intention is.

31:02 Probably also more efficient.

31:03 Probably do it all at once instead of loop over it.

31:05 Who knows?

31:06 Anyway, this is what I got running against R, stuff like this.

31:09 And you know what, I fixed it all.

31:11 - Cool.

31:12 - Except there's this one part where it's got a whole bunch of different tests to transform a string.

31:18 And it's like line after line of dot replace, dot replace, dot replace, dot replace, dot replace.

31:22 One of those lines is to replace tabs with spaces.

31:26 Then eventually it finds all the spaces, turns them into single dashes and condenses them and whatnot.

31:31 And it says, oh, you should change x.replace backslash t, So tab with a space replace that with x.expand tabs one.

31:42 I'm like, no.

31:44 Maybe if it was just a single line where the only call was to replace the tabs, but there's like seven replaces and they all make sense replace tabs replace lowercase with that like all these other things.

32:00 And if you just turn one of them into expand tabs, like why did where did this come like into the sequence of replacements?

32:06 like why would you do this one thing?

32:08 >> Yeah.

32:08 >> So I just put a no QA on that one and fixed it up.

32:12 But I found it to be pretty helpful in offering some nice recommendations.

32:16 People can check it out. You can just run it in an entire directory.

32:19 You don't have to run it on one file.

32:20 Just say, refurb./go.

32:23 >> Cool. Yeah. We should run several of these and then just do them in a loop and see if it ever settles down.

32:30 >> Exactly. If you just keep taking its advice, does it upset the other one?

32:35 Yeah, like if you pi upgrade and then refurb and then black and just and some others and yeah auto pip eight.

32:41 See, the goal of this one is to modernize python code basis if we had python to code I suspect it would go bonkers but we don't so it's okay.

32:51 But one of the cool thing you mentioned you weren't going to do the expand tabs but I didn't know about the expand tabs so.

32:58 The tools like this also just teach you stuff that you may not have known about our language.

33:04 >> Yeah, like that read text versus a context manager and all sorts of stuff.

33:06 Yeah. So the expand tabs, where was it? It was over here.

33:10 The expand tabs of one, that means replace the tab with one space.

33:14 So if you wanted four spaces for every tab, you would just say expand tabs four.

33:17 >> Which is probably correct, right?

33:19 >> Yeah, of course. Of course it is.

33:22 >> Of course it is.

33:23 >> All right. Well, that's it for all of our items.

33:25 You got anything else you want to throw out there?

33:28 - I don't, how about you?

33:31 - I do actually, all right, so let's see, I had a few things, I'll go through them quick.

33:35 So another sequence of things that I think's pretty interesting, this is not really the main thing, but it's kind of starting the motivation.

33:43 So we have over on all of our sites, on Python Bytes, on Talk Python and Talk Python Training, we have the ability to do search.

33:52 So for example, over on Talk Python Training, I can say ngrok api postman, And the results you got were just like, previously were like this ugly list that you'd have to kind of make sense of.

34:00 It was really not something I was too proud of.

34:03 But I'm like, I'm not inspired to figure out a different UI, but I got inspired last week and said, okay, I'm gonna come up with this kind of like hierarchical view showing like, okay, if I search for say, ngrok API postman, I wanna see all the stuff that matches that out of the 240 hours of spoken word, basically, right?

34:22 On the site and all the descriptions and titles and so on.

34:25 And so, like for example, this Twilio course I talked about used all those things and actually has one lecture where exactly it talks about all three of those things and then others where they're in there but like one video talks about ngrok then another one talks about an API or you know, it's not really focused, right?

34:44 And here just in this course, like it doesn't even exist in a single chapter but across 100 days of web and Python, like all those words are said.

34:52 All right, so I came up with this search engine and well, the search engine existed, but it wasn't running, you know, it wasn't basically hosted in a way that I was real happy with.

35:01 So what I did is I took some of our advice from 2017.

35:06 I said, you know what, I'm going to create, I'm going to create a system B service that just runs as part of Linux when I turn it on, that is gonna do all the indexing and a lot of the pre-processing, so that page can be super fast.

35:19 So for example, like the response time for this page is effectively instant.

35:23 It's like 30, 40 milliseconds, right?

35:25 Even though it's doing tons of searching.

35:27 So I'm going to run this Python script, series of scripts in the lab as a system D service, which is excellent.

35:36 So we talked about how you can do that.

35:38 If you look, here's an example.

35:40 Basically, you just create a system D.service file and you say like Python, space your file with the arguments and you can set it.

35:49 It'll just auto start and be managed by system control, which is awesome.

35:54 So that's all neat.

35:56 The other thing I want to give, the main thing I really want to give some advice about though is those these daemons what they look like is while true, chill out for a while, do your thing.

36:08 Wait for an event, do your thing.

36:10 Look for a file, do your thing, then look for some more.

36:14 You're just going over and over in this loop like running, but often it's not busy, it's waiting for something.

36:19 In the search thing, it's like waiting for an hour or something, and then it'll rebuild the search.

36:24 But it could just as well be waiting for a file to appear in some kind of upload folder and then like start processing that, I don't know.

36:32 Right, so they almost always have this pattern of like while true, either wait for an event and then do it or chill for a while and then do the thing.

36:43 So my recommendation, my thought here is if you combine this with multiprocessing, you can often get much, much lower overhead on your server.

36:54 Right? So check this out.

36:57 So here's an example of the search thing on TalkByThon search out of Glances.

37:02 Notice it's using 78 megabytes of RAM.

37:06 This is in a show notes, of course.

37:08 This is it just running there in the background.

37:10 Before I started using multiprocessing, it was using like 300 megs of RAM constantly on the server because it would wait for an hour and then it would load up the entire 240 hours of text and stuff and process it and do database calls and then generate like a search result, a search set of keyword maps, and then it would refresh those again.

37:33 But normally, it's just resting.

37:36 It puts that stuff back in the database.

37:37 But if you let it actually do the work, it will basically not unload those modules and unload all that other stuff that happened in there.

37:47 So if you take the function that says, just do the one thing in the loop, and you just call that with multi-processing, it goes from 350 megs to 70 megs, no other work.

37:58 'Cause that little thing fires up, it does all the work, and then it shuts back down, and it doesn't get all that extra stuff loaded into your process.

38:04 - Okay. - A little, cool, right?

38:06 - It is cool.

38:07 You could, I mean, for special cases like ours, I mean, for yours, you could just kick it off yourself, right, or have it be part of your published thing when you publish the show notes.

38:18 - Yeah, exactly.

38:20 I mean, I could base it on some of that.

38:23 Like, yeah, it could.

38:25 It gets complicated because it's hard to tell when that happens.

38:29 There's like a bunch, as you can see, like in this example, there's like eight worker processes.

38:35 All right, so which one should be in charge of knowing that?

38:39 I don't know.

38:40 So it's easy to just have that thing running and just like, you know, the search will be up to date is going but please don't overwhelm the server by loading the entire thing and hanging on to it forever.

38:50 >> Exactly.

38:51 >> Yeah, so anyway, I thought that was a fun story to share.

38:55 Let's do this one next.

38:57 We talked about JetBrains fleet, think PyCharm.

39:01 PyCharm is like little cousin that is very much like VS Code, I guess, but as like PyCharm heritage.

39:10 So this thing is now out of private beta, it's now into public beta.

39:15 So it has like Google Docs type collaboration, it has, but it has like PyCharm source code, refactoring and deep understanding that seems pretty excellent.

39:26 So people can check that out, it looks pretty neat.

39:29 I've done a little bit of playing with it, but not too much yet.

39:32 But if you're a VS Code type of person, like this might speak to you more than PyCharm.

39:37 So that's out.

39:38 Speaking of PyCharm, I'm gonna be on a webcast with Paul Everett on Thursday.

39:45 We're talking about Django and PyCharm tips reloaded.

39:48 So just kind of a bunch of cool things you can do to if you're working in a Django project in PyCharm.

39:54 You wanna be awesome and quick and efficient.

39:56 Okay, last one.

39:58 How about this?

39:58 This is Mark.

40:01 Go ahead.

40:02 - This blows me away and it's interesting.

40:04 - This is interesting.

40:05 So we all have got to be familiar with the GDPR.

40:09 I did weeks worth of work reworking the various websites to be officially compliant with GDPR.

40:16 You know, like we weren't doing any creepy stuff to like, oh, now we got to start stop our tracking or anything like that.

40:23 But like, there's certain things about you need to record the opt-in explicitly and be able to associate a record like that kind of stuff, right?

40:29 So some of us did a bunch of work to make our code GDPR compliant, others not so much.

40:37 But the news here is that Denmark has ruled that Google Analytics is illegal.

40:44 I mean, like, okay.

40:45 And illegal in the sense that the Google Analytics violates the GDPR and basically can't be used.

40:54 I believe France and two other countries whose name I'm forgetting have also cited that as well.

41:03 And yeah, more or less, a significant number of European countries are deciding that Google Analytics just can't be used if you're gonna be following the GDPR, which I think most companies, in the West at least, need to follow.

41:23 - Yeah, so I'm glad.

41:25 I mean, my early days of web stuff, I was using Google Analytics.

41:31 Of course, a lot of people do.

41:33 And it's free, they give you all this information free.

41:37 Why not?

41:38 Why are they giving, oh, it's not--

41:39 - Wait a minute.

41:40 - Wait a second.

41:41 They're using you and your website to help collect data on everybody that uses your website.

41:46 - Yeah, it seems like such a good trade-off.

41:48 But yeah, I mean, you're basically giving every single action on your website, giving that information about your users, every one of their actions over to Google, which seems like a little, I could see why that would be looked down upon from a GDPR perspective, no doubt.

42:05 By the way, also on that, if you look over on Pythonbytes.fm, the pay, let's see, does it say anything?

42:13 How many blockers have we got or how many creepy things do we have to worry about over here?

42:19 Zero, like we don't use Google Analytics, we don't use, yeah, that's just global stats.

42:26 But yeah, we don't use Google Analytics or any other form of client-side analytics whatsoever.

42:31 So I'm pretty happy about that actually.

42:34 But check out the video by Steve Gibson.

42:37 It's an excerpt of a different podcast, but I think it's worth covering.

42:41 It's pretty interesting.

42:42 - Yeah, it's something to watch at least.

42:45 - Yeah, yeah.

42:46 Ikevu points out in the audience, how can you enforce something like that?

42:49 That is Google Analytics being not allowed.

42:52 It's embedded in so many sites everywhere.

42:54 Sometimes you don't even manage it.

42:56 you just enter an analytics ID.

42:58 Yeah, it's honestly a serious problem.

43:02 Like for example, on our Python Bytes website, if you go to one of the newer episodes, they all have a nice little picture.

43:11 That picture is from the YouTube thumbnail.

43:14 Like it literally pulls it straight from YouTube.

43:17 The first thing I tried to do, Brian, was I said, well, here's the image that YouTube uses for the poster on the video.

43:25 I'll just put a little image where the source is YouTube.com/video poster or whatever the heck the URL is.

43:31 >> Yeah.

43:31 >> Even for that, Google started putting tracking cookies on all of our visitors.

43:36 Come on Google, it's just an image.

43:38 No.

43:39 >> Tracking cookies.

43:40 >> Yeah, or cookies.

43:42 What I had to end up doing is the website on the server side, looks at the URL, downloads the images, puts it in MongoDB, and when a visitor comes, we serve it directly out of MongoDB with no cookies.

43:55 Like it is not trivial to avoid getting that kind of stuff in there because even when you try not to, it shows up a lot of times like Ikovu mentioned.

44:05 >> Yeah.

44:06 >> The way it gets enforced, somebody says, here's a big website, they're violating the GDPR.

44:12 We're going to recommend, I'm going to report them basically is what happens I think.

44:16 >> Yeah. But I think for small fish like me or something, It's just if a country says don't do that, maybe I won't because they might have good reasons.

44:28 >> Yeah. If you're a business, you got to worry a lot more.

44:31 I don't think any individual will ever get in trouble for that.

44:35 But it's also, I mean, think about how much you're exposing everybody's information.

44:41 You can't know before you go to a website whether that's going to happen.

44:45 It's already happened once you get there.

44:46 So I guess see our previous conversation about ad blockers, Next DNS, do we hate creators?

44:52 No.

44:53 Do we hate this kind of stuff?

44:54 Yes.

44:55 Yeah.

44:55 Anyway.

44:56 Also, information is interesting.

44:57 So but just pay attention to what you have because you don't need Google Analytics to just find out which pages are viewed most.

45:05 Absolutely.

45:06 You can use other ways.

45:07 Yep.

45:09 All right.

45:09 Well, that's a bunch of extras, but there they are.

45:12 That's so serious though.

45:13 Do we have something funny?

45:15 We do.

45:16 Okay, something I got some this is very much.

45:18 I picked this one for you, Brian.

45:20 - Okay.

45:21 - So this has to do with testing.

45:22 Tell me what's in this picture here.

45:23 Describe for our listeners.

45:25 - I love this picture.

45:26 This is great.

45:29 So it says all unit test passing and it is a completely shattered sink.

45:35 The only thing left of the sink is the faucet is still attached to some porcelain.

45:41 You can turn it on and it goes down the drain.

45:43 Actually, so you've already, you've even got integration tests passing.

45:46 - Yeah, you do.

45:47 Yeah, it's pretty, not a hundred percent coverage, but yeah.

45:51 Right.

45:52 Not a hundred percent coverage of the sink.

45:54 Yeah.

45:55 There's this sink and it's completely smashed.

45:58 There's just like just a little tiny chunk fragment of it left, but it's got the drain and the faucet is still pouring into it.

46:04 Unit test pass.

46:05 I love it.

46:05 Yeah.

46:06 You might even cut yourself if you tried to wash your hands in this, but, but funny, you might, you might.

46:13 Well, that's good.

46:15 Fun as always.

46:16 Thanks for being here.

46:17 - Thank you.

46:18 - Yeah.

46:19 See you later.

46:19 Thank you everyone for listening.

Back to show page