Transcript #203: Scripting a masterpiece for Python web automation
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:03 your earbuds. This is episode 203, recorded October 7th, 2020. I'm Michael Kennedy.
00:10 And I am Brian Okken.
00:11 And this episode is brought to you by Datadog. Thank you, Datadog, for supporting us.
00:16 PythonBites.fm slash Datadog. And a lot of cool stuff out there. We'll tell you more about it
00:20 later. Brian, can you believe we're like well into the 200s? Well, by three.
00:25 Yeah, we're getting a good start already. Yeah.
00:27 A month almost. Yeah, I guess a month because that's zero based, which is pretty awesome.
00:31 Now, speaking of things that are awesome, DigitalOcean was a sponsor of the show for a while.
00:37 But before they were sponsors, we actually just use them as, you know, hosting our infrastructure.
00:43 And we still do. So when you download the MP3, your podcast player talks to something,
00:49 it's talking to our services on DigitalOcean and so on. And over there, we just have a set of virtual
00:55 machines, some database servers, some other things. And they manage themselves as kind of a cluster.
01:00 And by manage themselves, I mean, I manage them.
01:03 I mean, they mostly take care of themselves, but I do have to log in and take care of them.
01:09 But there are different ways of hosting your apps that don't require you to actually log in and
01:16 configure servers and make sure they're all good and so on. Often that's called platform as a service.
01:20 We also have Kubernetes clusters and things like that where you just say, here's a definition of my code.
01:25 Please make it go on the internet. Right. So what I want to talk about is DigitalOcean just launched a new
01:30 app platform that is a platform as a service. And like I said, I'm a fan of DigitalOcean because they're simple
01:36 and straightforward and affordable and easy to use, but really high quality.
01:40 So I think that it's worth pointing out this new platform that they just launched.
01:44 You're comfortable with doing your own, what, droplet or whatever it is?
01:48 Yeah, exactly.
01:49 I'm not. So I'm kind of looking forward to trying something like this.
01:53 And I've got a ton of different apps and they have inner connections within each other that they have to care about.
01:59 And like, there's a lot of stuff where, you know, at some point it makes sense to go down that path with various things that all work together.
02:06 But if I just got an app and I wanted to get on the internet, you know, often you don't want to deal with or worry about those things or, you know, forget to apply an OS patch or, you know, how many, how many times?
02:17 I mean, I'm large scale VC funded professional web apps say we're going to be experiencing downtime for the next 30 minutes or for four hours.
02:27 I'm just like, what could you possibly be doing that takes four hours?
02:31 I just, it's like boggles my mind that you're not able to do it better than four hours of downtime.
02:36 And so platforms like this mean zero downtime deployment and things like that.
02:41 So really, really neat.
02:42 So they've announced this new app platform.
02:43 I want to point out, this is not an ad.
02:45 This is just something I think is cool.
02:46 So I'm sharing with you.
02:48 Yeah.
02:48 So, yeah.
02:48 So they, they came up with this new app platform that, you know, you say it's pretty modern.
02:54 It's like, how do you get your code into it?
02:56 You point it at your GitHub repository.
02:58 You don't like log into it and do a Git thing.
03:01 You just say, I'm going to give you access to my source code and it will automatically deploy from that.
03:06 That would be one nice way to get it over there and get it set up.
03:09 But you also might want continuous deployment.
03:11 So if I push, like, how do you get a new version with zero downtime deployments and all that?
03:17 Well, you just push to a particular branch that you decide upon and it automatically notices that and does it redeploy.
03:23 That's pretty sweet.
03:24 Like, so I have that for like Talk Python training.
03:27 If I push to a production branch, it'll automatically do the checkout, ensure the requirements are built, recreate it.
03:34 I had to write that.
03:34 This just happened.
03:35 This is just part of it, right?
03:37 That's pretty neat.
03:38 Yeah.
03:38 Yeah.
03:38 I don't want to do that myself.
03:39 I didn't either, but it was better than logging in all the time.
03:42 So this is built on top of DigitalOcean Kubernetes, which is interesting because a lot of platform as a service type of things are just opaque.
03:51 They're like, well, you can give us access to your code and we'll make it run.
03:54 Magic.
03:55 But really all this is, is they'll orchestrate running your code on top of their Kubernetes clusters, which means you can like define Docker files in your repository that are going to be part of the app that runs in Kubernetes.
04:09 You can use some of the tools actually to talk to the underlying infrastructure.
04:14 So it's not a closed environment.
04:16 You can actually kind of get down to the infrastructure layer a little bit more.
04:19 So all these things are pretty neat.
04:21 It has automatic handling of traffic spikes for simple, simple, simple apps.
04:26 For static apps, it's free for a three limit.
04:29 For three of them, right?
04:30 For real apps, I guess, apps that run code like Python.
04:34 You can pay five bucks for like a simple version, like on a shared server.
04:38 Or you can pay 12 bucks for a more pro version that has more features, CDN, SSL, all those kinds of things.
04:47 And then if you want to scale it up, you can pay tons, right?
04:50 You can pay like $150 to run it on a huge server or a bunch of different small servers.
04:55 And there's a whole scaling thing that you can do, but there's a pretty decent offering.
04:58 It's still not as cheap as running on your own.
05:00 But just like you said, a lot of people don't want to run it on their own and that's not their expertise.
05:05 And why should they be doing that, right?
05:06 Yeah.
05:08 Like if you were to offer to do all of my server stuff for me, I would totally buy you dinner once a month.
05:15 Yeah.
05:16 And that's kind of the price, right?
05:17 But this would be like a cheap dinner, like a muchos gracias type of, you know, enchiladas and a Coke, not a filet mignon.
05:24 Yeah.
05:25 Maybe just like a $5 gift card to Starbucks.
05:28 Yeah.
05:28 There you go.
05:28 I could totally get two scones.
05:29 Anyway, if you were thinking about running your, you know, I talked to so many people, students of the courses and stuff.
05:34 And they're like, I got my app, but now I got to put it online.
05:38 Like what a pain.
05:38 Like I can't get engine X configured, right?
05:40 Or this other thing or so on.
05:42 This is another solid option now that has a nice, you know, push to a branch, deploy, run your stuff, zero downtime.
05:51 You know, it's probably most comparable to Heroku, I would say, in the Python ecosystem.
05:55 Yeah.
05:56 Yeah.
05:56 All right.
05:57 Well, people can check this out.
05:58 I think it's, I think it's a cool offering.
05:59 I will not be personally using it because there's a bunch of little gotchas like, you know, it would be better if, right?
06:06 For example, I don't want to use their hosted Postgres database.
06:10 I want to run a MongoDB server, which is fine.
06:12 It's no problem.
06:13 You can do that there.
06:13 But you can't, like what I do on the MongoDB server is in order to talk to it, you have to be within a white list of
06:20 known IP addresses that the servers, the web servers and API servers have.
06:25 All right.
06:25 So there's like 10 APIs in the world that can talk to that server and no others.
06:29 The thing is with these Kubernetes clusters, when you push redeploy, it will regenerate it and rehost it potentially somewhere else.
06:36 And the IP address keeps changing.
06:38 So you can't do things like have a custom database server that has, you know, firewall limited, restricted, like VPN type of stuff.
06:45 Those types of things don't exist.
06:47 Most people probably don't care.
06:48 I care, so I'm not doing it.
06:50 You can't do Mongo with this thing?
06:52 You can do Mongo, but you would have to have the MongoDB database port listen on the open internet rather than be restricted to just a few IP addresses.
07:03 Maybe they've figured this out and it's buried in the fine brain.
07:07 It's something that like there's a whole conversation about like, here's the things we're going to add.
07:11 Here's the things that it doesn't currently do.
07:13 Here's some workarounds, et cetera, et cetera.
07:15 So anyway, there's a whole conversation.
07:17 You can check it out.
07:18 But if you do things like use their hosted database, which would make sense in a pass type of story, you don't have these problems, right?
07:25 They automatically wire that stuff up.
07:26 Because when you want to break the rules, you get in trouble.
07:29 So you're a fan of Shakespeare.
07:32 Is that right?
07:32 Head down to Medford.
07:34 I've never been.
07:35 Ashland, sorry.
07:36 It's Ashland down there.
07:37 Ashland.
07:38 There's a whole like Shakespeare week.
07:40 Yeah.
07:40 Is Ashland still there with the fires and all?
07:43 God, I hope so.
07:44 Yeah.
07:45 No, I've always wanted to, but people that don't live in Oregon have no idea what we're talking about.
07:49 But there's a small town in Southern Oregon that does a lot of Shakespeare plays.
07:55 And that sort of transition was because I want to talk about Playwright.
07:59 So Microsoft put out an announcement announcing Playwright for Python.
08:03 I was trying to look into this.
08:05 I guess I haven't quite got whether or not Playwright was a thing before Playwright for Python or not.
08:13 But in any case, it's a Microsoft thing and it's a way to drive and test your web application through easily.
08:22 So it's an end-to-end testing solution.
08:25 It's open source and whatnot.
08:27 But in their announcement, it's a pretty cool announcement.
08:30 It gives examples and everything.
08:31 So I'm going to read their pitch.
08:33 The pitch for it is, with the Playwright API, you can author end-to-end tests that run on all modern web browsers.
08:41 Playwright delivers automation that is faster, more reliable, and more capable than existing testing solutions.
08:47 And I'm guessing by existing testing solutions, there's a nice way of them to say we are better than Selenium.
08:54 Yeah, that's what I was thinking as well.
08:56 So there's already a pytest plugin.
08:57 There's runs on Python.
08:59 We've said that we like animated GIFs of how it works.
09:05 And on their announcement page, there's a little animation.
09:08 And I was actually pretty impressed with that little bit.
09:11 So you can drive it even from a command line or an interactive shell.
09:16 You can drive some playing with it, which is nice.
09:21 So a few of the benefits.
09:22 Apparently, it's timeout-free automation.
09:24 So the Playwright automatically waits for the user interface to be ready before you act on it again.
09:31 I know there's some workarounds and there's some wrappers on top of Selenium that do that also.
09:36 But this is built into the system.
09:39 It's intended to stay modern with emulation of mobile viewports, geolocation, web permissions.
09:45 You can automate scenarios across multiple pages.
09:48 I don't really test websites that much, but I didn't know that that was difficult before.
09:53 So apparently that's easier now.
09:55 Cross-platform, of course, or cross-browser, of course, because you got to test against different things.
10:02 They use a Chromium driver for Chrome and Edge emulation, WebKit driver for Safari, and a Firefox driver.
10:10 And supposedly the Safari rendering driver even works on Windows and Linux.
10:15 So you don't actually have to have an Apple computer to do that.
10:19 So pytest compatible and Django compatible.
10:22 I'm sure it's compatible with lots of other stuff too.
10:24 But the examples on the announcement show pytest examples and Django examples, which is cool.
10:30 They even mentioned that, of course, you can run this from your continuous integration server, including GitHub Actions and others.
10:38 You must be happy to see that it's pytest, like natively pytest friendly, like with fixtures and whatnot.
10:45 I love that obviously we're to the point now where if you have a new testing tool, you may as well, in the announcement, tell people whether or not you can run it with pytest.
10:54 Because people are going to ask.
10:55 That's a good state to be in in the Python world, I think.
10:58 So for example, like the simple hello world sort of test is just go to make sure that you get like a header text on a page.
11:05 So it says define a function which takes a page with type annotations, by the way, double props for that.
11:12 So page and then that's already a fixture from the framework in pytest.
11:16 So it automatically passes that over setup.
11:18 All you do is say it takes a page and page go to URL, assert page.intertext of h1 equal equal, you know, the text you're looking for.
11:27 There's also more like that you could do.
11:29 It's like beautiful soup like stuff.
11:31 But there's more of the kind of drive it.
11:33 Yeah, go ahead.
11:34 That's two lines of code for a test to make sure that something's on a web page.
11:38 That's pretty cool.
11:38 Yeah, that is pretty slick.
11:39 And the fixture bit is neat.
11:41 You can also go and like do a test to log in.
11:44 So get a new page.
11:46 Go to the URL.
11:47 Do page.fill.
11:49 Give it a CSS selector for the username field.
11:53 Heck, the input field.
11:54 Give it a CSS selector for the passwords they fill with that.
11:57 And then click where the text of a button equals login.
12:00 You don't have to do the CSS stuff or anything.
12:02 Just find me a button or a thing or a URL that has the text login and click that.
12:07 And it's off.
12:07 And so like one of the examples here is it does that first.
12:10 And then it logs in.
12:11 Then it creates a session that remembers that it's logged in for the rest of the testing.
12:15 So that's like one of the setup phases, which is pretty cool.
12:17 Yeah.
12:17 Yeah.
12:18 Let me throw out one other thing.
12:19 You talked about Chromium as one of the drivers, right?
12:22 So a lot of times when you're doing Selenium, I don't know about this, but it looks the same.
12:25 You have to install Chromium.
12:27 And then there's like a little hidden one.
12:29 You can also do the Firefox browser for Selenium.
12:33 But I was talking to the guys at Attila from Scraping Hub on Talk Python.
12:39 And he pointed out that Scraping Hub makes a headless browser specifically designed to be a headless browser called Splash.
12:48 So their headline is the headless browser designed specifically for web scraping turn JavaScript heavy web pages into data.
12:56 So I don't know how much better that is, but it's interesting to think that you can swap out these browsers.
13:02 And here's a cool example as well.
13:03 Something that maybe people don't know about.
13:05 Yeah, I listened to that episode and thanks for reminding me.
13:08 I was like, I got to check that out.
13:09 Yeah, I do too, but I haven't checked it out.
13:11 But it definitely looks neat.
13:12 So this though, I like it.
13:15 I mean, it looks at least as neat as Selenium.
13:18 I don't know.
13:18 Maybe it's even better.
13:19 So pretty cool.
13:20 Yeah.
13:21 Also cool, Datadog.
13:22 They're actually sponsoring the show.
13:25 Unlike DigitalOcean where I just found something that I like from someone who happened to be a sponsor.
13:29 But Datadog are sponsoring the show, not making them any less cool.
13:32 So let me ask you a question.
13:34 Do you have an app in production that's slower than you like?
13:37 It's performant.
13:37 Maybe it's all over the place.
13:39 Sometimes fast, sometimes slow.
13:40 Here's the important question.
13:42 Do you know why?
13:43 With Datadog, you will.
13:44 You can troubleshoot your app's performance with Datadog's end-to-end tracing.
13:48 Get detailed flame graphs.
13:50 Identify bottlenecks and latency in that finicky app of yours.
13:53 Be the hero that got your app back on track at your company.
13:57 Get started with a free trial.
13:58 And I believe they send you a t-shirt, a little cool t-shirt still, over at pythonbytes.fm/Datadog.
14:04 So, Brian, something we haven't spoken about nearly enough is async.io and async and await.
14:09 Should we touch on that a little?
14:10 Sure.
14:11 Okay.
14:14 Yeah, we've talked about it some.
14:16 Some.
14:17 I believe some, maybe.
14:18 So, one of the things that async.io is for, I mean, if you look at the name, it's around...
14:27 Waiting on IO.
14:28 Waiting on external things like network calls, API calls, and so on, right?
14:32 Oh, I thought it was just trying to be cool.
14:34 Like all the .io.
14:36 It could be that.
14:37 Or it could just be like the Italian pronunciation.
14:39 Async, yo.
14:40 Async, yo.
14:42 No, it's beautiful.
14:43 So, when I think of files, I think of IO.
14:46 Like, if somebody said, what is IO?
14:48 I would think file IO.
14:49 That's the first thing I would say.
14:50 And yet, Python doesn't have built-in support for asynchronously working with file IO.
14:56 That's bizarre, right?
14:58 Yeah.
14:58 It is.
14:59 I believe there's an external package.
15:01 I think I saw it somewhere on like awesome async.io or some list like that, that somebody
15:06 had built something along those lines.
15:08 But, there's a cool article called Asynchronously Opening and Closing Files in Async.io by Chris
15:15 Wellens.
15:15 Nice.
15:16 So, he wrote this and said, look, Async.io has great support for networking, subprocess, interprocess
15:21 communication stuff, but no file operations like opening, reading, writing, and closing files.
15:25 And if you're talking to something that might take a long time, I mean, I don't know about
15:29 you, but I've got a pretty raging SSD on both my computers.
15:32 So, maybe I don't need this.
15:33 Unless you're at that corporate, maybe you're logged in through a corporate VPN and you've
15:40 mapped a network share over to your drive and then you try to read from that, all of a sudden
15:44 your file IO might get super slow, right?
15:46 Well, even on SSDs, file IO is slower than memory reads.
15:50 Yeah, it's much slower.
15:51 So, there's certainly situations where this could be extreme like the network one, but
15:56 you're right, even normal file IO can be slow if you're really looking to squeeze out the
16:00 most concurrency.
16:01 So, basically, he wrote a little article working through it and it's ridiculously short, actually,
16:07 on how you can do this, right?
16:09 So, basically, he says, look, if I use open, open file in Python, I would, as a decent Pythonic
16:16 bit of code, typically I would write with open thing as file IO object.
16:21 Right?
16:21 File Street.
16:22 Yes.
16:22 Let's build that for something we're going to call a open, which is an asynchronous one.
16:26 And it's kind of bizarre and weird that Python has this, but it does and I think it's neat.
16:30 It has an async with blocks when you do async things that have to be asynchronously managed
16:36 within context managers.
16:38 So, he said, let's write this so it implements the async with style, which is really simple.
16:45 You basically implement a couple of methods.
16:46 Instead of dunder enter, dunder exit, you do dunder a enter, dunder a exit, and so on.
16:51 Okay.
16:52 And then he says, okay, well, what we're going to do is we're going to define a function
16:55 that just opens a file.
16:56 Super easy.
16:57 But then we're going to run it in an asyncio event loop by saying run in executor.
17:04 And what that means is asyncio will create a thread pool where it's going to run over
17:11 on a background thread and then it just runs that and lets you await it.
17:15 And that's basically it.
17:17 Wow.
17:17 Isn't that neat?
17:18 That's not much code.
17:18 No.
17:19 It's like the opening bit is one, two, three.
17:21 It's six lines of code, including the function name, which has to be there.
17:25 The five lines of writing code.
17:27 Yeah.
17:27 And one of the things I like about this is not because I really want to do async file
17:32 stuff.
17:33 It's because it's a neat, neat little example that I can get my head around so that if I
17:38 have some other process or other slow thing that I want to make asyncified, this might be
17:44 an example to how to do that.
17:46 Yeah, absolutely.
17:46 So I think this is super instructive and interesting.
17:50 I'll also throw out that there is an AIO files package.
17:54 I think it's files plural.
17:57 Maybe it's file?
17:57 No, file singular.
17:59 AIO file, which you can pip install and then just do this instead of like see the tutorial.
18:05 But I think the value here is like, well, what else doesn't have async support and what could
18:11 I just kick over to a thread but then integrate into asyncio event loops?
18:15 Yeah, it's nice.
18:16 Indeed.
18:17 You know what else is nice?
18:18 Excel.
18:19 Like so many people who can't do any programming or any scripting or anything, they can just
18:24 go to Excel and like drag a droppy, a little, you know, a formula and paste it over and then
18:29 they're good to go.
18:29 Yeah.
18:30 Except?
18:31 Except what?
18:32 Except it's 2020.
18:35 That's the problem.
18:35 Yeah.
18:36 So this is only tangentially related to Python.
18:39 Mostly it's that people start using databases in Python, stop using Excel so much.
18:45 This article, we had a lot of people actually say, did you guys see this?
18:51 Yeah.
18:51 So yeah, lots of people brought this up to us.
18:54 I've got an article that I picked.
18:56 There's a bunch of articles also, but I picked a BBC.com article because it didn't have very
19:01 many ads.
19:02 So the BBC article says, Excel, why using Microsoft's tool caused COVID-19 results to be lost?
19:09 Wow.
19:10 So there's apparently, if you haven't heard about this, apparently there were 16,000 coronavirus
19:17 cases that went unreported in England.
19:19 The good news is, is they, well, sort of good.
19:21 They, they did, it only took like a few days for somebody to notice this, but there is a
19:27 few days where, where there was some stuff not getting tracked right.
19:30 And policy was like, Hey, things are getting better.
19:33 We're trending down.
19:34 This is amazing.
19:35 Yeah.
19:35 Except.
19:35 No.
19:36 Yeah.
19:37 So I just didn't read it.
19:39 So apparently what you had, you had, several commercial for testing firms filling out CSV files
19:45 and sending them to, I forget the, the name of the place, something, some health organization
19:52 in England that was pulling all this stuff together.
19:56 And they were pulling it together by putting it all in an Excel, XLS template that could be
20:03 then uploaded to a central system and made available to NHS test and trace team, as well as other
20:09 government computer dashboards.
20:10 But the use of the XLS template made it so that there was a limit of 65,000 rows.
20:18 Actually, that just gives me nightmares to think of a 65,000 row Excel spreadsheet, but apparently
20:25 that's the limit.
20:25 Nobody quite noticed that they'd hit it.
20:27 It didn't say anything about failing.
20:30 And, people noticed, some people said, well, you should have used XLS X because that
20:36 increases the limit by 16 times, but still Excel for this.
20:41 Of course, I was thinking, why are you doing this in Excel?
20:43 And in this article, they had a quote from professor John Croft, Crow, sorry, Crowcroft from the
20:50 university of Cambridge.
20:51 He says, Excel is always meant for people mucking around with a bunch of data on their small
20:57 company to see what it looked like.
20:58 And then when you need something more serious, you build something bespoke that works.
21:02 There's dozens of other things that could do, but you wouldn't use an XLS.
21:07 Nobody would start with that.
21:08 Exactly.
21:10 Exactly.
21:11 Apparently people did though.
21:14 And so people should be using Python.
21:16 Yeah.
21:16 That's not good.
21:17 That is not good.
21:18 So I think there's a really interesting trend of moving towards things like pandas to answer
21:25 these questions.
21:25 Right.
21:26 Yeah.
21:26 I don't think that's the answer for everybody.
21:29 Right.
21:29 Like, oh, well, Excel is kind of clumsy for you.
21:32 So here's what you should do is you should learn a whole bunch of programming.
21:36 Right.
21:36 I mean, here's a random story that I would, one of the more frustrating things from my
21:41 corporate days is when I was doing training, we would have to write proposals to send off
21:46 to clients and like, here's what we're going to cover.
21:48 Here's what I'm going to teach.
21:48 Here's your goals.
21:49 And here's the timeline and so on.
21:51 And I would send that off as a word document and work with one of the salespeople I worked
21:55 with.
21:56 And they said they'd send it off to a client and some, some had changed the word doc, like
22:00 a doc X said, oh, Michael, I need you to replace this word with that word.
22:05 And so she sent me the document back and asked me to replace that word with that word.
22:09 I'm like, do you not know about command R or control R?
22:13 Like, or whatever the replace hotkey is.
22:15 And why would you ever send me a file and just, I need this word to do a find and replace
22:20 with that one, but I need to do it for it.
22:22 I was just like, so anyway, I'm thinking of that person using Excel.
22:26 Like you would, I would never suggest that that person learn it.
22:29 That said, a lot of Excel power users, I think would do really well to adopt Jupyter
22:34 Overlab and pandas and stuff.
22:36 And actually Chris Moffitt, who's does practical business Python, just did a webcast with us
22:41 over, we talked about it before, but the recording's up now.
22:44 You can check that out and that'll give you some concrete tips to avoid the Excel if possible.
22:48 Oh, nice.
22:49 Good resource.
22:50 And that links in our show notes.
22:52 Yeah.
22:52 Would you be a fan of getting documents sent to you and asked to do a finder in place on
22:57 a word?
22:57 I've totally had that happen.
22:59 Yeah.
22:59 I'm saying.
23:00 Like I sent you the doc.
23:01 You could just, I mean, maybe send it back to me and say, hey, I made some updates and
23:07 here's my updates if you need to store the version.
23:10 Yeah, exactly.
23:10 Yeah.
23:10 Just make sure I did it right.
23:11 Maybe.
23:12 But I mean, it was pretty straightforward.
23:14 Anyway, let's move on.
23:17 I'm sure everyone out there has a story like that of you wouldn't believe what I had to
23:22 do in my corporate job.
23:24 So this next one comes to us from a listener, Preston Daniel, who's given us lots of cool
23:30 feedback and ideas.
23:32 And this one is called locust.io.
23:35 This is actually a pretty good pairing with Playwright.
23:38 Okay.
23:38 So Playwright is about validating that what is on the web page makes sense.
23:44 I can go log in and press the button and then I go to this page and this text is here.
23:48 Something like that, right?
23:49 As a continuous integration.
23:50 So locust is about, okay, you know that works.
23:53 What if 10 people do it at the same time?
23:56 What if 100 people do it at the same time on our current infrastructure?
23:59 Yeah.
23:59 You hear about things like the whole healthcare debacle where they spent hundreds of millions
24:05 of dollars of code on code on these projects and like a few people logged in and it just
24:11 failed.
24:12 And you just wonder, like, could you just tried it?
24:15 Just maybe just seeing like if we call that API 10 times a second, will it actually take
24:22 it?
24:22 Right.
24:22 And so tools like this are exactly what you want.
24:25 It's really cool for just simulating, accessing a bunch of different sites.
24:29 I was just thinking one good use for this may have been, sorry to interrupt.
24:33 Maybe the schools could have done this before they had everybody log in so that everybody,
24:39 all the kids on their laptops or their tablets wouldn't have said on day one, I don't know
24:44 what's going on.
24:45 It won't let me in.
24:45 Yeah.
24:46 The page won't load.
24:47 It just, it keeps giving me the numbers.
24:49 500.
24:49 Is this a math class?
24:50 Anyway.
24:52 Yeah, exactly.
24:53 So you should test your code.
24:54 And so I've used these before, these types of tools.
24:56 And often it's like, okay, what you're going to do is open a web browser and you're going
25:00 to go to the site and it'll record like the URLs and you can like use some weird like selection
25:06 syntax.
25:07 I guess weird clumsy GUI.
25:08 Maybe it stores it as XML, but you have like a UI on top of it.
25:12 It's all crummy.
25:13 And they probably charge you a ridiculous amount of money for this.
25:16 So here's the thing with Locus.
25:17 It basically looks like you're writing like unit test code.
25:21 So if you look at the, there's an example in the show notes, just check that out.
25:25 So what you do is you define a user and then you give the user some tasks or some behaviors.
25:31 Oh, this is the one that I was thinking.
25:32 I'm sorry.
25:32 I was confused this with your playwright.
25:34 So for example, with the user, like you would say something like self.client.post to log
25:40 in and you just give it a dictionary.
25:41 Username is this.
25:43 Password is that.
25:44 Boom.
25:44 That's it.
25:45 And that will actually go over there and submit the login form with that data, which is pretty
25:51 awesome.
25:52 And then you give it tasks.
25:53 And these are kind of like tests, like go to the index page, do a get on slash and do
25:57 a get on the JavaScript.
25:58 Go to the about page and do a get on slash about or, you know, go click this button or
26:03 go make this thing happen.
26:04 And then once you have this, then you can turn that into like a bunch of distributed parallel
26:10 requests to see if you get any 500 errors, timeout errors, like what the average latency is for
26:16 10 users, 100 users, a thousand users at a time.
26:19 You can run it on distributed machines.
26:22 So you can have it simulate millions of users if you want to run it on like 20 cloud VMs
26:28 or something like that and turn it on onto your website.
26:30 What do you think?
26:31 I think this is cool.
26:32 And you're saying that there's a game website that's using this?
26:36 There is.
26:37 In the notes that they say when they talk about the features, they say, look, you can define
26:40 user behavior in code.
26:41 Just plain Python code, which is neat.
26:44 It's scalable so you can run it, like I said.
26:46 And then it's battle tested.
26:48 Because Locus has been used to simulate millions of simultaneous users on Battlelog, the web app for
26:56 Battlefield games.
26:57 And so they could say, you really could say, Locus is battlefield, battle tested.
27:02 Nice.
27:03 I don't know if anybody's seen the trailer for the Battlefield games.
27:06 I've not been paying attention to it for ever, but for many, many years at least.
27:09 Wow, these games have come a long ways.
27:11 Like if you watch the trailer for the latest one, that's crazy, crazy stuff.
27:15 But it's kind of also beside the point.
27:17 I think this way of saying like, this is what a website user does.
27:20 They log in and then they go to this page and I might also visit this page.
27:23 And you set up things like, not just, I want to have.
27:26 So when you answer questions like, how many users can we support?
27:29 Typical users are not like pathological.
27:32 They don't go to like your account page and hold down Command R or Control R and just refresh
27:36 it as hard as they can, right?
27:38 They'll go there and they'll spend like three or four seconds, five seconds.
27:41 And then they'll go to another thing.
27:42 They'll spend 10 seconds there.
27:43 Then they'll go off and they'll click this button, right?
27:45 They'll have normal human behavior.
27:47 So one of the things you set up in this class you define that represents a user on your
27:51 site is a wait time.
27:52 So say the wait time is between five and 15 seconds.
27:56 And then you ask, can it take a million users?
27:58 It doesn't just do a million concurrent requests.
28:00 It has like a million of these things randomly waiting between five to 15 seconds as they're
28:04 kind of like interacting randomly with your site.
28:07 Oh, cool.
28:08 So you could sort of scale this then.
28:11 You could start with something like some long wait times and then make sure that it can
28:16 handle like a thousand users or something and then gradually make it shorter so that it's
28:22 hitting on your server harder.
28:23 Yeah, exactly.
28:24 I think this is really neat.
28:25 So I don't know that I would necessarily be using it right now, but if I create something
28:29 new, especially something I'm sure is going to get a lot of traffic, then I would definitely
28:34 use this.
28:35 It looks really neat.
28:36 It's free and open source.
28:37 Like it's right in Python.
28:38 Like why the heck not?
28:40 The only reason I wouldn't use it now is I've already had like some really big spike
28:44 events.
28:44 I'm like, okay, well, it's, you know, everything's running at like 2%, 5% CPU.
28:48 It's like, it's fine.
28:49 I don't know.
28:50 You can totally see.
28:51 I mean, there's a huge use case for this is that like people that have the, they're rolling
28:55 out a new app or even if they're an existing company rolling out something new and everything
29:00 looks fine on their server, even when they're testing with like two or three consecutive
29:05 tests or something.
29:05 But are we ready to roll it out?
29:08 We don't know how many people are going to hit it.
29:09 So they can sort of gauge that.
29:12 The one that I always have in mind when I think about this is you've got some app that's been
29:16 out there and it's kind of getting some traction and your company's getting some traction in
29:20 it.
29:20 And the company decides we're going to run a Superbowl ad or we're going to spend, we're
29:25 going to launch some huge marketing campaign on Black Friday.
29:28 that's like on like way, way out of bounds of what we normally do.
29:33 The last thing, I mean, you only get one shot for your app to work when that Superbowl ad
29:38 runs or on that Black Friday event.
29:39 If it just goes down for that little bit of time, it's not like, well, we got it up.
29:43 It's fine.
29:44 Now it's, you've lost that moment and that million dollar spend or whatever the heck it turns
29:48 out to be.
29:48 So it's like those moments where the spike is unknown, but also the time which you get
29:54 to deal with it is short.
29:54 Yeah.
29:55 Or things like, yeah, I'm pretty sure that the healthcare marketplace website's ready.
30:01 It's fine.
30:02 Yeah.
30:02 Sure.
30:02 Mr. President, this is going to be fine.
30:04 It won't be like blemish your record for all of history.
30:07 All right.
30:08 Speaking of things that I'm sure are going to be fine.
30:10 Hacktoberfest was such a, it's a good idea in theory, potentially.
30:14 We're like in, in middle October or deep into October already.
30:18 I don't know how your repos did, but I got a lot of attention.
30:21 Did you?
30:21 Yeah.
30:22 No, mine.
30:23 Yes.
30:23 Mine didn't so much.
30:24 I'll tell you about that, but go ahead and tell, tell people where we're going with this.
30:27 Okay.
30:27 So Hacktoberfest, hopefully you know about it, but if you don't, it's an interesting idea
30:32 sponsored by DigitalOcean and other sponsors.
30:34 Again, DigitalOcean not sponsoring this episode.
30:37 Overall, it's a good idea.
30:38 So the idea is to encourage people to contribute to open source by bribing them with a t-shirt
30:43 and other swag.
30:44 That works for geeks.
30:45 We love our t-shirts.
30:46 Like, how else are you going to be like wearing your clothes?
30:49 What do you put in your closet?
30:50 Yeah.
30:50 Maybe, maybe you can buy a t-shirt with a half an hour of work, but we're going to like have
30:54 you work for like hours and just get one t-shirt.
30:58 Anyway, there's always been some spam with this, people abusing it, but I think it was
31:03 not as prevalent as this year.
31:05 But what happened this year, and I'm going to link to a video by Anthony Satili titled
31:12 What's Wrong with Hacktoberfest?
31:13 He introduces what Hacktoberfest is, some of the problems, and he recommended some solutions.
31:19 We're not going to cover those today.
31:21 But apparently there was a YouTuber this year.
31:23 I think it was in India that did a video on how to get a free t-shirt by doing like, it's
31:31 basically how to get free swag with not much work.
31:34 And he did this video to show you how to submit a pull request to a project and only do things,
31:41 something like update the readme to say an awesome project or change its with it is or
31:47 something like that.
31:48 And then do a pull request saying document or improve docs and do that for four different
31:54 repos.
31:55 And there you got a t-shirt.
31:56 Yeah.
31:56 I met many of these people.
31:59 It turned into a big problem.
32:00 So I was actually really thrilled with how fast DigitalOcean and whoever's working on
32:08 Hacktoberfest fixed it, or at least hopefully, I'm sure people are still trying to do this.
32:13 So I'm sure there's a lot of spam going on, but they changed the rules.
32:16 So as of the third, they updated the rules to try to reduce the spam.
32:21 One of the big things is maintainers can opt in by adding a Hacktoberfest topic to their
32:28 repo.
32:28 So a whole bunch of stale old repos won't get hit, hopefully.
32:32 And then also you can mark any PR that's dumb as invalid and it invalidates stuff.
32:38 And actually the full rules is, let's see, we're going to have it in the show notes.
32:43 It's a little pseudo code.
32:45 So if you submit a PR in the month of October and the PR is labeled as Hacktoberfest accepted
32:53 by the maintainer or you submitted it to a repo with Hacktoberfest topic and the pull request
33:01 was merged or it was approved.
33:03 So you can't just submit it and get your t-shirt.
33:06 It has to be like some maintainer has to say, yeah, this is good or I approve it or whatever.
33:11 It's not automatic anymore.
33:13 And also if you are a maintainer and you're, and you've dealt with all the spam, sorry about
33:18 that.
33:18 But also I'd like to, I'd like to encourage more people to do Hacktoberfest because it's
33:23 a cool thing.
33:24 I didn't want to bring it up before because I didn't want to encourage spam, but I think
33:28 these changes will help.
33:30 And if you're a maintainer, please be sure to do those notifications by November 1st because
33:35 that's the deadline.
33:36 Yeah.
33:37 Interesting.
33:37 I had no idea what was going on until I saw Anthony Cotili's post or Twitter message.
33:44 You know, somebody came over to some of the, I have 222 repositories, most of which are public
33:50 between the courses and various other things.
33:53 So there's a bunch of opportunity to go in and make changes, right?
33:57 So somebody came along to the beginner, the Python for Absolute Beginners course and said,
34:02 I would like to add a few little tips for some beginners to make this slightly better.
34:06 You know, we can't change anything because it needs to match what's in the video.
34:09 But if you had a little section that had like some tips and they were meaningful, sure, I
34:13 guess that's okay.
34:14 And then the next day I woke up and it was like 10 PRs, not necessarily all from this person,
34:18 but from a bunch of different people with weird things like change the read me from this,
34:23 you know, check out our latest course to check out the latest course.
34:27 And just changing like the word hour to the, and I'm like, what is going on?
34:31 Then I saw Anthony's thing and I'm like, okay, close, close, close, close, close, close, close.
34:35 Just straight out.
34:36 Like, I don't even want to talk to these people.
34:38 This is super annoying.
34:39 And they weren't just making changes to the read me.
34:41 They would go in and they would make changes to like XML configuration documents.
34:45 I'm like, you can't change that.
34:47 That's, that's machine.
34:49 That's read by the machine, right?
34:50 That's going to break something if I accept this.
34:52 Not only is it like annoying that I got to deal with it, but if I were to accept that,
34:55 I'm pretty sure it would break, I think maybe it was like formatting, like putting a node,
34:59 closing node bit, like on, on a line above or like putting a space.
35:03 I mean, I don't think it actually broke it, but it was really weird stuff.
35:06 And I didn't understand it was coming from Hacktoberfest.
35:08 I was being hacked by the Hacktoberfesters.
35:13 Yeah.
35:13 But it has stopped since they made these changes, which is great.
35:16 Oh, it hasn't stopped?
35:16 So most of that stuff was in the first few days.
35:18 Yeah.
35:19 I haven't seen the last couple of days.
35:20 I didn't realize that's probably because the rules changed.
35:22 I just went through and like, just denied everything that I saw coming in.
35:25 Yeah.
35:25 I wonder if they forced the takedown of that video or maybe it's gone.
35:30 Yeah.
35:30 Yeah.
35:30 Who knows?
35:31 Who knows?
35:32 Well, I know that that's it for all of our main topics.
35:34 Got anything else you want to throw out real quick before we wrap it up with a joke?
35:38 They don't.
35:38 I could totally use a joke.
35:40 But do you have any extra things?
35:41 I do.
35:42 There's a really cool conference.
35:43 It's, I believe, theoretically was supposed to be this year in Vancouver, BC, which is an
35:50 absolutely wonderful town to visit, called Pie Cascades.
35:53 Cycles between Vancouver, Seattle, and Portland.
35:56 Well, this year it's taken a diversion to cycle to the internet because 2020, although it's
36:02 in 2021, like still planning now.
36:04 So Pie Cascades 2021 will take place Saturday, February 20th from the world.
36:10 I don't know if they're having any local stuff going on, but anyway, it's basically a virtual
36:16 conference and the call for proposals is open.
36:18 So if you'd like to give a presentation there, you can do that by November 10th.
36:23 Submit proposals.
36:25 So that would be cool.
36:26 You know, I think talking at get-togethers like this, meetups, the smaller, not full-blown
36:33 PieCon, but Pie Cascades and other types of events are a really good way to sort of raise
36:37 your profile and stretch your comfort zone as a developer.
36:40 So I encourage people to do it.
36:41 Also, Patricia.
36:42 I spoke at the 2020 version.
36:45 That was just before the world fell apart.
36:48 That's right.
36:48 I was there.
36:49 My daughter and I watched from the back.
36:51 It was great.
36:52 Next thing, other thing, Patricio Reins, who is a researcher at the Barcelona Supercomputing
36:58 Center, which by the way, they have this virtual tour he sent me.
37:01 Oh my God, it is so awesome.
37:02 They have like a pop song for it.
37:04 It is held inside, literally the supercomputer is inside an old cathedral.
37:11 So like where all the arches are and where the sermons would have been given, that's where
37:17 the supercomputer is.
37:18 That's pretty awesome.
37:18 Can we put that link in the show notes too?
37:20 Yeah.
37:21 Yeah.
37:21 Yeah.
37:21 I'll put it in there.
37:21 But that's not why he sent it to me.
37:23 He just said, hey, I happen to work here and I use Jupyter a lot.
37:26 You spoke about Black Cell Magic and then another black formatter plugin for Jupyter
37:34 Notebooks.
37:34 So he said, you should also check out NB Black, NB underscore black, which works in Jupyter
37:40 and JupyterLab.
37:41 And there's another one that only works in JupyterLab called the JupyterLab code formatter.
37:45 So just like always, we mentioned one thing that we kind of discover and then listeners are
37:51 like, that's great.
37:52 And, and, and, and here's a bunch of other stuff.
37:54 So thank you for that, Patricio.
37:55 Yeah.
37:55 Nice.
37:56 But I love that.
37:57 I like the multiple tool thing.
37:58 That's fine.
37:59 Yeah, indeed.
37:59 All right.
38:00 Let's do a joke.
38:01 I've chosen some very clear ones that actually have a visual component.
38:04 As you know, I don't know why I do that, but that's what I've done.
38:07 So why don't you, I'll let you do the first one.
38:10 I'll do the second one.
38:11 So the way people who don't know, this is a classical programmer painting.
38:16 And the idea is this is a legitimate real painting from some museum.
38:21 Typically they're hundreds of years old, but there's, instead of having, you know, like flowers
38:29 in the, the tide pools or whatever, some random thing that the artist named it, it's renamed
38:36 with a programming title.
38:38 Okay.
38:39 Yeah.
38:40 So why don't you quickly describe your picture and then tell us the title.
38:44 Okay.
38:45 So, the picture is, it's a white, kind of a white gray background.
38:50 I think it's snow or something.
38:52 There's some horses running.
38:54 There's a white out blizzard almost.
38:54 Yeah.
38:55 It's horrible.
38:55 Yeah.
38:55 And there's some horses running, two horses running, pulling a, what, like a sled or something?
39:01 I don't know.
39:02 And there's somebody laying on the sled.
39:03 All right.
39:04 What's the title?
39:04 Delivering a feature in the time of a code freeze.
39:07 This is by Anthony Petrowski, oil on wood, 1883.
39:13 Yeah.
39:13 It's beautiful.
39:14 All right.
39:15 So the one that I got here, it's these three guys, they look highly skeptical, almost like
39:23 they're on some kind of mission, sneaking out of like really tall grass on a boat in some
39:28 kind of swamp.
39:29 You can see them like really slowly sort of approaching.
39:32 And the title is Red Hat Enterprise Linux, sys admins entering the Docker convention floor.
39:38 Oil on canvas, 1882.
39:40 Isn't that a great one?
39:42 Like, look at their face.
39:43 Yeah.
39:44 People got to check this out.
39:45 Click on the link in your podcast player and see it.
39:47 They're like angry pirates in a canoe.
39:50 Yeah.
39:50 It's sort of a piratey feel to it.
39:52 Like they're like, oh, what are we doing here?
39:53 We're breaking in.
39:54 It's such a weird world.
39:55 This Docker Kubernetes.
39:56 I love this thing of like programmer quotes on old on paintings.
40:02 It's a, it's funny.
40:03 Yeah.
40:03 If there's ever some sort of like artwork exhibition at a PyCon, this is happening.
40:09 Oh, we could probably do it virtually somehow.
40:12 Try to do it at a virtual conference.
40:14 Yes.
40:14 I think we could.
40:15 Yeah.
40:16 Yep.
40:16 All right.
40:16 Well, thanks for being here as always.
40:18 And thank you everyone out there.
40:19 Thank you.
40:19 Let's listen.
40:19 Yep.
40:20 Bye-bye.
40:20 Bye.
40:21 Thank you for listening to Python Bytes.
40:22 Follow the show on Twitter via at Python Bytes.
40:25 That's Python Bytes as in B-Y-T-E-S.
40:28 And get the full show notes at Python Bytes.fm.
40:31 If you have a news item you want featured, just visit Python Bytes.fm and send it our way.
40:35 We're always on the lookout for sharing something cool.
40:38 On behalf of myself and Brian Okken, this is Michael Kennedy.
40:41 Thank you for listening and sharing this podcast with your friends and colleagues.