Transcript #262: So many bots up in your documentation
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:04 This is episode 262, recorded December 8th, 2021.
00:09 Oh my gosh, it's almost winter.
00:11 I'm Michael Kennedy.
00:12 And I'm Brian Okken.
00:13 And I'm Leah Cole.
00:14 Yay.
00:15 Yay. So great to have you here. Thanks for being here on the show.
00:19 Yeah, happy to be here.
00:20 You and I got a chance to discuss Airflow over on Talk Python a couple months ago, something like that.
00:26 Yeah.
00:27 Yeah, but now we'll probably do a little more Airflow over here for people who aren't familiar with that, but also just whatever you're interested in.
00:34 So great to have you here. Why don't you tell people a quick bit about yourself before we jump into the topics?
00:39 Sure. So I'm Leah and I am a developer relations engineer in Google Cloud.
00:45 And specifically, I work on Cloud Composer, which is our hosted managed product of the popular Apache Airflow project, which we'll talk about a little bit later.
00:56 And in addition to writing samples and content for that, I also work with a group of fellow engineers and we maintain all Python samples for Google Cloud.
01:07 And make sure that they stay tested up to date and are healthy and are getting reviewed for new samples.
01:13 And that's a lot of fun. That kind of fell into my lap and has been a good time.
01:17 Yeah, that's fantastic.
01:18 Yeah.
01:18 I remember Python being one of the original two supported languages on Google Cloud, right? It had sort of a special place.
01:25 Yeah. Now it's one of seven, I think.
01:28 Yeah. Cool. Well, that sounds like such a fun job. I've always imagined dev relations type of jobs to be super fun. Maybe slightly less fun in COVID because the travel and the conferences and, you know, all those kinds of things are part of it.
01:40 But still a fun job, right?
01:41 Still a good time. Every day is a little bit different. You kind of never know what's going to happen. And that's part of what I like about it.
01:47 Yeah. Awesome. Cool. Brian, I don't even know what you're going to cover. So I don't know what's going to happen. Why don't you let us know?
01:53 You don't know what I'm going to cover?
01:54 Well, I'm not looking at my docs yet.
01:56 Oh, okay. Sorry. Fighting it cold. I am super excited. pytest 7, release candidate one, is out.
02:05 Oh, that's excellent. That's big news.
02:07 It is. The last release for pytest was, or six, they've done other dot releases, but the 6.2 or 6.0 came out. 6.2, I don't know. I lost track.
02:18 We use 6.2.4 for our GCP samples.
02:21 Oh, you do?
02:23 We do.
02:23 Well, I think it was, I wrote this down. The 6.2.0 was released on December 2020. So it's been, we're ready for a new one. So 7.0 is out, the release candidate at least. And so because it's a release candidate, to install it, you have to do pip install pytest double equal 7.0.0 RC1. We've got that in the show notes.
02:47 It's also on the release announcement page for pytest. But I wanted to go through some of the cool features that I'm really excited about. There's a lot of great things in there.
03:01 But there's some little improvements with the aprox thing. So one of the things that pytest has is an aprox. So you can say floating point numbers, if you're comparing them, you should never do equal. But you can do equal aprox with pytest. And it's really pretty cool.
03:16 I didn't know that because any science you're doing is so, like double equals is the kiss of death for floating point math comparison.
03:24 Yeah. Well, the pytest of Prox does now the docs reference the NumPy comparisons, which is nice because NumPy has some really cool features around that.
03:35 But pytest out of the box does. And now also with mappings and dicts and other sets, it handles decimal types, which is nice.
03:47 Decimal types, of course, are very useful when working with money and other things that need to be exact decimals.
03:53 One of the things that's really cool is the sequences are compared better. So if you have like a list of numbers and you compare against an approximate list of another numbers, I didn't know you could do this.
04:06 It will tell you which index was wrong and by how much.
04:12 And actually not by how much, but what the expected was. And that's pretty neat.
04:17 So those are the little minor features. Most of these are kind of minor, but major for somebody, right?
04:23 So one of the things I like is that some people have mentioned fixtures or sometimes when people use a lot of fixtures, they don't know where the fixtures are.
04:31 Well, there's a couple of flags, fixtures per test and --fixtures. Both those flags are helpful to find out what fixtures you have available.
04:40 And now by default, they print the location of the path with the fixture name.
04:47 And you can also do a verbose option that prints out the full doc string, which is pretty handy.
04:52 A couple of things that I'm really excited about are a Python path that's been added.
04:57 And that was a feature I added to the project, which is fun.
05:01 Nice. It's cool to see the contributions you're making coming back out.
05:04 Yeah, it's cool. And then there's a bunch of other features that I contributed to by just saying,
05:10 this is a little weird. Can we fix this? And somebody else volunteered to fix it.
05:14 So it's nice.
05:15 That's the best kind of contribution.
05:16 Yeah.
05:19 One of the improvements in the docs, which is kind of fun, is there's an auto-generated list of...
05:26 So I've got the changelog going on here.
05:27 And I got to come back to this.
05:30 There's an auto-generated list of plugins.
05:32 And there's 963 right now.
05:35 We'll refresh it.
05:35 No, still 963.
05:37 But that's a lot.
05:38 When I first started writing the beta or the second edition of the Pi test book,
05:45 I noticed this and I wrote it down, but the number keeps changing.
05:48 So I took out the number.
05:49 I'm like, it's a lot.
05:51 There's a lot of cool plugins.
05:53 One of the things that if you'll notice when you go to the changelog, it starts with breaking changes and then deprecations.
06:00 And I think this is around because people, when they upgrade, they want to know if it's going to break their code or not.
06:06 I have tested a bunch of stuff and upgraded from 6 to 7.
06:09 And I haven't noticed a lot.
06:11 There was like a 6.1 to 6.2.
06:15 I can't remember what the...
06:17 There was one break a while ago in the 6x that messed some plugin authors.
06:22 But I haven't noticed any problems.
06:23 So please try these out.
06:25 I wish they would do the features first and then not the breaking changes.
06:29 Because that's...
06:29 I suspect it's the people working deep in the guts, like the plugin authors that hit these deprecations
06:36 and not just people doing assert this equals that type of work.
06:39 Yep.
06:39 Right.
06:40 One of the things that I didn't list, but I think a lot of people are excited about,
06:45 there's more the objects within pytest that people are using.
06:48 More of them are type hinted now so that you can do type hints with objects.
06:53 Oh, that's nice.
06:54 Yeah.
06:55 That's really nice.
06:55 So fun.
06:57 Leah, do you use pytest?
06:59 Some of these changes exciting?
07:00 We do.
07:01 We use pytest on our Python samples.
07:03 And so I actually, the one that was most exciting to me was the fixtures.
07:07 Figuring out where fixtures are is definitely something that comes into play for me,
07:12 especially when we're maintaining something that was written a while ago
07:16 by someone who might not be working on that code anymore.
07:19 Yeah.
07:20 Yeah.
07:20 Nice.
07:21 Yeah.
07:21 This is great.
07:22 I love the pip installable RC1.
07:24 That's great.
07:25 And before we move on, let's take a step back.
07:28 Roman Wright, author of Beanie.
07:29 Hey, Roman.
07:29 Out there in the audience says, Hey, I'm a big fan of Google Cloud.
07:32 Oh, thank you.
07:33 For sure.
07:34 Well, I've got some fun stuff to talk about next year.
07:37 I want to talk about this thing that David Smith, former guest co-host here on Python Bytes, sent over and said,
07:45 this looks cool.
07:46 Sam Lowe and Philip Guell released this thing called Pandas Tutor.
07:51 This is cool.
07:52 Yeah.
07:53 Previously, Philip had built Python Tutor at pythontutor.com.
07:57 Now there's pandastutor.com.
08:00 And it's all about just helping you understand what the code does.
08:05 So it basically says, look, there's this code here.
08:08 Like imagine you've got a list of dogs that have a breed, a type, a longevity,
08:13 a type is like a herding dog or a toy dog.
08:15 It goes in a purse.
08:16 Longevity, size, weight, and so on.
08:19 And you've got that as a data frame.
08:20 If you wrote dogs with a size equal, equal medium, then sort values on type and then group by,
08:27 by type and then show the median.
08:29 Well, what is that actually doing?
08:30 Like, how do I understand that?
08:32 Right.
08:32 As somebody learning Pandas, imagine I don't really have a database background.
08:36 And so I'm not sort of trying to map that over to like, okay, there's the where clause, there's
08:40 the order by clause.
08:41 And, you know, like that kind of business.
08:42 Right.
08:43 So what is happening when I write that code, either because I'm coming across it for the
08:48 first time or, which happens to me a lot, I wrote it two years ago and understood it perfectly
08:53 then.
08:54 I have no idea what it does now.
08:55 You want to know what it does?
08:56 Yeah.
08:57 That happens way too often.
08:58 Right.
08:58 So what you do is you can go and run this code over in Pandas Tutor and you say visualize
09:05 and says running a code, please wait.
09:08 And so what they do is they put a CSV bit of text in here.
09:10 Like a triple string and then use Pandas read and then just do that one line.
09:14 So that's a nice way to kind of get data in there.
09:16 And the way to think about this is steps.
09:18 It shows you what is the first step and what is the second step and so on.
09:21 So when you go there, you'll see that it has the code that we were talking about.
09:26 But then right now the effective where clause, the filter is regular font and the rest is gray.
09:32 It's like fade into the background.
09:33 And so you can actually see what the starting data frame was and the ending data frame and
09:39 then how it got in there.
09:40 And you can use the mouse over like, so what they're saying is the type is medium.
09:44 So if you hover over like a large or a small dog, there's just no arrow.
09:47 But if you hover over medium, it shows you where in the result that that thing landed.
09:52 Isn't that cool?
09:53 That's wild.
09:54 Isn't that wild?
09:55 And so then you can see size has all the values on the left and then the size is grouped on
10:00 the right and it shows medium, medium, medium, medium, because that's all that's in there.
10:03 Now, when I first looked at this, I'm like, there's a bunch of stuff on the screen.
10:06 What's going on?
10:07 I noticed the arrows, but then what it took me a minute to realize is there's multiple
10:11 steps.
10:11 So the next thing, if you scroll down, shows the same code at the top, but now the sort
10:16 values type is highlighted, right?
10:18 That's the next part of what looks like one expression in pandas.
10:21 And so now it highlights the column that it's sorting on.
10:24 And you can actually see the arrows pointing to how they were reordered in the result because
10:29 you're sorting by type.
10:30 So it's non-sporting, non-sporting, non-sporting, non-sporting, and then sporting, sporting, and
10:34 working, working, and so on.
10:35 So that was step two.
10:36 And we have a group by, this one's interesting.
10:39 It doesn't have arrows as colors.
10:41 So the group by type, again, non-sporting, sporting, and so on, you end up with these
10:46 groups.
10:46 Like here's a blue, a blue box of all the non-sporting dogs, the bulldog, the poodle, the French bulldog
10:52 is so cute.
10:52 Then you've got the golden retriever and the Labrador and the boxer, right?
10:56 So these are grouped into the colors.
10:58 And then finally you do the median and it shows how those groups reduce down to statistics.
11:03 The longevity of a non-sporting dog is less than a sporting dog, apparently, but they're
11:08 also lighter.
11:08 So anyway, what do you all think?
11:10 Oh my gosh.
11:11 I love this.
11:12 This is nice, right?
11:13 I'm a very visual learner, so I really appreciate this.
11:17 And especially if you're working with data that you kind of aren't sure what it does and
11:22 or the code, like that's pretty incredible.
11:24 I'm filing this away.
11:25 It's going to go in my team's group chat pretty much as soon as we're done recording.
11:28 In fact, yeah, it's awesome.
11:30 I think it's really good.
11:32 You know, there's so many people who are presented a notebook or presented some kind
11:35 of result and they're like, I need to understand what that means so I can keep following.
11:39 And I think, you know, throw it into here or something like this would be really helpful.
11:42 Well, and a lot of people that have spent a lot of time with databases might, it might
11:47 be obvious what these things do.
11:48 But for people that don't spend a lot of time with SQL, it's not obvious.
11:54 And so this is really nice.
11:56 Yeah, definitely.
11:57 Or if you're like trying to take some example that you have with their example data
12:02 and trying to translate it to your own data, that's something that customers do all the
12:07 time for us.
12:07 It's something I do a lot too.
12:09 Just seeing how it behaves with your stuff.
12:12 Oh, man.
12:13 You didn't write it, but you want to use it.
12:15 So how much applies.
12:16 Yeah.
12:16 Exactly.
12:16 Yeah.
12:17 Yeah.
12:17 So this is quite cool.
12:19 Dean out in the live stream.
12:21 Hey, Dean.
12:21 Says Pandas Tutor looks awesome.
12:25 And Robert Robertson also loving it.
12:27 It's nice.
12:28 So very cool.
12:29 Indeed.
12:29 All right.
12:30 Over to you, Leah.
12:31 All right.
12:32 So yeah, my first thing today is Apache Airflow.
12:34 So Airflow is a project that is part of the Apache Software Foundation.
12:39 It's a workflow orchestration tool that originated at Airbnb, I want to say in like 2014.
12:45 And then pretty shortly after became part of the ASF.
12:50 And it became a top level Apache project in, I want to say early 2019.
12:55 It's been a little while now, which is very exciting.
12:58 So you can use it to author these workflows as directed acyclic graphs or DAGs of tasks,
13:05 which is pretty cool.
13:06 And it's most commonly used with workflows that are like pretty static, not super frequently
13:12 changing or slowly changing just so that you can see how the workflow goes over time and
13:18 not allows you for some clarity and continuity in your workflows.
13:22 I've always sort of wondered what the role of these workflow type systems were until I realized,
13:28 you know, if you're going to build a full end to end type of workflow without a framework,
13:32 there's a lot of coordination.
13:34 And what if this fails?
13:35 Where do you restart?
13:36 What do you do?
13:36 And then the analogy for me is kind of like Flask or some web, like all I got to do is
13:41 write this little thing and everything else will come together to make sure these four
13:45 lines of my Python code run.
13:47 They run reliably.
13:48 If they fail, it gets dealt with.
13:49 Right.
13:50 It allows people to not have to understand the whole system and just go, I need you to load
13:54 up this file and put it into that database.
13:56 Can you write that code?
13:57 And that's all you got to know to be part of some complex thing.
14:00 Right.
14:00 Yeah.
14:00 Yeah.
14:00 It's I mean, it's not the most glamorous thing, but it is extremely useful.
14:06 I mean, I did a summer internship when I was doing my bachelor's where I wrote a cron job
14:13 that ingested some data every night.
14:15 And the only way I knew if it failed was if I looked in the target folder where it was
14:19 supposed to end up.
14:20 And if the data wasn't there.
14:22 No files.
14:23 Whoops.
14:24 That sucked.
14:25 I'm sure a lot of people have dealt with that.
14:27 And this is actually like a really common Airflow workflow, which is the extract, transform
14:33 and load the ETL workflow, which is where you have data somewhere that you want to get.
14:38 You want to do something to it or maybe not.
14:40 Maybe you just want to extract and load it.
14:42 And you want to put that result somewhere else, either locally or in the cloud for all of that.
14:48 And Airflow lets you do all of that.
14:50 And you can see the history of these jobs.
14:53 There's a UI where you can see, did it fail?
14:55 It has a helpful error message.
14:57 If it failed, it's not just, oh, gosh, the data is not there.
15:00 What do I do?
15:01 Yeah.
15:01 You've got a really cool UI where it shows all the parts of the workflow running and whether
15:06 or not they finished successfully and stuff, right?
15:07 Yeah.
15:08 And it got a makeover fairly recently.
15:10 So it's had a lot of improvements.
15:12 Yeah, that's super cool.
15:14 Another thing maybe you could talk about really quick is the connectors.
15:17 I don't remember exactly the right terminology.
15:20 There's a name for them.
15:22 Tell us, tell people about that.
15:23 That's also good to know.
15:24 So these connectors that you're thinking of, I mean, we can use the word connector to describe
15:28 what it does.
15:28 There are these things called operators in Airflow.
15:31 And an operator executes a single task.
15:34 And so that might be executing a bash script or executing a Python script.
15:39 But we also have these connectors that are grouped by providers, which might be your cloud provider
15:45 or other software providers that allow you to execute code there.
15:49 So for example, we have a ton of GCP operators.
15:53 One example might allow you to create a data prop cluster or then run a job on that data prop
16:00 cluster and maybe tear it down when you're done.
16:02 And there are providers that have operators for all the major clouds and more.
16:10 You can do it.
16:10 There's one that sends a Slack message when it's done.
16:13 So if you can dream it, it might be there.
16:16 And if not, you can make it there.
16:18 That's awesome.
16:19 What's GCP?
16:21 GCP is Google Cloud Platform or Google Cloud.
16:24 GCP might be a dated acronym.
16:26 Sorry.
16:27 Don't know.
16:28 Yeah.
16:29 Yeah.
16:30 So one of the advantages, I think, of that that's really cool is you don't necessarily
16:33 have to know all those APIs.
16:35 Like if I was going to connect Slack to GCP to like Azure Blob Storage to like some hosted
16:41 database, I don't have to learn all those things.
16:43 I can just sort of click it together.
16:44 Yeah.
16:45 You just have to.
16:46 There's a small amount of setup you have to do for auth, which is understandable.
16:50 You can't just like publicly go to your Azure Blob thing to grab your data.
16:54 But once you set up that connection, then your operators can talk to those things.
16:59 And if you use so you can run or host Airflow yourself.
17:02 And there are a few different ways to do that.
17:05 And then Amazon and Google both have managed hosted providers.
17:10 And there's a company, Astronomer, that also does manage hosted ones.
17:14 And so if you're in an Amazon or a Google, the advantage there is that the connections
17:19 with those operators might be a little bit simpler from the auth and networking perspective.
17:24 But other than that, you're you can still like if you're running in Cloud Composer, which
17:29 is Google's Airflow, you can still be using the Amazon or the Microsoft operators to pull
17:34 data from over there.
17:35 That's really common.
17:37 And you see it all the time and bring it do some stuff in Google Cloud and either put
17:42 it back in the other cloud or leave it in Google Cloud.
17:44 That's totally normal.
17:46 And people are doing that all the time.
17:47 Right on.
17:48 Yeah.
17:48 Cool.
17:49 Cool.
17:49 I think this is neat.
17:50 And people for whom that would make sense.
17:53 You're like trying to do these sort of running in the background schedule jobs.
17:57 Or there's triggers as well.
17:58 Like a file has been uploaded or landed here.
18:00 Yeah.
18:01 Let's talk about that.
18:02 So that's actually I had written down this one example, but I'll adapt it slightly since
18:05 you mentioned triggers.
18:07 So that's another common type of operator.
18:09 These sensors where you wait for a certain condition to be true.
18:12 And they're used in data analytics workflows all the time.
18:16 So like one example workflow might be waiting for a particular file to appear in a cloud
18:22 storage or an S3 bucket.
18:23 So you'd use one of those sensors to wait for that to happen.
18:27 And then you want to do something to that data.
18:30 So let's say you then create a data prop cluster that is going to run a PySpark job on that cluster.
18:38 And then you can store the results in BigQuery at the end and then delete the cluster and like
18:43 send a Slack message when the job is done.
18:46 That's a very common ETL thing, including that sensor.
18:50 Yeah, that sounds pretty nice.
18:51 Definitely seems interesting and quite useful.
18:54 Yeah.
18:54 Brian, thoughts before we move on?
18:57 I have a question.
18:58 If you wanted to get started with something like this, I was trying to look for tutorials
19:03 and getting started and stuff like that.
19:05 Does it make sense or is it too confusing if somebody, you said you could run it on your
19:10 own machine.
19:10 Does that make sense to try it that way or should you try it with a?
19:15 Okay.
19:16 You totally can do it on your own machine.
19:19 And there's this really wonderful environment that can be found in the Airflow repository that's
19:25 called Breeze.
19:26 And it's a Dockerized version of it.
19:29 It shouldn't be run in production.
19:30 But if you're looking to try it out or if you're looking to contribute to Airflow, we highly
19:36 recommend that everyone check out the Breeze environment.
19:38 Right now, I have the community page pulled up where you can join the dev list in the Slack
19:44 if you have questions.
19:45 But if you were to go to the GitHub repo, you would see Breeze right on that first page.
19:50 Okay, cool.
19:50 Thanks.
19:51 Yeah.
19:51 Great question.
19:52 Thank you.
19:53 Yeah.
19:53 Very good one.
19:54 All right, Brian.
19:54 Are you going to give us a tutorial on Airflow or what we got going next?
19:58 Yeah.
19:58 So I was looking through the tutorials on Airflow and I noticed that right away one of the examples
20:03 used D-Dent.
20:05 How about that for a connection?
20:08 Nice connection.
20:09 Totally well planned.
20:10 Very cool.
20:11 D-Dent was suggested.
20:14 It's a text wrap tool.
20:15 It's suggested by Michael Rogers-Villet.
20:18 It's a small utility, but it's super useful.
20:21 And I kind of forget that it's, I mean, I use it all the time, but I forget to mention it
20:26 to people.
20:26 But it comes up a lot.
20:29 And the idea around D-Dent is you've got something.
20:33 Oh, I think I lost my D-Dent thing.
20:36 See if I can find it.
20:39 There it is.
20:39 The idea is you've got a multi-line string.
20:42 Like here we've got Hello World and some multiple lines and there's different spacing.
20:46 But I'm, as you notice, I want to define it within a test, within a test function or within
20:52 some other function.
20:53 And that's, so there's this extra like space at the beginning.
20:58 That's, that's in the string.
20:59 It's in the multi-line string.
21:01 And we don't want that.
21:02 We don't, we want it to be just, just no, like nothing at the beginning or the same amount
21:09 chopped off.
21:09 So one of the options that people have used before is to just define a very multi-line string
21:14 out of the function.
21:15 You just do it out of the function.
21:17 Then it's against, then it's just against the left side of your editor or whatever on column
21:22 zero.
21:23 And you don't have to worry about it, but it does bother some people that you've got this,
21:27 this variable defined outside of your function when you're just using it within one function.
21:31 So ddent is the answer.
21:33 So what ddent does is it just takes a multi-line string and strips off all the common white space
21:38 at the beginning.
21:39 That's it.
21:39 But it's a, it's super useful.
21:42 They've got a little example that we're showing here, but I, I think this is a, not a great example.
21:48 So I wrote a new example.
21:50 Oops, fell asleep.
21:52 and so that the idea really is I've got a function that either, you know, print stuff
21:57 or has some output and I want to be able to compare that string and I want my comparison
22:02 to be in the function.
22:04 so, so I use ddent to just write it right in my function and then I don't have
22:09 the spaces.
22:10 And then, yeah, anyway, so this is a, a high test example of how you could test a output
22:15 string.
22:16 So anyway, this really sounds like a classic example of there's a problem, like the open
22:21 source, this really bothered me.
22:23 And so I wrote something to fix it and it's, it's wonderful.
22:26 Like the time honored open source reason to make something.
22:30 But, I also want to remind people that ddent is not the only thing in text wrap and text
22:35 wrap has a whole bunch of other cool tools.
22:36 So it's, it's not huge.
22:38 It's just, but a five minute read to peruse what's in text wrap so that next time you need
22:43 to manipulate some text, it's useful.
22:45 Nice.
22:46 Yeah.
22:46 Maybe wrapping.
22:47 Yeah.
22:48 Like wrapping.
22:49 Well, it does things like, like if you've got a huge string and you want to be able to
22:53 like, one of the things is to, shorten it.
22:55 So if you, if you've got a huge string, but you really only have like eight characters to
22:59 show something like ellipsize it.
23:01 Yeah.
23:01 It does that for you.
23:03 So that's nice.
23:04 That's there too.
23:05 That's good.
23:05 Cause I've written that code.
23:06 It wasn't fun.
23:07 It didn't feel useful either.
23:09 I'm like, okay, great.
23:09 It works.
23:10 But here we go.
23:11 some audience feedback, Anthony out there.
23:14 Hey, Anthony says it's really useful.
23:15 I've still used it many times.
23:17 Nice.
23:18 Cool.
23:18 Mm hmm.
23:19 All right.
23:20 This next one comes to us from Dan Bader.
23:23 You might know him from a real Python and other things.
23:25 He and I were chatting and he said, Hey, have you heard about pip audit from trailer bits?
23:30 And I was sure that I had, and I thought we had talked about it, but then I realized, no,
23:35 I don't believe we have.
23:36 So I must've just heard about it somewhere else.
23:38 And then we haven't covered it before.
23:39 So the idea is we've heard about a lot of issues with supply chain vulnerabilities, things
23:46 getting into pip, but also Ruby gems and NPM and so on.
23:50 Sometimes that's somebody trying to be evil and putting in some typo squatting thing, or,
23:55 you know, worse than that would be if the GitHub account of a maintainer got hacked and somebody
24:01 published a package with like to the real package.
24:05 Right.
24:05 So however things might get into your dependencies, if something is going on bad there, it's better
24:10 to know than to not know.
24:12 So this pip audit is all about that.
24:14 It audits Python environments as in virtual environments and dependency trees for known
24:19 vulnerabilities.
24:19 So that's one of the things that's interesting is when you pip install things, you might be
24:25 very good about saying, Oh, I pip installed flask and I pip installed pandas.
24:30 So those are going into my requirements file or my pyproject.toml.
24:33 But did you remember to pin their versions so that things like GitHub will say your version
24:39 is wrong?
24:39 Because if it just sees flask and the recent version doesn't have a problem, it's not going
24:43 to tell you.
24:43 But the one you have installed may also the transitive closure of the dependencies.
24:48 So flask depends on it's dangerous, which depends on, I don't know.
24:52 But if there's something down that chain that has a problem, you may have not put that in
24:57 your requirements file and you may not be tracking it.
24:59 Like I might be paying careful attention to flask.
25:02 I might not care anything about it's dangerous, but that's where the problem is, right?
25:06 Yeah.
25:06 So this tool from Trail of Bits, which is a security company, basically solves that problem.
25:11 And it lets you just type pip-audit.
25:14 And for me, it's a dash r requirements.txt or whatever.
25:19 And from what I can tell, what it does is it will go create its own virtual environment
25:24 where it one by one installs each package, looks at the things that come out of that process
25:31 and then scans those.
25:32 So it's not just looking at, oh, you say you have flask and that's 201.
25:35 Great.
25:36 You're good to go.
25:38 It actually installs it because who knows what the setup.py process is doing and all
25:43 those kinds of things.
25:43 And then it scans that and it gives you a report.
25:46 So for like Talk Python training the site, we have, I don't know, 30 dependencies or something.
25:51 And it sat there and it took, I don't know, it probably took two minutes to go through and
25:56 it said, everything's good to go.
25:57 So that was good to hear, but it's pretty neat, really easy to use.
26:01 It's like an external tool, like black or something.
26:04 So it's very, a good candidate for PIPX.
26:06 And then it's just globally available to point at any environment.
26:09 What do you all think?
26:10 Oh, this is so cool.
26:11 I heard about it because one of my colleagues, Dustin Ingram, I think has been involved with
26:16 it or either it's his Twitter that I found out about it from, but he also has a really
26:19 good talk from PyCon this past year about the supply chain vulnerabilities.
26:25 That's worth checking out if you're wanting to get an idea of why this is important.
26:30 Yeah.
26:31 Yeah.
26:31 We've highlighted a few examples over the years, but it's definitely something you want to pay
26:37 attention to.
26:37 And that's cool that Dustin was talking about it.
26:39 He works, I think he's still working with the PyPA and works on, you know, the PyPI.org
26:46 and all those kinds of things.
26:47 So very cool warehouse.
26:48 Brian, what do you think?
26:50 I think this is cool.
26:51 I'm going to start using it right away.
26:52 This is nice.
26:53 Yeah.
26:53 I already used it once as well and everything seems good.
26:56 So here, look, I even called a flask as an example.
26:58 Say here on this particular version, there was this security vulnerability from 2019 and
27:05 same with, I guess, Jinja and all those were good.
27:07 But yeah, it just, it gives you a nice description of what went wrong.
27:10 And like in this case, it's a denial of service attack and whatnot.
27:15 So I definitely recommend people pin versions definitely in your requirements.
27:20 But what do you all think of including hashes?
27:23 I think that's something Dustin talked about in his talk.
27:26 And at the time I was like, oh, that sounds like a good idea.
27:29 And it's not something I've started doing yet.
27:32 Yeah, exactly.
27:33 That's exactly what I think.
27:34 It sounds like a good idea and I'm not doing it yet.
27:37 So anyway.
27:38 But that sounds like it's a me problem more than anything else.
27:42 Also, it seems like a good idea.
27:44 You know, I might be missing a step.
27:47 It feels like the challenge you're going to run into there, what you're preventing against
27:53 is a man in the middle attack.
27:54 Somebody can intercept what's happening with PyPI.org and sneak in some kind of broken,
28:01 hacked version.
28:03 I don't know.
28:03 I don't necessarily trust what goes into PyPI.org, but I trust PyPI.org.
28:08 So I'm not super, it's not my biggest worry.
28:12 There's like 10 other worries that make me have a hard time sleeping at night about running
28:16 stuff on the internet that precedes that.
28:18 So I haven't worried about it, but maybe I should.
28:20 It's in the queue of things to worry about.
28:23 Well, for instance, with this audit, you can pin your stuff and then have it be, check it
28:31 every once in a while and install everything and check things.
28:34 I don't see why it couldn't be a CI step.
28:37 I was actually just going to say that PipAudit, I need to bring it to my samples maintaining
28:42 group to talk about who wants to implement it and how soon we're going to do it.
28:46 And whose pager rings when it finds a problem.
28:48 Yes.
28:49 Yeah.
28:50 Nice.
28:51 Look at that, pagers from back in the day.
28:52 All right.
28:52 Well, that's all I got for that one.
28:54 We're off to Leah.
28:57 I'm so glad you mentioned pitting requirements because that is actually, that's a great segue
29:01 for managing samples for GCP.
29:04 So what I have open right now for Google Cloud is an example documentation page.
29:09 I picked Cloud Composer because it's what I work on.
29:12 And I want to give an example of where this code lives that I'm talking about that I work
29:17 with this group to maintain.
29:19 So this is a page that's about using a particular Airflow operator.
29:24 And if you were to scroll on it, you will see these code samples and they are all stored
29:30 in GitHub and then embedded in our docs.
29:33 So you can click view on GitHub on any one of them and it will take you to the linked repository.
29:38 You can look at the history, look at everything in context.
29:42 So we have thousands of samples for all of the Google Cloud products just for Python, but
29:50 we have them in other languages too.
29:51 And they're located across hundreds of repos.
29:54 This happens to be one repo that has samples for multiple products, but we have other repos
30:00 where things are stored too.
30:02 So to ensure that there's consistency and that my group of engineers, my colleagues and I
30:09 actually have time to do our work and function as humans outside of work too, we use a lot
30:15 of automation.
30:15 So we use a lot of bots to do things like keep our dependencies up to date, check for license
30:21 headers, auto-assign PRs for reviewing, syncing repositories with centralized configurations,
30:27 and even more, which is pretty great.
30:31 And this is actually where the pinning requirements comes in.
30:34 We very strongly believe in pinning requirements because it makes the samples easier to maintain
30:40 and test against.
30:41 And it's easier to go back to the product and say, hey, you just pushed a release candidate
30:46 for your product and it broke your samples.
30:49 It wasn't supposed to.
30:51 What gives?
30:52 Rather than finding out mysteriously when getting a customer issue.
30:57 So then to keep it up to date, we use a bot.
31:00 And these are some pull requests recently opened by the bot of some dependencies.
31:05 They get double-checked to make sure everything looks good by human and merged.
31:09 It's pretty great.
31:10 And then we actually have a team of engineers in DevRel that works on making GitHub bots that
31:18 we use.
31:18 And that is totally open source.
31:20 So you can see some of the ones that we use.
31:22 Like we have our license header one.
31:24 The sync repo settings allows us to have a single source of truth for our configuration for all
31:30 of our Python repos.
31:31 And then it makes sure it gets synced across all of them.
31:35 It's pretty great.
31:36 I really don't know how I would function without all of my bot friends.
31:40 This is super cool.
31:42 I can just imagine how much work it is to keep all of those different things in sync.
31:47 And I have worked recently on projects where I'm like, okay, I got to integrate this library.
31:52 I'm going to go to the documentation.
31:53 And I try to use the one or two functions that the whole thing does.
31:58 And it's like, nope, that parameter doesn't exist.
32:00 Or you're missing some parameter.
32:02 You're like, come on, at least just keep the signature, right?
32:05 You know?
32:05 And of course, it's something like star args, star star kwrs.
32:09 It's not like, oh, I can just look in my ID and see, oh, yeah.
32:11 It says it takes like, use security, use SSL, yes or no?
32:15 Like, no, it's unknown without the documentation, basically.
32:18 Yeah.
32:18 This is awesome.
32:19 Thank you.
32:20 I think so, too.
32:21 I'm very grateful to it.
32:22 And yeah, for our dependency bot, we do use an external one.
32:26 I think GitHub is the one that does Dependabot.
32:28 We, in particular, use Whitesource Renovate bot.
32:31 It's what we were using when I started.
32:33 And that works very well, too.
32:35 And they're very nice and responsive to issues.
32:38 Oh, that's fantastic.
32:38 Yeah.
32:39 Dependabot was fairly new.
32:41 And then it was bought quite recently by GitHub.
32:43 So I can imagine you were all doing something before then, yeah?
32:47 Probably.
32:47 But I know I have friends who use that, too.
32:50 And they're great.
32:50 Using a dependency bot, I would say, if you need a starter bot for any of them,
32:54 the dependency bot is a great place to start.
32:57 Yeah, that's fantastic.
32:59 I recently switched to pip-tools and PipCompile to generate my requirements with pinned versions and stuff.
33:04 Nice.
33:05 But before that, I was all about Dependabot telling me if something new was up out and seeking that.
33:11 pip-tools rocks.
33:11 Yeah.
33:12 pip-tools rocks.
33:13 I love pip-tools.
33:14 Yeah, it definitely does.
33:15 Brian, there's a lot of cool automation here.
33:17 What do you think?
33:18 I'm excited about looking through all these.
33:20 I love looking at bots because the whole idea about a bot is like the Unix philosophy of do one thing and do it well.
33:28 Yes.
33:28 Yeah.
33:29 I love that.
33:30 Have something else do it and not you do it.
33:32 Oh, yeah.
33:33 All of our bots are based on like, oh, gosh, we're doing this one thing over and over and we're not doing it well because we're doing it manually.
33:42 So how can we like use automation to make sure we're doing it consistently and to save a lot of time?
33:47 Like one of the things you've got in here that's shown right now is label sync.
33:51 So one of the nice things about one of the interesting things about different groups workflows is to have different labels that mean different things.
33:59 But when you open a new repo, it doesn't have all those labels.
34:03 So being able to sync those labels across an organization.
34:06 Like needs triage, good first contribution, all those kind of things, right?
34:12 Yeah.
34:12 As I said, we have hundreds of repos just for Python and we use things like we have labels that say what API something belongs to.
34:22 And that helps with the auto assign bot to make sure that issues and PRs get routed to the right team.
34:28 Otherwise, you're having a human do all that triage, which is fine, but doesn't scale super well in our use case.
34:35 Yeah.
34:36 And adding a label is really easy to an issue or something.
34:41 So having a bot that looks at label changes and just does an action based on that is a brilliant use of time.
34:48 Yep.
34:48 Highly recommend.
34:49 Yeah.
34:50 Fantastic.
34:51 This is great.
34:52 And you have an install link next to all of them.
34:54 Does that mean I just click that and install it into one of my repos?
34:56 I believe that is the intent.
34:58 And if it doesn't work, you should open an issue on this repo because my colleagues are very responsive.
35:03 Fantastic.
35:04 Yeah.
35:04 And we just need bots to generate bots.
35:06 Honestly, if my colleagues told me they were working on that in this repo, I wouldn't be surprised, but I don't know.
35:12 The meta bot.
35:14 Yeah.
35:15 Fantastic.
35:16 All right.
35:17 Well, how about some extras?
35:19 Brian, you got anything extra you want to?
35:21 Share while we're here before we call it a show?
35:23 No, just I'm fighting a cold.
35:26 Hopefully that'll all be over.
35:27 Yeah.
35:28 Well, maybe some sort of audit thing.
35:30 We'll check your health status.
35:31 We can run that against you.
35:32 Leah, anything else you want to share with us?
35:35 Oh, I mean, on Twitter earlier, we were talking about HTTP status codes and it reminded me of still my forever reference for HTTP status codes is HTTP.cat.
35:45 Yes, HTTP.cat is fantastic.
35:48 It's so good.
35:49 It is so good.
35:50 Let me share a few non-funny things and then we'll mix that in with our joke.
35:54 Please do.
35:55 Fantastic.
35:56 All right.
35:56 All right.
35:56 The first one has to do with, speaking of GitHub, another cool GitHub thing.
36:02 You know you could press a dot and that would do certain things.
36:05 This only works if you're signed in.
36:07 But now there's a command palette.
36:10 This idea of command palettes are becoming popular in UIs.
36:12 We've got it in VS Code.
36:14 We've got it in like superhuman, the email.
36:17 And often you get them by pressing command K or control K.
36:20 And now you have that for GitHub.
36:22 So if I were on a repo where I could do stuff to it, I could hit command K and then it will say, what do you want to do?
36:28 Search or jump to.
36:30 I could go to pages, issues.
36:32 I could look for, let's see, look for the app.
36:36 If I just type app, it'll search for those.
36:38 I could search for all sorts of things here.
36:40 And boom, it takes me and shows me all the apps.
36:42 Isn't that cool?
36:43 That's so cool.
36:45 Command palette.
36:45 Yeah, that's now a thing.
36:47 That's beautiful.
36:48 And you could just, I mean, no mouse.
36:50 I'm here.
36:51 I'm in this repo, the top level.
36:53 Command K, down arrow, two times, hit enter.
36:55 I'm on the issues.
36:56 Oh my gosh.
36:57 Love to see it.
36:58 Yeah, so that's a good one.
36:59 The other one, the other extra is Python 310.1 is out.
37:05 Released December 6th.
37:06 So as in two days ago.
37:07 Wow.
37:08 It's got a fun little snake with a hat on.
37:11 Love it.
37:12 That's really about 310.
37:13 So let me describe.
37:14 I can cover the entire release for you.
37:16 So Python 310.1 is the newest major release of the Python programming languages.
37:20 It contains many features and optimizations.
37:22 So now you all know what's in it.
37:24 It's very vague.
37:26 It's very vague.
37:29 Apparently it has 300 commits of changes and fixes.
37:32 One thing I would, I wanted to know, are there security updates?
37:36 Yes or no?
37:36 Should I install this if I'm curious?
37:37 Should I install this now before tomorrow?
37:42 Because someone's going to start poking around.
37:45 I would love if it would say that.
37:46 There's a great thing about the major features, but that's just 310, not the point release.
37:51 Well, we'll have to check it out.
37:52 Anyway, still good.
37:53 Yeah.
37:54 We've been having fun making all of our GCP samples, making sure they're 310 compatible, which we're getting there.
38:01 It's all waiting for certain dependencies to be ready.
38:03 But a lot of fun.
38:05 Very exciting to see.
38:06 Yeah, that's awesome.
38:07 Well, you can look at the changelog.
38:09 So if you look at the 310, the changelog, you can see 310.1 and stuff.
38:12 I can.
38:13 Yeah, go up a little bit.
38:15 Full changelog there, maybe?
38:16 Yeah.
38:16 Yeah, that's true.
38:18 I can go to the changelog there and check that out.
38:19 But having a security thing would be an idea.
38:22 Yeah, just like.
38:23 A TLDR.
38:24 Yeah, exactly.
38:25 Exactly.
38:26 Cool.
38:26 Well, and also a lot of people didn't want to try 310 until we got one patch release.
38:31 So now we have one patch release.
38:32 So there's no excuse.
38:33 So now it's safe.
38:35 I have been running it for a day in production and it seems okay.
38:38 Put it on one site to see if it would hang in there.
38:41 It seems fine.
38:41 So we're all good.
38:42 Yeah.
38:42 All right.
38:43 The samples that are using it are doing fine.
38:45 They've had passing periodic builds for a while.
38:47 Yeah, fantastic.
38:48 All right.
38:48 Are you ready for some cat jokes later?
38:50 Sorry.
38:50 Yes.
38:51 I mean, we started our conversation off today talking about cats.
38:55 It's true.
38:56 Before we hit record.
38:57 So I feel like we should round that out.
38:59 Definitely.
39:00 So first of all, htpstatuses.com is a fantastic place to go learn about the real meaning or
39:07 the official meaning, let's say, of status code.
39:09 So for example, there's 100 continue.
39:11 And if you want details, you click on that and it actually pulls this all up.
39:14 Even shows you like the enum in Python.
39:17 Oh my gosh.
39:18 I love that.
39:19 Isn't that cool?
39:20 Yeah.
39:20 It gives you the meaning like 100 continue.
39:22 The initial part of a request has been received and has not yet been.
39:26 Yet been rejected by the server.
39:27 The server intends to send a final response eventually.
39:31 And so there's other ones like 200.
39:33 Okay.
39:33 201 created.
39:34 Let's see.
39:35 What else should I point out?
39:36 304 cache, not modified.
39:39 400 bad request.
39:40 Bad request.
39:41 404 not found.
39:43 403 forbidden.
39:43 500 internal server error.
39:46 Yeah.
39:47 418.
39:48 I'm a teabot.
39:49 Yeah.
39:49 And 502 bad gateway.
39:50 Okay.
39:51 So let's do yours firstly.
39:53 Please.
39:53 I put it.
39:54 I put out this joke and you said, this is good, but oh my goodness.
39:58 But yeah.
39:59 So when I was doing my computer science degree, a friend shared with me, HTTP.cat, when we
40:07 were learning about HTTP status codes.
40:09 And if you go there, you will find one cat per HTTP status code representing what is going
40:16 on.
40:16 And I'm not going to lie to you in my professional career.
40:19 I still use it as a reference because it's my favorite one.
40:22 And you can even, if you go to like HTTP.cat slash 200, it returns a JPEG of a cat that's
40:29 like, okay.
40:29 Yeah, exactly.
40:30 And you can do that for all of the status codes.
40:32 201, the cat has walked through some wet cement and that's 201 created for footprints.
40:40 Let's see what else we got in here.
40:41 Some good ones.
40:42 404 not modify.
40:44 304, sorry.
40:46 The 404, the cat is hiding under some wrapping.
40:49 Not found.
40:50 Fantastic.
40:50 Yeah.
40:51 I love this.
40:51 I had not heard about this and it's glorious.
40:54 Thank you.
40:54 Well, wait, what's it?
40:55 Does there, is there a 418?
40:56 There is.
40:57 Of course there is.
40:57 I'm a teapot.
40:59 A kitten and a teapot.
41:00 A kitten and a teapot.
41:00 Literally inside of a teapot.
41:02 All right.
41:02 So I saw this joke by Breen, who is John Breen, and thought, that's really funny.
41:10 What he did is he put his own personal take on what status code means.
41:15 And I thought they were hilarious, but I thought, you know, let me take a shot at this as well.
41:19 A little more Python focused.
41:20 So I'll link to my tweet.
41:22 I put this set of colloquial meanings of the HTV status codes.
41:27 All right.
41:27 You all ready for this?
41:28 Yeah.
41:28 Do it.
41:29 So 200 is, what's up?
41:31 All right.
41:32 All good.
41:33 201.
41:35 Hello, creator.
41:36 304.
41:37 Not modified or cached.
41:39 It's same old, same old.
41:40 403.
41:41 Permission denied.
41:42 It's get off my lawn, kids.
41:43 That was my favorite.
41:47 404 is just, there's no message.
41:49 It's just not there.
41:49 It's just not that that's the message, but it's just blank.
41:52 500 is we're bad at APIs.
41:54 I love that.
41:55 Server error.
41:55 400 is you're bad at APIs.
41:57 Yes.
41:58 Yeah.
41:58 The real cardinal sin of APIs is 200, but in the body, there's a JSON that says error
42:05 and a reason.
42:05 Oof.
42:06 200, but with error text, we're really bad at APIs.
42:09 Yeah.
42:10 502, we're bad at deployment or DevOps because part of the infrastructure can't get to the
42:16 other part.
42:16 And Brian's favorite, 418, is it already April again?
42:19 Yeah.
42:19 Because the reason is that was actually put into the spec as an April Fool's joke and they
42:26 left it.
42:26 I'm a teapot.
42:27 I love that they left it.
42:29 I do too.
42:30 I do too.
42:31 It's like import this.
42:33 Just stuff that's fun that should just always be there.
42:36 Yeah.
42:37 Yeah.
42:37 What's the harm?
42:38 What's the harm?
42:39 Just leave it there.
42:40 Anthony, the live stream has some feedback for you, Leah.
42:44 These status codes using cats.
42:46 Well, I never.
42:47 I mean, where there's internet, there is cats, no?
42:50 Oh, of course.
42:52 Why we created the internet in the first place is for cats.
42:55 Exactly.
42:56 All right.
42:57 Well, I think that's it for our show.
42:59 Brian, thanks for being here as always.
43:01 Thank you.
43:01 Leah, thanks for joining us.
43:02 Thank you for having me.
43:02 Thanks for listening, everyone.
43:03 Yeah.
43:04 You bet.
43:05 See you all later.