Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #262: So many bots up in your documentation

Return to episode page view on github
Recorded on Wednesday, Dec 8, 2021.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 262 recorded December 8th, 2021. Oh my gosh, it's almost winter. I'm Michael Kennedy.

00:12 And I'm Brian Okken.

00:13 And I'm Leah Cole.

00:14 Yay.

00:15 Yay. So great to have you here. Thanks for being here on the show.

00:19 Yeah. Happy to be here.

00:20 You and I got a chance to discuss Airflow over on Talk Python a couple months ago, something like that.

00:26 Yeah.

00:27 - Yeah, but now we'll probably do a little more Airflow over here for people who are unfamiliar with that, but also just whatever you're interested in.

00:34 So great to have you here.

00:36 Why don't you tell people a quick bit about yourself before we jump into the topics?

00:39 - Sure, so I'm Leah, and I am a developer relations engineer in Google Cloud.

00:45 And specifically, I work on Cloud Composer, which is our hosted managed product of the popular Apache Airflow project, which we'll talk about a little bit later.

00:57 And in addition to writing samples and content for that, I also work with a group of fellow engineers and we maintain all Python samples for Google Cloud and make sure that they stay tested, up to date, and are healthy and are getting reviewed for new samples.

01:14 And that's a lot of fun.

01:15 That kind of fell into my lap and has been a good time.

01:17 - That's fantastic.

01:18 I remember Python being one of the original two supported languages on Google Cloud, right?

01:23 It had sort of a special place.

01:25 - Yeah, now it's one of seven, I think.

01:28 - Yeah, cool.

01:29 Well, that sounds like such a fun job.

01:30 I've always imagined dev relations type of jobs to be super fun, maybe slightly less fun in COVID 'cause the travel and the conferences and all those kinds of things are a part of it, but still a fun job, right?

01:41 - Still a good time.

01:42 Every day is a little bit different.

01:44 You kind of never know what's gonna happen and that's part of what I like about it.

01:47 - Yeah, awesome.

01:48 Oh, cool.

01:49 Brian, I don't even know what you're gonna cover, so I don't know what's gonna happen.

01:52 Why don't you let us know?

01:53 - You don't know what I'm gonna cover?

01:55 - I, well, I'm not looking at my docs yet.

01:56 - Oh, okay.

01:57 (laughing)

01:59 Sorry, fighting a cold.

02:01 I am super excited, pytest 7, release candidate one, is out.

02:05 So--

02:06 - Oh, that's excellent.

02:07 That's big news.

02:08 - It is.

02:09 The last release for pytest was, or six, they've done other dot releases, but the six two, or six zero came out, or six two, I don't know, I lost track.

02:19 - We use six two four for our GCP samples, so.

02:22 - Oh, you do?

02:23 - We do.

02:24 - So the, this, I think it was, I wrote this down.

02:27 The 6.2.0 was released on December, 2020.

02:30 So it's been, we're ready for a new one.

02:33 So 7.0 is out, the release candidate at least.

02:36 And so, because it's a release candidate to install it, you have to do a pip install pytest, double equal 7.0.0 rc1.

02:46 We've got that in the show notes.

02:48 It's also on the release announcement page for pytest.

02:53 but I wanted to go through some of the cool features that I'm really excited about.

02:57 There's a lot of great things in there.

02:59 There's some little improvements with the approx thing.

03:05 So one of the things that pytest has is an approx.

03:08 So you can say floating point numbers, if you're comparing them, you should never do equal, but you can do equal approx with pytest and it's really pretty cool.

03:16 - That's cool, I didn't know that because any science you're doing is so, like double equals is the kiss of death for floating point math comparison.

03:24 - Yeah, well, the pytest of Procs does now the docs reference the NumPy comparisons, which is nice because NumPy has some really cool features around that.

03:36 But pytest out of the box does.

03:38 And now also with mappings and dicts and other sets, it handles decimal types, which is nice.

03:47 Decimal types, of course, are very useful when working with money and other things that need to be exact decimals.

03:54 One of the things that's really cool is the sequences are compared better.

03:59 So if you have like a list of numbers and you compare against an approximate list of another numbers, I didn't know you could do this, it will tell you which index was wrong and by how much. - Oh, nice.

04:13 - And actually not by how much, but what the expected was, and that's pretty neat.

04:17 So those are the little minor features.

04:20 Most of these are kind of minor, but it made your first somebody, right?

04:24 So one of the things I like is some people have mentioned fixtures or sometimes when people use a lot of fixtures, they don't know where the fixtures are.

04:32 Well, there's a couple of flags, fixtures per test and dash dash fixtures.

04:35 There's, both those flags are helpful to find out what fixtures you have available.

04:40 And now by default, they print the location of the path in with the fixture name.

04:47 And you can also do a verbose option that prints out the full doc string, which is pretty handy.

04:52 The couple of things that I'm really excited about are Python path that's been added.

04:58 And that was a feature I added to the project, which is fun.

05:01 - Nice.

05:02 It's cool to see the contributions you're making coming back out.

05:05 - Yeah, it's cool.

05:06 And then there's a bunch of other features that I contributed to by just saying, "This is a little weird.

05:11 "Can we fix this?" And somebody else volunteered to fix it.

05:14 So it's nice.

05:15 - That's the best kind of contribution.

05:17 (laughing)

05:18 - Yeah.

05:19 One of the improvements in the docs, which is kind of fun, is there's an auto-generated list of, so I've got the changelog going on here.

05:28 And I gotta come back to this.

05:30 There's an auto-generated list of plugins and there's 963 right now.

05:35 We'll refresh it.

05:36 Nope, still 963.

05:37 But that's a lot.

05:38 When I first started writing the beta or the second edition of the pytest book, I noticed this and I wrote it down, but the number keeps changing.

05:49 So I took out the number.

05:50 I'm like, it's a lot.

05:52 There's a lot of cool plugins.

05:54 One of the things that if you'll notice when you go to the change log, it starts with breaking changes and then deprecations.

06:01 And I know, I think this is around because people, when they upgrade, they wanna know if it's gonna break their code or not.

06:06 I have tested a bunch of stuff and upgraded from six to seven and I haven't noticed a lot.

06:11 There was like a six one to six two.

06:15 I can't remember what the, There was one break a while ago in the 6X that messed some plugin authors, but I haven't noticed any problems.

06:23 So please try these out.

06:25 I wish they would do the features first and then not the breaking changes, 'cause that's-- - I suspect it's the people working deep in the guts, like the plugin authors, that hit these deprecations and not just people doing assert this equals that type of work. - Yeah, yep, right.

06:41 One of the things that I didn't list, but I think a lot of people are excited about, There's more, the objects within pytest that people are using, more of them are type hinted now so that you can do type hints with objects.

06:54 - Oh, that's nice.

06:54 - Yeah. - Yeah, that's really nice.

06:56 - So, fun.

06:57 - Leah, do you use pytest?

06:59 Some of these changes exciting?

07:00 - We do, we use pytest on our Python samples.

07:04 And so I, actually the one that was most exciting to me was the fixtures, figuring out where fixtures are is definitely something that comes into play for me, especially when we're maintaining something that was written a while ago by someone who might not be working on that code anymore.

07:20 - Yeah. - Yeah.

07:21 - Nice. - Yeah.

07:22 - That's cool. - This is great.

07:22 I love the pip installable RC1, that's great.

07:25 And before we move on, let's take a step back.

07:28 Roman Wright, author of Beanie.

07:29 Hey, Roman, out there in the audience says, "Hey, I'm a big fan of Google Cloud." - Oh, thank you. - For sure.

07:34 Well, I've got some fun stuff to talk about next here.

07:37 I want to talk about this thing that David Smith, former guest co-host here on Python Bytes sent over and said, "This looks cool." Sam Lowe and Philip Guell released this thing called Pandas Tutor.

07:52 - This is cool.

07:53 - Yeah, previously, Philip had built Python Tutor at pythontutor.com.

07:58 Now there's pandastutor.com.

08:00 And it's all about just helping you understand what the code does.

08:05 So it basically says, look, there's this code here.

08:09 Like imagine you've got a list of dogs that have a breed, a type, a longevity, a type is like a herding dog or a toy dog, it goes in a purse.

08:17 Longevity, size, weight, and so on.

08:19 And you've got that as a data frame.

08:21 If you wrote dogs with a size equal equal medium, then sort values on type, then group by by type, and then show the median, well, what is that actually doing?

08:30 Like, how do I understand that, right?

08:32 As somebody learning pandas, imagine I don't really have a database background, and so I'm not sort of trying to map that over to like, okay, there's the where clause, there's the order by clause, and you know, like that kind of business, right?

08:43 So what is happening when I write that code, either because I'm coming across it for the first time, or, which happens to me a lot, I wrote it two years ago and understood it perfectly then.

08:54 I have no idea what it does now, right?

08:55 You wanna know what it does. - Oh, same.

08:56 - Yeah, that happens way too often, right?

08:59 So what you do is you can go and run this code over in Pandas Tutor, and you say visualize, and it says running a code, please wait.

09:08 And so what they do is they put a CSV bit of text in here, it's like a triple string, and then use pandas read and then just do that one line.

09:14 So that's a nice way to kind of get data in there.

09:16 And the way to think about this is steps.

09:19 It shows you what is the first step and what is the second step and so on.

09:21 So when you go there, you'll see that it has the code that we were talking about, but then right now the effective where clause, the filter is regular font and the rest is gray.

09:32 It's like fade into the background.

09:33 And so you can actually see what the starting data frame was and the ending data frame, and then how it got in there.

09:40 And you can use the mouse over like, so what they're saying is the type is medium.

09:44 So if you hover over like a large or a small dog, there's just no arrow.

09:47 But if you hover over medium, it shows you where in the result that that thing landed.

09:52 Isn't that cool?

09:53 - That's wild.

09:55 - Isn't that wild?

09:55 And so then you can see size has all the values on the left and then the size is grouped on the right and it shows medium, medium, medium, medium, because that's all that's in there.

10:03 Now, when I first looked at this, I'm like, there's a bunch of stuff on the screen, what's going on?

10:07 I noticed the arrows, but then what it took me a minute to realize is there's multiple steps.

10:12 So the next thing if you scroll down, shows the same code at the top, but now the sort values type is highlighted.

10:18 That's the next part of what looks like one expression in Pandas.

10:21 And so now it highlights the column that it's sorting on, and you can actually see the arrows pointing to how they were reordered in the result, because you're sorting by type, so it's non-sporting, non-sporting, non-sporting, non-sporting, and then sporting, sporting, and working, working, and so on.

10:35 So that was step two.

10:37 And we have a group by this one's interesting.

10:39 It said have arrows as colors.

10:41 So the group by type again, non-sporting, sporting, so on.

10:45 You end up with these groups, like here's a blue, a blue box of all the non-sporting dogs, the bulldog, the poodle, the French bulldog is so cute.

10:53 Then you've got the golden retriever and the Labrador and the boxer, right?

10:56 So these are grouped into the colors.

10:58 And then finally you do the median and it shows how those groups reduce down to statistics, Like the longevity of a non-sporting dog is less than a sporting dog, apparently, but they're also lighter.

11:08 So anyway, what do you all think?

11:10 Oh my gosh.

11:11 I love this.

11:12 This is nice, right?

11:13 I'm a very visual learner, so I really appreciate this.

11:17 And especially if you're working with data that you kind of aren't sure what it does and or the code, like that's pretty incredible.

11:24 I'm filing this away.

11:25 It's going to go to my team's group chat pretty much as soon as we're done recording.

11:28 In fact, yeah, that's awesome.

11:30 I think it's really good.

11:32 There's so many people who are presented a notebook or presented some kind of result, and they're like, I need to understand what that means so I can keep following it.

11:39 And I think, throw it into here, something like this would be really helpful.

11:43 - Well, and a lot of people that have spent a lot of time with databases, it might be obvious what these things do.

11:49 But for people that don't spend a lot of time with SQL, it's not obvious.

11:54 And so this is really nice.

11:57 - Yeah, definitely.

11:58 Or if you're like trying to take some example that you have with their example data and trying to translate it to your own data.

12:04 That's something that customers do all the time for us.

12:08 It's something I do a lot too.

12:10 Just seeing how it behaves with your stuff.

12:12 Oh man.

12:13 - You didn't write it, but you wanna use it.

12:15 So how much applies.

12:16 - Exactly.

12:17 - Yeah, so this is quite cool.

12:20 Dean out in the live stream.

12:21 Hey Dean, says Panda's tutor looks awesome.

12:25 And Robert Robertson also loving it.

12:27 It's nice.

12:28 So very cool.

12:29 Indeed. All right. Over to you, Leah.

12:31 All right. So yeah, my first thing today is Apache Airflow.

12:35 So Airflow is a project that is part of the Apache Software Foundation.

12:39 It's a workflow orchestration tool that originated at Airbnb, I want to say in like 2014, and then pretty shortly after became part of the ASF and it became a top level Apache project in, I want to say early 2019.

12:56 It's been a little while now, which is very exciting.

12:58 So you can use it to author these workflows as directed acyclic graphs or DAGs of tasks, which is pretty cool.

13:06 And it's most commonly used with workflows that are like pretty static, not super frequently changing or slowly changing, just so that you can see how the workflow goes over time and not allows you for some clarity and continuity in your workflows.

13:22 - I've always sort of wondered what the role of these workflow type systems were And until I realized, you know, if you're going to build a full end to end type of workflow without a framework, there's a lot of coordination.

13:34 And what if this fails?

13:35 Where do you restart?

13:36 What do you do?

13:36 And then the analogy for me is kind of like flask or some web, like all I got to do is write this little thing and everything else will come together to make sure these four lines of my Python code run, they run reliably.

13:48 If they fail, it gets dealt with, right.

13:50 It allows people to not have to understand the whole system and just go, I need you to load up this file and put it into that database.

13:56 Can you write that code?

13:57 And that's all you got to know to be part of some complex thing, right?

14:00 Yeah, it's, I mean, it's not the most glamorous thing, but it is extremely useful.

14:06 I mean, I did a summer internship when I was doing my bachelor's, where I wrote a cron job that ingested some data every night.

14:15 And the only way I knew if it failed was if I looked in the target folder where it's supposed to end up and if the data wasn't there.

14:22 No files, whoops.

14:24 That sucked. I'm sure a lot of people have dealt with that.

14:27 And this is actually a really common Airflow workflow, which is the extract, transform, and load, the ETL workflow, which is where you have data somewhere that you want to get, you want to do something to it, or maybe not, maybe you just want to extract and load it, and you want to put that result somewhere else, either locally or in the cloud for all of that.

14:48 And Airflow lets you do all of that.

14:50 And you can see the history of these jobs.

14:52 There's a UI where you can see, did it fail?

14:55 It has a full error message if it failed.

14:58 It's not just, oh gosh, the data is not there.

15:00 What do I do?

15:01 >> Yeah. You got a really cool UI where it shows all the parts of the workflow running and whether or not they finished successfully and stuff, right?

15:08 >> Yeah. It got a makeover fairly recently.

15:10 So it's a lot of improvements.

15:13 >> Yeah, that's super cool. Another thing maybe you could talk about really quick is the connectors.

15:18 I don't remember exactly the right terminology.

15:20 There's a name for them.

15:21 tell us, tell people about that. That's also good to know.

15:24 So these connectors that you're thinking of, I mean, we can use the word connector to describe what it does.

15:28 So there are these things called operators in Airflow, and an operator executes a single task.

15:34 And so that might be executing a Bash script or executing a Python script.

15:39 But we also have these connectors that are grouped by providers, which might be your cloud provider or other software providers that allow you to execute code there.

15:49 So for example, we have a ton of GCP operators.

15:53 One example might allow you to create a Dataproc cluster or then like run a job on that Dataproc cluster and maybe tear it down when you're done.

16:03 And there are providers that have operators for all the major clouds and more.

16:10 You can do, there's one that like sends a Slack message when it's done.

16:13 So it's, if you can dream it, it might be there and if not, you can make it there.

16:19 That's awesome. What's GCP?

16:21 GCP is Google Cloud Platform.

16:23 Or Google Cloud.

16:24 GCP might be a dated acronym. Sorry.

16:27 Don't know.

16:28 Yeah.

16:29 Yeah. So one of the advantages I think of that, that's really cool is you don't necessarily have to know all those APIs.

16:35 Like if I was going to connect Slack to GCP, to like Azure Blob Storage, to like some hosted database, I don't have to learn all those things.

16:43 I can just sort of click it together.

16:44 Yeah. You just have to, there's a small amount of setup you have to do for auth, which is understandable. You can't just like publicly go to your Azure blob thing to grab your data. But once you set up that connection, then your operators can talk to those things. And if you use so you can run or host Airflow yourself. And there are a few different ways to do that. And then Amazon and Google both have managed hosted providers.

17:11 And there's a company Astronomer that also does manage hosted ones. And so if you're in an Amazon or a Google, the advantage there is that the connections with those operators might be a little bit simpler from the auth and networking perspective. But other than that, if you're running in Cloud Composer, which is Google's Airflow, you can still be using the Amazon or the Microsoft operators to pull data from over there. That's really common. And you see it all the time and bring it, do some stuff in Google Cloud and either put it back in the in the other cloud or leave it in Google Cloud.

17:44 That's totally normal and people are doing that all the time.

17:47 - Right on. - Yeah.

17:48 - Cool, cool.

17:49 I think this is neat and people for whom that would make sense, you're like trying to do these sort of running in the background, - Yeah. - schedule jobs, or there's triggers as well.

17:58 Like a file has been uploaded or landed here.

18:00 - Yeah, let's talk about that.

18:02 So that's actually, I had written down this one example, but I'll adapt it slightly since you mentioned triggers.

18:07 So that's another common type of operator, these sensors where you wait for a certain condition to be true, and they're used in data analytics workflows all the time.

18:16 So one example workflow might be waiting for a particular file to appear in a cloud storage or an S3 bucket.

18:24 So you'd use one of those sensors to wait for that to happen.

18:28 And then you want to do something to that data.

18:30 So let's say you then create a data prod cluster that is going to run a PySpark job on that cluster.

18:38 And then you can store the results in BigQuery at the end and then delete the cluster and like send a Slack message when the job is done.

18:46 That's a very common ETL thing, including that sensor.

18:50 - Yeah, that sounds pretty nice.

18:52 Definitely seems interesting and quite useful.

18:54 - Yeah, it's a lot of fun.

18:55 - Brian, thoughts before we move on?

18:58 - I have a question.

18:59 If you wanted to get started with something like this, I was trying to look for tutorials and getting started and stuff like that.

19:05 Is it, does it make sense or is it too confusing if somebody, you said you could run it on your own machine.

19:10 Does that make sense to try it that way or should you try it with a, okay.

19:16 - You totally can do it on your own machine and there's this really wonderful environment that can be found in the Airflow repository that's called Breeze and it's a Dockerized version of it.

19:29 It shouldn't be run in production but if you're looking to try it out or if you're looking to contribute to Airflow, we highly recommend that everyone check out the Breeze environment.

19:39 - Right now I have the community page pulled up where you can join the dev list in the Slack if you have questions, but if you were to go to the GitHub repo, you would see Breeze right on that first page.

19:50 - Okay, cool, thanks.

19:51 - Yeah, great question, thank you.

19:53 - Yeah, very good one.

19:54 All right, Brian.

19:55 Are you gonna give us a tutorial on Airflow or what we got going next?

19:58 - Yeah, so I was looking through the tutorials in Airflow and I noticed that right away one of the examples used ddent.

20:05 So that was a nice--

20:07 - How about that for a connection?

20:08 - Nice connection.

20:09 - Totally well planned, very cool.

20:11 - D-dent was suggested, it's a text wrap tool, it's suggested by Michael Rogers-Fallet.

20:18 It's a small utility, but it's super useful.

20:22 And I kind of forget that it's, I mean, I use it all the time, but I forget to mention it to people, but it comes up a lot.

20:29 And the idea around D-dent is you've got something, oh, I think I lost my D-dent thing.

20:37 Let's see if I can find it. There it is.

20:40 The idea is you've got a multi-line string, like here we've got "Hello World." And some multiple lines, and there's different spacing.

20:46 But as you notice, I want to define it within a test, within a test function or within some other function.

20:53 And so there's this extra, like, space at the beginning.

20:58 That's in the string.

20:59 It's in the multi-line string, and we don't want that.

21:02 We don't, we want it to be just, just no, like nothing at the beginning or the same amount chopped off.

21:09 So one of the options that people have used before is to just define a very multi-line string out of the function.

21:16 You just do it out of the function, then it's against, then it's just against the left side of your editor or whatever on column zero, and you don't have to worry about it.

21:24 But it does bother some people that you've got this, this variable defined outside of your function when you're just using it within one function.

21:31 So dedent is the answer.

21:33 So what dedent does is it just takes a multi-line string and strips off all the common white space at the beginning.

21:39 That's it.

21:40 But it's super useful.

21:43 They've got a little example that we're showing here, but I think this is not a great example.

21:49 So I wrote a new example.

21:50 Oops, fell asleep.

21:51 And so the idea really is I've got a function that either print stuff or has some output, and I wanna be able to compare that string, and I want my comparison to be in the function.

22:04 So I use ddent to just write it right in my function and then I don't have the spaces.

22:10 And then, yeah, anyway.

22:11 So this is a pytest example of how you could test a output string.

22:16 So anyway.

22:17 - This really sounds like a classic example of there's a problem, like the open source, this really bothered me.

22:23 And so I wrote something to fix it.

22:25 And it's wonderful.

22:26 Like the time honored open source reason to make something.

22:30 But I also want to remind people that D-Dent is not the only thing in TextWrap.

22:34 And TextWrap has a whole bunch of other cool tools.

22:36 So it's not huge, it's just a five-minute read to peruse what's in TextWrap so that next time you need to manipulate some text, it's useful.

22:46 Nice. Maybe wrapping.

22:48 Yeah, like wrapping.

22:49 It does things like if you've got a huge string and you want to be able to...

22:53 Like one of the things is to shorten it.

22:55 So if you've got a huge string, but you really only have like eight characters to show something.

23:00 - Like ellipsize it.

23:01 - Yeah, it does that for you.

23:03 So that's there too.

23:05 - That's good 'cause I've written that code.

23:06 It wasn't fun.

23:07 (laughing)

23:08 It didn't feel useful either.

23:09 I'm like, okay, great, it works.

23:10 But here we go.

23:11 Some audience feedback, Anthony out there.

23:14 Hey, Anthony says, "It's really useful.

23:16 "Used it many times." - Nice. - Quite cool.

23:19 All right, this next one comes to us from Dan Bader.

23:23 You might know him from RealPython and other things.

23:26 He and I were chatting and he said, "Hey, have you heard about pip audit from Trail of Bits?" And I was sure that I had, and I thought we had talked about it, but then I realized, "No, I don't believe we have." So I must've just heard about it somewhere else, and we haven't covered it before.

23:39 So the idea is we've heard about a lot of issues with supply chain vulnerabilities, things getting into PIP, but also RubyGems and NPM and so on.

23:50 Sometimes that's somebody trying to be evil and putting in some typo squatting thing, or worse than that would be if the GitHub account of a maintainer got hacked and somebody published a package with like to the real package, right?

24:06 So however things might get into your dependencies, if something is going on bad there, it's better to know than to not know.

24:12 So this pip audit is all about that.

24:14 It audits Python environments as in virtual environments and dependency trees for known vulnerabilities.

24:20 So that's one of the things that's interesting is when you pip install things, you might be very good about saying, "Oh, I pip installed Flask and I pip installed Pandas.

24:30 So those are going into my requirements file or my pyproject.toml, but did you remember to pin their versions?" So that things like GitHub will say, "Your version is wrong." 'Cause if it just sees Flask and the recent version doesn't have a problem, it's not gonna tell you, "But the one you have installed may." Also the transitive closure of the dependencies.

24:49 So Flask depends on it's dangerous, which depends on, I don't know.

24:53 But if there's something down that chain that has a problem, you may have not put that in your requirements file and you may not be tracking it.

24:59 Like I might be paying careful attention to Flask, I might not care anything about it's dangerous, but that's where the problem is, right?

25:06 - Yeah.

25:07 - So this tool from Trail of Bits, which is a security company, basically solves that problem.

25:11 And it lets you just type pip-audit.

25:15 And for me, it's a -r requirements.txt or whatever.

25:19 And from what I can tell, what it does is it will go create its own virtual environment where it one by one installs each package, looks at the things that come out of that process and then scans those.

25:32 So it's not just looking at, oh, you say you have Flask and that's 201, great, you're good to go.

25:38 It actually installs it because who knows what the setup.py process is doing and all those kinds of things.

25:43 And then it scans that and it gives you a report.

25:46 So for like Talk Python Training site, we have, I don't know, 30 dependencies or something.

25:52 and it sat there and it took, I don't know, probably took two minutes to go through and it said, "Everything's good to go." So that was good to hear, but it's pretty neat.

26:00 Really easy to use.

26:01 It's like an external tool, like Black or something.

26:04 So it's a very good candidate for PIPX, and then it's just globally available to point at any environment.

26:09 What do you all think?

26:10 - Oh, this is so cool.

26:12 I heard about it 'cause one of my colleagues, Dustin Ingram, I think has been involved with it, or either it's his Twitter that I found out about it from, but he also has a really good talk from PyCon this past year about the supply chain vulnerabilities that's worth checking out if you're wanting to get an idea of why this is important.

26:31 - Yeah, yeah, we've highlighted a few examples over the years, but it's definitely something you wanna pay attention to.

26:37 And that's cool that Dustin was talking about it.

26:39 He works, I think he's still working with the PyPA and works on the pypi.org and all those kinds of things.

26:46 So very cool warehouse.

26:49 Brian, what do you think?

26:50 - I think this is cool, I'm gonna start using it right away.

26:52 This is nice.

26:53 - Yeah, I already used it once as well and everything seems good.

26:56 So here, look, I even called a flask as an example.

26:58 Say here on this particular version, there was this security vulnerability from 2019 and same with, I guess, Ginger and all those were good.

27:07 But yeah, it gives you a nice description of what went wrong and like in this case, it's a denial of service attack and whatnot.

27:15 - So I definitely recommend people pin versions, definitely in your requirements.

27:20 But what do you all think of including hashes?

27:24 - I think that's something Dustin talked about in his talk.

27:26 And at the time I was like, oh, that sounds like a good idea.

27:30 And it's not something I've started doing yet.

27:32 - Yeah.

27:33 - Exactly, that's exactly what I think.

27:35 It sounds like a good idea and I'm not doing it yet.

27:37 So anyway.

27:38 - But that sounds like it's a me problem.

27:41 More than anything else.

27:43 - I also, it seems like a good idea.

27:45 You know, I might be missing a step.

27:47 It feels like the, the challenge you're going to run into there, where what you're preventing against as a man in the middle attack, somebody can intercept what's happening with IPI.org and sneak in some kind of broken hacked version.

28:03 I don't know.

28:04 I don't necessarily trust what goes into pipe.org, but I trust pipe.org.

28:08 So I'm not super, it's not my biggest worry.

28:12 There's like 10 other worries that make me have a hard time sleeping at night about running stuff on the internet that precedes that.

28:19 So I haven't worried about it, but maybe I should.

28:22 - It's in the queue of things to worry about.

28:24 - Well, for instance, with this audit, you can pin your stuff and then have it be, check it every once in a while, install everything and check things.

28:35 - I don't see why it couldn't be a CI step.

28:37 - I was actually just gonna say that pip audit, I need to bring it to my samples maintaining group to talk about who wants to implement it and how soon we're gonna do it.

28:46 - And whose pager rings when it finds a problem.

28:49 - Yes.

28:49 - Yeah, I'm listening to that, pagers from back in the day.

28:52 All right, well, that's all I got for that one.

28:55 We're off to Leah.

28:58 - I'm so glad you mentioned pitting requirements because that is actually, that's a great segue for managing samples for GCP.

29:04 So what I have open right now for Google Cloud is an example documentation page.

29:09 I picked Cloud Composer because it's what I work on.

29:12 And I wanna give an example of where this code lives that I'm talking about that I work with this group to maintain.

29:19 So this is a page that's about using a particular Airflow operator.

29:24 And if you were to scroll on it, you will see these code samples and they are all stored in GitHub and then embedded in our docs.

29:34 So you can click view on GitHub on any one of them and it will take you to the linked repository.

29:39 You can look at the history, look at everything in context.

29:43 So we have thousands of samples for all of the Google Cloud products, just for Python.

29:50 But we have them in other languages, too.

29:52 And they're located across hundreds of repos.

29:54 This happens to be one repo that has samples for multiple products.

29:59 But we have other repos where things are stored, too.

30:03 So to ensure that there's consistency and that my group of engineers, my colleagues, and I actually have time to do our work and function as humans outside of work too, we use a lot of automation.

30:17 So we use a lot of bots to do things like keep our dependencies up to date, check for license headers, auto assign PRs for reviewing, syncing repositories with centralized configurations, and even more, which is pretty great. And this is actually where the pinning requirements comes in. We very strongly believe in pinning requirements because it makes the samples easier to maintain and test against.

30:41 And it's easier to go back to the product and say, "Hey, you just pushed a release candidate for your product and it broke your samples." It wasn't supposed to. What gives?

30:52 Rather than finding out mysteriously when getting a customer issue.

30:57 So then to keep it up to date, we use a bot and these are some pull requests recently opened by the bot of some dependencies. They get double checked to make sure everything looks good by human and merged.

31:10 It's pretty great. And then we actually have a team of engineers in DevRel that works on making GitHub bots that we use, and that is totally open source.

31:20 You can see some of the ones that we use.

31:22 We have our license header one.

31:24 The sync repo settings allows us to have a single source of truth for our configuration for all of our Python repos.

31:31 And then it makes sure it gets synced across all of them.

31:35 It's pretty great. I really don't know how I would function without all of my bot friends.

31:40 This is super cool. I can just imagine how much work it is to keep all of those different things in sync. And I have worked recently on projects where I'm like, okay, I got to integrate this library, I'm going to go to the documentation, and I try to use the one or two functions that the whole thing does. And it's like, nope, that parameter doesn't exist, or you're missing some you're like, come on, at least just keep the signature, right?

32:05 You know, it's, and of course it's something like star org, star star KW orgs.

32:09 It's not like, oh, I can just look in my ID and see, oh yeah, it says it takes like, use security, use SSL, yes or no.

32:15 Like, no, it's unknown without the documentation, basically.

32:18 Yeah.

32:19 This is awesome.

32:20 Thank you.

32:20 I think so too.

32:21 I'm very grateful to it.

32:22 And yeah, for our dependency bot, we do use an external one.

32:26 I know, I think GitHub is the one that does depend a bot.

32:28 We in particular use White Source Renovate Bot.

32:31 It's what we were using when I started and that works very well too and they're very nice and responsive to issues.

32:38 - Oh, that's fantastic.

32:39 Yeah, Dependabot was fairly new and then it was bought quite recently by GitHub so I can imagine you were all doing something before then.

32:47 - Probably, but I know I have friends who use that too and they're great.

32:50 Using a dependency bot, I would say, if you need a starter bot for any of them, the dependency bot is a great place to start.

32:58 - Yeah, that's fantastic.

32:59 I recently switched to pip-tools and PipCompile to generate my requirements with pinned versions and stuff.

33:04 - Nice.

33:05 - But before that, I was all about the pinned bot telling me if something new was out and seeking that.

33:11 - Nice.

33:12 - Yeah.

33:13 - pip-tools rocks, I love pip-tools.

33:14 - Yeah, it definitely does.

33:15 Brian, there's a lot of cool automation here.

33:17 What do you think?

33:18 - I'm excited about looking through all these.

33:20 I love looking at bots, 'cause the whole idea about a bot is to do, is like the Unix philosophy of do one thing and do it well.

33:28 - Yes. - And I love that.

33:30 - And have something else do it and not you do it.

33:32 - Oh yeah, all of our bots are based on like, oh gosh, we're doing this one thing over and over and we're not doing it well because we're doing it manually.

33:42 So how can we like use automation to make sure we're doing it consistently and just save a lot of time.

33:48 - Like one of the things you've got in here that's shown right now is label sync.

33:51 So one of the nice things about, one of the interesting things about different groups workflows is to have different labels that mean different things.

33:59 But when you open a new repo, it doesn't have all those labels.

34:03 So being able to sync those labels across an organization.

34:06 - Like needs triage, good first contribution, all those kinds of things, right?

34:12 - Yeah, as I said, we have hundreds of repos just for Python and we use things like we have labels that say what API something belongs to.

34:23 And that helps with the auto assign bot to make sure that issues and PRs get routed to the right team.

34:29 Otherwise you're having a human do all that triage, which is fine, but doesn't scale super well in our use case.

34:36 - Yeah.

34:37 - And adding a label is really easy to an issue or something.

34:41 So having a bot that looks at label changes and just does an action based on that is a brilliant use of time.

34:48 - Yep.

34:49 Highly recommend.

34:50 - Yeah, fantastic.

34:51 This is great.

34:52 - And you have an install link next to all of them.

34:54 That mean I just click that and install it into one of my repos?

34:57 - I believe that is the intent.

34:59 And if it doesn't work, you should open an issue on this repo because my colleagues are very responsive.

35:04 - Fantastic.

35:04 - Yeah. - And we just need bots to generate bots.

35:07 - Honestly, if my colleagues told me they were working on that in this repo, I wouldn't be surprised, but I don't know.

35:12 (laughing)

35:14 - The meta bot.

35:15 - Yeah.

35:15 - Fantastic.

35:16 All right, well, how about some extras?

35:20 Brian, you got anything extra you want to share while we're here before we call it a show?

35:24 - No, just I'm fighting a cold and hopefully that'll all be over.

35:27 - Feel better soon.

35:29 - Yeah, maybe some sort of audit thing.

35:30 We'll check your health status.

35:31 We can run that against you.

35:33 Leah, anything else you want to share with us?

35:35 - Oh, I mean, on Twitter earlier, we were talking about HTTP status codes and it reminded me of still my forever reference for HTTP status codes is http.cat.

35:46 - Yes, http.cat is fantastic.

35:48 - It's so good.

35:50 - It is so good.

35:51 - Let's share a few non-funny things and then we'll mix that in with our joke.

35:54 - Please do.

35:56 - Fantastic, all right.

35:57 The first one has to do with, speaking of GitHub, another cool GitHub thing.

36:02 You know you could press a dot and that would do certain things.

36:05 This only works if you're signed in.

36:07 But now there's a command palette.

36:10 This idea of command palettes are becoming popular in UIs.

36:12 We've got it in VS Code.

36:14 We've got it in like Superhuman, the email.

36:17 And often you get them by pressing Command + K or Control + K, and now you have that for GitHub.

36:22 So if I were on a repo where I could do stuff to it, I could hit Command + K, and then it will say, what do you wanna do?

36:29 Search or jump to, I could go to pages, issues, I could look for, let's see, look for the app.

36:36 If I just type app, it'll search for those.

36:38 I could search for all sorts of things here and boom, it takes me and shows me all the apps.

36:43 Isn't that cool?

36:44 - That's so cool.

36:45 - Command palette, yeah, that's now a thing.

36:47 - That's beautiful.

36:49 - And you could just, I mean, no mouse.

36:50 I'm here, I'm in this repo, the top level, command K, down arrow, two times to enter, I'm on the issues.

36:56 - Oh my gosh, love to see it.

36:58 - Yeah, so that's a good one.

36:59 The other one, the other extra, is Python 3.10.1 is out, released December 6th, so as in two days ago.

37:08 - Wow.

37:09 - It's got a fun little snake with a hat on.

37:11 - Love it.

37:12 - That's really about 3.10.

37:13 So let me describe, I can cover the entire release for you.

37:16 So Python 3.10.1 is the newest major release of the Python programming languages.

37:20 It contains many features and optimizations.

37:23 So now you all know what's in it.

37:25 (laughing)

37:26 It's very vague.

37:27 - It's really vague.

37:27 - It's very vague.

37:28 (laughing)

37:29 Apparently it has 300 commits of changes and fixes.

37:32 One thing I would, I wanted to know, are there security updates?

37:36 Yes or no?

37:37 Like, should I, like, should I install this if I'm curious or should I install this now before tomorrow?

37:42 because someone's gonna start poking around.

37:45 I would love if it would say that.

37:47 There's a great thing about the major features, but that's just 3.10, not the point release.

37:51 So anyway, still good.

37:54 - Yeah, we've been having fun making all of our GCP samples, making sure they're 3.10 compatible, which we're getting there.

38:01 It's all waiting for certain dependencies to be ready.

38:04 But a lot of fun, very exciting to see.

38:06 - Yeah, that's awesome.

38:07 - Well, you can look at the change log.

38:09 So if you look at the 3.10, the change log, you can see 310 one and stuff.

38:13 - I can.

38:14 - Okay, go up a little bit.

38:15 - Full change log there maybe?

38:16 - Yeah.

38:17 - Yeah, that's true.

38:18 I can go to the change log there and check that out.

38:20 - But having a security thing would be a good idea.

38:23 - Yeah, just like--

38:23 - A TLDR.

38:24 - Yeah, exactly, exactly, cool.

38:26 - Well, and also a lot of people didn't wanna try 310 until we got one patch release.

38:31 So now we have one patch release, so there's no excuse.

38:33 - So now it's quote safe.

38:35 I have been running it for a day in production and it seems okay.

38:38 put it on one site to see if it would hang in there.

38:41 It seems fine.

38:42 So we're all good.

38:42 - Yeah.

38:43 All right, the samples that are using it are doing fine.

38:45 They've had passing periodic builds for a while.

38:47 - Yeah, fantastic.

38:48 All right, are you ready for some--

38:49 - No, I'm ready. - Chat jokes later?

38:50 - Sorry.

38:51 - Yes, I mean, we started our conversation off today talking about cats.

38:55 - It's true.

38:56 - Before we hit record.

38:57 So I feel like we should round that out, yeah?

38:59 - Definitely.

39:00 - So first of all, httpstatuses.com is a fantastic place to go learn about the real meaning or the official meaning, let's say, of status code.

39:09 So for example, there's 100 continue, and if you want details, you click on that, and it actually pulls this all up.

39:14 Even shows you like the enum in Python, if you wanna use that. - Oh my gosh, I love that.

39:19 - Isn't that cool? - Yeah.

39:20 - It gives you the meaning like 100 continue, the initial part of a request has been received and has not yet been rejected by the server.

39:28 The server intends to send a final response eventually.

39:31 And so there's other ones like 200 okay, 201 created.

39:35 Let's see, what else should I point out?

39:36 304, cache not modified, 400, bad request.

39:41 - Bad request.

39:42 - 404, not found, 403, forbidden, 500, internal server error.

39:47 Yeah, 418, I'm a tbot.

39:49 And 502, bad gateway.

39:51 Okay, so let's do yours firstly.

39:53 - Please.

39:54 - I'd put out this joke and you said, "This is good, but oh my goodness, cats." - Yeah, so when I was doing my computer science degree, a friend shared with me HTTP.cat when we were learning about HTTP status codes.

40:09 And if you go there, you will find one cat per HTTP status code representing what is going on.

40:16 And I'm not going to lie to you, in my professional career, I still use it as a reference because it's my favorite one.

40:22 And you can even, if you go to like HTTP.cat/200, it returns a JPEG of a cat that's like, okay.

40:29 Yeah, exactly.

40:30 And you can do that for all of the status codes.

40:32 - 201, the cat has walked through some wet cement and that's too uncreated for footprints.

40:40 Let's see what else we got in here.

40:42 Some good ones.

40:43 404, not modify, 304, sorry.

40:46 The 404, the cat is hiding under some wrapping, not found.

40:50 Fantastic.

40:51 Yeah, I love this.

40:52 I had not heard about this and it's glorious.

40:54 - Well, wait, is there a 418?

40:57 - There is. - Of course there is.

40:58 - I mean, teapot.

40:59 - A kitten in a teapot.

41:01 - Literally inside of a teapot.

41:02 All right, so I saw this joke by Breen, who is John Breen, and thought, that's really funny.

41:10 What he did is he put his own personal take on what status code means, and I thought they were hilarious, but I thought, you know, let me take a shot at this as well, a little more Python focused.

41:20 So I, I'll link to my tweet, I put this set of colloquial meanings of the HTV status codes.

41:27 All right, you all ready for this?

41:28 - Yeah.

41:29 - Do it.

41:30 - So, 200 is, what's up?

41:32 All right. All good.

41:34 201, hello creator.

41:36 304, not modified or cached.

41:39 Same old, same old.

41:41 403, permission denied.

41:42 Get off my lawn, kids.

41:44 - That was my favorite.

41:46 - 404 is just, there's no message.

41:49 It's just not there.

41:50 Not that that's the message, but it's just blank.

41:52 500 is, we're bad at APIs.

41:54 - A little bit. - Server error.

41:55 400 is, you're bad at APIs.

41:57 - Yes.

41:58 - The real cardinal sin of APIs is 200 but in the body there's a JSON that says error and a reason.

42:06 - Oof.

42:07 - 200 but with error text.

42:09 We're really bad at APIs.

42:10 - Yeah.

42:11 - 502, we're bad at deployment or DevOps because part of the infrastructure can't get to the other part.

42:17 And Brian's favorite, 418.

42:18 Is it already April again?

42:20 - Yeah.

42:21 - Because the reason is that was actually put into the spec as an April Fool's joke and they left it.

42:27 I'm a teapot.

42:28 - I love that they left it.

42:29 - I do too.

42:30 - I do too.

42:31 - It's like import this, just stuff that's fun that should just always be there.

42:37 - Yeah. - Yeah.

42:37 - What's the harm?

42:39 What's the harm?

42:39 Just leave it there.

42:41 Anthony in the live stream has some feedback for you, Leah.

42:44 "HTTP status codes using cats." Well, I never.

42:48 - I mean, where there's internet, there is cats, no?

42:51 - Oh, of course.

42:52 - Why we created the internet in the first place is for cats.

42:55 - Exactly.

42:56 All right, well, I think that's it for our show.

42:59 Brian, thanks for being here as always.

43:01 Leah, thanks for joining us.

43:02 - Thank you for having me.

43:02 Thanks for listening, everyone.

43:04 - Yeah, you bet.

43:05 See y'all later. - Bye.

Back to show page