#142: There's a bandit in the Python space

Published Tue, Aug 6, 2019, recorded Wed, Jul 31, 2019

Special guest: Brett Thomas

Sponsored by Datadog: pythonbytes.fm/datadog

Brian #1: Writing sustainable Python scripts

Vincent Bernat
Turning a quick Python script into a maintainable bit of software.
Topics covered:
- Documentation as a docstring helps future users/maintainers know what problem you are solving.
- CLI arguments with defaults instead of hardcoded values help extend the usability of the script.
- Logging. Including debug logging (and how to turn them on with CLI arguments), and system logging for unattended scripts.
- Tests. Simple doctests, and pytest tests utilizing parametrize to have one test and many test cases.

Brett #2: Static Analysis and Bandit

Michael #3: jupyter-black

Black formatter for Jupyter Notebook
One of the big gripes I have about these online editors is their formatting (often entirely absent)
Then the extension provides
- a toolbar button
- a keyboard shortcut for reformatting the current code-cell (default: Ctrl-B)
- a keyboard shortcut for reformatting whole code-cells (default: Ctrl-Shift-B)

Brian #4: Report Generation workflow with papermill, jupyter, rclone, nbconvert, …

Chris Moffitt articles
Automated Report Generation with Papermill: Part 1
Automated Report Generation with Papermill: Part 2
Jupyter Notebooks used to create a report with pandas and matplotlib
nbconvert to create an html report
Papermill to parametrize the process with different data, and execute the notebook
Copy the reports to shared cloud folders using Rclone.
Set up a process to automate everything.
Hook it up to cron to run regularly

Brett #5: Rant on time deltas

datetime.timedelta(months=1) # Boom, too bad.
Use: https://dateutil.readthedocs.io/en/stable/

Michael #6: How — and why — you should use Python Generators

by Radu Raicea
Generator functions allow you to declare a function that behaves like an iterator.
They allow programmers to make an iterator in a fast, easy, and clean way.
They only compute it when you ask for it. This is known as lazy evaluation.
If you’re not using generators, you’re missing a powerful feature
Often they result in simpler code than with lists and standard functions

Extras

Brian:

PyPI now supports uploading via API token
- also on Test PyPI

Michael:

Chocolatey package manager on windows via Prayson Daniel
GvM’s Next PEG article

Jokes

A good programmer is someone who always looks both ways before crossing a one-way street.

(reminds me of another joke: Adulthood is like looking both ways before crossing the street, then getting hit by an airplane)

Little bobby tables

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 142, recorded July 31st. I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And I'm Brett Thomas.

00:12 And yes, we have a special guest this time, third co-host, Brett Thomas. Brett, welcome to the show.

00:17 Thank you very much.

00:17 Yeah, it's great to have you here. I also want to say thank you to Datadog for sponsoring the show.

00:22 Check them out at pythonbytes.fm/datadog. More on that later.

00:26 Brett, you want to just quickly tell everyone a little bit about yourself before we get into the topics?

00:29 Yeah, sure. I'm the chief technology officer of a company called FasterThanLight.dev.

00:33 And we do static code analysis tooling for, you know, the SAS model to help you analyze your code.

00:39 And I'll talk a little bit about that later.

00:40 All right. Awesome. Sounds good. Well, happy to have you here.

00:42 Brian, you want to kick us off?

00:44 Sure.

00:44 It's great to have sustainability. And like, it's almost like Earth Day for code.

00:48 And like, you always want to just have that in mind, right?

00:51 Earth Day for code?

00:52 Sustainability. Come on. You got to roll in the air.

00:55 Yeah, I'm really worried about what climate change is going to do to my code, but...

00:59 I ran across this article called Writing Sustainable Python Scripts by Vincent Burnett.

01:05 And most of my time, I'm not writing... I mean, I don't really think about it too much for little scripts, but...

01:10 Or little helper utilities. We've got lots of them around work.

01:13 But there is an issue that I think this is a reasonable thing to talk about is if it's only going to be a short-lived script, yeah, we don't really care about it too much.

01:21 But if it's going to be a long-lived script, yeah, we don't really care about it.

01:27 And there are things you can do to make it a little bit more maintainable.

01:31 And I like the things he put down.

01:33 And then the most obvious ones, which is some people forget, is throwing a doc string at the top of the file to let people in the future know what problem you're trying to solve and kind of describe what it is.

01:45 Instead of doing like hard-coded stuff, you can have easily add some command line arguments with...

01:51 Defaults can be sort of hard-coded defaults, but having some way to make the script useful.

01:57 And then he goes into adding logging.

02:00 And I think it's kind of neat.

02:02 He includes how to do debug logging and hook that into the command line argument system, which is kind of a cool trick.

02:09 And then also for unattended scripts, being able to log to system logging.

02:15 And then finally, finishing it off with adding some simple tests to make sure that your code does what you think it does.

02:22 It's just a nice little article.

02:23 Yeah, it's easy to forget about maintaining these little scripts because like they're kind of throwaway, but actually then they're throwaway until they're not.

02:31 Go ahead.

02:32 And the more throwaway you think it's going to be, the more it's going to be the longest-lived part of your system, of course.

02:36 Of course.

02:38 Yeah.

02:38 Yeah, I have a bunch of these, and I find that I even forget that they exist.

02:43 Like, I'll be doing something super painful.

02:45 And I'm like, wow, this is really not fun.

02:47 Like, I've got to like rename all these files based on like certain stuff out of the database or something.

02:51 Like, man, I should automate this.

02:53 Like, wait a minute.

02:54 I think I did automate this.

02:55 And I'll go back and look.

02:56 I'm like, yeah, like I can just run this thing in the command line.

02:59 And like 100 files are properly renamed.

03:01 Why did I spend the last five minutes?

03:03 Why do my hands hurt now?

03:04 You know, like, and so the way I got around solving that problem, often that involves like setting up a virtual environment, activating it, then running that script because it has dependencies.

03:14 Like I mentioned the database or whatever, the models and so on.

03:17 So what I'll do is I'll create an alias in my shell.

03:21 And then I just run the alias.

03:22 And so if I go and just like, if I forget, I'll go look at my aliases.

03:25 I only got like 50 or something.

03:26 Like, it's got to be one of these, which there it is.

03:29 And then I run it again.

03:30 And that's like, that's my system until my computer gets formatted.

03:34 Then I have to start from scratch.

03:35 That's the great thing.

03:36 I was actually went through and audited my like dot profile.

03:38 I know six months ago.

03:39 And I swear I've been, I've called around the same dot profile for, I think, two decades.

03:43 Right.

03:43 You know, it's like there's aliases to do things on the systems that I haven't had access to for 20 years.

03:49 I'm like, what?

03:49 Okay, maybe, maybe it's time to clean it up.

03:51 A little pruning.

03:51 Yeah.

03:52 But no, this is a great, a great article, Brian, to like remind people just to do these little simple things.

03:58 I don't know.

03:58 Maybe there's some threshold, right?

03:59 You just like play around, you do it once.

04:01 But like if you use it a third time, then you should go back and like refactor and clean up a little.

04:05 I don't know.

04:06 Sometimes logging doesn't really make sense.

04:07 And you can just, and sometimes testing.

04:11 I mean, I'm shocked that I'm the person to say this, but sometimes manually testing stuff is fine.

04:17 If you're going to notice when it breaks, it works.

04:19 Yeah, it definitely depends.

04:20 Well, what do we got next?

04:21 Well, so the first thing I would like to talk about is a static code analysis using Bandit.

04:27 And for anyone who's not aware of what static code analysis is, static code analysis is basically running a computer program on your computer program, right?

04:36 I've actually recently heard somebody analogize it.

04:38 It's like spell check for a computer program.

04:40 The reason it's called static code analysis, it's that separate from what we usually do in testing, which is dynamic, right?

04:46 We run the program.

04:47 We give it various inputs.

04:48 We see if it does what we think it should.

04:50 But static code analysis is about the idea of examining either the source code or the object code and saying, okay, we're going to look for patterns that look troublesome, right?

05:00 And so, for example, one thing static code analysis might help you look for might be SQL injection attacks, right?

05:07 Where you've got unbound SQL variables, which is, you know, just an absolutely perennial security problem that's, you know, always in the OWASP top three.

05:14 Select star from quote plus input name.

05:17 Exactly, right?

05:18 I mean, and there's actually all kinds of great database performance reasons you want to bind your variables anyhow.

05:22 But if you don't know, definitely when you're doing SQL statements, you should be using placeholder variables and binding them instead of actually interpolating strings, especially if those strings come from some random person on the internet, as in the, you know, the famous XKCP Little Bobby Tables cartoon.

05:38 And so anyway, so Bandit is an open source tool that you can just grab.

05:43 And in fact, I believe you can just pip install it and it will, you run it on your code and it tells you things that are problematic.

05:49 Now, of course, it doesn't know what your code's doing.

05:51 Sometimes, you know, one of the rubs on static code analysis tools is that they tend to false positive a lot because they don't understand the context, right?

05:59 They say, okay, well, this kind of pattern tends to be kind of dangerous, but they don't know that the way that you're using it is absolutely fine.

06:06 You know, another just kind of hard example that I got out of out of it one time is Flask.

06:11 You know, if you're making a Flask app and you turn debugging on when you create the Flask object, which seems like a reasonable thing to do, it actually enables a debug console.

06:20 If you know the right place to go to on your web app that allows you to execute arbitrary Python code on authenticated.

06:27 You probably don't want to do that on the internet.

06:29 That is super bad.

06:30 And Django has the same problem.

06:32 And like there's tools that search for that.

06:34 It's a big deal.

06:35 You don't want to publish.

06:35 Yeah, exactly.

06:36 I mean, and that's just one of literally dozens or scores of things that Bandit can help you find, you know, those kinds of errors in your code.

06:43 Does Bandit do like Pythonic code?

06:45 Let's say this numeric four should be just like an enumerate loop or something like that.

06:52 Yeah, there's a number of different things that you can do that, you know, that kind of like range from, oh, hey, this is not a best practice or, you know, there's not this, you know, doesn't go with, you know, the correct coding style, for example.

07:02 And certainly if you are super into that, I think that can be a great resource for you.

07:08 You know, my personal focus on it, I think, tends to be more from the security side of things.

07:12 Somebody who has been running sensitive web apps on the Internet now for 20 years, that really is just in my DNA.

07:18 My prior position, actually, I was responsible for keeping several hundred million credit cards safe at a PCI DSS level one service provider called Vendicia.

07:27 That will make you a little bit paranoid.

07:30 Yeah, I mean, I don't know if you see the gray hairs there, but yeah, I sleep a lot better now that I don't have that weight on my shoulders, I have to admit.

07:39 I can imagine that one of the most stressful bit of codes I wrote was the credit card processing system for this company where the individual purchases would be like three or four thousand dollars.

07:48 And I'm like, yeah, better not mess it up.

07:50 That's what you're talking about is a whole nother level.

07:52 So static code analysis and Bandit can help like find those types of problems as well there.

07:57 Absolutely.

07:58 I'm a fan of these things.

07:59 You know, it's the problem is you kind of got it.

08:01 And sometimes at least on large projects, you got to start using them early enough.

08:05 Yeah.

08:06 Because otherwise.

08:07 Absolutely.

08:08 You've just got you've got a huge thing to go through.

08:10 And actually, I got to admit, my company, what we're doing is my new company, Faster Than Light.

08:14 We're working on packaging this stuff up as a software as a service and definitely helping you manage that.

08:21 Like, oh, hey, you know, I've already looked at this and it's not a problem kind of thing for large projects is one of the things that we're really trying to work on fixing for people.

08:28 Right.

08:29 If you can put the little code comment or whatever that says, please suppress this warning here because we reviewed it.

08:34 Actually, a customer I was talking to, you know, it's like they've got an Android app they're shipping.

08:38 You know, well, the Android scanner and this is obviously not Python, but the Android scanner is like, whoa, you've got an API key, you know, in your code.

08:45 Ah, right.

08:46 You know, and it's like, well, yeah, but it's an API key that has been carefully restricted so that it can only make one read only call.

08:52 And so, like, it's okay, right?

08:55 You know, and it's, you know, it's kind of like having, you know, kind of like that extra nervous person who occasionally freaks out and you're like, no, no, no, no, really.

09:02 It's okay.

09:03 It's not, it's, this isn't a problem.

09:04 So.

09:05 Cool.

09:05 Yeah.

09:05 All right.

09:05 So Bandit, pretty awesome.

09:06 I know it does a lot of good stuff for Python.

09:08 And they actually list out all the stuff they check on their site, right?

09:11 Yeah.

09:11 There's definitely a, you know, big document you can get with all their tests.

09:14 So.

09:14 Super.

09:14 All right.

09:15 So the next one I want to talk about is black.

09:18 And Brian, you're a fan of black, right?

09:19 Yep.

09:20 Brett, do you use black?

09:20 Do you know this code formatter?

09:21 No, I'm not familiar with it myself.

09:23 I'd love to hear.

09:23 It basically takes what, like, Flake 8 does and some of these other linting tools, a little bit like you were talking about.

09:30 Instead of saying, this file is too long, you should change it.

09:34 This variable name is unused or this indentation is not right or whatever.

09:39 Instead of just giving you a bunch of warnings, it just rewrites your code to conform to its standard.

09:44 And long as you are willing to live with a standard, a lot of people put it as, like, a GitHub pre-commit hook or something like that.

09:51 And then just the whole team is just straight up on this type.

09:55 So it's really, really popular these days over the last year or so.

09:58 However, one of the things that's super annoying is there's a lot of places where you write code where you cannot apply these kind of tooling.

10:06 And a lot of it is in places like Jupyter Notebooks or online editors.

10:11 And you're like, well, you can type your code in here.

10:13 But it's like, well, but I can't format my code in here.

10:15 And I'm doing space a lot to line up stuff.

10:18 And it's making me crazy.

10:19 Like that kind of stuff, right?

10:20 So if you use Jupyter, there's a thing that came out called Jupyter-Black.

10:25 Jupyter-Black.

10:26 And it's a super simple Jupyter Notebook plugin that gives you a hotkey to apply black formatting to your Jupyter Notebooks online.

10:34 Does that work with the Flask debug console?

10:37 No, I don't think so.

10:38 I don't think so.

10:39 But yeah.

10:39 So I think this is super helpful for the data scientists who are out there writing code.

10:45 Or maybe even if you're a teacher and you're getting other people's code, you're like, I can't look at this.

10:49 What is this?

10:50 Like these freshmen.

10:51 Control B.

10:53 Okay.

10:54 Freshman-itis is gone.

10:55 I can read this.

10:56 It's properly formatted like a professional.

10:58 Now let's review it.

10:59 Things like that.

11:00 I just think it really brings a cool tool to a new place.

11:03 And I'm sure it would be really welcome.

11:04 You answered my question that I had right away is, does it format just the current cell or the whole thing?

11:10 And yeah, there's two different keyboard options.

11:13 Control B and Control-Shift-B that do both of those.

11:17 Yeah.

11:17 Control-Shift-B is probably the one you want.

11:19 But there's also a little toolbar button if you are not a hotkey person.

11:23 So yeah, it's super simple.

11:24 It just plugs in like a standard Jupyter notebook extension, which I don't really do a ton with.

11:29 But it sounds really easy to install it.

11:31 And then the only other requirement is that you have Black installed on the system or the virtual environment.

11:36 Because it has to like basically shell out to Black and figure out what's happening.

11:40 All right.

11:40 Before we get on to the next topic, though, let me just quickly tell you about Datadog.

11:45 So this episode, like many of ours, is sponsored by Datadog.

11:50 They're a cloud monitoring platform built by engineers for engineers, like all of us, right?

11:55 And so what it does is it auto instruments Django, Flask, Postgres, like MongoDB, AsyncIO, all these different things.

12:04 And will allow you to trace your requests across servers, across processes, and bring you basically a holistic view of like what is the request doing.

12:14 Because it's great to like profile your Python code, but there's a whole lot of other stuff happening.

12:18 That's maybe where most of the stuff is happening, right?

12:21 In the database or in the framework or whatever.

12:23 And so this brings it all together.

12:24 And it integrates with over 350 technologies.

12:27 Hadoop, Redis, all the good stuff.

12:29 So check them out.

12:30 They've got a free trial, pythonbytes.fm/Datadog.

12:34 And you also get a sweet Datadog t-shirt.

12:36 So like that alone makes it worth it, I think.

12:38 All right, Brian, what's this next one you got here?

12:40 Well, I'm glad that we checked ahead of time and make sure that we've had two Jupyter articles.

12:44 Yeah, right next to each other.

12:45 That's perfect.

12:46 Yeah.

12:47 This is involving Paper Mill.

12:49 And I think I'm pretty sure we've talked about Paper Mill before, at least briefly.

12:52 We covered Paper Mill live at PyCon.

12:55 Oh, yeah, right.

12:56 Yeah.

12:56 So I included this because it's a two-part article series that talks about the entire workflow,

13:01 which that's where it seemed it looked pretty interesting to me.

13:04 So this Chris Moffitt wrote part one and part two of automated report generation with Paper Mill.

13:10 So it's taken Jupyter notebooks that use Pandas and Matplotlib to create a report.

13:16 And then using nbconvert to take that and create an HTML report.

13:21 And then go through and use Paper Mill to parameterize the input of this entire process and to set up execute blocks.

13:30 And then he completed the process, talked about the rest of the workflow,

13:34 about using a new tool that I've never heard of before, which is called Rclone,

13:38 to clone different cloud directory services and keep the same directory on lots of different cloud services.

13:45 And then how to, if you're in a Linux box using Cron to set up a regular process for this whole thing.

13:53 I mean, the example is a simple thing like a monthly sales report that you want to have just go out.

13:59 Somebody can pop in the data in a spreadsheet or something like that, but then all the reporting and the data analysis and everything can happen afterwards.

14:08 And just going through the, from the top to the bottom, the whole workflow, I thought was a real nice touch.

14:13 I love this because it takes the boring stuff that you don't want to do.

14:18 And it just hands it over to the computers in like a beautiful way, taking some of the really new and nice tools,

14:23 Paper Mill, Jupyter, and so on.

14:25 And it just automates it all.

14:27 So instead of like every Friday, you're like, oh, there's that two hours of like copying data from system to system for the report.

14:34 It's like, it just shows up in the email, right?

14:36 It's just on the internet or whatever, right?

14:39 This is really cool.

14:40 And just to summarize Paper Mill, basically it turns Jupyter Notebooks into like functions or command line style applications.

14:47 It can be called.

14:48 You provide data to it, inputs, it runs, and then output comes out.

14:52 So you have the general analysis report.

14:54 You feed like, hey, it's from July 1st to July 31st.

14:58 Drop the files here.

15:00 Go.

15:00 Yep.

15:00 It's nice.

15:01 Yeah, very cool.

15:02 Brett, you got any things like this that you guys got to do at Faster Than Light?

15:04 We are still new enough.

15:06 I've got to admit, I was literally was just thinking to myself this morning about how I really need to start writing some nightly reports that tell me what all everybody was doing in the system yesterday.

15:14 So we actually still haven't completely fully launched.

15:18 So thankfully, there isn't too much yet that I need to know about that's, you know, it's like, oh, yeah, how many tests did I run yesterday?

15:24 You know, is kind of what I would be getting here.

15:27 But yeah, that's definitely something that I'll be looking at as we start to do our release.

15:33 Yeah, you're in that beautiful place where the molasses of real life day-to-day business operations hasn't hit you yet.

15:40 You can go quick and build things.

15:41 Absolutely.

15:42 And I got to say, it is so strange having run four nines plus environment for, you know, a decade and a half to all of a sudden be like, oh, yeah, you know, production's down.

15:52 Nobody noticed, right?

15:53 You know, because you haven't actually done anything yet.

15:55 So it's a different world.

15:56 It's a different world.

15:57 Cool.

15:57 All right.

15:58 Well, you got the next item.

15:59 Tell us about it.

15:59 Actually, it's a little bit of a rant for me because it was something that was just kind of surprising to me given the how much is in the Python standard library and how much, you know, just, you know, there's one of the things that's really great about Python is this, hey, how do I do this thing?

16:13 And it's like, oh, wow, it's in the standard library and you just call a function and it works.

16:18 And that is that quite a while ago, I ported a database from an application from a Postgres database to an Oracle database, which I know is a ridiculously stupid thing to do, but a customer was paying us a lot of money to do it.

16:29 And I discovered that Postgres is interval types, which, you know, is just, you know, when you're doing a timestamp and you go, okay, I want to add a week to this or a day or whatever.

16:39 There's an SQL type that's called an interval where you can just have arbitrary amounts of time that you can add to a timestamp.

16:45 Well, so Postgres lets you do anything completely arbitrary.

16:48 What turned out, Oracle didn't.

16:49 And it took me a little while to kind of understand why.

16:52 And I actually ended up having to write my own interval parser.

16:54 So I really had to, like, understand how all this stuff works.

16:56 And it turns out that all date intervals really, at the end of the day, boil down to a number of seconds or a number of months.

17:04 It's one of those two things.

17:05 Because if you think about it, a week is a number of days and a day is a number of hours and an hour is a number of minutes and a minute is a number of seconds.

17:14 You know, if you've been a developer for any length of time, you probably know off the top of your head that there are 86,400 seconds in a day because it just comes up all the time and you remember it.

17:24 But the other two is months and years.

17:27 And a month is not a constant number of seconds.

17:29 It's anywhere from 28 to 31 days.

17:32 And depending upon how many that is, that actually varies by what year it is.

17:36 And it's actually really kind of difficult to tease all that apart.

17:39 And so it actually turns out if you use the date time library that comes with Python and you use the time delta object that comes with it and you try to set an offset of months or years, it just says, sorry, can't do that.

17:52 Right.

17:52 You can't do, I can't tell you what a month from now is.

17:55 So that was just really surprising to me and really kind of frustrating because actually the reason why I needed this is actually I was setting up our subscriptions service.

18:04 Although I will say, of course, I'm not actually hanging on to the credit cards now, but I wanted to be able to test it.

18:09 Right.

18:09 And so I want to say, okay, if we, you know, if you're, if we'll start your subscription, let's go a month and a day out and see if your subscription is still active because it shouldn't be.

18:17 And said, oh, wow.

18:18 Okay.

18:18 You can't do that.

18:19 So there is another package out there that you can just get off of pip.

18:22 It's the date util package.

18:24 And it has a time delta replacement called relative delta that just supports months and years and works very similar to the time delta thing.

18:32 So if you've got the problem of, oh, hey, I want to know what it is a month from now or 10 years from now, that'll let you calculate those timestamps.

18:40 Because, of course, the problem is that parsing that stuff, like when you write the library, when you're advancing months, I mean, it's got to be text based.

18:47 Right.

18:47 You got to go, okay, I'm going to turn this thing into a date string and then I'm going to increase the number of months and see if I've overflowed the number of years, increase the number of years, and then turn it back into a timestamp.

18:59 So it's a much slower process from a calculation perspective.

19:03 And I suspect that's probably why the original Python library doesn't just support it out of the box.

19:08 Yeah.

19:08 But that is frustrating, right?

19:09 Because a totally reasonable thing.

19:10 And actually, the hardest of all things to compute is how many months from now is it?

19:15 Close to that.

19:15 It's how many years, right?

19:16 Yeah.

19:17 It can be just.

19:17 And of course, actually, we're running up, we're going to run up against the Unix 2037 thing right now.

19:22 Which, by the way, I saw someone point out, I hadn't actually thought about this.

19:25 There are places, businesses that are trying to do things like generate a certificate that expires in 20 years.

19:30 Like, you can actually run into that problem in your code now if you're still running on, if you're not running on 64-bit native code.

19:37 Which is actually, as I understand, a bit of a problem still in Linux.

19:40 The Linux kernel is not doing a good job of handling all of that.

19:43 That's going to be an increasing problem as we get closer to that barrier.

19:47 Sounds like a great opportunity for consultants.

19:49 Oh, yeah.

19:50 I'm sure.

19:50 I mean, it would not surprise me if I, in my career, as a year 2037 consultant, right, as one of the last people who still knows how to program those old systems.

20:00 Yeah, this is a good recommendation.

20:01 I like the DateUtil library.

20:03 I love the parsing for it.

20:06 I like parsing date times.

20:08 It's annoying in Python.

20:10 Parse, ST, I can't remember even.

20:13 Because I stopped using it.

20:14 Because I just import parse from DateUtil, and I'm good, right?

20:17 Yes.

20:18 It seems to be able to guess the format that you're going for really, really well.

20:22 I was doing a bunch of coding recently with DynamoDB, and it was just like the timestamps you get back out of the Boto3 library is just like one character off of what you can natively parse.

20:33 It was just like, why does this have to be so painful?

20:36 Did DateUtil take it?

20:38 Yeah, no, a date I was getting out of Boto3.

20:39 I don't remember the exact details, but the date I was getting out of Boto3 had like one.

20:43 It was like, so I literally am doing like a text substitution on a particular character to turn it into something else so that it'll then parse it.

20:51 Cool.

20:51 I do that on XML to get rid of the namespaces all the time.

20:53 All right.

20:55 So, Brian, before we've spoken about understanding the language and some of the core language features, and I just want to come back to this topic a little bit and focus on Python generators.

21:06 So, there was a cool article recommended to us by one of the listeners.

21:10 It's not super new, but we haven't covered it, so I think it's totally relevant.

21:14 It's an article by Radu Raycia.

21:16 Hopefully, I'm getting that roughly right.

21:18 And it's called How and Why You Should Use Python Generators.

21:21 So, basically, it talks about what are generators, how you should use them.

21:27 And I wanted to cover this because I feel like there's a lot of people that come from other languages, and it's both a blessing and a curse of Python that people can come from C or Java or JavaScript or other languages and just go,

21:40 Oh, this is simple.

21:41 I learned it in a weekend.

21:42 Let me write my code now.

21:43 Right?

21:44 And they're doing numerical for loops, and they're doing, you know, like tons of stuff that is not really Pythonic, right?

21:49 Right.

21:49 And they've got 27 Stack Overflow tabs open for, like, oh, how do I open a file again?

21:54 Right.

21:55 Yes, exactly.

21:55 And so, a lot of languages don't have this idea of generators or coroutines, which are just amazing, right?

22:02 Like, you've got a function.

22:03 It's going to process some huge amount of data.

22:05 Maybe it needs to read a 10-gigabyte file and parse it line by line.

22:09 Well, if you write that as a generator, if you only pull 10 lines from it, it only reads 10 lines.

22:15 Or even if you've got to go through all of it, it only loads one line into memory at a time.

22:19 And often, the implementation of the generator using the yield keyword is actually simpler, shorter, cleaner than if you were to try to build it up into a list and then return that list and all those kinds of things.

22:28 And code that doesn't exist doesn't have bugs in it.

22:31 That's a good point, yes.

22:32 Code that doesn't exist does definitely not have bugs in it.

22:35 So, this article is good if these generator ideas are new to you.

22:39 It talks about the lazy evaluation, which is really important to understand, and gives you a couple of simple examples.

22:45 It's not super deep.

22:46 So, if you're new, read through it.

22:47 If you really know it pretty well, you probably won't gain a whole lot about it.

22:51 But it's something you could shoot over to your coworkers.

22:52 You're like, why did you write this code?

22:54 Please don't do that again.

22:55 Use this.

22:55 Yeah, this is good.

22:57 Even myself, an experienced Python person, there's certain times where I'm like, why didn't I think of using a generator earlier?

23:04 Yeah, absolutely.

23:04 I mean, you don't always know that that's really the best path.

23:09 You just start writing the code.

23:10 You're like, I'm going to have a list.

23:11 I'm going to put stuff in the list.

23:12 I'm going to do this.

23:13 I'll get a dictionary, whatever.

23:14 Like, wait a minute.

23:14 Actually, I didn't need any of that.

23:16 I could do it way better, right?

23:17 So, you kind of got to have it in mind.

23:18 I would love it if tools like PyCharm and VS Code had a button to, like, refactor to generator, right?

23:24 Convert this list and return the list into, like, a generator.

23:27 It probably doesn't exist because, you know, like, the way you process the results has, you know, some kind of effect, right?

23:33 You can't, like, go through a generator twice, but you can go through a list twice or things like that.

23:38 But it still would be really cool if you could kind of, like, automate that a little bit.

23:41 Yeah, I was thinking more along the lines of when I have my custom data structure that is essentially a container structure

23:47 and I forget to add iterin next to it so that it can be used, generators can be used with it.

23:53 Yeah, exactly.

23:54 You can fit it into that whole pipeline.

23:55 Cool.

23:56 Well, that's it for our main items this week.

23:58 You got anything else you want to quickly give a shout out to, Brian?

24:01 Well, I just saw that we've got a link to PyPI now supporting API tokens.

24:07 There's been a lot of recent changes to the PyPI interface to make it more secure.

24:13 And this is just one of the latest.

24:16 And I think it's a good way.

24:18 They're doing well about making sure that these changes are supported on the test server as well so that you can test out the changes first.

24:26 Yeah, that's pretty good.

24:28 So you don't have to just do it on the immutable write only or write once real version.

24:33 Yeah, yeah.

24:34 Super cool.

24:35 So I think this is like evidence, you know, there was a big push and that funding, that grant from Mozilla to basically modernize PyPI, right?

24:43 And the work that the PyPI did to move that along.

24:46 It's like now you can start seeing these new features coming in because previously it was like no one wants to touch that.

24:52 There's no way we're adding new features.

24:53 Like we're just trying to keep it from breaking.

24:55 Now it can grow.

24:56 It's cool.

24:56 Yeah, it's nice.

24:57 I've got a couple I want to give a quick shout out to.

24:59 Last week we covered possibility of this exploration of moving to peg parsers as opposed to the original sort of one-off version of the parser that Gita Van Rossum had written for Python 30 years ago.

25:14 And so now he's written another article talking about building a peg parser and moving towards it and so on.

25:20 So if that was interesting to you, you can check out that follow-up that he wrote.

25:24 And then finally, we've talked a lot about Homebrew and obviously people know about apt and other package managers on Linux.

25:31 But I don't think we've really talked that much about Windows, right?

25:35 You don't have Homebrew on Windows or many other things like that.

25:38 So Preston Daniel sent us over a quick message that, hey, if you guys get a chance, you should give a shout out to the chocolatey package manager on Windows.

25:47 Are you familiar with this, Brian?

25:48 No, I've never used it.

25:49 No, I've never used it.

25:50 Actually, I do all of my Python or a lot of my Python development actually under Windows, but I'm using WSL, so I don't actually do anything natively in Windows.

25:59 Yeah, yeah.

26:00 So in a sense, you have the Windows UI, but it's kind of Linux-y in some of the tooling.

26:04 WSL was just kind of magic.

26:06 I mean, it definitely is in that the thing that's amazing here is not how well the bear dances, but that it dances at all.

26:12 But yeah, I have a Windows gaming laptop I bought years ago.

26:16 And now that I'm in a startup, of course, I'm using whatever hardware I had lying around in order to not spend any money, right?

26:21 And so yeah, that's my development laptop is my old gaming laptop.

26:25 And so yeah, it's all WSL.

26:27 And I'm simulating AWS services on it so that I can develop offline and stuff.

26:32 It boggles me that you can actually do all of this stuff at all.

26:36 Yeah, that's super cool.

26:37 That's super cool.

26:38 Yeah, so Chocolaty is like homebrew, basically, but for Windows.

26:41 So you can say Choco install such and such.

26:44 And I link to Python, right?

26:46 So actually, on Chocolaty, you can now install Python 374, which is kind of impressive, right?

26:52 That came out like a couple weeks ago.

26:54 But if you actually look at the versions, you can install Python 3.8 beta 3, which came out yesterday, right?

27:00 Like it's right on top.

27:02 And they even do like limited virus scanning and like validation a little bit.

27:06 So yeah, it's a pretty cool little system.

27:08 And people should definitely check it out if they're doing work natively on Windows.

27:12 Yeah, nice.

27:13 Brett, anything else you want to throw out there while you're here?

27:14 That's probably all the great Python ideas that I have at the moment.

27:19 All right, super.

27:20 How about a joke?

27:20 Yeah, how about a joke?

27:21 I actually have a couple of jokes for you.

27:23 A programming joke and then just an adulting joke following up on that.

27:27 So it's more of, I guess, more of an assessment.

27:29 A good programmer is someone who always looks both ways before crossing a one-way street.

27:34 Does that connect with you, Brian, as a tester?

27:38 Yes, definitely.

27:39 Having just gotten back from wandering around in London and always looking the wrong way when crossing a one-way street, that resonates with me.

27:46 That's awesome.

27:47 Yeah, I'm always paranoid when I'm in London or in Australia.

27:50 Like, I'm like double checking both directions.

27:52 They're like, you just got to look that way.

27:54 I'm like, no, that's what you say now.

27:56 Then there's going to be the time I look the wrong way and one of those big red buses is going to crush me.

27:59 And I'm just, you know, so I'm just like a paranoid squirrel trying to cross the street in the UK.

28:05 It's great.

28:05 Yeah.

28:06 So then related to that, not quite programming, but adulthood is like looking both ways before crossing the street and then getting hit by an airplane.

28:14 I want to throw one more in there, though.

28:17 Oh, go ahead, Brian.

28:17 No, I think that's good.

28:19 That's funny.

28:19 It's a little too real to be funny, though.

28:21 Like, we aren't laughing like, yeah, that hurt.

28:24 All right.

28:25 So, Brett, you started this one.

28:27 So I'm going to throw it out here for everyone.

28:29 Little Bobby Tables.

28:30 Brian, do you know about Little Bobby Tables?

28:32 Well, I remember it, but I probably couldn't explain it.

28:35 Yeah.

28:35 So it's XKCD.

28:36 And here, I'll just read it.

28:38 We'll just leave it out there for folks.

28:39 It's a mom answering the phone and says, hi, this is your son's school.

28:44 We're having some computer trouble.

28:46 Oh, dear.

28:47 Did he break something?

28:47 In a way.

28:48 Did you really name your son Robert, quote, parentheses, semicolon, drop table students, quote, semicolon, dash, dash?

28:55 Oh, yes.

28:57 Little Bobby Tables, we call them.

28:58 Well, we've lost years of student records.

29:00 I hope you're happy.

29:01 And I hope you've learned to sanitize your database input.

29:04 The thing I really love about this idea is that she has saddled this child with this terrible name for this one opportunity, right?

29:14 I mean, when he gets to be 24, it's never going to work again, right?

29:17 I mean, I guess unless he goes on to be a teacher.

29:19 Then he can just cause havoc wherever he goes for the rest of his life.

29:22 It sounds crazy, but there's this pen tester, penetration tester, who has a Tesla.

29:27 And in the app, you can change the name of your Tesla.

29:30 He changed it to a JavaScript injection string.

29:35 And it went off when his car had to go get some service.

29:38 Yeah, I believe it.

29:40 I just have to say, as a general aside about anything, as somebody who ran, you know, a PCI DSS compliant thing for a long time, get pen tested.

29:47 Like, every time I got pen tested, those people came up with something creative and I learned something.

29:52 You know, like that's like, take the first dollar that you have to spend on security and hire a pen tester.

29:57 Yeah, that sounds like good advice for sure.

29:59 All right.

29:59 Well, that looks like it.

30:00 That's it for us, you guys.

30:01 Brian, thanks as always.

30:03 And Brett, thank you for coming this time.

30:04 My pleasure.

30:05 Yeah, thank you.

30:05 Yep.

30:06 Bye, everyone.