Transcript #417: Bugs hide from the light
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:04 your earbuds and mine. And this is episode 417, recorded January 21st, 2025. And I'm Brian Okken.
00:12 And I am Michael Kennedy.
00:13 And we're excited about this show today. And nothing ain't nothing going to bring us down.
00:20 So but before we get started, I want to thank everybody that has supported us through
00:26 Talk Python training through pythontest.com, the courses through ByMyBook, our Patreon supporters,
00:33 of course, you rock. And of course, many of the sponsors that have sponsored us in the past.
00:39 And we love them too. But we also love people that support us directly.
00:43 If you'd like to send us topics, please do so through there's contact form on our website.
00:49 But also you can send them to us at Blue Sky or on Mastodon. And those links are in the show notes.
00:56 And if you are listening to this, thank you. And also share it with a friend. And if you'd like to
01:02 join us live sometime, check out pythonbytes.fm/live to see when the next episode is going to
01:08 be filmed, filmed and recorded. And you can join us and comment while we're going live. And thank you
01:16 also for people that subscribe to the email newsletter. If you go to pythonbytes.fm, you could subscribe there
01:24 as well. And get the list of all the topics directly in your inbox so you don't have to go look those up.
01:30 Yeah, we're evolving the format of that a little bit, trying to make a little deeper analysis,
01:34 but also skimmable. And yeah, it's a huge resource. I think it's great.
01:39 Yeah.
01:39 People listen as well, but it's also nice to just have that written down in one place.
01:43 And we cover lots of great topics every week. And what is our first topic this week, Michael?
01:48 The first topic will be the LLM catcher. The name, not terribly descriptive of what it actually does,
01:57 but here's the deal. I'm sure everyone has done this at this point. I know I've done it recently as I was
02:03 yelling at the Bodo 3 API because there ain't nothing as frustrating as a little bit of a little
02:10 Bodo auto-generated, no comments, no documentation, no idea what parameters go in it. Anyway, you might
02:17 take those errors and pass them over to an LLM and go, please, dear chat, co-pilot, anthropic,
02:25 whatever. What is going on here? What am I missing? Right? And it's super helpful. But this
02:30 project is a little bit different. It's like a gateway to those types of things. So here's what
02:34 you get. If there is a crash, obviously you have stack traces or tracebacks, depending on the language
02:40 you and how you say it. They describe it here as the unsung villains of debugging. Why wrestle with the
02:49 wall of cryptic error messages when you could let LLM catcher do the heavy lifting? So here's the thing.
02:54 You basically, I'll go down here somewhere. What you can do is in your try accept blocks,
02:58 you can say, you know, given an exception, diagnose or dot diagnose, passing the exception,
03:03 and it will pass those details over to various LLMs and say, help me understand this and print out a
03:11 message that will show me how to fix it, not just traceback. Okay. So I don't know if I can find it.
03:17 Yeah, I think it's pretty dope. I would not use it in production, though you could, you could,
03:22 if you want your logs to have messages about here's actually what happened, it's your debugging
03:26 sidekick. So what you do is you can run OLAMA locally, and that's the default, or if you give
03:32 it your open AI API key, it can pass it over to whatever level of model you possibly have. You know,
03:40 it'd be awesome if you could have a one mini or something like that, diagnose it over at ChatGPT.
03:45 So there's different ones that we'll work with. But basically, when it gets an exception,
03:49 it says, hey, I'm working on this thing with FastAPI, and I get this exception, help me figure out
03:55 what's going on. So the OLAMA one is a local free just running your machine version, open AI, well,
04:01 we know, all know about ChatGPT, right? So you can put it as a decorator on a function, you can manually
04:07 do it in a try except block, or you can even register a global exception handler. So anytime
04:12 a global exception happens, it's uncaught in your system, it'll diagnose it has both async and async
04:19 API, and you can set it up through environment variables. So, you know, it shows you like how to
04:24 pull down the the qwen 2.5 coder model for OLAMA, which is pretty excellent. And just off it goes,
04:31 look at that. So you've got your diagnoser.catch on a risky function, or in your try except block,
04:36 you just say, you know, diagnose or async diagnose, because you know, it's going to run for a while,
04:41 it's going to make an API call either locally or out to ChatGPT. So you don't want to necessarily
04:46 block your system. So you just make a little async await, boom, off it goes. Yeah, there you go.
04:51 That's pretty much it. You can get formatted or unformatted information back. So if you need plain
04:57 text to go in some kind of JSON field, you can do that. Or you can get it with
05:01 proper formatting to make it more readable. What do you think, Brian?
05:04 I'm going to withhold judgment on this until I give it a shot at some point.
05:09 But yeah, you can even specify the temperature, aka the creativity you want the model to apply to your
05:15 analysis. That's funny. Yeah, it's an open AI thing.
05:18 I kind of like it to like, on any exception, just upload my entire code base and rewrite my code to
05:26 fix the error. Exactly. Don't diagnose it, just fix it. Why am I even in the way here?
05:31 Yeah, so look at it. Yeah, you can even do the full on 01 model of ChatGPT, which is like the really,
05:38 really high one. Is that the $200 one?
05:40 That's the $20 one, but you only get to call it 50 times a week. So not too many errors. If you get the $200 one, you can call it all day long.
05:48 Yeah, I'd like people to get the $200 one, put this in your CI and do it over all versions of Python so that we just fill up all of the data.
05:57 And then we'll get an announcement of, oh, the entire West Coast is blacked out because we broke the power grid.
06:04 Yeah.
06:05 But anyway, I think it's interesting, right? Just plug that in.
06:09 Yeah, it looks like it might be kind of fun.
06:12 Yeah, it does look kind of fun. And this was recommended by Pat. So thanks, Pat, for sending that in. Pat Decker.
06:17 Oh, and Pat's here. Thanks, Pat.
06:19 Yeah.
06:20 Yeah.
06:20 Over here.
06:23 Well, I kind of want to talk about bad packages a little bit.
06:28 Like no Christmas presents for them or what's going on?
06:31 Yeah, no wrapping paper. Actually, we are talking about wrapping. So I want to talk about the Python packaging index and malicious stuff. So let's scroll down here.
06:44 There's in the in the there was a security and safety engineer first year in review. This was from Mike Fiedler and and he talked about a lot of stuff.
06:55 But one of the things he talked about was quarantining. And this came out in August. But but I just am catching up.
07:01 So it's like if they catch COVID or what's going on.
07:04 No, it's it's you know, it's like bad packages. So if somebody says, you know, there's a there's there's malware in a in a package shouldn't be there.
07:14 What do we do with it? And they used to like have the option to investigate it and then yank it.
07:19 But it just sort of makes the whole thing go away.
07:21 But there's a new process. And they just recently at the end of December wrote about it.
07:27 And there's it's called Project Quarantine. And in this we're linking to an article that really talks about it.
07:33 So if you're if you're worried about malicious packages and you're curious about what IPI is up to, go ahead and check this out.
07:39 I'm not going to go through the whole thing. However, it is kind of interesting.
07:43 So the idea is if we jump back down to like future improvements and automation, hopefully we'd have some sort of automated way.
07:52 Like let's say a couple people report that a package has malware in it.
07:57 Administrators of PyPI can go ahead and and and somehow have some litmus test to say or some thing to say rather quickly.
08:07 Let's get this get this get this under control.
08:09 And the quarantine doesn't delete the whole thing.
08:12 It it puts the there's an API simple API that an admin can go in and say, hey, we're going to quarantine this project.
08:20 And the package goes into quarantine.
08:22 And at that point, there's a bunch of the bunch of stuff happens.
08:26 The it's not installable, but the owner can still see it.
08:30 And the only owner can can make I don't know if they can make changes.
08:34 But, yeah, it's not modifiable while it's in quarantine, but they can see what's going on.
08:39 Administrators can look at it and and determine whether or not there really is malware there.
08:45 And possibly it's possible that, you know, we might have some bad actors reporting packages.
08:51 So we don't want people to like report stuff that's fine and have things to remove just because they're angry about it or something.
08:59 But that hasn't happened yet.
09:02 So this has been in place for a little while.
09:05 And looking at the statistics, it's been let's see since August.
09:11 They put this in place.
09:12 There's been 140 reported packages and they've been gone into quarantine and only one of them exited quarantine.
09:22 And it's because why was it the the there was obfuscated code in there.
09:28 Then that's a violation of the PIPI acceptable use policy.
09:31 Project owner was contacted.
09:33 They fixed it because they just, I guess, weren't aware that you can't do that.
09:37 That's a really interesting.
09:38 I didn't know that was a policy.
09:40 Yeah.
09:40 Well, I mean, if you want to ship something that you I know there are companies out there that would like we would like to obfuscate our code, but we still want to make it
09:40 available.
09:48 But we don't do it through PIPI.
09:51 I guess don't do it through PIPI.
09:52 Okay.
09:52 Yeah.
09:53 I don't want to pipe obfuscated code.
09:55 So I understand that that's primarily a shield that malware hides behind.
09:59 Right.
10:00 Like, yeah, they'll have a base 64 encoded string of something or other than they'll decode it and execute it.
10:05 Right.
10:05 And that's it.
10:06 Yeah.
10:07 So, yeah, there's created some outreach templates.
10:12 So the full process, if you're confused or if you have a, this is something, if you get notified by an administrator that one of your packages is in quarantine, they'll
10:12 probably point you to this anyway.
10:23 But, you know, check this out.
10:25 I'm glad that they're working on this and we're making the environment easier for PIPI admins to deal with, but also just safer for everybody to use.
10:34 So it's good.
10:35 Yeah.
10:35 Excellent.
10:35 Well, you know, I'm sure you're aware of this, Brian.
10:38 Testing.
10:39 Testing makes your code safer to use.
10:41 Yeah.
10:41 And I have fully embraced the async lifestyle these days.
10:45 You know, I talked about rewriting Python and Quart, the async version of Flask.
10:51 And I blogged about that and brought it up on the show, I'm pretty sure.
10:54 But how are you going to call APIs?
10:56 I'm working on some projects right now that are like all about calling APIs.
10:59 And I'm like, oh, my gosh, so many APIs.
11:01 This thing calls that, which, you know, so on.
11:04 If you can do that asynchronously, that'd be awesome.
11:06 And I would say probably the best kind of, I'm a fan of requests, but I want async story these days is HTTPX.
11:14 Yeah.
11:15 Which has got some basically very, very similar, not identical, but very, very similar behaviors and API patterns as requests, but also has an async variant.
11:24 Create the async client and then await all your calls, which is great.
11:27 So you might want to test that, right?
11:29 Even asynchronously as you're writing code.
11:31 That's async.
11:32 So I want to introduce people to RESPX, like response X.
11:37 Probably is the way you pronounce that.
11:39 I'm not talking about HTTPX or response X.
11:42 I don't know.
11:42 Whatever.
11:44 HTTPX.
11:44 And what it does is it lets you mock out HTTPX requests.
11:49 Super, super easy.
11:51 However you like.
11:52 So for example, if I want to make a call where I say HTTPX get, and I want to make sure that if that URL comes in, it's going to return some particular value, like a 204.
12:03 You just say RESP.same function call with the values.
12:06 And then you just say .mock and you set the values or the behaviors that you want it to do.
12:10 And off it goes.
12:11 That's pretty cool.
12:12 Yeah.
12:13 And it also comes as a pytest plugin, if you want to roll that way.
12:16 So then you just say RESP.mock.whatever and just call the functions.
12:20 And then all the examples here, like first line, mock it.
12:23 Second line, call it.
12:24 But, you know, probably you're testing some function that then internally is using HTTPX through like a width, async width block.
12:31 And, right, like there's a lot of layers going down there that you might need to work with.
12:35 And so, right, that would be a more realistic example.
12:38 You call the mock and then you call your code and then something happens, right?
12:42 So that's pretty cool.
12:43 You can even use mark.
12:45 It probably makes sense of this pytest.mark statement here for me.
12:48 What are we doing?
12:49 Well, what do you mean?
12:52 Okay.
12:52 So you got pytest marking it with RESPX.
12:56 So the project is to find a custom mark and it's passing in the base URL of foo.bar.
13:03 And then within it.
13:04 Yeah, you don't have to, I guess, say the base URL, right?
13:07 Right.
13:07 Because you're just passing it in.
13:10 You can, because it's really, it's not that bad.
13:13 Not that hard to pass in through markers of variable to fixtures.
13:18 So that's what's going on.
13:19 I see.
13:19 So you kind of pre-pare it with your mark here.
13:21 Okay.
13:21 Yeah.
13:21 Awesome.
13:22 Yeah.
13:22 And then the, then the fixtures passed in.
13:24 Okay, cool.
13:24 And that's pretty much it.
13:26 There's not a whole lot of, not a lot of to say about it.
13:28 But if you need to mock out HTTPX, instead of using generic mock stuff, you can use this
13:32 library that basically has exactly the same API as HTTPX.
13:36 Pretty cool.
13:37 Sometimes I forget that not everybody has completely internalized the entire content of my book.
13:42 Well, we can work on that.
13:44 We can work on it.
13:45 Yeah.
13:46 I learned something new.
13:49 Oh, really?
13:50 You know what?
13:51 If you, I think if this is your next topic, I had no idea about this either.
13:55 So I'm about to learn something new.
13:57 Okay.
13:57 Well, so this is actually something that Rodrigo also learned something new because he marked
14:04 it as a TIL for today I learned.
14:06 And I kind of love people posting the TILs, but also I'm, I'm personally, personally somebody
14:13 that I don't think you need to prefix things with TIL for today I learned.
14:17 If you just have a small blog post, go ahead and post it.
14:20 I like small posts.
14:21 Anyway.
14:22 So unpacking keyword args with, or K-Kwargs.
14:26 I usually just say keyword args.
14:29 Do you have, do you say quarks?
14:30 I'm K-W-Args.
14:32 K-W-Args.
14:33 Okay.
14:33 But I know people say quarks, but I don't know.
14:35 It sounds like I'm speaking Klingon or something.
14:37 I don't do it.
14:38 Yeah.
14:38 It makes me think of Deep Space Nine with quark.
14:42 But unpacking keyword args with custom objects.
14:46 So let's say you got, so there's a couple of things.
14:49 And we're talking about the star or the double star or the splat splat or double splat.
14:56 I don't really want to say it.
14:57 So let's say you've got a dictionary and you want to pass that, the contents of the dictionary
15:04 as arguments to a function or something.
15:07 That's how we often use it, is doing a star star with a dictionary and it unpacks it into
15:13 keyword args for a function call, which is cool.
15:16 Or you can just do it.
15:18 Here's an example of merging two dictionaries with this.
15:22 I don't do it.
15:23 I don't usually do this much, but cool.
15:25 You can do that.
15:25 There's a newer syntax where we use the pipe on dictionaries as well.
15:29 And that's the same thing.
15:30 There's like three or four ways to do this these days.
15:33 Yeah, because with Python, there should be one obvious way to do.
15:38 And if there's not, there's four.
15:39 Unless it's strings, then there's six.
15:41 Okay.
15:44 So there's a lot of times where doing this star star unpacking is like so cool and convenient.
15:51 And but if you have custom objects, not dictionaries, if you have your own objects, what do you want?
15:58 How do you deal with that?
15:59 Can you do that?
16:00 Yes, you can.
16:01 And that's what this little TIL is about.
16:03 All you have to do is you have to add a keys function to your object or your class.
16:09 And the keys function needs to or method needs to return an interval.
16:14 And in this case, just a list is an interval, for instance.
16:16 And and then the example, he's got a Harry Potter class.
16:20 It was returning first, middle and last.
16:22 And then a get item that presumably that takes a key and returns something.
16:30 And that's all you need.
16:32 And then you can you can do this double splat thing.
16:35 And it works.
16:36 Oh, that's awesome.
16:37 And also the example is good also to just to remind everybody that when you're doing the get item to go ahead and do an else clause with a with a key error.
16:48 So if people pass in the wrong thing, they get the appropriate exception.
16:52 So anyway, thanks.
16:53 Yeah, I love it.
16:55 Very, very cool.
16:55 All right.
16:57 For items, right?
16:58 That's it, I guess.
17:00 You feel pretty extra.
17:02 I can see.
17:02 I do feel pretty extra.
17:04 I got more extras than I had normal things.
17:06 So let's jump in.
17:07 So let's do it.
17:08 Over on python test dot com.
17:11 Oh, a couple of things.
17:13 I'll just kind of go backwards.
17:14 First off, I finally fixed it.
17:16 I had X up or Twitter and I don't do Twitter anymore.
17:20 So I replaced it with a blue sky icon.
17:23 And also on my contact form has blue sky now.
17:26 So I fixed those things.
17:29 Also, I had like incorrect podcast stuff up.
17:33 So I fixed my podcast data.
17:34 Testing code and Python bytes and stuff.
17:36 Of course.
17:37 Anyway, that's not what I really want to talk about.
17:39 What I want to talk about is the top pytest plugins.
17:42 I've been researching a lot of the stuff in here for the testing code season two.
17:49 And I'm relying on with this data.
17:54 I'm relying on the top PyPI packages.
17:56 And this is an excellent resource.
17:59 And it uses BigQuery.
18:01 And there was just a new article from the person that created this, Hugo.
18:08 He wrote an article about what's going on with this.
18:11 A surprising thing about PyPI is BigQuery data.
18:15 And it's interesting and also kind of exciting news.
18:18 So the interesting thing is he's using the free version of Google Big Cloud or BigQuery stuff.
18:29 Whatever you need, a Google account.
18:31 You get a few BigQuery queries.
18:34 And if you do it too much, they kick you out.
18:38 And so at first he started with 4,000 projects.
18:41 Then he bumped up to 5,000 projects.
18:43 And then 8,000 projects.
18:45 But there's more than that.
18:46 So he's like, well, I wonder how much I can do.
18:48 And so this is a little test that he went through.
18:51 I'm going to jump down to the punchline.
18:54 And the punchline is that you can do.
18:58 He went up to tried a million packages.
19:01 And there aren't a million packages.
19:04 But it returned 531,000 packages.
19:08 And it was the same bytes processed as even just doing one for 30 days.
19:15 So it doesn't matter.
19:18 It turned out it didn't really matter how many packages to query.
19:22 What it was was how the date time, the date spread.
19:26 So if he did like five days, it was way cheaper than 15 days, which is way cheaper than 30 days.
19:33 And it's relatively linear.
19:35 So it looks like what he's going to do is change it so that we get like a ton of package data.
19:43 Like as much as we can get.
19:44 531,000.
19:46 But he's probably going to report that in smaller chunks too.
19:51 Because a lot of people are.
19:52 Daily or something, yeah.
19:53 But a lot of people aren't going to want to see 531,000.
19:57 The top 8,000 is probably sufficient.
19:59 You really got to zoom in to see them all at once.
20:00 So I'm excited because when I'm using the 8,000 data set and the top pytest packages,
20:11 there are currently 133 in the top 8,000.
20:15 And I'd like to have a bigger list.
20:17 So if I've got the top 10,000 or 20,000, I could probably get a bigger list of pytest packages.
20:23 Anyway.
20:24 So that's it.
20:27 It's just an interesting thing if you're doing BigQuery data.
20:29 It's the date that is the big effector of the price.
20:34 Right.
20:34 Because it probably counts the number of downloads for each day or per download individually.
20:40 Whereas if there's only 500,000 packages, there's that.
20:43 But there's way more downloads than there are packages.
20:46 The other thing that might change is, I think is going to change, is that it cost more to filter on just pip packages.
20:56 And now we're getting a lot of UV, people using UV to download stuff from PyPI.
21:03 And so he wants to include that too.
21:06 So it'll probably, I think he's going to change it so that the data is from everything instead of just PIPs.
21:14 That makes sense.
21:15 Yeah, it makes sense.
21:15 Yeah, it definitely does.
21:16 Yeah.
21:17 Anyway.
21:17 Awesome.
21:17 Cool.
21:18 Do you have any extras?
21:19 I do.
21:19 Not too many, but let's do it.
21:21 So, oh, and I think it's going to be a good thing.
21:25 Dash secure the project.
21:29 I was speculating.
21:30 I was speculating where, what API it was using.
21:33 He wrote in to say, it's going to be a good thing.
21:38 JSON API, JSON API at present to query for package vulnerability.
21:41 Same thing that pip audit does.
21:42 It does work at it asynchronously to try to make it a little faster, but it's just the simple API there.
21:47 So that's what that is.
21:48 Not something like sneak or some other more advanced threat modeling setup.
21:53 And yeah, that's it.
21:53 That's all I got for my extras.
21:55 All right.
21:55 Cool.
21:56 How about a joke?
21:56 Oh, I've got a joke.
21:57 People, if they like puns and stuff, this is going to be good.
22:00 It's at angle bracket slash angle bracket code puns or codepuns.com.
22:06 So we've all written bad code and I know that sometimes testing will shake out the bugs, Brian,
22:10 but do you know why programmers prefer dark mode?
22:13 I think this is not totally wrong.
22:15 I think we should switch it.
22:17 I think it's a foul fallacy here.
22:19 Why, I guess, I'll read it as it is.
22:22 Why do programmers prefer dark mode?
22:23 Because light attracts bugs.
22:25 I guess if you're talking moths, but if you're talking cockroaches, it's the other way around.
22:29 But here's the thing.
22:30 That's a great joke, but you can click more puns and they just keep going.
22:34 My love for coding is like a recursive function.
22:37 This is not a very good one.
22:38 That's fine.
22:38 Why did the for loop stop running?
22:41 It took a break, semicolon.
22:42 How do you comfort a JavaScript bug?
22:46 You console it.
22:47 I see.
22:48 There's a lot of good stuff here.
22:49 Because console.log is how you debug that thing instead of bug.
22:53 Oh, okay.
22:53 Okay.
22:53 It's the print debugging equivalent of JavaScript.
22:57 I'm not a JavaScripter.
22:58 Okay.
22:58 Well, you certainly can't console your JavaScript bugs when you create them.
23:01 All right.
23:02 Why do you not want to function as a customer?
23:05 Because they return a lot of items.
23:07 Come on.
23:07 Anyway, you can go to codepuns.com and click through until they can't take it anymore.
23:14 Yeah.
23:15 That's probably why you want to see customer because they only return one item.
23:19 Oh, that's true, right?
23:20 Yeah.
23:21 Well, good stuff.
23:24 Yeah.
23:25 Good stuff.
23:25 All right.
23:27 Well, thanks again.
23:27 Thanks for showing up for Python Bytes.
23:30 And thanks, everybody, for listening.
23:32 Yeah, you bet.
23:33 Thanks for meeting you.
23:34 Bye, everyone.