Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #391: A weak episode

Return to episode page view on github
Recorded on Tuesday, Jul 9, 2024.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 391 recorded July 9, 2024. And I am Brian Ocke.

00:11 And I'm Michael Kennedy.

00:13 This episode is sponsored by Code Comments, an original podcast from Red Hat.

00:17 Listen to their segment later in the show. You can connect with us on Mastodon, of course.

00:24 The links are in the show notes. We're all on Fosstodon, but you can get to us from any Mastodon.

00:30 You can also join us if you're listening to it later. You can join us live by going to

00:36 pythonbytes.fm/live, be a part of the audience, or if you just want to see the show later,

00:42 it's all there. It's usually 10 a.m. Pacific time on Tuesdays, but sometimes it changes.

00:50 But if you go to that live thing, it'll tell you when the next one is.

00:53 And finally, I'd really like to thank a lot of the people that have subscribed to the mailing

00:58 list. If you read the newsletter, if you go to our pythonbytes.fm, there's a newsletter link.

01:04 And we'll send you all of the links of the stuff we cover every week, and we'll just send it to

01:09 your inbox. So even if you miss an episode, you won't miss any of the goodness. So speaking of

01:14 goodness, what do you got for us, Michael? >> Well, you know how we all love PyPI and the

01:21 ability to go and just pip install a thing and make our apps be so much more, right?

01:27 Programming in Python becomes like Lego block clicking together and not algorithm class.

01:31 >> Yeah. >> You know what I mean?

01:33 Which is amazing. It's amazing. However, there are issues that you run into if you use third-party

01:39 packages, not the least of which is you now probably should have a virtual environment.

01:43 You definitely require some pip install commands, just stuff all along those lines, right? Just the

01:49 management of setup before you can even run your app. Plus then any potential changes if people

01:55 don't pin their versions, then you're at the whims of what potentially can happen there.

02:00 So what some people do is something called vendoring or vendorizing dependencies. So for

02:06 example, if I depend on some functionality from requests, I don't know if that's a super good

02:11 example, but let's say it is. I could just download the source code of requests, stick it into my code

02:17 and use it. It's probably not a great example because it itself has a bunch of dependencies,

02:21 but stuff that's kind of like pure Python, no other dependencies. You could have it as a

02:25 third-party package, or you could just stick the code of that somewhere to your app and refer to

02:30 it as a relative import within your app, right? >> Yeah.

02:33 >> Then people just pip install your package, pip x install your package, or even just get a script

02:38 and run it or a set of scripts. So there's this project from M. Williamson called Python

02:45 Vendorize that I want to talk about. So it'll vendorize packages from PyPI as I just described,

02:51 right? So it allows you pure Python dependencies to be vendorized. That is, it'll copy that into

02:57 your project, best used for small pure Python dependencies to avoid version conflicts with

03:02 other packages, require different versions and so on. So what you do is you set up a

03:08 vendorize.toml file, and in there you basically set up what is your module, what subsection of

03:17 your module do you want it to go into, what submodule, and then what PyPI packages you want.

03:21 So example here is like a hello package. So they create a dunder, just not an underscore vendor

03:27 folder, and then they say the packages are six. And once you run it, it'll create that underscore

03:33 vendor folder, and it'll put the six distinfo, the dunder initpy, the six.py, all the stuff it needs

03:41 to basically have that there. So then in your app, you can say from dot underscore vendor import six,

03:46 rather than having an external dependency. What do you think?

03:49 >> Kind of neat. How do you keep up? Yeah, I've got questions. Like, how do you keep up with

03:54 updates and things like that? >> Well, I believe that you just run command line, right, just run Python vendorize in the directory, and I don't know if it'll

04:06 re-download it, but it will create it. So worst case, delete the folder, run it again.

04:10 >> Yeah. >> Yeah, and then that'll update it.

04:12 So the whole point of this is like you want less change than normal. You want to like kind of

04:16 freeze it in a place. And things like six that don't really change.

04:19 >> Or have gotten kicked out of the standard library.

04:23 >> Yeah, exactly. A lot of stuff that's really super stable. And it's pretty small, right?

04:29 Because if you do something that's got a bunch of dependencies, you've then got to start doing

04:33 their dependencies, and then it gets really wonky, right?

04:36 >> Yeah. >> For small things.

04:37 >> Yeah, actually, this is pretty cool. Yeah, neat.

04:41 >> Well, that's what I got for you. The first one. I got a bunch of other stuff, as you can see.

04:44 >> No, I like it. There's projects that I've open sourced that it really wasn't intended for

04:52 somebody to actually use as a dependency. It's like some example code that happens to be pip

04:57 installable. But somebody would probably take it and just copy it and run from there. And using

05:04 something like Vendoris would work. >> Yeah, for sure.

05:07 >> Cool. I would like to talk about something not as strong as this, but weaker.

05:12 >> Weak. So weak. >> Some weak references.

05:17 So this is an article from Martin Hines, a guide to Python's weak references using

05:24 weak ref modules. So weak ref is a built-in standard library module. And I actually have

05:31 never played with it. And I kind of knew that Python must have weak references, but I just

05:36 didn't really explore it before now. And this is a great introduction to just talk about what they

05:41 are. So I mean, the term weak reference might be new to somebody that's like, I don't know,

05:49 might be new to you. It's a term that we talk about in like C++ and other things a lot because

05:55 of the using, and I use it a lot in C++, but using strong references and weak references. Python

06:02 also has strong references and weak references. A strong reference is just sort of a copy of

06:07 something. But a weak reference is a way to point to something else, but not muck up the garbage

06:14 collection. So this is a great article. It starts out talking about garbage collection and how

06:25 weak references are used with garbage, how garbage collection and weak references and

06:30 strong references affect that. So why do we care? Well, it is used in things like the logging

06:37 module, for instance. So you have named, this is a cool example because you have named logging

06:43 modules. Oh, where was it? This is an example. Anyway, some named modules that you named a

06:52 logging module, a named logger, sorry. It's been a rough weekend. And then like if you wanted

07:00 another one of the same name, it might be there, it might not. So it's a caching sort of thing that

07:04 how logging uses it. But there's also ways to use it as a, like for trees, if you're building a data

07:10 structure where you might want bidirectional links between objects. One of the objects, it shouldn't

07:18 be really hard links in both directions. So one of those links should be a weak reference, like

07:24 the link between a parent and a child and a tree structure would be good like that. Or other things

07:31 like he talks about, he talked about using an observer, building an observer pattern from the

07:37 design pattern book, using weak references. Just some really cool stuff. I don't know,

07:42 I don't build a lot of data structures. Like there's enough data structures in Python already.

07:47 But if, especially if you're in a, in a CS class or you have some special needs for data structure,

07:54 weak references are built in and they might help you a lot. So.

07:58 Yeah. And they're pretty interesting. The only chance, only time I've really played with them is

08:01 for the Python memory course that I created at Talk Python to like understand, because you want

08:06 to look at stuff and see, we did this, it's alive. We did this now it's garbage collected or now it's

08:10 reference count deleted. But if you have a planner to it, then obviously it's never going away.

08:17 So weak references allow you to ask questions like that. Like, I think you can do interesting

08:21 stuff with caching too. For example, like if you've got a cache and you've handed out an instance of

08:27 the object and it's still alive and people are still using it, the parts of the app are still

08:30 using it. You can have a weak reference to it. And if someone else asks for you, like you can

08:35 upgrade the weak reference to a strong reference, right. And hand that out again without recreating

08:39 it. But if no one's using it, it'll get cleaned up because a weak reference won't keep it around.

08:43 So it's like sort of a self-managing cache type of structure. It could be fun to make too.

08:48 Yeah.

08:48 But that said, I was thinking just like you, I don't usually make data structures these days.

08:53 Python's pretty much got something for you.

08:55 Right. But you know, people are building, well, there's some third-party library data

09:00 structures I use too, and they probably use weak references and I just haven't

09:03 poked into there to find out.

09:04 Yeah, exactly. Let someone else do cool stuff with it for us.

09:08 But I love the idea of the, like the logging module that uses named items,

09:15 doing something like a cache named item thing.

09:17 Yeah, very cool.

09:19 Do you know what else is cool?

09:20 Code comments from Red Hat.

09:22 Yeah.

09:23 This episode is brought to you by Code Comments, an original podcast from Red Hat.

09:27 You know, when you're working on a project and you leave behind a small comment in the code,

09:31 maybe you're hoping to help others learn what isn't clear at first.

09:35 Sometimes that code comment tells a story of a challenging journey to the current state of the

09:40 project. Code Comments, the podcast, features technologists who've been through tough tech

09:45 transitions, and they share how their teams survived that journey. The host, Jamie Parker,

09:50 is a Red Hatter and an experienced engineer. In each episode, Jamie recounts the stories

09:56 of technologists from across the industry who've been on a journey implementing new technologies.

10:01 I recently listened to an episode about DevOps from the folks at Worldwide Technology.

10:06 The hardest challenge turned out to be getting buy-in on the new tech stack,

10:10 rather than using that tech stack directly. It's a message that we can all relate to,

10:14 and I'm sure you can take some hard-won lessons back to your own team. Give Code Comments a

10:19 listen. Search for Code Comments in your podcast player or just use our link, pythonbytes.fm/code-comments.

10:26 The link is in your podcast player's show notes. Thank you to Code Comments for supporting the show.

10:30 This one is "Make Time Speak" from Preyson. Preyson's been on the show before, a friend of the show,

10:36 and also former co-guest, co-host. And the idea is, it's a little bit of a human-friendly way to

10:44 refer to time. You might know about things like, I think it's Arrow, that has a humanized thing that

10:52 says, you know, "five minutes from now" or "in ten minutes" or "just now," those kinds of things. But

10:57 the way this one works is it talks in sort of colloquial way of saying the time. So you create a

11:04 clock object and you give it a language to use, like English, German, Swahili, I think, all those

11:13 things, Dutch, and then you can ask it, you know, "what is 11 colon 15?" It'll say "quarter past 11"

11:20 or a bunch of different times. "What is 7 29?" And well, it says that in Swahili, which I can't,

11:26 I can't get that. I'm not going to get that right. But it'll convert time into spoken expressions in

11:30 multiple languages, super easy to use, pure Python, so you could vendorize it, I guess,

11:35 and so on. Even has plugins. So super easy to use if people want to check that out and play with it.

11:42 This is pretty fun. Yeah, I like it. Very simple. But if you've got a use case like that, you like,

11:46 you have a date time and you wanted to say it in a more human version, well, here you go.

11:51 All right, nice. Yeah. I am going to cover a topic that I get asked all the time. So

11:57 I talk about testing a lot and machine learning and AI is kind of a big thing now. So I get

12:04 questions like, "how do I test machine learning projects?" And I got an answer, I have no idea.

12:09 So I'm excited that somebody made an attempt at this. Here is a article called, "How Should You

12:16 Test Your Machine Learning Project? A Beginner's Guide" by Francois Porcher. So, and it's published

12:21 in the Towards Data Science blog. Anyway, kind of a cool intro, talked about, you know, some of the

12:29 simple stuff. I mean, there is like, how do you test machine learning? It's complex, but there are,

12:36 there's a lot of pieces that are pretty straightforward to test. So cool introduction,

12:41 had a project. This article also includes a repository that you can play with directly,

12:47 which is nice. You just follow along with the code. So this is doing, what is it doing? It's

12:54 essentials of testing with a machine learning pipeline, focusing on fine tuning BERT for text

13:01 classification on an IMDB dataset. So that's just what it's, he's using. He's using pytest and pytest

13:08 Cov, which are awesome things to start with. And so it kind of goes through some of the easy stuff

13:14 right away is, is starting with some of the simple things like, has a clean text function. So a

13:22 function within the source that takes a string and makes it all lowercase and strips it, but it might

13:29 do other things too. But these are, these are great examples. You've in a lot of machine learning

13:34 stuff, you've got a lot of little helper functions along the way. You may as well go test those and

13:39 it'll get you in the habit of writing tests too. In this, in this case, it's just giving, giving

13:43 some examples of, of some, some random text input and what the clean output should look like. And

13:49 these are your expectations of like, if I pop, pop this data into this function, what should the

13:53 output look like? So this is a great way to get started. I personally would have put this in a

13:57 parameterized, but I guess we're trying to teach people slowly. These are really three test cases.

14:03 They could be three test functions, but it works. And so I'm, I'm referring to a test function that

14:11 does a test for capital letter stripping and removing extra spaces and what, how, how it

14:19 should handle the empty string. And this is actually a good point. One of the things they

14:23 test with interviews a lot is the edge cases for testing. So like what, what test examples are

14:31 like derivative small cases that you wouldn't possibly think about. And it's important to

14:36 test those too. Like what does an empty string get cleaned as or good, a good thing. Like if I

14:42 already had the word spaces in lowercase, how would that end up showing up in the output? Things like

14:48 that. So good start. And then jumps up to higher level things. He talks about a larger chunk of

14:57 the script. So he's got a tokenized text, tokenized text function that uses a lot of sub pieces,

15:03 uses the tokenizer with certain input. And how you test that? Well, this is a great example of

15:10 just figuring out really some examples, some example input and what, how you would expect it

15:17 to be tokenized on the output. Looking at the, the length and the shape of the result. And then,

15:23 you know, making sure that, that not all values are, I don't know what this would be. Oh, he's

15:29 making sure that all values are torched by torch tensors. I don't even really know what that means,

15:34 but you know, thinking about what the output should be. If, even if you don't know the

15:39 specifics, you can have some way to describe how it should sort of look. And these are good

15:44 enough tests or they possibly are good tests to have anyway. So I think this is a good,

15:51 good starting point to, to, to start a discussion on your team for how to, how to add testing to a

15:57 machine learning project. Yeah. It's interesting. I really would have no idea how to test machine

16:02 learning. It seems like black box type stuff. So yeah, this is a lot more to work with than I would

16:07 have come up with. I think. Yeah. Just getting started, taking a chunk out of it and then where

16:11 to go from there. So after you kind of have a sense of some of the easy stuff, some of the

16:16 middle level stuff of testing examples and shapes and whatnot, what's left? Well, that's where a

16:23 quick introduction to how code coverage works and looking at what other, what, what the rest of your

16:28 code is doing and that maybe you want to add tests to, or maybe those are the things that you

16:34 manually test or something. So anyway. Yeah. Excellent. Sounds good. Well, those are our

16:40 items. Do you have any extras for us this week? I am out of extras. Clean out. Well,

16:46 don't worry. I'll make up for it for you. Okay. So wonderful news from Authy, you know, Authy,

16:52 the 2FA password thing that you can get for multi-factor authentication. Super nice because,

16:59 you know, so many of the devices are locked to, or some of the apps are locked to one platform,

17:05 like Google, a Google authenticator, you lose your phone or you have to reformat or something.

17:10 Sorry. Good luck now. You know, there's no syncing, things like that. But with Authy,

17:16 you have an account. It syncs it across your different devices. One device can authenticate

17:21 another. If you want to add a new one, it's really nice. Except now Authy is urging users to stay

17:28 alert after 33 million phone numbers were leaked. How? Well, there's an authenticated API endpoint,

17:38 but apparently it would return an error that would indicate whether the phone number that you passed

17:44 in to try to authenticate with was valid or invalid. Like, sorry, that phone number doesn't

17:48 exist. Or sorry, wrong password. Something like that, I think is the deal. And so somebody just

17:53 hammered it with, you know, every phone number combination they could think of and recorded the

17:59 results when it said that phone number exists. And we know that Authy has it. And we know that

18:04 you have 2FA and all of these things. And so from what I could tell, no real information about people

18:09 was stolen. But given that they know you have 2FA and they know that you have, that this is your

18:15 phone number, they can start sending you all sorts of spoof things, social engineering type things,

18:21 right? Yeah. Wow. And Authy recently canceled their desktop apps. You know, Authy being

18:27 Twilio, the parent company, canceled their desktop apps. It just seems like it's really in a kind of

18:32 a state of disrepair and lack of love and a lack of confidence in Michael at this point. So I went

18:39 through the super fun experience of resetting about 30 different 2FA logins. And boy, oh boy,

18:46 I learned some things, Brian. I've learned that some companies make it super easy to reset.

18:52 Because my thought was, look, if this is, you know, what else potentially has happened, I'm going to

18:57 revoke all of my 2FA logins and set new secret keys that will generate new passwords. So even if

19:04 they were able to get a hold of everything in my account, that stuff doesn't work anymore

19:08 effectively, right? That was my plan. And it took like six hours or something, five hours.

19:13 You go to different places and you'll see some of them will let you, some have an awesome button,

19:18 reset 2FA. Here's a QR code you scan. Boom, you're good to go. Others say, your Google

19:25 authenticator is, is enabled. Like what? I don't have a Google authenticator. There's like 50 apps

19:31 that are 2FA apps, T-Mobile and like 10 of the other ones say, use your Google authenticator

19:38 here. Like, no, it is not. It's like, use your internet Explorer six here. Like, no, there are

19:43 other browsers. Please don't just say, use your Google authenticator. Right. But you can just go,

19:49 yep, this is my Google authenticator. It's called something else and it doesn't come from Google,

19:53 but sure enough, I'm going to set this up. Right. And like Christopher out in the,

19:58 in the audience here, that is my next recommendation is, well, if not Authy,

20:04 what? Because Google authenticator is garbage. Like I said, you, if you, your phone gets messed

20:09 up, you've lost all logins forever. There's not a sync, at least last time I use it, there's no

20:13 way to sync it or export it or any of that stuff. Right. That's bad news. So Bitwarden, Bitwarden's

20:18 awesome. This is a premium feature. So you have to have the premium version of Bitwarden, which is

20:23 $10 a year or 80 cents a month or something like, yeah, fine. That's that seems fair. But Bitwarden's

20:28 cool. It's open source, multi-platform. You just scan, scan stuff or enter the code and that they

20:35 give you for the 2FA and off it goes. And because it has a browser plugin, you can just click on

20:40 your name. When it says type in your 2FA code, you don't have to go pull it up. You just click

20:43 the button and boom, it auto fills it, which is great. I don't put it in my one password because

20:48 I'm just not ready to say my 2FA logins and my passwords are all stored behind one single

20:54 platform because then your 2FA is kind of toast if somebody breaks into that. So Bitwarden for

20:59 2FA, one password for logins for me at the moment. What do you think? Well, I'm using,

21:04 maybe I shouldn't tell people, but yeah, I'm using Authy. So are they, they're still supporting it

21:10 on like, aren't they? They're not supporting it on, they used to have a desktop app. They don't

21:16 have that anymore. They have a iPad and an iOS and Android app. Since you have an Apple Silicon

21:22 one, you can run the iPad version on your Mac, just like a desktop app. So it's kind of feels

21:30 the same except for it doesn't have like, like the keyboards behave weirdly and stuff, right?

21:33 Because it doesn't expect you to have a keyboard. Maybe you're using it a lot more than I am,

21:37 but it doesn't bother me to run it on my phone. Yeah. Well, I mean, it doesn't bother me either,

21:44 but I've got, there's like a bunch of different apps that I have. For example, the credit card

21:48 front end system for talk Python courses. It has a remember me button. It never remembers me.

21:55 Never. It has a 2FA. It never remembers the 2FA. So even if I say, remember me 20 minutes later,

22:02 I'm putting in the password and the 2FA. And then 20 minutes later, I'm putting the password in the

22:06 gym. Like, so there's like a few places like that that just constantly ask for the 2FA.

22:12 Digitalization is a little bit like that. Like if you're every single time you're putting in the

22:16 2FA, there's not like a trust this device sort of thing. And so I ended up, I'm probably in,

22:21 used to be in Authy now in Bitward, and I'm probably in that like five times a day, at least

22:25 every day. So anyway, so one final thing before I move off of this, after all of this, they said,

22:31 I don't know how this helps. It doesn't seem like it should help, but somehow they said,

22:35 as part of our recommendation to users, it's very important that you upgrade to the latest

22:40 version of Authy. Why? That because the endpoint and some, I don't understand, but anyway, it says

22:46 you must. And then if you go and look at the upgrade, all it says for the, it says you must

22:50 get version 26.1.0. What does it say here? Bug fixes. Not, this is an important security update

22:56 and you need to update because we're trying to protect your privacy. They're hiding behind bug

23:01 fixes and it's disgraceful. Right? This is bad. So all these things taken together, I'm like,

23:05 you know what? It may be safe. It may be not, but I'm out. Like this is not where my, my important

23:11 things live. So. Okay. Yeah. And OFAC also out there says, Hey, if you're okay with recommending

23:16 paid services, which I am, one password is what I migrated away to from Authy. Yeah. One password

23:21 is awesome. But like I said, I have my logins at one password, so I put my 2FA in Bitward.

23:25 All right. Whew. That was a long extra. That should have just been a thing, right? Extra.

23:29 Remember a while ago I did this article, unsolicited advice from Mozilla and Firefox,

23:32 like your AI, your, your good citizen AI project probably won't save Mozilla. It probably needs

23:41 more than that. And it won't really probably help Firefox either. So let's do some things that help

23:44 them because I'm in principle a fan of them. Well, I said, one of the things that my main

23:49 recommendation was like a privacy focused Google docs. Well, they didn't do it, but

23:53 Proton, the Proton mail people just came up with a, like a Google docs equivalent, but with

23:59 end to end encryption and no AI training and you know, all the kinds of things you would like about

24:05 your data without the negatives. So if you have Proton, there's there's now collaborative docs

24:11 with it, which is kind of cool. And it looks pretty, I think it looks pretty nice. So just

24:14 want to give that a shout out. Does, do you know, I probably don't want to ask, does our Google docs

24:21 open to scanning for AI? Do you know? I believe that the, the freeware versions are, the free

24:29 versions, but the business ones, maybe not, you know, I think your business workspace stuff is

24:34 not open to that, but like your, your personal Gmail is open to like scanning for ads and stuff.

24:39 Whereas the business one isn't. That's so that's the price you pay. Last thing is the code in a

24:45 castle thing I'm doing in October 5th. The early bird discount closes tomorrow. If you listen to

24:51 this right when it comes out. So if you're interested, please check it out. It'd be super

24:55 awesome to spend a week hanging out in Tuscany, doing all sorts of things together. And yeah,

24:59 that's, that's, it's for my extras, Brian. That's a lot of extras, Tony. I mean, Michael. Yeah.

25:04 All right. All right. Shall we have a joke? Yeah. This joke I called, I lied. And it's a,

25:11 it's like a cartoon. It's a woman behind a esoteric programming sort of thing. She's got a

25:18 gun. It says, I lied. I don't have Netflix. Take off your shoes. We're going to learn rust. I just

25:24 thought about like this, all this like rust energy. Like we're converting that to the rest.

25:29 We're rewriting that and rust. It's just like, you're doing rust. That's what the world's doing.

25:33 Sit down. So she invited someone over to like, watch Netflix. Like I lied. I don't have Netflix.

25:39 Take off your shoes. We're going to learn rust. I thought I would catch the zeitgeist. Well,

25:44 right. It's bizarre, man. It's amazing. Okay. I'll, I'll, I'll maybe I'll put that as the,

25:52 the chapter art for this one. Cause the picture, the eyes are amazing. The desperation.

25:56 That's pretty good. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.

25:58 I've resist, I've resisted rust so far. I mean, I'm, I'm, I'm happy that things are getting faster

26:05 and whatnot, but I haven't learned it yet. We'll see. Yeah. Same. All right. Well, cool. A nice

26:11 episode. Thanks for joining on today. You bet. Fun as always. Bye everyone.

Back to show page