Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #240: This is GitHub, your pilot speaking...

Return to episode page view on github
Recorded on Friday, Jul 2, 2021.

00:00 - Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 240, recorded July 1st, 2021, how time does fly.

00:11 I'm Michael Kennedy.

00:11 - And I'm Brian Arkin.

00:13 - And I'm Chris Moffitt.

00:14 - Hey, Chris, welcome to the show.

00:16 - Thank you, great to be here.

00:18 - Yeah, it's great to have you here.

00:19 We've had you talking about the missteps of Excel and how the Python data tools chain can make that better over on Talk Python a few times.

00:28 This is your first time on Python Bytes, right?

00:30 - It is, yes.

00:31 - Yeah, exciting to have you here.

00:33 Definitely, definitely.

00:34 But maybe you wanna go ahead and kick us off?

00:36 - I wanna talk about subclassing today, but Hinnick wrote an article called "Subclassing in Python," and dealing with classes is just everywhere in Python.

00:46 Even if you're not using classes, Python itself has all sorts of classes and objects that you're using all the time, whether you know it or not.

00:55 But when you start getting into larger design, there is a question around composition versus inheritance and stuff.

01:05 So I really like this article that Hinnick put together because I think people should think about the ramifications more.

01:13 So the general gist is he prefers composition over inheritance, and I do too.

01:20 But then it goes through, if you have to do inheritance, and sometimes you do in Python.

01:27 For instance, the greatest example I know of is when you're having exception hierarchies.

01:34 It's really easy to build up exception hierarchies in Python.

01:40 There's nothing there except for the class definitions and their inheritance and that's it.

01:45 >> The easiest class you've ever created, class, exception name, path.

01:49 >> Yeah. But it's useful to do that.

01:52 But then if you want to go further, There's other design patterns and stuff, especially from the C++ world, where people might be thinking, "Well, I want to do something similar in Python and stuff." This is actually a really great article discussion about it.

02:09 It's pretty long. I don't want to summarize it too much, but I'll jump into the three types.

02:13 He talks about three types of subclassing that often happens.

02:17 Subclassing for code sharing, and the short answer is, people are trying to do the dry principle and try to share code.

02:26 It's just it ends up being a bad idea, essentially.

02:30 There's a bunch of references for it.

02:32 I think if you don't believe me or him, read this article and read a bunch.

02:38 He's got a whole bunch of linked articles too that discuss it, but I agree.

02:43 The second type is abstract data types or interfaces.

02:49 In a lot of languages, they're called the interfaces.

02:52 This is a neat use of it and there's a bunch of things.

02:57 I thought, "Okay, yeah, you definitely will use inheritance and composition for data type stuff." But in his discussion, he talks about some of the cool things that Python has that allow you to have these hierarchies without actually doing subclassing.

03:20 there's some cool features of Python, like the protocol syntax that came in recently in typing.protocol.

03:28 >> Protocol is like formal duck typing, which is an odd thing to combine, but yes.

03:33 >> Yeah. But it's really cool how it's put together in Python, so I like that.

03:39 Then lastly is specialization, and that's where the exception hierarchies come in.

03:44 But also, he's got a great discussion about structuring data classes that have common elements.

03:50 I think that's an interesting discussion too.

03:54 I think I already said this, the summary, it's really hard to summarize this article other than, it's good to think about your design, especially if you're going to try to bring subclassing into it.

04:05 So let's do that.

04:06 >> Yeah. Awesome. I haven't had a chance to dive into this article, but I do want to read it and explore it.

04:11 It touches on a couple of things, like it touches on namespaces and modules, which I think is pretty interesting.

04:17 So many people coming from C++, C#, Java, et cetera, like all these really strongly OOP, especially C# and Java, where everything has to be a class.

04:26 You'll see people creating classes just for things like static variables and so on, right?

04:31 Or static functions.

04:32 If you just have a bunch of static functions, you know what works really well for that?

04:35 A module that has functions in it, right?

04:37 The same thing, you import module, then you say module.functionName is the same as from module import class, class.static function name, right?

04:45 Like it's just a layer that doesn't really need to be there.

04:48 So the article touches on that, which I think is neat.

04:51 Like sometimes you just don't need those.

04:53 And then also the composition over inheritance.

04:57 I think composition over inheritance is a really important thing to think about.

05:00 'Cause so often people say, well, you can't use OOP because it's horrible in all these ways and you end up with like a robotic duck that can't quack.

05:07 I don't know, like you end up with these weird situations if you like derive too many things then you put a weird specialty on the end.

05:13 You're like a duck is an animal, but it has wings.

05:17 But wait, now it's a robot.

05:18 Now, why does it eat water?

05:19 It's like what happened to it?

05:20 But the composition allows for you to keep things much more tight and small in the inheritance stack, but still put them together in meaningful ways.

05:28 Anyway, I want to see more about this. This looks great.

05:31 >> I'm coming from a standpoint of, I'm a C++ person as well, and I've done both extremes.

05:37 I've gone way down the inheritance hierarchy thing, and had like seven deep in higher, maybe not seven, but like five deep, and it gets to be a nightmare.

05:47 Then I went to the other direction and didn't do any inheritance at all in a design, but there's problems there too.

05:55 Thinking about it and doing it smartly, you just need to.

06:00 >> It's often like salt.

06:02 I could see it's really good when you have some.

06:04 You try to go like, salt's great, I'm going to have that for dinner.

06:06 I'm like, no, you shouldn't do that.

06:08 (laughing)

06:09 - I think the other thing that's really important about this is depending on how long you've been working in Python, sometimes you kind of get stuck in a rut and you're always doing the same thing and the language has evolved and grown over time.

06:22 And so I think articles like this kind of force you to take a step back and see if you're using all the new features in a way that maybe aren't idiomatic.

06:32 - Also quick comment out there from the live stream.

06:33 Paul says, first time watching the live stream.

06:36 Hey Paul, weird seeing everyone when they say the intro.

06:38 Indeed, it is kind of weird, but I wanna highlight this one to say hi to Paul.

06:42 Thanks for being here.

06:43 But also, if you're listening and you're like, hey, I'd kinda like to see what's on the screen while you're all talking about this, you'll follow us on YouTube.

06:49 There's like a live stream and you write on Pythonbytes.fm, so it's easy to sign up for that.

06:54 And also, Sam out there in live stream following up on this says, "Currently maintaining a library "with a deep templated class and hierarchies.

07:00 "It's very hard to keep track of it all." Yeah, I hear you there, that's for sure.

07:05 All right, let's switch over to the next one.

07:08 Now I tried, I tried to resist this Brian, I promise I did, but I've ended up with an extra extra with seven more extras here all about it and it just had to become a main item because otherwise we'd be here for hours.

07:22 It's not the idea of the show.

07:23 So we've got an extra extra here all about it, nine extras, let's pull them up.

07:28 Action number one, we've talked about Pyodied.

07:31 I had a whole talk Python episode on Pyodied which is an interesting thing.

07:35 It is this project by Mozilla where you take Python and you run it in the browser and then you take many of the data science packages like NumPy and Matplotlib and stuff and compile them into the browser.

07:47 And then you basically have client side Python data science, which is really interesting.

07:52 This project is being spun out as its own topic, as its own project, it's no longer under Mozilla.

07:58 Usually that doesn't sound good to me.

07:59 It kind of sounds like it's been orphaned.

08:01 I have no idea what the status of PyOdied is.

08:04 people can check that out, but it's no longer under Mozilla.

08:07 It's its own separate thing, as they say.

08:09 So it's cruising out there.

08:11 And also, it didn't get compiled to JavaScript.

08:13 It got compiled to WebAssembly, which is interesting 'cause that's faster.

08:17 All right, that's number one.

08:18 Number two, I just, as in a couple hours ago, released a brand new course, Python-Powered Chat Apps with Willio and SendGrid.

08:25 So the idea is if you wanna have some kind of chatbot, but a lot of that conversation has to involve your database and your data and verifying things.

08:34 Like the app that we built here is a tech savvy bakery where you can order cakes by sending it a WhatsApp message.

08:41 And then it'll say, "Hey, you want a cake?

08:43 Well, here's the menu." And it actually gets the menu from our Flask app.

08:47 And then they pick something off the menu.

08:48 And then once they pick all the details, they said like this, "Okay, great." We send it back to our website and figure out how much that's gonna cost.

08:56 They order it, goes back, we send them a, If once they accept it, they get like a customized pretty email, goes back to the backend, the bakers bake it, it sends them another message to let them know.

09:06 So if you want to build kind of like that workflow, if Willio and Syngrid check it out, this course is super fun.

09:12 It's six hours and it's a hundred percent free.

09:14 So, people can check that out.

09:16 That I think you're trying to build that kind of thing.

09:18 That'll be a lot of fun.

09:19 So links in the show notes there.

09:20 Oh, I had something.

09:21 Yes.

09:22 If I can't afford free, can I get a discount code?

09:24 Yeah.

09:25 I will give anyone listening 50% off that.

09:28 So I have this really cool tweet and Twitter is broken from what I can tell for everything that's not the homepage.

09:35 So something went wrong, but let me describe it.

09:38 So when you look at it in the show notes, you'll be able to see there's a really cool tweet from Nick Moll, who was on the guest on the show last week.

09:45 Oh, you got it.

09:47 How can you get this to work?

09:48 You've got some sort of magic.

09:49 All right.

09:50 Well, so thanks for putting on your screen.

09:51 So here we have Will McGinn, showing an animation of basically this really cool, like collapsible sidebar and like scrolling within sub-windows inside of Textual.

10:06 We talked about Textual as well.

10:08 It's just such a cool graphic that says like, "Wow, you can build some pretty amazing applications there." What do you think?

10:14 >> Will's just docking out of the park with this.

10:16 It's fun to watch him go so fast.

10:19 >> Absolutely. Well done there, Will.

10:21 I'll switch back to mine for a moment.

10:22 Okay, Ars Technica works on my computer.

10:25 So remember we did an episode and I titled it something like Flock No or something like that.

10:30 (laughing)

10:31 So Flock, Federated Learning of Cohorts, is something that Google was trying to do so that they can replace third party cookies.

10:40 Why?

10:41 Because people are running ad blockers or like I am right now, a VPN that at the network level blocks all the ad tracking and third party cookies.

10:50 So they're just basically not working very well anymore.

10:53 So they need to, they're gonna cancel third-party cookies from which means they're canceling for the net, the internet.

10:59 And, but because they're Google and they're based on ad revenue, primarily, they can't just go and we canceled tracking, hooray, we're all winning on privacy, right?

11:09 It has to be replaced with some other form of tracking, which they call this federated learning of cohorts.

11:15 But the federated learning of cohorts has all these almost more negative consequences.

11:20 And I don't wanna go too much into that because we went into quite a lot of detail.

11:24 But for example, you can say, I would like to target lesbians who just got divorced.

11:29 You run an ad on that, people show up on your site, they sign up, you have an email, and now guess what?

11:35 Not only do you know what their email is, you know that they're in this group and maybe this is the very first time you've ever met them, right?

11:40 So really weird, creepy stuff that you could like pull out with this.

11:45 Anyway, the big news is Google delays the rollout till 2023 because you know what?

11:49 People don't like it.

11:50 They're not super keen about it.

11:52 So there's a whole bunch of people who are against that.

11:57 - You're saying they're just delaying it, not stopping it?

12:01 - Yes, for now.

12:03 This is a great RSTack article that people should check out.

12:06 Let me read the first sentence or two.

12:08 Google's plan to upend web advertising and user tracking by dropping third-party cookie support in Chrome has been delayed.

12:14 Most browsers block third-party tracking cookies now, as do VPNs, like I mentioned.

12:19 But Google, the world's largest advertising company, it wasn't going to follow suit without protecting its business model first.

12:24 But there's a lot of challenges with this.

12:27 A lot of people have come out against it.

12:29 And yeah, it's not gonna work out super well.

12:33 So they decided to delay it.

12:35 That's what they said.

12:36 Stage two starts mid-2023.

12:39 - Google says it's received substantial feedback.

12:43 - Including from us.

12:44 And other companies out there are like, We kind of want to keep tracking too, but we're not really excited about this.

12:52 So we're just going to not say anything like Apple, Opera, Mozilla, Microsoft.

12:56 Yeah, they're like, "Ah, we're not so sure about this." Anyway, yeah, they've received substantial feedback.

13:01 So hooray, I think for now.

13:03 One thing that we don't talk very often about in Python is what if you want to ship your code to somebody and it has sensitive algorithms in it, right?

13:12 It's not that common, but you could get Py2exe or Py2app bundle up your code and give it to somebody.

13:17 For example, Dropbox does something to this effect, right?

13:20 But your Python code running up in your little menu bar, there's other ones as well.

13:23 But you might want to encrypt how that works or protect how that works.

13:27 So people can't just open the PY files and look around.

13:30 So there's this thing called Source Defender.

13:32 I'll be clear, this is a paid commercial product.

13:35 I have no affiliation, but they pointed, they sent me a message, "Hey, we're doing this thing, what do you think?" It looks kind of interesting.

13:40 I think it's gonna be a pretty limited set of people who actually care about this.

13:44 Like if you're running on Docker, you're running on the server, you probably don't care, maybe you do, but probably not.

13:49 But if you'd like to be able to encrypt your source code, so it's much harder to see, and then ship that to somebody, you can use this thing as part of their paid service.

13:56 So that's kind of cool.

13:58 People can check that out.

13:59 Let's see, oh, there's a plate noise, I don't want that.

14:01 So I was recently interviewed on a day in a life in a work from home Pythonista, which is a cool series being done by the folks in the Philippines.

14:09 If you want a tour of the behind the scenes studio and all the work from home stuff, people can check that out.

14:13 Python 3.9.6 was just released.

14:16 We can check out the change log and see what's happening there.

14:19 There's a security HTTP client about what I think is like a denial of service.

14:25 It sounds like it avoids an infinite loop sort of thing.

14:29 So that might matter to people.

14:31 Probably not, but maybe it does.

14:33 Then a bunch of changes that are happening here, including platform specific ones.

14:38 So if you're running Python 3.9, and why wouldn't you be? Update that.

14:42 >> Because you're running 3.10.

14:43 >> Yes, that's right. You're already ahead of the world.

14:45 You're in the future.

14:46 So also we had Calvin on from six feet up a while ago and we talked about the conference that he was putting together.

14:55 Well, the videos from that conference are out as a YouTube playlist.

14:58 So people can check that out.

14:59 I don't remember how many videos there are.

15:01 Let's click on it and see.

15:02 There are 61 videos, including one on the Python memory deep dive talk that I gave.

15:08 So if people wanna check that out, they can.

15:10 Let's see.

15:11 Oh, this one.

15:12 Check this out, Brian.

15:13 Have you seen this?

15:13 Did you know you can pip install Python bytes?

15:15 Yeah.

15:16 [laughs]

15:17 You can literally pip install Python bytes because of...

15:21 Scott Stoltzman created this for us as a joke.

15:25 He was listening to one of our episodes.

15:26 I can't remember what we talked about.

15:28 This was episode 239, but we must have talked about packaging and pip and things like that.

15:33 So he created a package called Python bytes.

15:35 And what it does is basically you give it a number like 240 and it would download this version as an MP3 file and put it right next to whatever the working directory is.

15:45 If you want to install Python Bytes and then PythonBytes.downloadEpisode instead of using a podcast player, we're all for that, you can check that out.

15:54 Yeah, that's it. That's extra, extra, extra, extra, well, many, many extras.

15:58 >> The Python Bytes package was for fun, but it also is really small.

16:04 One of the things I like about it is it's just a really cool example of like with Python, you got something that downloads MP3 files off of a feed somewhere.

16:15 It's that easy.

16:16 It's just, that's pretty cool.

16:17 - Yeah, that's fantastic.

16:18 Absolutely. - Absolutely.

16:19 - All right, let's see.

16:21 A couple of things from the live stream.

16:23 Sam says, "Things have happened with Mozilla "the last two years that really shook "my confidence with them." I am still a big fan of Firefox and I support their mission, but yeah, I wanna see them succeed.

16:33 Let's see.

16:34 Another one from the live stream.

16:36 Antonio said, "Hey guys, have you mentioned Kivi before?" Hey, GUIs and Kivi, there you go.

16:40 I watched a video about it this week.

16:42 It's a GUI that's compatible with many things, including the mobile devices.

16:45 I do, my feeling is that Kivi is a lot about, it's more about building almost game-like interactions, whereas a lot of GUIs people want, they want like, here's a text box.

16:56 I type in the text box, here's a button I drop in.

16:59 But yeah, pretty cool.

17:00 Well, let's see.

17:01 Kim Fenwick says, "As an aside, shipping a Docker image won't obfuscate the Python.

17:06 The image can be taken apart and files like that.

17:08 That's true.

17:09 They absolutely can.

17:10 I was just thinking like, you're probably just running on like a container service, but yeah, if you're shipping it to someone, it's the same Nick Harvey on the live stream says, could just send the PYC files with no, PY.

17:21 It's not foolproof, but it does require more work.

17:24 You're right.

17:24 You'd basically be down to like this.dis and like reading the bytecode.

17:28 Yeah, for sure.

17:29 let's see.

17:31 - Final one, Rayhan says, if it ends up running code on your machine, you can read it.

17:36 It's about putting enough barriers that people won't bother.

17:38 Yeah, that's definitely true.

17:39 I mean, you think of C++ and things like that being completely opaque and yet people take that apart all the time.

17:44 But there is also a difference from, I'm literally shipping you the source files here, to, you know, 'cause then you could go in like, oh, here's where the license check is.

17:53 Let's just, you know, command slash comment that out.

17:55 All right, now we're ready to run.

17:57 - Yeah.

17:58 - Right, you wanna make it a little bit of a challenge at least, I suspect.

18:00 Anyway, thanks for all the feedback out there, everyone.

18:02 that's the everything extra, extra nine times.

18:06 All right, Chris, what's your first one here?

18:08 All right. So the first one is from Andreas Kahns, I think is how you pronounce it.

18:15 And it's a library called Klib, I believe.

18:18 I wasn't sure if it's K-lib or Klib, but I think it's Klib.

18:21 And it's for automated cleaning of pandas data frames.

18:26 I guess I should even say it's a little bit more than just cleaning, it's automated analysis.

18:30 And I'll be the first to say I'm a little skeptical about some things that try and automate the process, but I was playing around with it.

18:39 And there's some pretty cool things that it does.

18:42 The documentation, probably the best way to learn about it is the Towards Data Science article that he wrote, which gives a pretty nice overview of what it does.

18:54 It has some, as I mentioned, some pretty nice cleaning features, as well as analysis features.

19:01 So I was going to kind of go through a couple of the, describe a couple of things.

19:07 The first one that I thought was really interesting is there's this function called data cleaning and it essentially does, you can control what it does.

19:18 So it can clean the column names, it can convert data types, it can drop missing.

19:23 So one of the things that Pandas does is It's not really aggressive about the data types that it uses.

19:32 So when you read in data, it will just kind of assign it maybe to a float or an object.

19:40 But if you want, you can get in there.

19:42 And if it's a value, if it's a column, let's say that it has only values from less than 100, if you convert it to an integer, it saves memory.

19:52 If you save enough memory, then you can actually speed up your code.

19:55 And so this goes behind the scenes and takes your data frame and converts it essentially to the smallest value, NumPy value that it can store.

20:05 And then, you know, I took a random dataset and sure enough, it did reduce the memory footprint quite a bit, which I thought was pretty interesting 'cause it's one of those things that is very tedious to do on your own by hand.

20:18 - Does it do, like if you have the same string, does it just create a pointer to one copy instead of having that many times, stuff like that?

20:24 - It can do that by converting it to a category type.

20:28 That's essentially what Pandas is doing when you create a category, it does that to kind of string to like a list conversion.

20:36 And it's pretty effective.

20:38 And yeah, I've used the category piece before, but I haven't actually gone in and tried to shorten up the numeric columns, which is really useful.

20:49 The other thing-- - Can you just convert them to all the integers and then it'll just be shorter so you don't have to worry about the size?

20:54 I'm just teasing.

20:55 - Yeah, yeah.

20:56 - You probably knew that.

20:57 - Yeah, yeah.

20:57 No, no.

20:59 But I mean, it does even do, it's like, it can do even like int 16s or int 32s or--

21:04 - Oh yeah, interesting.

21:05 Like it'll shrink to the size that'll like, oh, these are all under 256, so we'll go to like one byte.

21:10 - Exactly, exactly.

21:12 You know, and I haven't looked at the code to see, you know, how it actually figures it out, but I had a fairly large data frame and it was pretty quick.

21:19 The other one that was interesting is the clean column names.

21:22 So I think there are some other libraries out there that will like strip spaces or special characters from column names.

21:29 But what this one will actually do is actually, if you have a column name that has, let's say camel case, it'll convert it to all underscore, or it will just essentially normalize all of your column names, which you could have a debate about whether you wanna do that.

21:47 But when you have a data frame that has a lot of columns and you're just looking at it the first time, that can really be helpful.

21:54 And then the other function that it does that works pretty well is for cleaning duplicate data or empty data.

22:04 So if you have a lot of columns that have no values in it or just maybe 90% of the values are empty, you can set thresholds and just clean that all out.

22:15 So I was playing around with it and I was pretty impressed and I kind of wanted to call it out because the documentation right now is mostly around the Jupyter notebooks that he has.

22:29 So I think it would be nice if we could get some more docs in there and some more examples.

22:36 But overall, I was really impressed with the library and I think people should kind of take a look at it and see if it's something they wanna use for some of their own processes.

22:47 - Yeah, some of them sound interesting, even if you don't have to trust it, right?

22:52 Like the shrink the smallest data set, data type, for example, or normalized column names, those don't seem as risky as, you know, clean it up, find the wrong data.

23:03 - Exactly, and then I forgot to mention, it also has some nice correlation plots.

23:08 And some of these things you can already do with Seaborn or Matplotlib, but I found that it gives you a little more control and it's just a little bit easier to do it.

23:18 There are certainly other tools out there that do this as well.

23:22 Oh, and then the categorical data plots, I thought was a nice summary of the data and gives you some nice graphs and it helps you understand where you've got some missing values.

23:35 But yeah.

23:36 - Yeah, visualizing the missing data is a really interesting feature.

23:39 - Yeah, and there is another pandas data frame called missing no that does this and does it well.

23:44 But I think this is a unique combination, especially some of the data, the memory saving features that it has are pretty neat.

23:54 >> The cleaning features though, there's a lot of parameters to it.

23:58 It looks like you have a lot of control.

24:01 Again, this is open source, so it isn't that magical.

24:05 You can just look at the source and see what it's doing.

24:08 >> Exactly. Yeah. That was one of the things I was looking at is data cleaning I think is the top level and you can just run that wide open and it'll do everything.

24:17 And it actually prints out a pretty nice summary of what it does, but you can also go in there and specify parameters, like you said, to control it so that maybe it doesn't rename the columns or drop some of the missing data.

24:31 The other thing that I tried to play with that seemed really interesting is this pool duplicate subsets.

24:37 And essentially what it tries to do, and I had a little bit of trouble with this 'cause I think I put too much data at it, but it tries to, maybe if you have 10 columns of data, it says, well, you know what, four or five of them are very heavily correlated, so we're gonna drop them and just give you the four or five that are actually most useful.

24:59 And so I think that's some interesting tools to use when you get some data that maybe you haven't worked with before.

25:05 - Yeah. - Yeah, very nice.

25:08 What a good find.

25:08 And Brian, you got the next one?

25:10 - Sure, yeah, just a second.

25:13 I wanted to remind people to every once in a while, look at FuncTools because I've experienced FuncTools as an interesting library that's built in.

25:28 It grows with you.

25:30 If you're new to Python and you look at it, it's going to be confusing.

25:34 It's like all intermediate stuff in there.

25:37 But as you learn and experience more Python programming, come back to it every once in a while because there's stuff in there that you'll use that you didn't think about before.

25:49 I'm going to go through a few things.

25:51 Actually, I wanted to call out, there was an article by Martin Hines that I read that reminded me to go through and look this.

25:59 So I want to shout out to him. Thanks.

26:02 We've talked about some of this stuff before.

26:04 we talked about function overloading and using single dispatch as one of the ways you can do function overloading in Python, which is cool, and that's part of FuncTools.

26:15 Hopefully, people are familiar with wraps.

26:19 Wraps is a way to create decorators that act like the thing that you decorated.

26:25 If you're writing decorators, make sure you check out wraps.

26:29 Then caching as well, I'm sure we've talked about LRU cache.

26:33 >> I'm sure we have, yeah.

26:35 >> Yeah. That's in FuncTools, the caching.

26:39 New in 3.9, there's just a simple cache.

26:44 You don't have to say LRU cache.

26:46 It's just a convenience wrapper around LRU cache, but there's no max size.

26:52 You don't want to do that for things that you actually want to throw items away.

26:57 But caching is super cool. Check that out.

27:01 >> When I first saw the LRU cache, I'm like, "Whoa, I got to go figure out what this LRU is and it's not like rather than just like, cache the response.

27:08 I guess the other question though you might be is like, well, what if you pass two variable or two arguments or sets of arguments?

27:14 How did those? Yeah. So either way, it's not 100 percent totally obvious what's going to happen.

27:19 Yeah, it's very cool.

27:20 >> Yeah. So there's a bunch of caching stuff in there like the LRU cache, but then you can also cache a property.

27:26 Actually, the property one I hadn't used before, but I was playing with it this morning and it's really cool.

27:33 For instance, if you've got a data class or any class that has a bunch of stuff, and you have an expensive read on one of those because you have to calculate the value, you can throw a cache property on it and it looks pretty cool.

27:54 One of the neat things about it is, it only reads it once and then it caches the value of the property.

28:01 If you need it to refresh, you called delete on it, which is weird but cool also.

28:08 But it's odd to call delete on something that you want to still be there, and it'll just reread it next time.

28:14 That's how that works.

28:15 >> That is weird. That's definitely weird.

28:18 >> Total ordering, I didn't realize was there.

28:21 If you have some data type that you want to be able to compare, you can use total ordering to define equal and one other operator, and then you get all of the comparison operators show up.

28:35 You can use that. Then the last one I wanted to highlight is partial and partial method, which these are neat in that, let's say you've got a function that takes a whole bunch of arguments, but you want to pre-fill some of those in and create a new function that has some of the arguments pre-filled in.

28:57 that's a, you can do that with this and pretty neat.

29:00 >> Yeah. Okay.

29:02 Interesting. I see you partially supply some of the arguments but not all of them.

29:05 >> Yeah. Just a shout out to this, that these are intermediate or advanced topics, but as you learn more Python, come back to this every once in a while and you might find it useful.

29:22 >> Yeah. Indeed. How did I miss this hashed property thing?

29:27 Like surely I would have paid attention to that because what so often these properties that are like computed things, but they you know, often don't change.

29:33 You get something back from the database, you want to it has time sorted in seconds, you want to know how many days it is so something happens, you might have a days property, right?

29:41 But that's probably not good.

29:41 So having that cache is cool if you're sure it's not going to change.

29:45 But I'm like, how did I miss it?

29:46 It's new in 3.8.

29:47 So it's not it's not super old.

29:49 And like Chris said, one of the reasons to revisit a lot of the these things and pay attention to the news on Python is because the language changes like this.

29:59 >> Yeah, for sure. Kim out there in live stream says, "Also worth looking at inner tools from time to time." >> Definitely.

30:05 >> Great.

30:06 >> Indeed. It's in the same level of complexity, but for collection.

30:10 It's like that. You wouldn't first go there, but eventually like, "Oh yeah, this is what I wanted. I just didn't know it." Speaking of things you didn't know it, let me scare you all a little, make you all delighted. I don't know.

30:19 You tell me how you take to this.

30:20 So let me set the stage.

30:22 GitHub has a little bit of source code.

30:25 much of it actually public, right?

30:26 Like it's public repos and whatnot.

30:29 So it can be analyzed and talked about and shared or used to train an artificial intelligence, which is pretty crazy.

30:36 And if you look at the artificial intelligence around text, there's the GPT-3 stuff, which is like scary, good text-based AI.

30:43 Well, they decided, what if, you know, our parent company also makes this editor?

30:48 What if we did an AI based on understanding the source code from GitHub, like all the source code from GitHub and put it into VS Code and then it did stuff.

30:59 Have you all seen this?

30:59 It's called GitHub Copilot.

31:01 - Yeah.

31:02 - Yeah. - Yeah.

31:03 - I haven't tried it yet.

31:04 - I was gonna put the link in there and you beat me to it.

31:06 - Oh yeah, I was on top of it.

31:08 So if you go over here, there actually works for TypeScript, Go, Ruby, Python, a couple other languages.

31:15 It says it works for many languages, but it's best on those, of course.

31:18 But if you just look like at their homepage, the copilot.github.com, they've got this little animation and it says, I'm gonna write a function that says parse expenses and it takes some kind of text.

31:29 And you put a doc string, literally a doc string in Python.

31:32 It says parse the list of expenses and return the list of tuples, date, value, currency, ignore lines starting with hash, parse using date time.

31:40 Here's some examples, tab.

31:42 And then it writes the code that does that.

31:46 And let's see, what is it gonna do?

31:48 It says, it's in the middle of animation, It creates a list of expenses, it goes through each line on split.

31:54 It says if the line starts with hash, this is all Python code, continue on your loop.

31:58 Otherwise, date value currency equals split it, and then it knows how to parse the date line, convert the value to a float, and then store the currency as a string.

32:08 And it's not just that sometimes it'll do this, you can actually get alternate implementations by tabbing through its recommended solution, which is pretty crazy.

32:17 So this is powered by open AIs, it's called Codec or something like that.

32:24 I don't see it right here right now.

32:25 Anyway, I'll probably run across it in a second.

32:27 That's what it's powered by.

32:28 It says things like, "You're the pilot.

32:31 So with GitHub Copilot, you're always in charge.

32:33 You can cycle through alternative suggestions and choose which to accept or reject and then manually edit the suggested code." Oh yeah, and it learns from you.

32:42 So I don't know, this is wild.

32:45 This is pretty wild stuff here.

32:47 What do you think?

32:48 - I think it's really impressive.

32:51 I mean, it will be interesting to see what it's like when you use it in real life.

32:56 And I think that there could certainly be limitations, but I don't know about you, but whenever I'm programming, there's always these things I just need to go and look at the documentation or look at stack overflow to refresh my memory.

33:09 - Like I gotta connect to SQLAlchemy and I totally forgot how to do those three steps for that connection string sequence, right?

33:15 Exactly yeah and i've seen i saw on twitter where someone was throwing a little shade at that example you're walking through because i said well why are you storing the currency is a float should be a decimal because of the store currency is a float you're gonna have all the rounding issues so well that's how superman makes all his money or the evil villain in superman what was it one of the yes Yeah, Richard Pryor and one of the original Superman.

33:41 Yeah.

33:42 Yeah.

33:43 And it's not just based on the doc string.

33:46 Like the example I first spoke about was you wrote complex doc string and then say do that thing but you can do it based just on function name.

33:55 You can just type a meaningful function name.

33:57 What was the example they used?

34:00 I can't remember.

34:01 But yeah, so you basically just write a doc string, a comment, a function name, or even and some code to give them like more context to it, and then off it goes.

34:10 So yeah, pretty neat.

34:11 Codex, that's the name of the AI system behind it.

34:14 So basically this is a plugin for VS Code, but a really nice one.

34:19 So here's some examples we'll all be familiar with.

34:21 So fetch tweets.

34:22 And the example here is you literally write def fetch_tweets_from_user tab.

34:30 And then what it auto completes with is, oh, you're gonna need to pass the username in, and then here's how you authorize with Tweepy, set up the API credentials, and then here's the code you write.

34:39 Oh yeah, and here's your return.

34:41 Or I wanna do a scatter plot, and you write import map plot lib.pyplot as plot, draw scatter plot have, and then boom, there it is.

34:49 Or memoization, I wanted to point this one out 'cause of what you're covering, Brian.

34:53 It says, oh, here's how you memoize a function, which is to, if it's passed a set of arguments, it's always gonna return the same answer, so just give that answer.

35:01 Like, remember, these arguments equal this return value once it's run, And it shows how to create a complex decorator that is gonna have a function that remembers the values using caching.

35:11 It could just go @bunctools.cache.

35:15 You know what I mean?

35:16 So there's things like that that is missing, right?

35:18 'Cause you could achieve the exact same outcome with bunctools.cache.decorators, right?

35:23 Instead of trying to write a bunch of code that re-implements that.

35:26 But anyway, pretty wild thing.

35:28 I don't know really how to feel about that.

35:29 I've been thinking about this today.

35:30 It's kind of freaking me out, but it's also kind of cool.

35:33 >> Yeah, I wanted to point out a comment that people have been pointing out with relation to this is the, I wish we could just specify what we wanted the computer to do, and it just does it, and we already have that, it's called code.

35:50 >> Yeah. People often say things like, I remember hearing this 20 years ago.

35:58 This low-code thing where you create these little boxes that do stuff and you drag and drop between them, We're not going to need programmers anymore.

36:04 We're all just going to become dragger droppies.

36:06 And then like you programmers won't be needed.

36:08 The business people will just drag you, drop you their way the future.

36:11 And that never ever happened.

36:13 Right.

36:14 Because people got to put them in production.

36:16 They've got to debug them.

36:17 They've got to scale them.

36:18 And so on.

36:19 Yeah.

36:20 Yeah.

36:20 I think the same thing here, like sure.

36:22 It wrote it once, but you can't have a right only experience for your code.

36:26 You have to understand your code and be able to evolve your code and work with.

36:30 This might power you into a solution faster, but I don't think it escapes the need of people doing meaningful software work.

36:37 >> The person that pointed out, and several people pointed out the example of using money, of floats and money, that does highlight one of the problems with something like this though, that everybody needs to be careful of is, the code that's generated, now you were already carefully thinking about it when you were creating it, but if something else creates it, you've got to scrutinize that to make sure that's really doing the right thing.

37:03 You're code reviewing some AI code while you're coding your own stuff.

37:09 It's just a different part of your brain.

37:11 You got to make sure that you're really paying attention.

37:13 >> Yeah. Even I was looking at that Matplotlib example, and I would even argue that's not really the way you should do a scatterplot in Matplotlib because you should use the object-oriented interface in Matplotlib.

37:26 The code will work, but I wouldn't advocate that you use that code.

37:30 So to your point, I think it will be interesting to see if it does learn on your own coding style.

37:38 So does it start to recognize those things that you're always, like you said, connecting to a database or fetching a file or doing a certain pandas function?

37:47 Will it start to learn that?

37:50 >> I thought I read something about it adapting to you and learning from what you're doing, but I have no idea what that actually means.

37:57 >> Yeah, hopefully it's paying attention.

37:58 So if it generates something and you change it to the different method, and everybody else is doing that also, maybe they'll stop suggesting the old one and start suggesting the new one.

38:09 >> Yeah. Chris, your point about having to, maybe it's you Brian, sorry.

38:13 Whoever said about you've got to criticize this and you didn't write it, so you basically have to study it and then understand or understand it and study it to make sure it's doing the right thing.

38:22 I, a couple of years ago, I don't know, a while ago I was river floating and broke my hand on some rocks, broke my finger in a bunch of places, and like my fingers were completely wrapped up all the way to the very tips.

38:35 There was no like, oh, little pecking typing while my hand healed.

38:38 It was like, nope, no one handed, really slow.

38:41 So to keep things going, I used voice to text to try to like at least keep email flowing for a month or something, you know?

38:49 And what I found was I could write pretty decent emails.

38:52 It's hard to like stop and think in whole sentences the way the little tools like it to work, but you can get it to work pretty well.

38:58 But the mistakes it makes that are phonetically correct, but actually what you mean wrong, like they and they, or, or something that sounds like what you said, but it's actually not what you mean to say is incredibly hard, it's much harder to understand and edit than you would think.

39:14 And so things like this, like, well, I wanted it to do that and I hit tab and okay, it's doing, I feel like there's going to be a lot of blind spots.

39:20 Yeah.

39:21 well, it did what it says it did and I typed the thing and it seems right.

39:24 And like, how do you really, really know?

39:26 I, it just seems like in the same type of situation, it's going to be harder than normal code to check because you didn't have to think through it to create it.

39:34 You know?

39:34 Yeah.

39:35 A couple of comments from the live stream.

39:36 Ray Han don't, don't give them ideas.

39:39 So, Dr.

39:42 Falcon, gosh, I worry if you in, let's play thermonuclear war as a doc string.

39:49 And Nick says, I can't help but think of Microsoft Tay, which Microsoft Tay was this really cool bot that was super good at adapting to stuff and they put it on Twitter, but people decided to be mean to it instead of teach it.

40:01 I think in Japanese Twitter, it became a very kind and intelligent bot, but on English Twitter, it got turned into a racist, horrible creature right away and they actually had to cancel the project, so yeah.

40:16 And then Arthur says, "Next April Fool's Day prank, everyone start writing terrible code that influences AI." And this is why English Day went down the tubes.

40:27 (both laughing)

40:30 Let's see.

40:31 And then Sam, "For goodness sakes, don't trade it on GitHub code.

40:34 It'll arbitrarily turn on debug mode." Yeah, perhaps.

40:38 Yeah, Kim thinks this is both very impressive and vaguely unsettling.

40:42 And that captures what I was thinking.

40:45 - Rayhan, will it go and talk to the marketing people for me?

40:48 (laughing)

40:49 - I'm good with people, that's what I do.

40:52 - Yeah, okay.

40:52 (laughing)

40:54 Another thing that's not mentioned here explicitly, but I think is interesting is, this code is coming from GitHub, yeah?

41:01 When I go and I'm saying like, I'm working on super secret commercial project for large organization that has lots of people trying to scrutinize it, and I hit memoize tab, it's gonna write some amazing code.

41:15 Oh, by the way, was that GPL?

41:16 Where did that code come from?

41:18 Right, like what's the license of the code that was on GitHub?

41:21 Did I just now all of a sudden grab something that turned, you know, like if I was doing this on Windows and I hit tab, is Windows now open source?

41:29 I don't know.

41:29 - That's a really interesting point.

41:31 And you would think if it was a small startup, someone will probably sue them, but you know, this is Microsoft now.

41:37 - Yeah, exactly.

41:38 Yeah, yeah, yeah.

41:40 Anyway, so I agree with Kim.

41:42 This is both very impressive.

41:43 If this is the start, like where will it go?

41:45 It'd be very amazing, but it's also vaguely unsettling at the same time.

41:49 And I don't know how I feel about it, other than I wish it was in PyCharm so I could play with it more often.

41:54 (laughing)

41:55 All right, Chris, you got the last one?

41:57 - I do.

41:59 So this is another library called Cats, and it's a time series analysis library, and it's made by the same, well, it's from Facebook.

42:10 And a lot of people may have heard of Profit, which is a library for time series forecasting.

42:16 And one of the things that's interesting to me about profit and cats is I think time series forecasting is something that's really common in the business world.

42:27 I mean, you think about trying to forecast sales or maybe inventory movements or stock prices, a whole bunch of different use cases for it.

42:37 And I think in general, most organizations don't have a group of PhDs that are really sophisticated in their analysis.

42:46 So people use Excel and kind of come up with their own approaches.

42:51 And that's why I thought Profit was interesting.

42:53 And I think this is interesting because it does come from Facebook and you have to assume that they've got a lot of smart people that are doing a lot of forecasting.

43:02 And they've taken some of the things that Profit was good at and added some additional tools.

43:09 So before I go too much into cats, one thing I wanted to mention is I did write an article about profit, but I think other people, this gentleman, Peter Cotton, wrote an article about profit and essentially questioning how good it was.

43:29 And this is a really long, really well thought out article and some of the math and some of the concepts are way over my head, but I do encourage people that you're looking at time series forecasting, take a look at this.

43:42 But what CATS does is instead of just doing forecasting with profit, it has a couple of different models that you can use.

43:51 You can also do some more just basic time series analysis with it to detect seasonality patterns and change points and other trends.

44:00 There's also, if you want to incorporate this in some of your other machine learning algorithms to pull out features from your time series data, you can do that with this library as well.

44:11 And there's a whole bunch of other libraries or utilities to build like ensemble models and other approaches for time series forecasting.

44:21 This is another one where it is relatively new.

44:25 So there's not a whole lot of documentation, but it's a whole bunch of different Python notebooks, Jupyter notebooks, I mean.

44:33 And like one of the things I think is interesting is from a forecasting perspective, you can use Profit, but use the same API and use Sarima, I think, Sarima, and Holt Winters, as well as some other Ensembl models.

44:47 You can backtest, you can tune your hyperparameters.

44:51 And then you can also, it's got several of these other algorithms for change point detection.

44:58 And a lot of this, like I said, is I'm not an expert on the math, but I am interested in how you figure out how to take these tools and apply them to those real world business problems.

45:10 And so I think it's really great when we have some of these libraries out there that are developed by really smart people that do understand the state of the art, that can maybe make it a little simpler for others to apply to their own unique challenges.

45:23 - Yeah, this looks really nice to bundle these all together.

45:25 What's a type of problem you might answer with this?

45:29 I think so one example could be help me figure out what my blog or my website traffic is gonna look like in six months from now.

45:39 So I need to figure out, do I need to resize my servers or upgrade my disk space or--

45:48 - What's my AWS bandwidth bill gonna be?

45:50 - Exactly.

45:51 The other one that I think it's probably used a lot in inventory.

45:57 So trying to figure out, okay, what do I think sales is gonna look like?

46:00 What do I need to reorder so that I actually have enough product so that we don't stock out?

46:07 I think those are some pretty common use cases.

46:11 A lot of the examples here are the airline flight data.

46:15 So anything that you have that's over a period of time, typically kind of on a daily basis over multiple years, you can then start to forecast out what those future numbers would look like.

46:27 - Then you have this magic prediction power for the executives.

46:31 - Exactly.

46:32 And I think what's interesting about is most of these, I think most times when people do prediction in Excel, it's kind of, you put the numbers in there and kind of do your linear line.

46:43 But these tend to give you more error bars, so you can give a range.

46:48 So I think a prediction like this is much more valuable when you say it could be between, you know, 100 and 110 versus it's going to be 101.5.

46:58 And when you do that, it conveys a lot more precision than is really there.

47:03 Yeah, that makes a lot of sense.

47:04 Comment from the live stream, Sam Morley says, when I was experimenting with time series data, I managed to get better results with a fairly simple, naive Dharma model than I did using profit.

47:14 And I think that's exactly what this article--

47:18 I don't know if he's read this article, but this-- the article from Dr. Cotton, That's essentially what he says is some of the more simple models did outperform profit.

47:29 Yeah, interesting.

47:30 Cool, cool.

47:31 All right, Brian, is that it for all of our items?

47:33 It is.

47:34 Got any extra stuff you want to throw out there?

47:36 Oh, I just had a quick one.

47:38 Somebody on Twitter last week asked, "Why did I write a second edition of the book?" And so I thought, well, that's a reasonable question.

47:46 So at py2sbook.com, you can go and I've added a "Why a second edition" section.

47:51 So you can go read that.

47:53 new built-in fixtures, new flags, EdgeScope features, F-strings, types, all sorts of good things are available that weren't available then, right?

48:02 >> Yeah. There's all sorts of reasons.

48:05 >> Always good to see Pathlib there.

48:07 I love Pathlib. It makes my life so much easier when dealing with files.

48:11 >> I finally made the move.

48:13 I've put down OS.Path, and I'm now all about the Pathlib. Loving it.

48:17 >> That's good.

48:18 >> Yeah. Chris, anything else you want to throw out there?

48:20 I was going to throw out one other. I was doing some some research for working with units of measure and there's a library called unit. You and YT. You and YT that allows you to do things like convert kilometers to miles. But it works with NumPy. It works with all the scientific stack.

48:45 And that was, I hadn't heard of that one.

48:48 I thought it was kind of interesting and wanted to put that out there.

48:51 And next time you need to actually do something with units and convert back and forth, might want to consider that.

48:57 And then the other one-

48:58 - It looks like there's a lot of like physics and chemistry type things, like the mass of the earth, the radius of the earth as constants, probably pi and E and all those things.

49:07 - Yes, exactly.

49:09 And I think it's, when you start getting into it, there's probably a temptation just to code it all yourself.

49:14 just put those constants in there.

49:16 But when it starts to get more complicated, I think something like this could be really useful.

49:21 And then there's another approach called Pint, which also works, and there we go, which also works with units and it has a little bit different approach.

49:31 And so I think it's good to take a look at both of them.

49:33 And if you have a need, then you can decide which API is gonna work best for your unique situation.

49:39 - Yeah, that's cool.

49:40 I haven't looked at unit, but I love Pint.

49:41 And I think the name is so good.

49:43 - Because my wife will ask me like, how many ounces are in a pound?

49:48 Or how many pints are in a liter?

49:51 I'll be like, or even in, I don't know, a quart or vice versa.

49:54 I'm like, I have no idea.

49:55 I just, these are such messed up volume measures.

49:58 And so it's like, here's the thing that takes the thing you don't really know about and allows you to convert it to the others in a safe way.

50:04 It's good.

50:05 - Exactly.

50:05 - I got one more quick extra throughout as well for you, Chris, there.

50:08 I forgot to mention that you were the author of the Move From Excel to Python with Pandas course over at TalkByThon trainings, which is a really popular course.

50:16 Basically it's a intro to pandas course, disguised as solving problems you might with Excel, right?

50:21 - Exactly, yeah, yeah, no thanks.

50:23 And I've had a lot of good feedback from folks, so hopefully it's interesting to the listeners that haven't had a chance to check it out.

50:31 - I might have to buy that for my boss.

50:33 (both laughing)

50:35 - Maybe you can get that discount code.

50:37 - Yeah, yeah, get the discount code.

50:38 All right, you ready for some jokes?

50:40 My Twitter came back, so I can show the Twitter joke now.

50:42 - Yeah.

50:43 So Dean, who is often, but I don't see him this day, on the live stream, sent a joke over and said, do you know how they say async in Italian?

50:54 Asyncio or asyncio, which I thought was a pretty good one.

50:58 Asyncio, asyncio, I love Italian.

51:00 All right, you guys got another one out there?

51:03 I saw one in the notes, another joke, but I didn't see who put it there.

51:06 - I've got one.

51:08 So does anyone know why cryptocurrency engineers aren't allowed to vote. No, I don't know. Because they're minors. That's a good dad joke. Yeah, it is. Absolutely. Well, on that high note, let's let's call a show. What do you what do you say? Yep. Right. Thanks, as always, Chris. Thanks for joining us this time. Thank you very much. Really appreciate it. Yeah. Bye everyone. Thank you for listening to Python bites. Follow the show on Twitter via at Python bites. That's Python bites as in B-Y-T-E-S. And get the full show notes at PythonBytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy.

51:54 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page