Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

« Return to show page

Transcript for Episode #240:
This is GitHub, your pilot speaking...

Recorded on Friday, Jul 2, 2021.

00:00 Hello, and welcome to Python bytes where we deliver Python news headlines directly to your earbuds. This is Episode 240. Recorded July 1 2021. How time does fly? I'm Michael Kennedy. And I'm Brian rocket. And I'm Chris Moffatt. Hey, Chris, welcome to the show. Thank you. Great to be here. Yeah, it's great to have you here. We've had you talking about the missteps of Excel, and how the Python Data Tools chain can make that better over on talk Python a few times. But this is your first time on Python bites, right? It is yes. Yeah. exciting to have you here. Definitely, definitely. But maybe you want to go and kick us off.

00:36 We're talking about subclassing. Today, but iNec wrote an article called sub classing in Python, and you know that dealing with classes is just everywhere in Python, even if you're not using classes, your Python itself has all sorts of classes and objects that you're using all the time, whether you know it or not, but there's, um, there's when you start getting into the larger design, there is a question around, you know, composition versus inheritance and stuff. So I really like this article that he put together, because, because I think people should think about the ramifications more. So the the general gist is he prefers composition over inheritance, and I do too. And, but then goes through, if you have to do inheritance, when in sometimes you do in Python, for instance, the greatest, the greatest example I know of is when you're having exception hierarchies. And it's really, it's really easy to build up exception exception or keys in Python. And it's just like nothing there, except for like the class, the class definitions and their inheritance. And that's the easiest class you've ever created.

01:47 Exception name path.

01:49 Yeah. And but it's useful to do that. But then if you want to go further, there's other there's other design patterns and stuff, especially from the C world c++ world, where people might be thinking, well, I want to do something similar in Python and stuff. And so this is this actually kind of a really great vertical, the discussion about is pretty long. I don't want to summarize it too much. But I'll jump into the three types. So he talks about three types of sub classing, that's often happens, sub classing for code sharing? And the short answer is, it's just people are trying to do the DRI principle and try to share code. And it's just it ends up being a bad idea, essentially. And there's a bunch of references for it. And if you don't, I think if you don't believe me, or him, read, read this article or, and read a bunch, he's got a whole bunch of linked articles to that, discuss it, but but I kind of agree. The second type is inheritance is abstract data types or interfaces in a lot of languages are called the interfaces. And this is kind of a neat use of it. And there's a bunch of things, but it's, it's a, and I thought, okay, yeah, you definitely will use inheritance and composition for for data type stuff. But there's, um, in his discussion, he talks about some of the cool things that Python has that allow you to, to have these sort of hierarchies without actually doing subclassing. So there's, there's some cool features of Python, like, like the protocol, syntax that came in recently, and in typing protocol,

03:29 protocols, like formal duck typing, which is an odd thing to combine. But yes,

03:33 yeah. But it's, it's kind of really cool how it's put together in Python. So like that. And then lastly, is a specialization. And that's where kind of exception hierarchies come in. But also, he's got a great discussion about structuring data classes that have common elements. And, and I think that's an interesting discussion to you. And I, I think I already said this, the summary really, it's really hard to summarize this article other than, it's good to think about your design, especially if you're going to try to bring sub classing into it. So

04:06 yeah, do that. Awesome. I haven't had a chance to dive into this article, but I do want to read it and explore it. You know, it touches on a couple of things like it touches on namespaces and modules, which I think is pretty interesting. So many people coming from c++, C sharp, Java, etc. Like all these really strongly, Opie especially C sharp and Java or everything has to be a class they, you'll see people creating classes just for things like static variables, and so on, right, or static functions. If you just have a bunch of static functions, you know, what works really well for that a module that has functions in it, right, that's the same thing. You import module and you say module dot function name is the same as from module import Class, Class dot static function name, right? Like it's just a layer that doesn't really need to be there. So the article touches on that, which I think is neat. Like, sometimes you just don't need those and then also the composition over inheritance. I think composition over inheritance is a really important thing to Think about so often people say, Well, you can't use o p because it's horrible and all these ways and you end up with like a robotic duck that can't quack right. And like, you end up with these weird situations if you like, derive too many things, and you put a weird specialty on the end, like a duck is an animal and then like, but it has wings. But wait, now it's a robot. Now, why does it eat water? You know, it's like what happened to it, right? But the composition allows for you to keep things much more tight and small in the inheritance stack, but still put them together in meaningful ways. So anyway, yeah, I want to see more about this. This looks great.

05:30 And I I'm coming from a standpoint of, I'm a c++ person as well. And I've done both extremes. I've like gone way down the inheritance hierarchy thing, and had like, seven deep into higher, maybe not seven, but like five deep, and it gets to be a nightmare. So then I got went to the other direction, and didn't do any hurt us at all in a design. But there's there's problems there too. So yeah, I'm thinking about it. And doing it smartly as you just need to. Yeah, so it's often like salt. I could see it's really good when you have some try to get like salts great. I'm gonna have that for dinner like no.

06:09 I think the other thing that's really important about this is depending on how long you've been working in Python, sometimes you kind of get stuck in a rut, and you're always doing the same thing. And the language has evolved and grown over time. And so I think articles like this kind of force you to take a step back and see if you're using all the new features in in a way that maybe aren't idiomatic. Also, quick comment out there from the live stream. Paul

06:33 says first time watching live stream. Hey, Paul, weird seeing everyone, when they say the intro. Indeed, it is kind of weird. But I want to highlight this one to say hi to Paul, thanks for being here. But also, if you're listening, you're like, hey, I'd kind of like to see what's on the screen. While you're all talking about this. You'll follow us on YouTube. There's like a live stream and you write on Python, Python FM, so it's easy to sign up. But also Sam out there and live stream following up on this is currently maintaining a library with a deep template and class and higher end hierarchies. It's very hard to keep track of at all. Yeah, I hear you that. That's for sure. All right. Let's switch over to the next one. Now, I tried. I tried to resist this, Brian, I promise I did. But I ended up with an extra extra was seven more extras here all about it. And it just had to become a main item because otherwise we'd be here for hours, which is not the idea of the show. Yeah, we've got an extra extra. You're all about it at nine extras. Let's pull them up. Action number one. We've talked about Pio died. I had old talk Python episode on Pio died, which is an interesting thing. It is this project by Mozilla where you take Python and you run it in the browser. And then you take many of the data science packages like NumPy, and matplotlib, and stuff and compile them into the browser. And then you basically have client side Python data science, which is really interesting. This project is being spun out as its own topic, as its own project is no longer under Mozilla. Usually, that doesn't sound good to me. It kind of sounds like it's been orphaned. I have no idea what the status of pirate died is people can check that out. But it's no longer under Mozilla. It's its own separate thing. As they say, no, it's it's cruising out there. And also, it didn't get compiled to JavaScript, it got compiled to web assembly, which is interesting, because that's faster. Right? That's number one. Number two, I just as in a couple hours ago, released a brand new course Python powered chat apps with Twilio and sendgrid. So the idea is if you want to have some kind of chatbot, but a lot of that conversation has to involve your, your database and your data and verify and things like the app that we built here is a tech savvy bakery, where you can order cakes by sending in a WhatsApp message. And then it'll say, hey, you want a cake? Well, here's the the menu, and it actually gets the menu from our flask app. And then they pick something off the menu. And once they pick all the details, they said like this, okay, great, we send it back to our website and figure out how much that's gonna cost. The order, it goes back, we send them a, if once they accepted, they get like a customized, pretty email goes back to the back end, the Baker's bake, it sends them another message to let him know. So if you want to build kind of like that workflow, if Willow and sendgrid Check it out, this course is super fun. It's six hours, and it's 100%. Free. So people can check that out that I think you're trying to build that kind of thing. That'll be a lot of fun, though, links in the show notes there. Yes.

09:22 If I can't afford free, can I get a discount code?

09:25 I will give anyone listening 50% off that. So I have this really cool tweet. And Twitter's broken. From what I can tell for everything. That's not the homepage. So something went wrong, but let me describe it. So when you look at it in the show notes, you'll be able to see there's a really cool tweet from Nick Ma, who was on the guest on the show last week. Oh, you got it. How can you get this to work? You've got some sort of magic. All right. Well, so thanks for putting on your screen. So here we have Wilma grin googan showing an animation of basically that's really cool like that. collapsible sidebar and like scrolling within sub windows inside of textual, right? We talked about textual as well. And it's just such a cool graphic that says, like, wow, you can build some pretty amazing applications there. What do you think

10:14 we'll just get out of the park with this? It's fun to watch him go so fast. Absolutely.

10:19 Well done there. We'll, I'll switch back to mine for a moment. Okay. Our second that works on my computer. So remember, we did an episode and I titled it something like block No, or something. So flock federated learning of cohorts is something that Google was trying to do, so that they can replace third party cookies. Why? Because people are running ad blockers, or like I am right now, a VPN, that at the network level blocks, all the ad tracking, and third party cookies. So they're just basically not working very well anymore. So they need to, they're going to cancel third party cookies from which means they're canceling for the net, the internet. And but because they're Google, and they're based on ad revenue, primarily, they can't just go and we cancelled tracking perrey we're all winning on privacy, right? It has to be replaced with some other form of tracking, which they call this federated learning of cohorts. But the Federated learning cohorts has all these almost more negative consequences. And I don't want to go too much into that, because we went into quite a lot of detail. But for example, you can say I would like to target, you know, lesbians who just got divorced, you run an ad on that people show up on your site, they sign up, you have an email. And now guess what, not only do you know what their email is, you know that they're in this group. And maybe this is the very first time you've ever met them, right? So really weird, creepy stuff that you could like, pull out with this. Anyway, the big news is, Google delays the rollout till 23. Because you know what? People don't like it. They're not super keen about it. So there's a whole bunch of people who are against that. Oh, you're saying they're just delaying, not stopping it? Yes. For now, like, let me this is a great Ars Technica article that people should check out, like, let me read the first sentence or two, Google's plan to up in web advertising and user tracking by dropping third party cookie support and Chrome has been delayed. most browsers block third party tracking cookies now has dough VPN, like I mentioned, but Google, the world's largest advertising company, it wasn't going to follow suit without protecting its business model first, but there's a lot of challenges with this. A lot of people have come out against it. And yeah, it's no, it's not gonna work out super well. So they decided to delay it. That's what they said. Stage two starts mid 2023.

12:39 Google says it's received substantial feedback

12:43 from us. And other companies out there, like, we kind of want to keep tracking too, but we're not really excited about this. So we're just gonna not say anything like apple, opera, Mozilla, Mozilla Microsoft. Right? What does it share about this? anyway? Yeah, they've received substantial feedback. So Hooray, I think for Now, one thing that we don't talk very often about in Python is, what if you want to ship your code to somebody, and it has sensitive algorithms in it, right? It's not that common, but you could get pi to ESC or pi to App bundle up your code and give it to somebody, for example, Dropbox does something to this effect, right? You've got your Python code running up in your little menu bar. And there's other ones as well. But you might want to encrypt how that works, or protect how that works. So people can just open the poi files and look around. So there's this thing called source defender. Are we clear this is a paid commercial product, I have no affiliation. But they pointed out, they sent me a message, hey, we're doing this thing. What do you think? It looks kind of interesting. I think it's gonna be a pretty limited data, people who actually care about this, like if you're running on Docker, or you're running on a server, you probably don't care. Maybe you do, but probably not. But if you'd like to be able to encrypt your source code, so it's much harder to see, and then ship that to somebody, you can use this thing as part of their paid service. That's kind of cool. You can check that out. But see, oh, it doesn't play noise. I want that. So I was recently interviewed on the data life and a work from home pythonista, which is a cool series being done by the folks in the Philippines. If you want a tour of the behind the scenes studio and all the work from home stuff, people can check that out by them. 396 was just released, we can check out the change log and see what's happening there. There's a security HTTP client about what to think is like a denial of service. Look, it sounds like an avoids in an infinite loop sort of thing. So that might matter to people? Probably not. But maybe it does. Then a bunch of changes that are happening here, including platform specific ones. So if you're running Python three, nine, and why wouldn't you be update that?

14:42 Because you're 310?

14:43 Yes, that's right. You already had in the world. You're in the future. So also, we had Calvin on from six feet up while ago and we talked about the conference that he was putting together. Well, the videos from that conference are out as a YouTube playlist so people can check that out. I think Don't remember how many videos that are. Let's click on it and see there are 61 videos, including one on the Python memory deep dive talk that I gave. So people want to check it out. They can. Let's see, Oh, this one. Check this out. Brian, have you seen this? Did you know you can pip install Python bites? Yeah. You can literally pip install Python bytes. Because of God stoltmann created this for us as a joke. He's listening whenever episodes, I can't remember what we talked about. This was Episode 239. But we must talk about packaging and PIP and things like that. So he created a package called Python bytes. And what it does is basically you give it a number, like 240. And it would download this version as an mp3 file and put it right next to you know, whatever the working directory is, if you want to pip install Python bytes and then Python bytes dot download episode, instead of using a podcast player, we're all for that. Check that out. Yeah, that's it. That's extra, extra extra extra Well, many, many extras.

15:58 Yeah, so the Python bytes package is just sort of was for fun. But it also is really small. And one of the things I like about it is it's just a really cool example of like with Python, you get something that downloads mp3 files off of a bead somewhere. It's that easy. It's just that's pretty cool.

16:17 Yeah, that's fantastic. Absolutely. All right. Let's see a couple of things from the live stream. Sam says things have happened with Mozilla last two years, that really shook my confidence with them. I am still a big fan of Firefox, and I support their mission. But yeah, it's I want to see them succeed. Let's see. Another one from the livestream. Antonio said Hey, guys, have you mentioned kibby? before? Hey, gooeys and kivy? There you go. I watched a video about this week. It's a GUI that's compatible with many things, including the mobile devices I do. My feeling is that kivy is is a lot about it's more about building almost game like interactions. Whereas a lot of gooeys people want they want like here's a text box, I type in the text box. Here's a button I drop in, you know, so But yeah, pretty cool. Let's see, giveaway does as an aside, shipping a Docker image won't obfuscate the Python the image can be taken apart and files like that. That's true. They absolutely can. I was just thinking like, you're probably just run it on like a Danish service. But yeah, if you're shipping it to someone, it's the same. Nick Harvey on the live stream says, Good just send the P yc. files with no PMI. It's not foolproof, but it does require more work. You're right. Basically be down to like this. Stop this and like reading the bytecode. Yeah, for sure. Let's see what a one ran says. If it ends up running code on your machine, you can read it. It's about putting up barriers that people won't bother. Yeah, that's that's definitely true. I mean, you think of c++ and things like that being completely opaque. And yet people take that apart all the time. But there is also a difference from I'm literally shipping you the source files here. To Yeah, cuz then you could go and like, oh, here's where the license check is. Let's just, you know, command slash comment that out. All right, now we're ready to run. Yeah. Right. You want to make it a little bit of a challenge, at least I suspect. Anyway, thanks for all the feedback out there, everyone. That's that everything extra extra nine times. Right, Chris? What's your first one here?

18:08 All right. So the first one is from Andreas cons, I think is how you pronounce it. And it's a library called klyb. I believe I wasn't sure if it is Caleb or klyb. But I think it's clear. And it's for automated cleaning of pandas. dataframes, I guess I should even say it's a little bit more than just cleaning. It's automated analysis. And I, you know, I'll be the first to say I'm a little skeptical about some things that try and automate the process, but I was playing around with it. And there's some pretty cool things that it does. The the documentation, probably the best way to learn about it is the towards data science article that he wrote, which gives a pretty nice overview of what it does. It has some, as I mentioned, some pretty nice cleaning features, as well as analysis features. So I was going to kind of go through a couple the describe a couple things. The first one that I thought was really interesting is the there's this function called data cleaning, and it essentially does, you can control what it does. So it can clean the column names, it can convert data types, it can drop missing. So one of the things that pandas does is, it's not really aggressive about the size or the data types that it uses. So when you read in data, it will just kind of assign it maybe to a float, or you know, an object. But if you want, you can get in there. And if it's a value, if it's a column, let's say that it has only values from you know, less than 100. If you convert it to an integer, it saves memory. If you save enough memory, then you can actually speed up your code and so this goes behind the scenes and takes your data frame and converts it essentially To the smallest value NumPy value that they can store. And then, you know, I took a random data set. And sure enough, it did reduce the memory footprint quite a bit, which I thought was pretty interesting because it's one of those things that is very tedious to do on your own by hand.

20:19 If you have the same string to just create a pointer to one copy, instead of having that many times, stuff like that,

20:24 it can do that by converting to a category type. That's essentially what pandas is doing. When you create a category, it does that to kind of string to, like a list conversion. And it's, you know, it's pretty effective. And yeah, I've used the category piece before, but I haven't actually gotten and then tried to, you know, shorten up the the numeric columns, which is really useful. The other thing, convert them to all the

20:51 integers, and then it'll just be shorter. So you don't have to worry about the size. I'm just using. Yeah.

20:57 No, but but I mean, it does even do it's like it can do even like in sixteenths, or in 30 twos or Oh, yeah, interesting.

21:04 size that all these are all under 256. So we'll go to like one bite.

21:10 Exactly, exactly. You know, and I haven't looked at the code to see you know, how it actually figures it out. But I had a fairly large data frame, and it was, it was pretty quick. The other one that was interesting is the clean column names. So I think there are some other libraries out there that that will like strip spaces or special characters from column names. But what this one will actually do is actually, if you have a column name that has, let's say, camelcase, it'll convert it to all underscore, or it will just essentially normalize all of your column names, which, you know, we, you could have a debate about whether you want to do that. But when you have a data frame that has a lot of columns, and you're just looking at the first time, that can really be helpful. And then the the other function that it does that works pretty well is for cleaning, duplicate data, or empty data. So if you have a lot of columns that have no values in it, or just, you know, maybe 90% of the values are empty, you can set thresholds and just clean that all out. So I was playing around with it. And I was pretty impressed. And I kind of wanted to call it out, because the documentation right now is mostly around the Jupyter notebooks that he has. So I think, you know, it would be nice if we could get some, get some more docs in there and some more examples. But overall, I was really impressed the library. And I think people should kind of take a look at it and see if it's something they want to use for some of their own processes.

22:47 Yeah, some of them sound interesting, even if you don't have to trust it, right, like the, the shrink the smallest data set data type, for example, or normalized column names, those don't seem as risky is, you know, clean it up.

23:00 exactly the wrong data. Exactly. And then I forgot to mention, it also has some nice correlation plots. And some of these things you can already do with seaborne or matplotlib. But I found that it gives you a little more control. And it's just a little bit easier to do it. There are certainly other tools out there that, that do this as well. So I Oh, and then the categorical data plots, I thought was a nice summary of the data and gives you some nice graphs and helps you understand where you've got some missing values. But

23:36 visualizing the messy data is a really interesting feature.

23:39 Yeah. And there is another panda's data frame called missing know that the does this and does it well, but I think this is a unique combination, especially some of the data, the memory saving features that has are pretty neat. So the cleaning features, though, have a lot of, there's a lot of parameters to it. So it looks like you have a lot of control. And one of the main again, this is open source. So it isn't that magical, you can just look at the source and see what it's doing. So Exactly, yeah. And that was one of the things I was looking at is like data cleaning, I think it's kind of the top level, and you can just run that wide open, it'll do everything, and then actually prints out a pretty nice summary of what it does. But you can also go in there and specify parameters, like he said, to control it so that maybe it doesn't rename the columns or drop some of the missing data. The other thing that I tried to play with it seemed really interesting is this pool, duplicate subsets. And essentially what it tries to do, and and I had a little bit of trouble with this, because I think I put too much data at it, but it tries to maybe if you have 10 columns of data, it says Well, you know what, four or five of them are very heavily correlated. So we're gonna drop them and just give you the the four or five that are actually most useful. And so I think That's some interesting tools to use when you get some some data that maybe you haven't worked with before.

23:39 Yeah, yeah, very nice. What a what a good find a Brian, you got the next one?

23:39 Sure. Yeah, just a second. So I wanted to remind people to, every once in a while look at func tools. Because it's, um, I've, I've experienced funk tools is kind of interesting library or library that's built in. That is, it kind of grows with you. So if you're new to Python, and you look at it, it's gonna be confusing. There's, there's, it's like all intermediate stuff in there. But, but as you learn and experience more python programming, come back to it every once in a while, because there's stuff in there that you'll use that you didn't think about before. So I'm going to go through a few things. And actually, I wanted to call out, there was an article by Mike Martin Heinz that I read that kind of reminded me to go through and look this. So I want to shout out to him. Thanks. We've talked about some of this stuff before, we talked about over function overloading with. And using single dispatch, as part of is one of the ways you can do function overloading in Python, which is cool. And that's part of funk tools. And hopefully, people are familiar with wraps. wraps is a is a way to create decorators that act like the thing that you decorated. And so if you're writing decorators, make sure you check out raps and then caching as well. We talked, I think, I'm sure we've talked about LRU cache, but I'm sure have Yeah, yeah, so that's infant tools, the caching. And new in three, nine, there's a, there's just a simple cache, you don't have to say LRU cache, and it's just a convenience wrapper around LRU cache, but it also it, there's no MAX SIZE. So you don't want to do that for things that you actually want to throw through items away. But caching is super cool. To check that out.

23:39 And then I did I first saw the LRU cache, I'm like, Whoa, I gotta go figure out what this LRU is. And it's not like rather than just like cache the response. Here, I guess the other question, though, you might be is like, well, what have you passed to variable or two arguments or sets of arguments? How are those? Yeah, so either way, it's kind of not 100%? totally obvious what's going to happen? Yeah, it's very cool.

23:39 Yeah, there's, so there's a bunch of caching stuff in there, like the LRU cache, but then you can also cache a property. And actually the property one I hadn't used before, but I was playing with it this morning. And it's really cool. So like, for instance, if you got a, you've got a data class or some or any class that has a bunch of stuff. And some of the things, you have an expensive read on one of those, because you have to calculate the value, you can, you can throw a cash property on it, it looks pretty cool. One of the neat things about it is, so you only read it once, and then it caches the value of the property. And if you need it to refresh, you called the lead on it, which is, it's kind of a weird, but kind of cool also, but it's odd, to call Delete on something that you want to still be there. And it'll just reread it next time. So that's how that works.

23:39 That is weird.

23:39 Total ordering, I didn't realize was there. You can. So if you have have something that you some data type that you want to be able to compare, you can use total ordering to define the equal and one other operator and then you get all of the operators, all of the comparison operators show up, you can use that. And then the last one I wanted to highlight is partial method, which partial and partial method, which these are kind of neat in that like, let's say you've got a func a function that takes a whole bunch of arguments, but you want to you want to pre fill some of those in and create a new function that has some of the arguments prefilled in that's a you can do that with this. And pretty neat.

23:39 Yeah. Okay, interesting. So you partially supply some of the arguments, but not all of them.

23:39 Yeah, so um, yeah, just shout out to this that. Yeah, these are intermediate or advanced topics, but but there. So as you learn more Python come back to this. Everyone smile, and you, you might just use find it useful.

23:39 Yep. Indeed. I was like, how did I miss this cached property thing? Like surely I would have paid attention to that because what so often these properties that are like computed things, but they you know, often don't change, it gets in the back from the database you want to get has time sorted in seconds, you want to know how many days it is. So something happens, you might have a base property, right, but that's probably not gonna know having that cache. It's cool. If you're sure it's not gonna change, but I'm like, how did I miss it? It's new in three. Yeah, it's not not super old. Yeah.

23:39 And like Chris said, one of the reasons to revisit a lot of the these things and pay attention to the news of Python is because the language changes like this. So yeah.

23:39 Sure game out there in live streams that's also worth looking at it or tools from time to time definitely bring Dean it's, it's in the same level of complexity. But for collections, it's kind of like that you wouldn't first go there. But eventually like, Oh, yeah, this is what I wanted. I just didn't know it. Speaking of things you didn't know it, let me scare you a little make you all delighted. I don't know, you tell me how you take to this. So let me set the stage. GitHub has a little bit of source code, much of it actually public, right, like it's public repos and, and whatnot. So it can be analyzed and talked about and shared, or used to train an artificial intelligence, which is pretty crazy. And if you look at the artificial intelligence around text, there's the GPT, three stuff, which is like scary, good text based AI. Well, they decided, what if, you know, our parent company also makes this editor? What if we did an AI based on understanding the source code from GitHub, like all the source code from GitHub, and put it into VS code? And then it did stuff? Have you all seen this? It's called GitHub copilot? Yeah. Yeah,

23:39 I haven't done it. I was gonna put the link in there, and you beat me to it.

23:39 Oh, yeah, I was on top of it. So if you go over here, there actually works for TypeScript go, Ruby, Python, a couple other languages. It says it works for many languages. But it's best on those, of course. But if you just look like at their homepage, the co, they've got this little animation, and it says I'm going to write a function that says parse expenses, and it takes some kind of text, and you put a doc string, literally a doc string in Python says, pars, the list of expenses and return the list of tuples date, value, currency, ignore line starting with hash, Rs using date time, here are some examples tab. And then it writes the code that does that. And let's see, what is it going to do? It says it's in the middle of integration, it creates a list of expenses, it goes through each line on split, it says if the line starts with hash, this is all Python code. Continue on your loop. Otherwise, date value currency equals split it, and then it knows how to parse the date one, convert the value to a float, and then store the currency as a string. And it's not just that sometimes it'll do this, you can actually get alternate implementations by tabbing through its recommended solution, which is pretty crazy, though. This is powered by open AI eyes called it's called ODE deck, or something like that. I don't be right here right now. Anyway, I'll probably run across in a second. That's what is powered by. And it says things like, like, you're the pilot. So with GitHub copilot, you're always in charge, you can cycle through alternative suggestions and choose which to accept or reject, and then manually edit the suggested code. Oh, yeah. And it learns from you. So I don't know this is this is wild, y'all. This is pretty wild stuff here. What you think?

23:39 I think it's, it's really impressive. I mean, I, it will be interesting to see what it's like when you use it in real life. And I think that there could certainly be limitations. But I don't know about you. But when whenever I'm programming, there's always these things. I just need to go and look at the documentation or look at Stack Overflow to write like,

23:39 I gotta connect to SQL alchemy. And I totally forgot how to do those three steps for that connection strings sequence, right?

23:39 Exactly. Yeah. And I've seen I saw on Twitter where someone was throwing a little shade at that example that you're walking through, because they said, Well, why are you storing the currency as a float? Should be a decimal? Because if you store currency as a float, you're going to have all the rounding issues. So well, that's how

23:39 Superman makes all this money, or the evil villain and Superman. Was it one of those? Yes,

23:39 yeah. Richard Pryor and one of the original Superman Yeah.

23:39 Yeah. And it's not just based on the doc string, like, the example I first spoke about was you wrote complex doc string, and then say, do that thing. But you can do it based just on function name, you can just type a meaningful function name. What was the example? They use? Remember? But yeah, so you basically just write a docstring, a comment a function name, or even some code to give my more context to it. And then off it goes. So yeah, pretty Codex. That's the name of the AI system behind it. Do basically, this is a plugin for VS code, but a really nice one. So here's some examples. We'll all be familiar with doe fetch tweets. And the example here is you literally literally write def fetch underscore tweets underscore from underscore user tab. And what it autocompletes with is, oh, you're going to need to pass the username in. And then here's how you authorize with dweeby. Set up the API credentials. And then here's the code you write. Yeah, and here's your return for I want to do a scatterplot. And you write import, import matplotlib.pi plot as plot, draw scatter plot tab, then boom, there it is, for memorization, I wanted to point this one out, because what you're covering Brian says, oh, here's how you memorize a function, which is to, if it's past a set of arguments, it's always going to return the same way. So just give that answer like Remember, these arguments equal this return value once it's run, and it shows how to create a complex decorator, that is going to have a function that remembers the values using caching, it could just go at func tools dot hash, you know, like, so there's things like that that is missing, right? Because you could achieve the exact same outcome with func tools, hash decorators, right, instead of trying to write a bunch of code that re implements that, but anyway, pretty, pretty wild thing. I don't know really how to feel about that been thinking about this today? It's kind of freaking me out. But it's also kind of cool. Yeah, I

23:39 wanted to point out a comment that people have been pointing out with relation to this is, is the, you know, wish we could have just specify what we wanted to do the computer to do. And it just does it. And we already have that's called code. So

23:39 yeah, I don't think this you know, people often say things like, I remember hearing this 20 years ago, oh, this low code thing, where you create these little boxes that do stuff and you drag and drop between them, we're not going to need programmers anymore, we're just going to become drag and drop ease. And then like you programmers won't be needed, the business people will just drag and drop either way, the future and that never ever happened, right? Because people got to put them in production. They've got to bug them, they've got to scale them, and so on. Yeah, yeah, I think the same thing here, like sure is wrote at once. But you can't have a write only experience for your code, you have to understand your code and be able to evolve your code and work with this might power you into a solution faster. But I don't think it escapes the need of people doing meaningful software work,

23:39 the person that pointed out and several people pointed it out the the example of using money floats and money. That does highlight one of the problems with something like this, though, that everybody needs to be careful of is the code that's generated. Now you have to like be you, you were already creating carefully thinking about it when you were creating it. But if something else creates it, you've got to scrutinize that to make sure that's really doing the right thing. And so your code reviewing some AI code while you're coding your own stuff, it's just a different part of your brain, you got to make sure that you're really paying attention.

23:39 Yeah, yeah. And even I was I was looking at that matplotlib example. And I would even argue that's not really the way you should do a scatter plot and matplotlib. Because you're, you should use the object oriented interface. In matplotlib. I mean, what the code will work, but I wouldn't advocate that you use that code. And so to your point, I think it will be interesting to see if it does learn on your own coding style. So does it start to recognize those things that you're always you know, like you said, connecting to a database or fetching a file or doing a certain pandas function? Will it start to learn that, and then I thought I read something

23:39 about it adapting to you and it learning from what you're doing, but I have no idea what that actually means.

23:39 Yeah, hopefully, it's paying attention. So if it generates something, and you change it to the different method, and everybody else is doing that, also, maybe they'll stop suggesting the old one and start suggesting them

23:39 all. You know, Chris, your point about having to maybe as you Brian, sorry, whoever said about you've got to like criticize this and you didn't write it, you basically got to study it, and then then understand our understanding the study to make sure it's doing the right thing. You know, I couple years ago, I don't know while ago, I was river floating and broke my hand on some rocks, broke my finger and a bunch of places and like my fingers were completely wrapped up all the way to the very tips, there was no like, oh, little packing, typing, while my handheld It was like, nope, no one handed really slow. So to keep things going, I use voice to text to try to like rescue at least keep email flowing for a month or something, you know. And what I found was, I could write pretty decent emails, it's hard to like, stop and think in whole sentences, the way the little tools like to work, but you can get it to work pretty well. But the mistakes it makes that are phonetically correct, but actually what you mean wrong, like they and they or, or something that sounds like what you said, but it's actually not what you mean to say, is incredibly hard. It's much harder to understand and edit, then you would think and so things like this, like well, I wanted it to do that. And I hit tab. Okay, let's do it. I feel like there's gonna be a lot of blind spots. Yeah, well, I did what it says it didn't I typed the thing and it seems right. And like, how do you really really know I just seems like in the same type of situation, it's going to be harder than normal code to check because you didn't have to think through it to create it. You know, a couple of comments from the live stream rayhaan don't don't give them ideas. Dr. Falcon. Josh, where are you and let's play thermonuclear war as a docstring. Hey, Nick says I can't help but think of Microsoft pay, which Microsoft was this really cool bot that was super good at adapting to stuff and they put it on Twitter, but people decided to like be mean to Do it instead of teach it I think in like Japanese winter, it became a very kind and intelligent bot but on like English Twitter, it got turned into like a racist, horrible creature. Yeah, like right away and they actually had to cancel the project. So yeah. And then Arthur says next April, April, April Fool's Day prank, everyone start writing terrible code that influences the and this this is why English day went down the tubes let's see. And then Sam for goodness sakes don't trade it on GitHub code arbitrarily turn on debug mode for abs. Kim thinks this is both very impressive and vaguely unsettling and that captures what I was thinking. Reagan will go and talk to the marketing people for me.

23:39 I'm good with people. That's what I do. Yeah. Okay.

23:39 Another thing that's not mentioned here explicitly, but I think is interesting is this code is coming from GitHub. Yeah. When I go, and I'm saying like, I'm working on a super secret commercial project for large organization that has lots of people trying to scrutinize it. And I hit memoized. tab, it's gonna write some amazing code. Oh, by the way, was that GPL? Where did that come from? Right. Like, what's the license of the code that was on GitHub, that I just now all the sudden, grab something that turned, you know, like, if I was doing this on Windows, and I hit tab is Windows now open source?

23:39 I don't know. That's a really interesting point. And you would think if it was a small startup, someone will probably sue them. But now this is Microsoft now. Yeah, so Exactly. Yeah,

23:39 yeah. Anyway, so I'm, I'm I agree with Kim, this is both very impressive. It's if this is the start, like, where will it go? I'll be very amazing. But it's also vaguely unsettling, at the same time. And I don't know how I feel about it, other than I wish it was in Python, so I could play with it more often. Alright, Chris, you

23:39 got the last one I do. So this is another library called cats. And it's a time series analysis library. And it's made by the st. While it's from Facebook, and a lot of people may have heard of profit, which is a library for time series forecasting. And one of the things that's interesting to me about profit and cats, is I think time series. forecasting is something that's really common in the business world. I mean, you think about trying to forecast sales, or maybe inventory movements, or stock prices, a whole bunch of different use cases for it. And I think in general, most organizations don't have a group of PhDs that are really sophisticated in their analysis. So people use Excel and kind of come up with their own approaches. And that's why I thought profit was interesting. And I think this is interesting, because it does come from Facebook, and you have to assume that they've got a lot of smart people that are doing a lot of forecasting. And they've taken some of the things that Prophet was good at, and added some additional tools. So before I go too much into cats, one thing I wanted to mention is I did write an article about profit. But I think other people, this gentleman, Peter cotton, wrote an article about profit, and essentially questioning how good it was. And this is a really long, really well thought out article in some of the math and some of the concepts are way over my head. But I do encourage people if you're looking at time series forecasting, take a look at this. But what Katz does, is instead of just doing forecasting with profit, it has a couple different models that you can use, you can also do some more just basic time series analysis with it to detect seasonality patterns and change points and other trends. There's also if you want to incorporate this in some of your other machine learning algorithms to pull out features, from your time series data, you can do that with this library as well. There's a whole bunch of other libraries to or utilities to build, like ensemble models and other approaches for time series forecasting. This is another one where it is relatively new. So there's not a whole lot documentation, but it's a whole bunch of different Python notebooks, Jupyter notebooks, I mean, and like one of the things I think is interesting is for a forecasting perspective, you can use profit, but use the same API and use seroma I think saaremaa and Holt winters as well. Some other ensemble models, you can back test, you can tune your hyper parameters. And then you can also it's got several these other algorithms for change point detection, and you know, a lot of this Like I said, is, I'm not an expert on the math. But I am interested in how you figure out how to take these tools and apply them to those real world business problems. And so I think it's really great when we have some of these libraries out there that are developed by really smart people that do understand the state of the art, they can maybe make it a little simpler for others to apply to their own unique challenges. Yeah, this

23:39 looks really nice to bundle these all together. What's a type of problem? You might answer with this?

23:39 I think so one, one example could be helped me figure out what my blog or my website traffic is going to look like, in six months from now. So I need to figure out, do I need to, you know, resize my servers or upgrade my my disk space? Or what's my AWS bandwidth bill gonna

23:39 be?

23:39 Exactly? You know, the the other one that I think it's probably used a lot in inventory. So trying to figure out, Okay, what do I think sales is going to look like? What do I What do I need to reorder so that I actually have enough product, so that we don't stock out? Like, those are some pretty common use cases, a lot of the examples here are the airline flight data. So anything that you have that's over a period of time, typically, kind of on a daily basis over multiple years, you can then start to forecast out what the future what those future numbers would look like, then you have this magic prediction power for the executive. Exactly. And I think what's interesting about most of these, I think most times when people do prediction in Excel, it's kind of the put the numbers in there and kind of do your your linear line. But this, these tend to give you more error bars. So you can give a range. So I think a prediction like this is much more valuable when you say it could be between, you know, 100 and 110. Versus it's going to be one at 1.5. And when you do that, it conveys a lot more precision than is really there.

23:39 Yeah, that makes a lot of sense. Coming from the livestreams. And more length. When I was experimenting with time series data, I managed to get better results with a fairly simple naive Arima model than I did using profit.

23:39 And I think that's exactly what this article I don't know if he's read this article. But this the article from Dr. Cotton, that's essentially what he says is some of the more simple models did outperform profit.

23:39 Yeah. Interesting. Oh, cool. All right. Brian, is that it for over items? It is got an extra stuff you want to throw out there?

23:39 Oh, I just had a just a quick one. Somebody on Twitter last week asked, Why did I write a second edition of the book? And so that that's a reasonable question. So at pytest book, calm, you can go and I've added a why a second edition section to read that

23:39 new built in fixtures, new flags, gauge scope, features, f strings, types, all sorts of good things are available that weren't available then right?

23:39 Yep. Yeah, there's all sorts of reasons. So always good to see pamphlet there. I love pathway that makes my life so much easier when dealing files

23:39 that I finally made the move, I put down a path, and I'm now all about the path live loving it. That's good. Yeah, Chris, anything else you want to throw out there,

23:39 I was gonna throw out one other. I was doing some some research, for working with units of measure. And there's a library called unit in YT, you and YT that allows you to do things like convert kilometers to miles. But it works with NumPy. It works with all the scientific stack. And that was, I hadn't heard of that one. I thought it was kind of interesting and want to put that out there next time, you need to actually do something with units and convert back and forth might want to consider that. And then that looks like

23:39 it is a lot of like physics and chemistry type things like the mass of the Earth, the radius of the earth as constants, probably pi and E and all those things.

23:39 Yes, exactly. And I think it's when you start getting into it, there's probably temptation, just to code it all yourself, just put those constants in there. But when it starts to get more complicated, I think something like this could be really useful. And then there's another approach called pint, which also works. And there we go, which also works with units, and it has a little bit different approach. And so I think it's good to take a look at both of them. And if you have a need, then you can decide which API is going to work best for your unique situation.

23:39 I haven't looked at unit but I love pint. And I think the name is so good. This because my wife asked me like how many ounces are in a pound or how many pints are in a liter I'll be like or even in, I don't know, a court or vice versa. Like I have no idea. I just these are such messed up volume measures. And so it's like, here's the thing. takes the thing you don't really know about, and allows you to convert it to the others in a safe way. It's good. Exactly. I have one more quick extra throughout as well for you, Chris there. I forgot to mention that you were the author of the move to from Excel to Python with pandas course over at talk Python training, which is a really popular course. Basically, it's a intro to pandas course disguised as solving problems. You might with Excel, right?

23:39 Exactly. Yeah. Yeah, no thanks, and had a lot of good feedback from folks. So hopefully, it's interesting to the listeners that haven't had chance to check it out.

23:39 I might have to buy that for my book. Maybe you can get that discount code.

23:39 Yeah, the discount code. Alright, you ready for some jokes? My Twitter came back so I can show the Twitter joke now. All right. So, Dean, who is often but I don't see him this day today, on the live stream, send a joke over and said, Do you know how they say async in Italian, it seemed Gil, or async. io, which I thought was a pretty good one. I think I think you'll love Italian. Alright, um, you guys got another one out there. I saw one in the notes. Another joke, but I want to put it there.

23:39 I I've got one. So does anyone know why cryptocurrency engineers aren't allowed to vote? No, I don't know. Because they're minors.

23:39 That's a good dad. Yeah, it is. Absolutely. Well, on that high note, let's, let's call the show Woody. What do you say? Yep. Brian. Thanks. As always, Chris, thanks for joining us this time. Thank you very much. Really appreciate it. Yeah. Bye bye. Thank you for listening to Python bytes. Follow the show on Twitter via at Python bytes. That's Python bytes as mb YTS and get the full show notes at python If you have a news item you want featured just visit Python bisetta fm and send it our way we're always on the lookout for sharing something cool. On behalf of myself and Brian rockin. This is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page