« Return to show page
Transcript for Episode #191:
Live from the Manning Python Conference
00:00 Hello, and welcome to Python bytes where we deliver Python news and headlines directly to your earbuds. This is Episode 191, recorded July 14 2020. I'm Michael Kennedy. And I'm Brian Ogden and welcome special guest Enos. Hi, Dad, it's great to have you here. So I want to kick this off with a cool IoT thing. Now, IoT and Python, they've got a pretty special place. Because when I think about Python, I think of it as not being something, you know, that sort of competes with assembly language, and really, really low level type of programming for small devices. But, you know, amazing people put together micro Python, which is a reimplementation of Python that runs on little tiny devices. And we're talking like $5 microchip type devices, right? Have you review all played with these? No, no, no, I haven't felt like I've been seeing a bit of this for my brother. So he's, he's pretty amazing. Like he he's a bit younger than me. He's an event technician. And he reached recently taught himself programming and everything just so he can build stuff on these, like tiny Raspberry Pi's. And like Adam, he's doing super advanced stuff. It's been really interesting to see him learn to program. And he's also he's incredibly good. He has like amazing instincts about programming, even though he's never done it before. But like, so I've been kind of watching this from afar, and it made me really want to build stuff. So I'm Vegas. Yeah, I've done the I've done the circuit Python on some of the Adafruit stuff. Exactly. So I always just want to build these things. I'm like, What could I think of that I could build with these cool little devices. I just in my world, I don't have it. Maybe if I had a farm, I could like automate, you know, like watering or monitoring the crops or if I had a factory, but I just, I just don't live in a world that allows me to automate these things. Do you have pets? Maybe you can build something for pets, we generally don't have pets, but we are fostering kittens for the summer. So a little device onto one of the kittens potentially.
01:54 GPS tracker? Yeah. So in general, you have to get these little devices, right, you've got the US bicon, we got the circuit playground Express, which is that little circular thing. It's got some 10, LEDs and a bunch of buttons, and other really advanced things like motion sensors, and temperature, and so on. Probably the earliest one of these that was a big hit was the BBC Micro bit, I think every seventh grader in the UK got it, some grade around that scale, got one of these. And it really made a difference in kids seeing themselves as a programmer. And interestingly, especially women, were more likely to see programming as something they might be interested in, in that that group where they went through that experience. So I think there's real value to work with these little devices. But getting a hold of them can be a challenge, right? You've got to physically get this device, that means you have that idea of I want to do this thing, and then I have to order it from Adafruit or somewhere else and then wait for it to come. In my experience has been I'll go there. And I'm like, Oh, this is really cool. I want one of these. Oh, wait, no, it's sold out right now you can order it again in a month, right? So get in is a challenge. And also, if you're working in a group of say, like you want to teach a high school class, or college class or something like that, and you want everyone to have access to these, well, then all of a sudden, the fact that it may be a cost $50 wasn't a big deal. But if it's $50 times one, or 100 kids, then all of a sudden, well, maybe not. So I want to talk about this thing called device simulator Express. So this is a plugin or extension or whatever the things that I think it's extensions, that VS code calls them, that makes VS code do more stuff. And it's a open source free device simulator. So what you can do is you just go to the Visual Studio Code extensions thing, and you type device probably is sufficient, but device simulator Express, and it'll let you install this extra thing inside of VS code that is really quite legit. So it gives you a simulated circuit playground Express, a simulated BBC Micro bit. And the most impressive to me is the clue from Adafruit, which actually has a screen that you can put graphics on. So really, really cool way to get these little IoT devices with circuit playground circuit. Python, so Adafruit fork of micro Python on there. Do you guys think that picture? Look how cool that is? Yeah, so you can really see you can write Python in one tab and then just have the visualization in the other. That's pretty cool. Yeah, yeah, exactly. And it's very similar to say what you might do with Xcode, and iPhones where you have an emulator that looks quite a bit like it, or what you would do on the Android equivalent, I actually think this is a little bit better than the device because it's actually larger, right? Like the devices are really small. But here's like a, you know, you could be like a huge thing on your 4k monitor with a little blue device. So you can simulate circuit playground Express, BBC Micro bit and the clue in here and we just say new project, and it'll actually write the boilerplate code for the main.pi or co pilot
05:00 whatever it's called that the various things is going to run. And it like you said Enos on one half, it's got the code and the other half it has the device that you can interact with, I was thinking that the couple cases that would be great is, like you were saying, trying to get a hold of it. But you might not even know if the concept that you're going to use is really going to work for the device you're thinking of. So this would be a good way to try it out to try out whether the thing you're thinking of trying in your house or whatever, would actually work for this device. The other thing was, yes, you brought up education, and that it's big. I was thinking about a couple of conferences where they tried to do the display. And sometimes in the try to have like a camera or something. Yes, sometimes it works. And sometimes it doesn't. This way, you could just do a like a tutorial or in a teaching scenario. And everybody could see it, because it's just going to be displayed on your, your monitors, right, your standard screen sharing would totally work here. And that's a good point as well. And it doesn't have to be all or nothing that actually was really interesting is this thing, is it just an emulator, but you can do debugging, you could set like a breakpoint, and like step through it running on the device, simulated or you can actually run it if you had a real device plugged in, you can run it on there as well. And then do debugging and breakpoints and stuff on the actual device. So that's like you tested here. I always admire people who actually use like the proper debugging features. I know, vias code has like so much of this and I'm always like, I should use this more. But I'm like, okay, print,
06:26 Yeah, there's some really cool libraries that will actually do that. I can't remember what it's called. But Brian, I recently covered one that that would actually like, print out a little bit of your code and the variables that they change over time, it was like the, the height of the print debugging world, it was really, really cool. I wish I could remember. Do you remember Brian? No, we actually cover a couple of them. And I know, I know. That's the problem. We cover thousands of things in here. So another thing that's interesting is like, okay, so you see the device, some of them have buttons, and they have lights, and you can imagine, maybe you could touch the button. But they also have things like temperature, Vera meter type things, or like moving it or motion sensing, or even like if you shake it, this thing has little ways to simulate all that stuff. So you can like have a temperature slider that freaks it out and says, Hey, the temperature is actually this and your temperature sensor, and so on. So all the stuff that the devices simulate are available here. That's cool. Yeah. So I actually had the team over on talk Python not long ago. So people can check that over at talk python.fm. And, yeah, I'm also really excited about what you got coming here and X ray. Was that Yeah, well, speaking of deep guest debugging vs. 10. We didn't really talk about testing Anyway, I'm really excited about talking about testing. Yeah. So I was I was thinking, and I was thinking that that I hardly ever use a debugger for my source code. But I use the debugger all the time, when I'm debugging my tests. I don't know, it's just some something different about it. But I've been running a lot of tests and debugging a lot of tests lately, because pi tests six the candidate releases out. Now by the time this episode airs, I don't know if the release candidate will be released, or just the release candidate sale. But it's, you can install it, we'll have instructions in the show notes. But essentially just have to say six dot o dot o RC one. And you'll get it. So there's a whole bunch of stuff that I'm really excited about. There's a lot of configuration that you used to be able to put in lots of places in the ER pi test any, or your setup, config or tox. Any or something by test six will support pipe project Tom will now so if you jumped on the Tom will bandwagon, you can stick your PI test configuration in there, too. There's a lot of people excited about the type annotations. So the six oh, is going to support type annotations as actually was a lot of work. There was a volunteer that went through and added type annotations to a bunch of especially the user facing API. And why this is important is if you're type checking, you're running my PI or something over the years, direct source and everything, your project in your why not include your tests. But if I test doesn't support types, it doesn't really help you much. So it will now so that's really, really cool addition, what this is, is basically the API of pi test itself is now annotated with types. Yes. And well, a lot of the internal code as well. So they actually go through and did a lot there was a lot of work in that. The look at the conversation chain, it went on for it was a month, month, very several month project. Now, what does that mean for compatibility that make pi tests like three, six only? And above? I think the modern versions of pi test really are ready are 360 I'm not sure about that. Right. So then then the door was open these that because otherwise it would cut I mean, it would be a vein, we'd move to like release a completely new version with two back Python two backwards. Compatibility, like that's
09:52 like, Yeah, well, you wouldn't do that. Right. I mean, it's, it's, I think, Well, I think that's the message it sends. It's like not great. I totally agree. Totally. There is a pinned version.
10:00 I have pi tests, I don't remember which one it is. That is still supports to seven if you're on it, but no new features are going in there. The thing I'm really excited about is the is a is a little flag they've added called no header. So don't use this. Most people don't use this. When you run pi test, it prints out some stuff like the version of Python, the version of pi test, all the plugins you're using bunch of information about it. All this stuff is really important for logging. If you're, if you're capturing the output to save somewhere, do a bug report or something. That information is great to help other people understand it. What I don't like about that, is that it? It's not helpful if you're writing tutorials, or if you're writing code to put on a slide or something all that extra stuff just takes up space, and it distracts. Yeah, like I've had students say, like, I ran it, I think pi test in PI charm, and it has, like some kind of output, just stating where it is and what it's doing. They're like, this didn't work for me. I'm like, Well, that was just random output from the tool. You're not actually supposed to try to run that part. You know what I mean? But it's it's I mean, I saw why they saw that, but at the same time, like to build to just say like, these details don't matter in the long term. Yeah. Yeah, so I'm excited about that. to trim it down, there was a plugin called TLDR. too long, didn't read. But it it, it actually didn't take enough of the header off than I wanted. So I had my own tool that would do this. But now I've got this, which is great. So a lot of the configuration, there is a chance for human error if you type something wrong, and you type a variable name wrong. And so I really like this new new flag called strict config, which will throw an error, if you have the PI test section of your configuration has something that it doesn't recognize. And it probably is just you've misspelled some variable or something. And then No, not to I can't remember the version. But it was, I think it was in PI test five, they added some code highlighting stuff that yeah, that's super cool. I discovered that just the other day I like just somehow updated all my dependencies in some environment. And suddenly pi test output was colored, colored. And I was like, Whoa, this is amazing. Yeah, yeah, the syntax. I like it. I love it. But there's times where you don't want that, I guess. Sure. Yeah. So there's a new flag to turn it off. And then a little tiny detail that I really like is the diff comparisons on pi test are wonderful. But apparently they didn't do recursive comparisons of data classes and address classes. But now they do. So that's neat. There's a whole bunch of new features, there's fixes. And I ran through some of the features I really liked. There are deprecations It's a large list of breaking changes and deprecations. That's why they went to a new number by test six. But I went through the whole list and I didn't see anything that was was like, Oh, that's gonna stop me. I'm gonna have to change something. Okay, that's that's good to know. Like, I mean, if you say, oh, there was nothing that like we're using, I feel confident that maybe there's nothing in my code either. And I knew that somebody was going to ask is the is my PI test book still valid? Yes, it is. I'm going through it right now. I haven't gone through the whole thing yet to make sure. The side that is not compatible is not the book the books fine. It's, I have a plugin that now is broken. So pi test check still works. But if you depend on x fail, test, this is a wow, this is a corner case. But if you depend on pi test check, and the x fail feature of it doesn't work right now. So I have to fix that. So you would say x fail fails temporarily.
13:32 Yeah, it actually marks everything as a pass. So if you mark x fail, well, that's like exhale ception. Yeah.
13:41 It's really bad. Anyway, I'll have to get back to that. Yeah, this is really exciting that pi test six is out super cool. I know that there were some waves, some uncertainty in the ecosystem. So it sounds like that got ironed out. Things are going strong new versions coming out. I even saw that Guido had tweeted, the announced retweet announcement and said, yay, type annotations coming in PI tests. Of course, he's been all about type annotations these days. Oh, we'll come back to that later in the show. Actually. Yeah. So Enos, I know you work a lot with text. But Are you frustrated with it? What's the story? This name here? Oh, my point of the day? Yeah, yeah. No.
14:17 I thought I'd present something for my space, obviously. Awesome. Yeah, there's this new framework that I came across, and it's called text attack. Yeah. And it's a framework for adversarial attacks and data augmentation for natural language processing. So what are adversarial attacks, you've probably you might have actually seen a lot of examples of it. For instance, they're an image classifier that predicts a cat or some other image even though you show it complete noise and you somehow trick the model. Or you might have seen people at protests wearing like funny shirts or masks to trick facial recognition technology. So really, to trick the model into to like, you know, not recognize them, or the famous example of Google Trends.
15:00 Slight suddenly hallucinating these crazy Bible texts if you just put in some complete gibberish, like just Gaga, Gaga, and it would go like, the Lord has spoken to like the people stuff like that.
15:12 That's amazing. I include a link to an article by a researcher, who explains like why this happened and shows the example. It's really fascinating. But I think it all comes down to like the fundamental problem of like, what how do you understand a model that you train? And what is it? You know, what does it mean to understand your model? And how does it behave in situations when it suddenly gets to see something that it doesn't expect at all, like Gacaca? What does it do? And the thing with neural network models is you can't just look at the weights, they're not linear linear, they had like that, you know, you can't just look at what your model is, you have to actually run it. And so the library text attack that lets you actually try out different types of attacks from the academic literature and different types of inputs that you can give a model to see whether it produces something that you're like, not happy with, or that's like really weird, and exposes some problems in your model. And it also lets you then, because normally, what's the goal, the goal is, well, you do that, and then you'll find out Oh, damn, like if I suddenly feed it this complete nonsense, or if I feed it, Spanish text it, like, goes completely in the wrong direction, and suddenly predict stuff that's not there. And if you you know, if you deployed that model into like, you context where it's actually used, that would be pretty terrible. And you know, they're much worse things that can be happening. So you can also create more robust training data by like, replacing replacing words with synonyms, you can swap out characters, and just, you know, see how the model does. So I thought that was very cool. And yeah, I think, in general, I think that adversarial attacks. It's a pretty interesting topic. And yeah, yeah, it's super interesting. So the idea is, basically you've trained up a model on syntax, and for what you've given it, it's probably working. But if you give it something you weren't expecting, you want to try that to make sure that it doesn't go insane. Yeah, exactly. And it can do, it can expose very unexpected things like the Bible text, for example, that sounds really bizarre when you like, first hear it. But one explanation for that would be that well, especially it happens in low resource languages, where you know, we don't have much text, and especially not much text translated into other languages, but there's one type of text that has a lot of translations available, and that's the Bible. And so the MDR, they're parallel corpora, where you have one text, one line in English, one line in Somali, for example. And then people train their models on that. But one thing that also is very specific about Bible texts is that some Bible text has some words that like really only occur in the in the Bible text, but it uses some really weird word. So what you might have might be learning is, if I come across a super unexpected word, that's really, really rare. That must be Bible. And also, also the objective is you want your model to output a reasonable sentence. So the models like Well, okay, you know, if that's the rare words, and the next book needs to be something that matches and then you have like this bizarre sentence from the Bible, even though you typed in Gaga. And funny. Yeah. Yeah. So it looks like they have actually, a bunch of trained models already at the text attack model zoo. They call it I guess, yeah.
18:17 Yeah. And so you can just take these and run it against it, like the movie reviews from Rotten Tomatoes, or IMDb, or the new set, or Yelp, and just give it that kind of data and see how it comes out. Right. Exactly. Yeah. I think that's a cool and yeah, and then you can actually, you can also generate your own data, or to load in your data and generate data that maybe a minute produces a better model, or like covers, things that your model previously couldn't handle at all. So that's the data augmentation part. Yeah, that's all very important. And I think it's also very important to understand the models that we train and, you know, really try them out and think about, like, what do they do? And how are they going to behave in like a real world scenario that we care about? Because making the decisions on this data on these models? Yeah, I guess soon as a human is convinced that the model works, and they start making decisions on it. Right? That, that could go bad if if the situation changes or the type of data and especially if the model is bad, like I'm always saying, like, well, people are always scared of these dystopian futures where like, we have AI that can I don't know anything about us and predict anything and works. But the real dystopia is if we have models that kind of don't work and are really shit. But people believe that that work, that's much more, it's not even about whether they work it's about whether people believe it, and then yeah, that's where it gets really bad. And yeah, yeah, yeah. And that's a way more likely.
19:44 Yes. It's a more difficult world to test. This sort of stuff to figure out, though, doesn't mean for model to be bad. How do you tell if it's bad, and models can be both working with some datasets and produce jibberish with
20:00 Or Yeah, the I guess, in this case, the reverse. Not produced jibberish if you're passing?
20:07 Yeah, actually this Yeah. I just realized it ties in very well with a PI test point earlier. And just like, yep, machine learning is quite special in a way that it's called plus data code. You can test you can have a function, and you're like, yeah, that comes in. That's what I expect out easy. Write a test for it. You know, I'm not it's not that easy. Testing is hard. But like, fundamentally, yeah, it's somewhat deterministic. Yeah. Right. Like, and even if it's not, there's like something you can you know, test around it. And it's much harder for the model. Yeah. Yeah, for sure. All right. Before we get to the next item, just want to let you know, this episode is brought to you all by us over at talk Python training, we have a bunch of courses, you can check them out. And we're actually featured in the Humble Bundle that's running the Python Humble Bundle right now. So if you go to talk python.fm, slash humble 2020, you get $1,400 worth of Python, training tools and whatnot for 25 bucks. So that's a pretty decent deal. And, Brian, you mentioned your book before, tell people about your book real quick. Yes, of Python testing with PI test is a book I wrote. And it's still very valid, even though it was written a few years ago, the intent was the 80% of pi test that you'll you will always need to know for any version of pi test. And I've had a lot of feedback from people saying a weekend of skimming, this makes it so that they understand how to test. It's a weekend worthwhile. Yeah, absolutely. And Enos, you want to talk a little bit about explosion, just let people now Yeah. So I mean, some some some of you who are listening to this might know me from my work on spacey, which is an open source library for NLP in Python, which I'm one of the core developers of and, yeah, that's all free open source. And we actually just working on the nightly version, or the pre release of spacey three, which is going to have a lot of exciting features. That might also mention a few more things later on. And, yeah, so that's, maybe that's already going to be out by the time this podcast officially comes out. Maybe not. I don't want to over promise. But yeah, you can definitely try that out. And we also recently released a new version of our annotation tool prodigy, which comes with a lot of new features for annotating relations, audio video. And the idea here is well, once you get serious about training your models, you usually want to create your own data sets for your vase specific problems that solve you know, your problems that often the first idea you have might not be the best one. It's a continuous process, you want to develop your data. And prodigy was really designed as a developer tool that lets you create your own data sets. With a web app, my Python back end, you can script. That's our commercial tool. That's how we make money. And it's very cool to see a growing community around this. So that's what we're doing. We have some cool stuff planned for the future. So stay tuned. Yeah, people should check it out. Actually, you and I talked on talk Python 202 about building a software business and entrepreneurship. You had a bunch of great advice. So people might want to check that out as we actually know these episode numbers by heart. Oh, did you look that up before? Some of them I know. But that one, I use the search. I remember you're on there. I remember was that that number I just put together that I know two people from explosion. So that's interesting. Yeah.
23:11 Yeah, he was on he was on your podcast recently, which I feel really bad I need to listen to I wanted to listen to this because he advertised it was like it will tell the story true story behind his mustache, which I really wanted to know. But then I was like, I'll need to listen to this on the weekend. And I forgot. So yeah, if he's listening, I'm sorry. I will definitely I need I need to know this. So I will listen. Excellent. So don't
23:33 do a great workout fast API. Alright, speaking of people that have been on all the podcasts, as well as Brett cannon, and he recently wrote an interesting article called what is the core of the Python programming language? And he's legitimately asking as a core developer, what is not the maybe lowest level but what is the essence? I guess, is maybe the way to think about it. Well, I only just got the core core. Pon, like I, it did not occur to me when I first read the article. I'm really bad. I feel really bad.
23:33 English is not my first language. But still, it's not about that.
23:33 Anyway, sorry. When I first read it, I was thinking like, okay, we're gonna talk about what is the lowest level and the pro K is probably C and C eval dot h, CL dot c, and so on. But really, the thing is, Brett has been thinking a lot about webassembly. And what does that mean for Python? In the broad sense, he and I talked about it on talk Python. I think at the very last icon event, we did a live conversation there about that. And it's important because there's a few areas where Python is not the first choice, maybe not the second choice, sometimes not even the 10th choice of what you might use to program. Some very important things like maybe mobile, maybe the web, the front end part of the web, I'm importantly, I mean, so there's a few really important parts of technology. What
23:33 icon, India, I think it was and I was like, Oh, this is kind of fun. And I mean, it was just so fun to watch to watch. They face the life code a compiler. Yeah.
23:33 Because it's bad. So that did get me thinking. I do think one question I think we should ask ourselves is like, well, do we really do we really need Python to do all the things in the browser like is this really, this is really a benefit that like, actually makes a difference? A, B, they are a lot of things people use Python for that just wouldn't work in that way. And that's also I think, part of what makes Python so popular in the first place. Like for instance, you know, all the interactive computing environments. That's why people want to use Python for data signs in ipython, Jupiter notebooks, that sort of stuff. That's why you know, Python as a dynamic language, makes so much sense to people. And that's what made it popular. And large scale processing, like a lot of the types of stuff we're working on. It's like, yeah, you can this stuff that we can run in the browser, but it's never going to be viable to run large scale information extraction in the browser, because you want to run that on a machine for like, a few hours. But I think there are a lot of opportunities also in the machine learning space for privacy preserving technologies that already exist. I think, from what I understand Mozilla is working on some features built into the browser, where you know, you can have models predicting things without it being sent to someone server and I think that's obviously a powerful that's an interesting idea, right? Yeah. Because if you have a little bit of
23:33 world. I just really need to get with the time helped me out. Okay, so pathlab busy. I know.
23:33 So I No offense to always stop path. But yeah, no, I really love path lib a lot. And but there is, I gotta tell you that the documentation for path doesn't cut it as an introduction, you can find what you're looking for. But if you know what you're looking for, but I agree with Chris Mae. So Chris may wrote a post called Getting started with pathlab. I guess it's kind of he's got a little PDF field guide that you can download. But he has a little bit of a blog post introducing it. But I downloaded it's like nine or 10 pages. And it's actually a really good introduction to path lips. So I really like it. The big thing with OS path versus path lib is pathlab. Crease path objects. So there's a class that represents a path that you have methods on. And it makes it different for when you're dealing with this with AAS path, it's just strings. So it's been populating strings that represent paths. So the objects different I like it. Actually, I switch just for the ability to add build up paths with just having the slash operator. Yeah, it's really interesting how they've overwritten division. Yeah, but I think it's a good example of where this makes sense. It's a reasonable use case, it looks good. It's defensible. Where are the cases where you like all you really have to like, overload these operators. But they're fine. I think that's a valid Yeah, yeah. Then things like it. How do you find parts of a path that when you have to parse paths, that's where path really shines for me. So if you want to find the parent of something, or the parent of the second level parent, there's ways to do that in path lib. In OSI path, you're stuck with, like trying to split things and stuff. And it's, it's cross. I mean, there are operations to do it. But it is very, it's very good to have this min min. Relative, I don't know just all these operators like parent. And then one of the things that I it took me a while to figure out was, I was used to trying to find the absolute path of something. And in path lib, the finding the absolute path is the resolve method. So you said
23:33 resolve and it finds the absolute path for you. You can find the current working directory, you can go up and down folders, you can use globs, you can find parts of path names and stuff. And it's just a really comfortable thing. So this, I think you should give it a world in. It's not like it's gonna change your life a lot. But the next time you come up with when the next time you're programming, you're like, Okay, I got to figure out, I got to have a base directory and some other directory. I'll reach for path lib instead of AAS path. So yeah, I guess it has been there since three, four times. Yeah. So I mean, now before I could see the objection of like, all you have to back ported in was, I think, what I like, as well as a lot of integrations that like, you know, automatic people can perform checks where the path exists, stuff like that. For me, as a library author, you know, you're writing stuff for users, and you want to give them feedback. And, for instance, in a library like click or typo, which is the modern type, hint version, CLA interface, which was also built by my colleagues of SDN, you can just say, hey, this argument is a path, what you get back from the command line is a path, it will check that a path exists via path lib. So it has like, you know, a whole bunch of magic there. Yeah. So that's super cool. Yeah. Or you can say it's not, it can't be a directory, and then you write your c li user passes in an invalid path. And you don't even have to do any error handling, it will automatically before it even runs your code say Nope, that argument is bad. So that's pretty cool. That's awesome. And you don't have to care about Unix versus Mac or PC or
23:33 Windows, I mean, no offense to Windows, but it's always handling paths. And Windows is always the classic story. Also, as a library author, we just file we supporting all operating systems, but like, well, Windows just does it a bit differently. And you cannot assume that a slash means a slash. Yeah, for sure. All right. Well, the final item is yours. Yes. And it's definitely interesting. So if you're working in the machine learning data science side of things, it might not be enough to just back up your algorithms and your code, right? Yeah, we also have, yeah, machine learning is code and data. So yeah, so this is something we discovered a while ago, and that we're now using internally. So we currently, as I mentioned before, we're working on version three of spacey, and one of the big features is going to be a completely new optimized way for training your custom models, managing the whole end to end workflows, from pre processing, to training to packaging, and also making the experiments more reproducible. You want to train a cool model, and then send it over to your colleague, and your colleagues should be able to run the same thing and get the same results. Sounds really basic, but it's pretty hard in general in machine learning. So I was basically stuff will also integrate with a tool called DVC, which is short for data version control, and which we've started using internally for our models. And it's dBc is basically an open source tool for version control, specifically for machine learning and for data. So you know, you can really, you can check your code into a Git repo as you're working on it. But you can just check your data sets and model as an artifact into get more your model weights like that's, so it's very, very difficult normally, to keep track of changes, and your files to kind of most people do end up with this directory of files somewhere, and it can be very frustrating. And so you can really, you can think of DVC as good for data. And the command line usage is actually pretty similar. So like you type, get in it, and dBc in it, to initialize it. And then you can do dBc add to start tracking your assets and add them. So it's like, I think if if you're familiar with Git as like abstract, it can be at times, you will also kind of find it easy to get into dBc. And it basically lets you track any assets, like datasets, models, whatever, by adding meta files to your repository. So you always have like the checksum in there, and you always have these checkpoints of the asset, even though you're not actually checking that file into your repo. And that means you can always go back fetch whatever it was from your case. And we've run your experiments. And it also builds this really cool dependency graph. So you can have really have these complex pipelines with different steps. And then you only have to rerun one step if some of the inputs to it have changed. So you know, in machine learning, you'd often have pipeline like you start download your data, then you pre process it, then you convert it to something, then you train, then you run it back in evaluation step. And everything sort of depends on each other. And it can make things like really hard and you never know, you usually have to run everything bake clean from scratch, because yeah, if something changes, your whole results change. So if you set up your pipelines with DVC, it can actually decide whether something needs to be rerun, or it can also know what needs to be rerun to reproduce exactly what you're trying to do. So that's pretty cool. Yeah, that could save you a ton of time and money if you're doing in the cloud. Yes, exactly. Yeah. And and you know, you can share it with other people. It's like it's it's I think it definitely solves a problem. That's a real and yeah, the people
23:33 dBc, they've also recently released a new tool that I have not personally checked out yet, but it looks very interesting. It's called CML, which is short for continuous machine learning. And that's really more of the CI, which kind of is logically the next step, right? You manage everything in your repo. And then you obviously want to want run automated tests and continuous integration. So the previous looked really cool. Like it showed kind of a GitHub action where you can submit a PR with like, some changes to your code and your data. And then you have the bot commenting on it. And it shows like accuracy results and little graph and how some changes. So it's really like the these code coverage bots that you've probably seen where, like you, you change some lines, and then it tells you Oh, coverage has gone up or down and you know, the new view of your code. So that's what it looks like. So I think, yeah, we excited about it, definitely, it solves a problem. And it's already been solving a problem for us. And yeah, how does it store the large files? I know it has this cache? Is that a thing that you host? Does it have a hosted thing that's kind of like GitHub or I'm not sure if you could could you probably connect it to some copper, like, normally you have that locally, it also has a thing where you can actually download files via the tool. And then depending on where you're fetching it from, if it's a Google Storage bucket, or Sam or s3 bucket or something, you can actually also tell if the file has changed, and whether it needs to be downloaded. And so for example, internally, what we're doing is we're using we're mounting a Google Storage, Google Cloud Storage bucket, or however they call it locally as like, you know, so it's like kind of a drive you have access to locally and then you can just sort of type GS parabola and then the path and really work with it like a local file system. And that's pretty nice. So you can, you know, you can have, you can work with private assets, because the thing is, a lot of toy examples. Assume that, oh, you just download a public data set, and then you train your model, and then you upload it somewhere. But that's not be realistic, because most of the time the data you have can't just go in the cloud publicly. So yeah, but yeah, I think I don't even know exactly how it works in detail. But like it can basically tell fetch, I think, from the headers or something, it can tell whether the file you're downloading has changed, and whether there's something new Yeah, yeah, with a normal version control, one of the reasons we use it is try to find what's different. Can you do do diffs on data? Or I don't know, maybe, I mean, I'm not sure. If there's, I think that's the main diff, it's more like origin around the results that you get, because diff I mean, diffing, large data sets, different weights, you kind of can't, that's when that's really where we had the you know, the other problem where like, you need to run the model to find out what it does. And then you diffing accuracies rather than weights, okay, I don't know if he does like actual diffing of the datasets. But often, the thing that changes is really the models like you have the you have your data, and then you change things about your code. Yeah, and something changes. And it's, if you want to keep track of what it is or how it manifests. Yeah, it's really cool to see them working on this. Yeah. So and also, we'll be in spacey three will hopefully have a pretty neat integration, where you know, if you want, it's not like mandatory, but if you say, hey, that's cool. That's how I want to manage my assets, you can just run that in your in a spacey project. And then it just automatically tracks everything. And you know, you can check that into Git and share it and know that other people can download it. So that's, yeah, I'm pretty excited about that.
23:33 Yeah, everything you can do to make it a little easier to work with spacey and just make it reproducible. Yeah. And it's just the things are hard. Like there is I'm not a fan of these Oh, one click, everything just magically works. Like it looks. It looks nice. And it's a nice demo. But like, once you actually get down to like the real work, like things need to be a bit modular, things need to be customizable. Otherwise, you're always hitting edge cases. So you have these leaky abstractions. So then, yeah, yeah, I think she thinks should be easy to use. But um, you can't just magically cover everything by just providing one button. That's just not gonna work. Yeah. Because when it doesn't work, it's not good anymore. Yeah, exactly.
23:33 Yeah. All right. Well, that's our six items that we go in depth into. But at the end, we always just throw out a couple of really quick things that maybe we didn't have time to fit into the main section. And I want to talk about two things that are pretty exciting. One is, if you care about podcasts as a catalog of a whole bunch of things, I don't know how many podcasts there are, there's probably over a million podcasts these days. One of our listeners, Anton ziana, have wrote a cool Python package that will let you search the iTunes directory and query it. And it's basically a Python API into iTunes podcasting directory. You know, some people think that by that you've got to be part of the apple ecosystem to care about iTunes. But really, that's just the biggest like directory, kind of Yahoo, circa 1995 style of listing of podcast so if you care about digging in and researching podcast, check that out. That's pretty cool. And then, yeah, and then I've also I'm such a big fan of F strings. How about you too? Yes, yes. F, right. Yeah, I'm finally I'm finally working in like Python three only. I remember. I think last time I was on the podcast. I was basically
23:33 I was saying how like, all these modern things, they're so nice. I wish I could use them more, but we're still supporting them too. But like know everything right now. 3.6? Yes. And I've talked previously about a tool called Flint, fly empty, which lets you run against an old codebase and convert all the various Python two and three styles of formatting magically into Python three, I think that was actually really nice. The episode I was, you might have been right, like, I wish I can run this, right, yeah. And the I ran that against like, 20,000 lines of Python, I found like just a couple errors reported them, they got fixed. So that's nice. But the thing that's bugged me endlessly about f strings is I'll be halfway through writing the string. And I'm like, Oh, yeah, I want to put data here. So I got to go back to the front of the string, not necessarily back to the front of the line, but maybe back to like, the string is being passed to a function. So I go back to the first quote, put the F go back forward, and then start typing out thing I actually wanted, right? Or maybe I'll string something. And when I realized I, oh, I'm not gonna put data, right. So it's like you're halfway through and you want it to become an F string. Well, pi charm is coming with a new feature where if you start writing a regular string and pretend like it's an F string, it'll automatically upgrade upgraded to strings. This is how we do yes, without leaving. So you just say curly variable. It's like, oh, okay, that means that's a string and the F appears at the front. Yes. Nice. So that is pretty awesome. Anyway, there's my two quick items. Enos, I'm also excited about the one you got here. Yes, yeah, one, which is something coming to 3.9, or in 3.9, which is pet 585. And you can use when you use type annotations, you can now use the built in pipes like list and dict as generic types. So that means no more from typing import list with a capital L.
23:33 So you just literally buy. I first saw it. I'm like, that looks strange. But like, yes. I'm so excited. Yeah, it probably could be years until I can just like use it all across my cost basis.
23:33 Yeah, but like, yeah, that's in three, nine. Yeah, yeah. I'm already using three, nine. And I didn't know that you can do this. Yeah. Yeah. And Quito is one of the guys on the pep making this happen. Like I said, He's really into typing. Oh, that's great. This is really cool. Because it was super annoying to say, Oh, you have this new import, just because you want to use type annotations on a collection. Right now you don't have to, there's actually a bunch of the collection stuffs and iterators and whatnot like this, you know, the collections module like that a bunch of stuff in there is really nice. And they're compatible, like, lowercase the list of stir is the same as capital Lister, I believe. All right, Brian, what you got, oh, I just wanted to I'll drop a link in the show notes. Testing code. 120 is where I interviewed Sebastian Ramirez, from explosion also, and talking about fast API and typer. Because I'm kind of in love with both of those. They're really cool. Yeah, absolutely. All right. Well, that's cool. I'm definitely gonna check that out. And you can find out why he has the cool mustache.
23:33 That's right. All right. So we always end the show with a joke. And I thought we could do two jokes today. So I think enough. Do you want to talk about this first one? Oh, yeah. I mean, I'm not even sure if it's a joke, per se, but like, more of a humorous situation, I guess. Yeah. Yeah. ties in? Well, it's Sebastian, again, like he had this very viral tweet the other day, when he posted about some experience, I could just read it out, because I think it needs to kind of stand on its own. So he's writes, I saw a job post the other day, it required four plus years of experience in fast API. I couldn't apply as I only have 1.5 plus years of experience since I created that thing. And then he says, maybe it's time to re evaluate that years of experience equals skill level. So yeah, and this, this was I was like, it resonated with people so much, I was actually surprised to see like, everyone was like, Oh, yeah, HR. Like, apparently, there seems to be this huge, huge issue, obviously, that like, well, not most job ads not written by the people who actually work with the technologies and where you have. Yeah, you, actually, yeah, this is awesome. And this tweet actually just got covered on D TNS, the Daily News tech, daily tech news show, alongside another posting that said you needed eight years of Kubernetes Oh, yeah. Other job. But of course, Kubernetes has only been around for four years. Yeah. When you say this went viral. It had 46,000 retweets, and 174,000 likes, that's like, that's got some traction. I feel like this might be a problem. And yeah, I was I was surprised that like so many people are like, yeah, that's a big deal. And it's like, and I mean, it is true, like when it comes to tech tech hiring sort of seems, seems to be broken and is also it's like, it's a bit different, in my case, I guess. But like, I don't qualify for most roles using my using the tech that I write and in some cases, that's justified because I'm not a data scientist. Just because I write developer tools for data scientists doesn't mean I can do the job. But in other cases, I'm like, this kind of ridiculous amount of arbitrary
23:33 stuff you're asking for in this job ad maybe that's needed, maybe not. But like, it centers around like a piece of software that I happen to have written. And I do not qualify for your job at all, like the last insane. The last time I wrote a job description, I intentionally left off the college degree requirement, because all of the other requirements I was listening in there, either they had it from college plus experience, or they had it just from experience. So I was fine with that. By the time it actually went live, somebody in HR had added a call, you refer to it just couldn't get away with less than that, I guess. Yeah, but master's degree in spaces prefer.
23:33 I guess another problem that uses like, Well, look, if you ask if HR writes these job ads with these bullshit requirements, then well, who applies? Like it's either people who are like, yeah, whatever, or people who are full of shit. And then that's the sort of culture you're fostering. And this might not even be the engineers fault who phone debate on this job description. But like, yeah, who applies to that? Like, yeah, to make me lie about my fast API experience? Yeah, people. People just apply to anything I like, yeah. 510 years experience and everything great. And they're like, perfect. That's what we're looking for. You're hired. And then you wonder like, why is company culture so terrible? Hmm.
23:33 I actually did have somebody apply to a job and say they have multiple years of experience in any new language coming up.
23:33 I know it looks like we're just about out of time. Let me give you one more joke for it. Brian, we describe this picture and then I'll read what it says there's a poorly drawn horse, I think, zebra horse that has white on the back end and black on the front end in the back says I defragged my zebra. I don't even know people defrag drives anymore. So this is only going to resonate with the folks who have been around for a while I saw that there was this great video I come across on YouTube where you can actually watch like a life defrag session like I don't know, Windows 95 and it's like, I don't know it takes a few hours. And you know, you can kind of bring back that nostalgia and just put it on your TV and just say like yeah, oh, that's it's like the aquarium he would put on your TV. Like buffer tech. Follow the show on Twitter via at Python bytes events, Python bytes as mb yts and get the full show notes at python bytes.fm. If you have a news item you want featured just visit Python by set FM and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian knockin. This is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.