Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #191: Live from the Manning Python Conference

Return to episode page view on github
Recorded on Tuesday, Jul 14, 2020.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 191, recorded July 14th, 2020.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And welcome, special guest, Enos.

00:12 Hi.

00:13 Yeah, it's great to have you here.

00:14 So I want to kick this off with a cool IoT thing.

00:19 Now, IoT and Python, they've got a pretty special place.

00:23 'cause when I think about Python, I think of it as not being something that sort of competes with assembly language and really, really low-level type of programming for small devices, but, you know, amazing people put together MicroPython, which is a reimplementation of Python that runs on little tiny devices.

00:42 And we're talking like $5 microchip-type devices, right?

00:46 Have either of you all played with these?

00:47 - No. - No.

00:48 No, I haven't, but I've been seeing a bit of this from my brother, So he's pretty amazing. He's a bit younger than me.

00:54 He's an event technician, and he recently taught himself programming and everything just so he can build stuff on these tiny raspberry Pis.

01:02 And, like, I don't know, he's doing super advanced stuff.

01:04 It's been really interesting to see him learn to program.

01:07 And he's also incredibly good.

01:09 He has amazing instincts about programming, even though he's never done it before.

01:12 But, like, so I've been kind of watching this from afar, and it made me really want to build stuff.

01:16 So I'm very curious.

01:17 Yeah, I've done the circuit Python on some of the Adafruit stuff.

01:22 Exactly. So I always just want to build these things.

01:25 I'm like, what could I think of that I could build with these cool little devices?

01:29 I just, in my world, I don't have it.

01:31 Maybe if I had a farm, I could like automate, you know, like watering or monitoring the crops, or if I had a factory, but I just don't live in a world that allows me to automate these things.

01:41 Do you have pets? Maybe you can build something for pets.

01:44 We generally don't have pets, but we are fostering kittens for the summer.

01:48 So I could put a little device on to one of the kittens, potentially.

01:52 GPS tracker.

01:55 Yeah.

01:56 So in general, you have to get these little devices, right?

01:59 You've got the us PyCon.

02:01 We got the circuit playground express, which is that little circular thing.

02:05 It's got some 10 LEDs and a bunch of buttons and other really advanced things like motion sensors and temperature and so on.

02:13 Probably the earliest one of these that was a big hit was the BBC micro bit, where I think every seventh grader in the UK got it.

02:20 Some grade around that scale got one of these and it really made a difference in kids seeing themselves as a programmer.

02:27 And interestingly, especially women were more likely to see programming as something they might be interested in, in, in that, that group where they went through that experience.

02:37 So I think there's a real value to work with these little devices, but getting ahold of them can be a challenge, right?

02:43 you've got to physically get this device that means you have that idea of I want to do this thing and then I have to order it from Adafruit or somewhere else and then wait for it to come.

02:51 And my experience has been I'll go there and I'm like, oh, this is really cool. I want one of these. Oh, wait, no, it's sold out right now. You can order it again in a month.

02:58 Right? So getting is a challenge. And also if you're working in a group of, say, like you want to teach a high school class, or a college class or something like that, and you want everyone to have access to these, Well, then all of a sudden the fact that it maybe it costs $50 wasn't a big deal, but if it's $50 times 20 or 100 kids, then all of a sudden, well, maybe not.

03:20 So I want to talk about this thing called device simulator express.

03:24 So this is a plugin or extension or whatever the things that I think it's extensions that VS Code calls them that makes VS Code do more stuff.

03:33 And it's a open source free device simulator. So what you can do is you just go to the Visual Studio Code extensions thing and you type device, probably is sufficient, but device simulator express.

03:45 And it'll let you install this extra thing inside of VS Code that is really quite legit. So it gives you a simulated Circuit Playground Express, a simulated BBC micro bit and the most impressive to me is the clue from Adafruit which actually has a screen that you can put graphics on.

04:05 So really, really cool way to get these little IoT devices with CircuitPython, so Adafruit's fork of MicroPython on there.

04:17 What do you guys think? See that picture? Look how cool that is.

04:19 Yeah, so you can write Python in one tab and then just have the visualization in the other.

04:25 That's pretty cool, yeah.

04:26 Yeah, exactly. And it's very similar to, say, what you might do with Xcode and iPhones, where you have an emulator that looks quite a bit like it, or what you would do on the Android equivalent.

04:37 I actually think this is a little bit better than the device because it's actually larger.

04:41 Right? Like the device is really small, but here's like a, you know, it could be like a huge thing on your 4K monitor with a little Clue device. So you can simulate CircuitPlayground Express, BBC Microbit, and the Clue in here, and we just say new project, and it'll actually write the boilerplate code for the main.py or code.py or whatever it's called that the various thing is going to run.

05:03 And like you said, Enos, on one half it's got the code and the other half it has the device that you can interact with.

05:07 I was thinking that a couple of cases that would be great is, like you were saying, trying to get a hold of it, but you might not even know if the concept that you're going to use is really going to work for the device you're thinking of. So this would be a good way to try it out, to try out whether the thing you're thinking of trying in your house or whatever would actually work for this device.

05:28 The other thing was, you brought up education, and that it's big.

05:33 I was thinking about a couple of conferences where they tried to do the display and sometimes try to have like a camera or something.

05:40 Sometimes it works and sometimes it doesn't.

05:43 This way you could just do like a tutorial or in a teaching scenario and everybody could see it because it's just going to be displayed on your monitors.

05:51 Right, your standard screen sharing would totally work here. That's a good point as well.

05:55 And it doesn't have to be all or nothing.

05:57 Actually, what's really interesting is this thing isn't just an emulator, but you can do debugging.

06:01 you can set like a breakpoint and like step through it running on the device, simulate it, or you can actually run it, if you had a real device plugged in, you can run it on there as well.

06:10 And then do debugging and breakpoints and stuff on the actual device.

06:13 So that's like you tested here.

06:14 I always admire people who actually use like the proper debugging features.

06:18 I know VS Code has like so much of this, and I'm always like, I should use this more, but I'm like, okay, print.

06:23 Print, print.

06:25 Yeah, there's some really cool libraries that will actually do that.

06:29 that I can't remember what it's called, but Brian and I recently covered one that would actually print out a little bit of your code and the variables as they change over time.

06:36 It was like the height of the print debugging world.

06:39 It was really, really cool.

06:40 I wish I could remember.

06:41 Do you remember, Brian?

06:42 - No, we actually covered a couple of them.

06:44 - I know, I know.

06:46 That's the problem.

06:47 We cover thousands of things in here.

06:49 So another thing that's interesting is like, okay, so you see the device.

06:52 Some of them have buttons and they have lights, and you can imagine maybe you could touch the button, but they also have things like temperature, gyrometer type things, or like you're moving it, or motion sensing, or even like if you shake it, this thing has little ways to simulate all that stuff.

07:07 So you can like have a temperature slider that freaks it out and says, hey, the temperature's actually this on your temperature sensor, and so on.

07:13 So all the stuff that the devices simulate are available here.

07:16 - Oh, that's cool. - Yeah.

07:17 So I actually had the team over on Talk Python not long ago, so people can check that over at talkpython.fm.

07:24 And yeah, I'm also really excited about what you got coming here next, Brian.

07:28 What is that?

07:29 - Yeah, well, speaking of, I guess debugging versus test, we didn't really talk about testing.

07:33 Anyway, I'm really excited about--

07:34 - We should have talked about testing.

07:35 - Yeah, so I was thinking, and I was thinking that I hardly ever use a debugger for my source code, but I use a debugger all the time when I'm debugging my tests.

07:48 I don't know, it's just something different about it.

07:50 But I've been running a lot of tests and debugging a lot of tests lately because pytest 6, the candidate release is out.

07:57 Now, by the time this episode airs, I don't know if the release candidate will be released or just the release candidate is still, but you can install it.

08:08 We'll have instructions in the show notes, but essentially you just have to say 6.0.0 RC1, and you'll get it.

08:15 So there's a whole bunch of stuff that I'm really excited about.

08:18 There's a lot of configuration that you used to be able to put in lots of places into your pytest.ini or your setup config or tox.ini or something, pytest 6 will support pyproject.toml now.

08:30 So if you jumped on the toml bandwagon, you can stick your pytest configuration in there too.

08:35 There's a lot of people excited about the type annotations.

08:38 So the 6.0 is going to support type annotations.

08:41 So it actually was a lot of work.

08:43 There was a volunteer that went through and added type annotations to a bunch of it, especially the user-facing API.

08:50 And why this is important is if you're type checking, you're running mypy or something over your source and everything, your project, and you're, why not include your tests?

09:02 But if pytest doesn't support types, it doesn't really help you much.

09:06 So it will now, so that's really, really cool addition.

09:10 What this is, is basically the API of pytest itself is now annotated with types?

09:15 Yes, and well, a lot of the internal code as well.

09:18 So they actually went through and did a lot. There was a lot of work.

09:21 And if you look at the conversation chain, it went on for, it was a month, several month project.

09:27 Wow. What does that mean for compatibility?

09:30 Does that make pytest like 3.6 only and above?

09:33 I think the modern versions of pytest really already are 3.6 and above.

09:37 I'm not sure about that.

09:39 Right. So then the door was opened, is that because otherwise it would...

09:42 I mean, it would be a vain, we'd move to release a completely new version with Python 2 backwards compatibility.

09:50 Like, that's...

09:52 – Yeah, why not do that? – You wouldn't do that, right?

09:53 I mean, it's...

09:54 I think the message it sends is not great.

09:57 I totally agree.

09:58 There is a pinned version of pytest, I don't remember which one it is, that still supports 2.7 if you're on it, but no new features are going in there.

10:08 The thing I'm really excited about is a little flag they've added called no header. So don't use this.

10:15 Most people don't use this. When you run pytest, it prints out some stuff like the version of python, the version of pytest, all the plugins you're using, a bunch of information about it. All this stuff is really important for logging if you're capturing the output to save somewhere or do a bug report or something.

10:34 That information is great to help other people understand it. What I don't like about that is that it's not helpful if you're writing tutorials or if you're writing code to put on a slide or something all that extra stuff just takes up space and it distracts. Yeah like I've had students say like I ran it I think pytest in PyCharm and it has like some kind of output just stating where it is and what it's doing they're like this didn't work for me I'm like well that was just random output from the tool you're not actually supposed to try to run that part you know what I mean but it's it's I mean I saw why they saw that but at the same time like to ability to just say like these details don't matter in the in the long term is great.

11:13 Yeah, so I'm excited about that, to trim it down. There was a plugin called TLDR, too long, didn't read, but it actually didn't take enough of the header off than I wanted, so I had my own tool that would do this, but now I've got this, which is great.

11:29 So a lot of the configuration, there is a chance for human error if you type something wrong and you type a variable name wrong, and so I really like this new called strict config, which will throw an error if you have the pytest section of your configuration has something that it doesn't recognize.

11:48 And it probably is just you misspelled some variable or something.

11:52 - Yeah, that's good to know. - And then...

11:54 Not to... I can't remember the version, but it was... I think it was in pytest 5.

11:57 They added some code highlighting stuff that...

12:00 Yeah, that's super cool. I discovered that just the other day.

12:02 I, like, just somehow updated all my dependencies in some environment, and suddenly pytest output was colored.

12:08 And I was like, whoa, this is amazing. Yeah.

12:10 Yeah, the syntax highlighting, I love it.

12:12 But there's times where you don't want that, I guess.

12:15 So there's a new flag to turn it off.

12:18 And then a little tiny detail that I really like is the diff comparisons on pytest are wonderful.

12:24 But apparently they didn't do recursive comparisons of data classes and adder classes.

12:29 But now they do. So that's neat.

12:32 There's a whole bunch of new features, fixes.

12:35 I ran through some of the features I really liked.

12:37 There are deprecations, and it's a large list of breaking changes and deprecations.

12:42 That's why they went to a new number, pytest6.

12:44 But I went through the whole list and I didn't see anything that was like, "Oh, that's going to stop me. I'm going to have to change something." Okay, that's good to know.

12:52 I mean, if you say, "Oh, there was nothing that we're using," I feel confident that maybe there's nothing in my code either.

12:58 And I knew that somebody was going to ask, "Is my pytest book still valid?" Yes, it is. I'm going through it right now.

13:05 I haven't gone through the whole thing yet to make sure.

13:07 The side that is not compatible is not the book.

13:10 The book's fine.

13:12 I have a plugin that now is broken.

13:14 So pytest check still works, but if you depend on XFail, pytest, this is a, wow, this is a corner case, but if you depend on pytest check and the XFail feature of it, it doesn't work right now.

13:28 So I'll have to fix that.

13:29 So you would say XFail fails temporarily.

13:31 Yeah.

13:33 It actually marks everything as a pass. So if you mark x fail, wow, that's like x fail section. Yeah It's really bad anyway, I'll have to get back to that. Yeah, this is really exciting that pytest 6 is out super cool I know that there were some Waves some uncertainty in the ecosystem. So it sounds like that got ironed out things are going strong new versions coming out I even saw that guido Had tweeted the announce retweeted the announcement and said yay type annotations coming in pytest Of course, he's been all about type annotations these days.

14:05 We'll come back to that later in the show, actually.

14:07 So, Ines, I know you work a lot with text, but are you frustrated with it?

14:10 What's the story of this name here?

14:12 Oh, my point of the day.

14:14 Yeah, text attack.

14:16 - What does text attack do? - Yeah, no, I thought I'd present something - for MySpace, obviously. - Yeah, awesome.

14:21 Yeah, there's this new framework that I came across, and it's called TextAttack, yay?

14:25 And it's a framework for adversarial attacks and data augmentation for natural language processing.

14:31 So what are adversarial attacks?

14:33 You've probably, you might have actually seen a lot of examples of it.

14:37 For instance, an image classifier that predicts a cat or some other image, even though you show it complete noise and you somehow trick the model.

14:46 Or you might have seen people at protests wearing like funny shirts or masks to trick facial recognition technology.

14:53 So really to trick the model into to like, you know, not recognize them.

14:57 Or the famous example of Google Translate suddenly hallucinating these crazy Bible texts.

15:03 If you just put in some complete gibberish, like just "ga ga ga ga" and then it would go like, "The Lord has spoken to the people," stuff like that.

15:11 That's amazing.

15:13 I'll include a link to an article by a researcher who explains why this happened and shows the example. But it's pretty fascinating. But I think it all comes down to the fundamental problem of of like, how do you understand a model that you train?

15:29 And what does it mean to understand your model?

15:32 And how does it behave in situations when it suddenly gets to see something that it doesn't expect at all, like, ga, ga, ga, what does it do?

15:39 And the thing with neural network models is you can't just look at the weights.

15:43 They're not linear.

15:44 They're like, you can't just look at what your model is.

15:47 You have to actually run it.

15:48 And so that library takes attack that lets you actually try out different types of attacks from the academic literature and different types of inputs that you can give a model to see whether it produces something that you're like not happy with, or that's like really weird, and exposes some problems in your model.

16:07 And it also lets you then, because normally what's the goal?

16:10 The goal is, well, you do that and then you find out, oh, damn, like if I suddenly feed it this complete nonsense, or if I feed it Spanish text, it like goes completely in the wrong direction and suddenly predict stuff that's not there.

16:22 And if you deployed that model into like a context where it's actually used, that would be pretty terrible.

16:28 And there are much worse things that can be happening.

16:30 So you can also create more robust training data by like replacing words with synonyms.

16:36 You can swap out characters and just see how the model does.

16:41 So I thought that was very cool.

16:42 And yeah, in general, I think adversarial attacks, it's a pretty interesting topic and yeah.

16:48 - Yeah, it's super interesting.

16:49 So the idea is basically you've trained up a model on some text and for what you've given it, it's probably working, but if you give it something you weren't expecting, you want to try that to make sure that it doesn't go insane at least.

17:01 - Yeah, exactly.

17:02 And it can expose very unexpected things like the Bible text, for example.

17:06 That sounds really bizarre when you first hear it, but one explanation for that would be that, well, especially it happens in low resource languages where we don't have much text and especially not much text translated into other languages.

17:18 but there's one type of text that has a lot of translations available and that's the Bible.

17:24 And so, and they're parallel corpora where you have one text, one line in English, one line in Somali, for example.

17:30 And then people train their models on that.

17:32 But one thing that also is very specific about Bible texts is that Bible texts has some words that like really only occur in the Bible texts.

17:40 It uses some really weird words.

17:42 So what your model might be learning is, if I come across a super unexpected word that's really, really rare, that must be Bible.

17:50 And also the objective is you want your model to output a reasonable sentence.

17:54 So the model's like, well, okay, if that's the rare word, then the next word needs to be something that matches, and then you have this bizarre sentence from the Bible, even though you typed in "ga ga ga." And it happens.

18:05 - Yeah, how funny.

18:06 - Yeah.

18:07 - Yeah, so it looks like they have actually a bunch of trained models already at the Text Attack Model Zoo, they call it, I guess.

18:15 - Yeah, everything's called the Model 2.

18:17 - Yeah, cute.

18:19 And so you can just take these and run it against it, like the movie reviews from Rotten Tomatoes or IMDb or the news set or Yelp, and just give it that kind of data and see how it comes out, right?

18:32 - Exactly, yeah.

18:33 I think that's pretty cool.

18:33 And yeah, and then you can actually, you can also generate your own data or load in your data and generate data that maybe produces a better model or like covers things that your model previously couldn't handle at all.

18:46 So that's the data augmentation part.

18:48 Yeah, that's all very important.

18:49 And I think it's also very important to understand the models that we train and really try them out and think about like, what do they do and how are they going to behave in like a real world scenario that we care about?

19:00 Because yeah, the consequences--

19:00 - As soon as you're making decisions on this data, right?

19:03 On these models.

19:04 - Yeah.

19:05 - I guess as soon as a human is convinced that the model works and they start making decisions on it, right, that could go bad if the situation changes the type of data. And especially if the model is bad, like I'm always saying, like, well, people are always scared of these dystopian futures where like, we have AI that can, I don't know, know anything about us and predict anything, and works. But the real dystopia is, if we have models that kind of don't work, and I really shit, but people believe that they work, that's much more. It's not even about whether they work. It's about whether people believe it. And then, you know, that's where it gets really bad. And yeah, yeah, Yeah, and that's way more likely.

19:43 - Sorry, Brian. - Yeah, yes.

19:45 It's a more difficult world to test this sort of stuff, to figure out what does it mean for a model to be bad?

19:52 How do you tell if it's bad?

19:53 And models can be both working with some datasets and produce gibberish with...

20:00 Or, yeah, I guess in this case, the reverse.

20:04 Not produce gibberish if you pass in gibberish.

20:07 Yeah, actually, yeah, I just realized it ties in very well with a pytest point earlier and just like, yep, machine learning is quite special in a way that it's code plus data.

20:15 Code you can test, you can have a function and you're like, "Yay, that comes in. That's what I expect out. Easy. Write a test for it." You know, it's not that easy. Testing is hard, but like, fundamentally, yeah.

20:25 It's somewhat deterministic, I think.

20:28 Yeah. And even if it's not, there's like something you can, you know, test around it.

20:32 And it's much harder with the model.

20:33 Yeah. Yeah, for sure.

20:36 All right, before we get to the next item, Just want to let you know this episode is brought to you all by us.

20:42 Over at Talk Python Training, we have a bunch of courses.

20:44 You can check them out.

20:45 And we're actually featured in the Humble Bundle that's running the Python Humble Bundle right now.

20:49 So if you go to talkpython.fm/humble2020, you can get $1,400 worth of Python training tools and whatnot for 25 bucks.

20:59 So that's a pretty decent deal.

21:01 And, Brian, you mentioned your book before.

21:03 Tell people about your book real quick.

21:04 Yeah, so Python Testing with pytest.

21:06 is a book I wrote, and it's still very valid, even though it was written a few years ago.

21:11 The intent was the 80% of pytests that you will always need to know for any version of pytest.

21:18 And I've had a lot of feedback from people saying a weekend of skimming this makes it so that they understand how to test.

21:25 It's a weekend worthwhile.

21:26 Yeah, absolutely. And Enos, you want to talk a little bit about Explosion, just so that people know?

21:30 Yeah, so, I mean, some of you who are listening to this might know me from my work on spaCy, which is an open source library for NLP and Python, which I'm one of the co-developers of. And yeah, that's all free open source. And we're actually just working on the nightly version or the pre-release of spaCy 3, which is going to have a lot of exciting features that might also mention a few more things later on. And yeah, so that's maybe that's already going to be out by the time this podcast officially comes out, maybe not, I don't want to over promise, but yeah, you can definitely try that out. And we also recently released a new version of our annotation tool, Prodigy, which comes with a lot of new features for annotating relations, audio, video. And the idea here is, well, once you get serious about training your own models, you usually want to create your own data sets for your very specific problems that solve your problems.

22:22 And often the first idea you have might not be the best one. It's a continuous process.

22:26 you want to develop your data.

22:27 And Prodigy was really designed as a developer tool that lets you create your own datasets with a web app, a Python backend, you can script.

22:36 That's our commercial tool, that's how we make money.

22:38 And it's very cool to see a growing community around this.

22:42 So yeah, that's what we're doing.

22:43 We have some cool stuff planned for the future.

22:45 So stay tuned.

22:47 - Yeah, people should check it out.

22:48 Actually, you and I talked on Talk Python 202 about building a software business and entrepreneurship.

22:53 You had a bunch of great advice.

22:54 So people might want to check that out as well.

22:56 Do you actually know these episode numbers by heart or did you look that up before?

22:59 Some of them I know, but that one I used the search.

23:02 I remember you were on there.

23:03 I remember what it was about, but not the number.

23:05 I just put together that I know two people from Explosion.

23:08 So that's interesting.

23:09 Yeah, and Sebastian.

23:10 Absolutely.

23:11 Sebastian.

23:12 Yeah, he was on your podcast recently, which I feel really bad.

23:15 I need to listen to, I wanted to listen to this because he advertised it with like, it will tell the story, true story behind his moustache, which I really wanted to know.

23:23 And then I was like, I'll need to listen to this on the weekend. And I forgot. So yeah, if he's listening, I'm sorry, I will definitely I need I need to know this. So I will listen.

23:30 Excellent.

23:31 So don't spoil it.

23:32 Do a great work on FastAPI. All right, speaking of people that have been on all the podcasts as well as Brett Cannon, he recently wrote an interesting article called what is the core of the Python programming language. And he's legitimately asking as a core developer, is not the maybe lowest level, but what is the essence, I guess, is maybe the way to think about it.

23:56 Oh, wow.

23:57 I only just got the core, core pun.

23:59 Like it did not occur to me when I first read the article.

24:01 I'm really, I feel really embarrassed now.

24:03 To be fair, English is not my first language, but still, it's not about that.

24:07 Anyway, sorry for interrupting.

24:09 Yeah, when I first read it, I was thinking like, okay, we're going to talk about what is the lowest level and yeah, okay, it's probably C and C of L dot H, C of L dot C and so on.

24:19 But really the thing is, Brett has been thinking a lot about web assembly and what does that mean for Python in the broad sense.

24:27 He and I talked about it on Talk Python, I think at the very last PyCon event we did a live conversation there about that.

24:35 And it's important because there's a few areas where Python is not the first choice, maybe not the second choice, sometimes not even the tenth choice of what you might use to program, some very important things like maybe mobile, maybe the web, the front end part of the web, importantly, I mean, so there's a few really important parts of technology where Python doesn't have much reach, but all of those areas support WebAssembly these days, right? And if you have something in C, you can compile it to WebAssembly. So there's some thought about like, well, what can we do potentially to make a WebAssembly runtime for Python so that Python magically, almost instantly gets access to what was just JavaScript front-end frameworks, space, what is mobile, iOS and Android, and all those things allow you to directly run JavaScript as part of your app.

25:34 So how would we make that happen?

25:36 So it's pretty important, right?

25:38 If we could solve that problem, like Python is already so popular and its growth is so incredible, Like, what if we could say, oh yeah, and now it's an important language on mobile, and it's an important front-end language framework?

25:49 Like, that would just take it to the next level, or maybe a couple levels up if you do them both.

25:53 And WebAssembly seems to be one of the keys to kind of bridge that gap.

25:57 Right? So, Brett talks about in this article how for so long, we've just had CPython is what we think of when we have Python.

26:05 Sometimes people use PyPy, P-Y-P-Y, as a partially JIT compiled version.

26:12 sometimes faster version of Python, but not always because the the way it interacts maybe with C, libraries that you might be using through packages and so on.

26:21 And really it's a lot of Python's dynamic nature makes it hard to do outside of an interpreter.

26:27 Where to be clear, WebAssembly is a compiled language, right? So if you're going to put it over there, maybe it's going to require it to be compiled.

26:34 So this is a really interesting thing to go through and read and think about with Brett.

26:38 He talks about things like, well, how much of the Python language would you have to implement and still consider it to be valid Python?

26:45 Like we talked about MicroPython, and usually when people look at, they don't look at that and go, "That's not Python.

26:50 That's fake." Right?

26:52 No, like, it's Python, but it's not as much Python, right?

26:53 You don't have the same, all the APIs on MicroPython as you do on regular Python.

26:59 So questions like, do you still need a REPL?

27:02 Could you live without locals?

27:03 Right?

27:04 The ability to ask what the local variables are and so on.

27:07 So he said he didn't really have a great bunch of great answer.

27:11 It's more of a philosophical like we need to solve this.

27:14 But I do want to share some of my thoughts on this.

27:16 And I feel like maybe what we could do is we could come up with like a standard Python language definition that is a subset of full Python, right?

27:27 Here's the essence like, okay, we have to be able to create classes, we have to be able to create functions, you have to define strings, probably you want type annotations.

27:34 But do you need a val?

27:36 Maybe, maybe not, right?

27:38 So like that, if you could have a subset of the language that was smaller, as well as the standard library, 'cause do you really need to like parse CSS hex colors?

27:49 Everywhere?

27:50 Probably not.

27:51 It's a very underused part of the library, but it's in there, right?

27:55 So if we could narrow it down, maybe it would be easier to think about how does it go to WebAssembly?

27:59 How does it go to like some kind of JavaScript runtime or something like that?

28:03 And if it sounds crazy, you know, the .NET people did this.

28:05 They have a .NET standard class library language.

28:09 They got it running on WebAssembly.

28:11 So there's an example of it out there and something that's kind of sort of similar.

28:15 Right, so I think this would just open stuff up if you could get Python in these places.

28:21 What do you guys think?

28:22 - Initially, I was never so sold on WebAssembly and especially WebAssembly and Python until I watched Dave Beasley life code a compiler at PyCon India, I think it was.

28:33 And I was like, "Oh, this is kind of fun." I mean, it was just also fun to watch Dave Beasley live code a compiler.

28:40 - Yeah, for sure. - Classic.

28:42 But so that did get me thinking.

28:44 I do think one question I think we should ask ourselves is, well, do we really need Python to do all of the things in the browser?

28:53 Like, does this really have a benefit that actually makes a difference?

28:57 A.

28:59 B.

29:00 There are a lot of things people use Python for that just wouldn't work in that way.

29:03 And that's also, I think, part of what made Python so popular in the first place.

29:07 Like, for instance, you know, all the interactive computing environments.

29:10 That's why people want to use Python for data science.

29:13 It is, you know, IPython, Jupyter Notebooks, that sort of stuff.

29:17 That's why, you know, Python as a dynamic language made so much sense to people.

29:21 And that's what made it popular.

29:22 And large-scale processing, like a lot of the type of stuff we're working on.

29:26 It's like, yeah, there's stuff that you can run in the browser, but it's never going to be viable to run large-scale information extraction in the browser because you want to run that on a machine for like a few hours.

29:37 But I think there are a lot of opportunities also in the machine learning space for privacy preserving technologies that already exist. I think from what I understand, Mozilla is working on some features built into the browser where you can have models predicting things without it being sent to someone's server. And I think that's obviously very powerful.

29:55 That's an interesting idea. Right. Yeah. Yeah. Because if you could have a little bit of machine learning. But you don't have to give up the data privacy aspect of it. That's pretty cool.

30:04 Yeah. So I think for that, there's a lot of potential here for running Python in a browser.

30:07 Yeah.

30:08 Well, we start getting used to saying, what is Python is, what is the CPython implementation?

30:14 And we got to remember CPython is the reference implementation for the language spec. And I think, I guess we're kind of getting at, maybe we need to split it up and have a core language spec and an extended one or something, I don't know.

30:30 Where would you divide the line?

30:32 Because we've seen, like you said, we've seen things like CircuitPython and and other things, and we actually talked about several smaller languages based on Python that just try to be the same syntax.

30:43 But at which point is it, when is it not Python anymore?

30:48 And there's at least some of the stuff. Like I could totally see having a distribution of Python that doesn't have a REPL still count.

30:56 I could totally see not having idle, for instance, if something doesn't ship with idle, is it still Python?

31:03 I think so.

31:04 And because of idle, then you need TK stuff in there.

31:09 There's a lot of stuff that maybe I would be in like, could you live without locals?

31:15 Most of the time, probably.

31:16 I actually think this would be since the web and since mobile is so such a big part of our lives, and it will be for a while, this might be a decent dividing line to say, whether or not it's for WebAssembly or not, maybe we should split the division at whatever we need to implement a WebAssembly version of Python.

31:35 And anything above that line is an extended version of Python.

31:40 Yeah.

31:41 Yeah, that's a good point.

31:43 All right, I don't want to go too long in this section because I want to make sure we get the others.

31:47 But I do want to leave you with just some thoughts.

31:48 What if shipping Python was just shipping a single binary and a thing that ran it?

31:54 you could do that with WebAssembly.

31:55 Maybe two WebAssemblies, the runtime plus the code.

31:58 What if all the browsers had capability to plug in alternate runtimes through WebAssembly?

32:05 So right now you have a JavaScript engine, but what if like say Firefox in Edge and whatnot came up with a way to say here's a WebAssembly API to plug in alternate runtimes, Python, Ruby, .NET, Java, you name it, and then shipped with the latest version of each of those runtimes.

32:24 So you just don't have to down...

32:25 Like, the big problem now is you can do it, but you still got to download like 10 megs per page, which is not a good idea.

32:32 So anyway, I think there's a ton of interesting things that open up if this were possible.

32:37 So I'm glad Brett's still on this, and hopefully he keeps thinking about it.

32:41 Brian, I still need to learn PathLab.

32:43 - You got any ideas on how I can do that? - Really?

32:44 Really? You're not using PathLab?

32:46 I'm such a...

32:48 I'm just stuck in the OS.path world.

32:51 I just really need to get with the time. Help me out. Okay, so pathlib is I know Like oh So I have no offense to always stop path, but you know, no, I really love pathlib a lot and But there is I gotta tell you that the documentation for pathlib doesn't cut it as an introduction You can find what you're looking for, but if you know what you're looking for, but I agree with chris may So Chris May wrote a post called getting started with pathlib.

33:22 I guess it's kind of he's got a little pdf field guide that you can download, but he has a little bit of a blog post introducing it.

33:30 But I downloaded it's like nine or ten pages and It's actually a really good introduction to pathlib. So I really like it The big thing with os path versus pathlib is pathlib creates path objects So there's a class that represents a path that you have methods on And it makes it different for when you're dealing with this.

33:49 With os.path, it's just strings.

33:52 So it's manipulating strings that represent paths.

33:55 So the object's different. I like it.

33:57 Actually, I switched just for the ability to add buildup paths with just having the slash operator.

34:03 Yeah, it's really interesting how they've overridden division.

34:06 Yeah.

34:06 But I think it's a good example of where this makes sense.

34:09 It's a reasonable use case. It looks good. It's defensible.

34:12 There are other cases where you're like, "Oh, did you really have to overload these operators, but they're fine. I think that's very valid.

34:19 Yeah, and things like how do you find parts of a path?

34:23 When you have to parse paths, that's where Pathlib really shines for me.

34:27 So if you want to find the parent of something or the parent of the second level parent, there's ways to do that in Pathlib and in os.path, you're stuck with like trying to split things and stuff and it's gross.

34:39 I mean, there are operations to do it, but it's very good to have this relative, I don't know, just all these operators like parent.

34:47 And then one of the things that it took me a while to figure out was I was used to trying to find the absolute path of something.

34:54 And in Pathlib, finding the absolute path is the resolve method.

34:58 So you say resolve and it finds the absolute path for you.

35:02 You can find the current working directory, you can go up and down folders, you can use globs, you can find parts of path names and stuff.

35:09 And it's just a really comfortable thing.

35:12 So I think you should give it a whirl.

35:14 And it's not like it's going to change your life a lot.

35:18 But the next time you come up with...

35:20 The next time you're programming, you're like, "Okay, I got to figure out...

35:23 I got to have a base directory and some other directory." I'll reach for pathlib instead of os.path.

35:28 Yeah, I guess it has been there since 3.4, so I should get the times.

35:32 Yeah, so I mean, now, before I could see the objection of, like, "Oh, you have to backport it." And also, I think what I like as well is a lot of integrations that automatically can perform checks where the path exists, stuff like that.

35:44 Or for me as a library author, you're writing stuff for users and you want to give them feedback.

35:49 And for instance, in a library like Click or Typer, which is the modern type hint version CLI interface, which was also built by my colleague Sebastian, you can just say, "Hey, this argument is a path.

36:01 What you get back from the command line is a path.

36:03 It will check that the path exists via pathlib." So it does like, you know, a whole bunch of magic there.

36:10 Yeah, that is super cool.

36:11 Yeah. Or you can say it's not, it can't be a directory.

36:14 And then you write your CLI user passes in an invalid path, and you don't even have to do any error handling.

36:19 It will automatically before it even runs your code say, Nope, that argument is bad.

36:24 So that's pretty cool.

36:25 That's awesome.

36:25 And you don't have to care about Unix versus Mac or PC or something.

36:30 Yeah, I mean, Windows, I mean, no offense to Windows, but it's always the handling paths and Windows is always the classic story also as a library author, where you just, probably supporting all operating systems, but like, well, Windows just does it a bit differently and you cannot assume that a slash means a slash.

36:47 - Yeah, for sure.

36:49 All right, well, the final item is yours, Ines, and it's definitely interesting.

36:53 So if you're working in the machine learning, data science side of things, it might not be enough to just back up your algorithms and your code, right?

37:01 - Yeah, you also have, yeah, machine learning is code and data.

37:05 So yeah, so this is something we discovered a while ago and that we're now using internally.

37:10 So we currently, as I mentioned before, we're working on version three of spaCy.

37:13 And one of the big features is going to be a completely new optimized way for training your custom models, managing the whole end-to-end workflows from pre-processing to training to packaging, and also making the experiments more reproducible.

37:27 You want to train a core model and then send it over to your colleague and your colleague should be able to run the same thing and get the same results.

37:34 Sounds really basic, but it's pretty hard in general in machine learning.

37:37 So, HelloSpacey stuff will also integrate with a tool called DVC, which is short for Data Version Control, and which we've started using internally for our models.

37:47 And DVC is basically an open source tool for version control, specifically for machine learning and for data.

37:54 So, you know, you can check your code into a Git repo as you're working on it, but you can't just check your datasets and models and artifacts into Git, or your model weights. So it's very, very difficult normally to keep track of changes and your files. Most people just end up with this directory of files somewhere, and it can be very frustrating. And so you could think of DVC as Git for data. And the command line usage is actually pretty similar. So you type Git in it and DVC in it to initialize it, and then you can do DVC add to start tracking your assets and add them. So it's like, I think, Yeah, if you're familiar with Git as abstract it can be at times, you will also find it easy to get into DPC.

38:36 And it basically lets you track any assets like datasets, models, whatever, by adding meta files to your repository.

38:44 So you always have the checksum in there, and you always have these checkpoints of the asset, even though you're not actually checking that file into your repo.

38:52 And that means you can always go back, fetch whatever it was from your cache and rerun your experiments.

38:59 And it also builds this really cool dependency graph.

39:02 So you can really have these complex pipelines with different steps.

39:06 And then you only have to rerun one step if some of the inputs to it have changed.

39:12 So, you know, in machine learning, you'd often have a pipeline, like you start, you download your data, then you pre-process it, then you convert it to something, then you train, then you run an evaluation step and everything sort of depends on each other.

39:26 And that can make things like really hard.

39:28 And you never know, you usually have to run everything, make, you know, clean from scratch, because yeah, if something changes, your whole results change.

39:36 So if you set up your pipelines with DVC, it can actually decide whether something needs to be rerun, or it can also know what needs to be rerun to reproduce exactly what you're trying to do.

39:47 So that's pretty cool.

39:48 Yeah, that could save you a ton of time and money if you're doing it in the cloud.

39:51 Yes, exactly. Yeah. And you know, you can share it with other people. It's like, it's, it's, I think it definitely solves a problem that's very real. And yeah, the people making DVC, they've also recently released a new tool that I have not personally checked out yet. But it looks very interesting. It's called CML, which is short for continuous machine learning. And that's really more of the CI, which kind of is logically the next step, right? You manage everything in your repo. And then you obviously want to run automated tests and continuous integration. So the previous And it just looked really cool.

40:21 Like it showed kind of a GitHub action where you can submit a PR with like some changes to your code and your data.

40:28 And then you have the bot commenting on it and it shows like accuracy results and a little graph and how stuff changes.

40:34 So it's really like these code coverage bots that you've probably seen where like you change some lines and then it tells you, oh, coverage has gone up or down and you know, the new view of your code.

40:46 So that's what it looks like.

40:47 So I think, yeah, I'm really excited about this.

40:49 - It definitely, it solves a problem.

40:50 It's already been solving a problem for us and yeah.

40:53 - How does it store the large files?

40:54 I know it has this cache.

40:55 Is that a thing that you host?

40:56 Does it have a hosted thing that's kind of like GitHub?

40:59 Or where's the--

41:00 - I'm not sure if you could, you probably connect it to some cloud, but like normally you have that locally.

41:04 It also has a cool thing where you can actually download files via the tool.

41:07 And then depending on where you're fetching it from, if it's a Google storage bucket or S3 bucket or something, you can actually also tell if the file has changed and whether it needs to be redownloaded.

41:17 And so, for example, internally, what we're doing is we're using, we're mounting a Google storage, Google Cloud Storage bucket, or however they call it, locally as like, you know, so it's like kind of a drive you have access to locally, and then you can just sort of type GS, blah, blah, blah, and then the path and really work with it like a local file system.

41:36 And that's pretty nice.

41:38 So you can, you know, you can work with private assets, because the thing is, a lot of toy examples assume that, oh, you just download a public data set, and then you train your model, and then you upload it somewhere.

41:47 But that's not very realistic, because most of the time, the data you have can't just go in the cloud publicly.

41:53 So, but yeah, I think I don't even know exactly how it works in detail, but it can basically fetch, I think, from the headers or something, it can tell whether the file you're downloading has changed and whether there's something new.

42:05 With a normal version control, one of the reasons we use it is to try to find what's different.

42:09 Do you do diffs on data?

42:12 I don't know, maybe.

42:14 I mean, I'm not sure if there's...

42:16 I think the main diff is more like around the results that you get, because I mean, diffing large data sets, diffing weights, you kind of can't.

42:25 That's really where we have the other problem, where you need to run the model to find out what it does, and then you're diffing accuracies rather than weights.

42:34 I don't know if it does actual diffing of the data sets, but often the thing that changes is really the models.

42:38 Like you have your raw data, and then you change things about your code.

42:43 Yeah, and something changes and you want to keep track of what it is or how it manifests.

42:48 Yeah, it's really cool to see them working on this.

42:50 Yeah, and also we'll be in spaCy 3. We'll hopefully have a pretty neat integration where, you know, if you want, it's not like mandatory, but if you say, "Hey, that's cool.

42:59 That's how I want to manage my assets." You can just run that in your spaCy project, and then it just automatically tracks everything. And, you know, you can take that into Git and share it and other people can download it. So that's, yeah, I'm pretty excited about that.

43:13 It works pretty well so far.

43:15 Everything you can do to make it a little easier to work with spaCy and just make it reproducible.

43:20 Yeah, and it's just that things are hard.

43:22 I'm not a fan of these, "Oh, one click, everything just magically works." It looks nice and it's a nice demo, but once you actually get down to the real work, things need to be a bit modular, things need to be customizable.

43:33 Otherwise, you're always hitting edge cases or you have these leaky abstractions.

43:37 So, yeah, I think things should be easy to use, but you can't just magically cover everything by just providing one button.

43:45 That's just not going to work.

43:47 Yeah, because when it doesn't work, it's not good anymore.

43:49 Yeah, exactly.

43:50 Yeah. Alright, well, that's our six items that we go in depth into.

43:55 But at the end, we always just throw out a couple of really quick things that maybe we didn't have time to fit into the main section.

44:01 And I want to talk about two things that are pretty exciting.

44:05 One is, if you care about podcasts as a catalog of a whole bunch of things, I don't know how many podcasts there are.

44:13 There's probably over a million podcasts these days.

44:15 One of our listeners, Anton Zyanov, wrote a cool Python package that will let you search the iTunes directory and query it. It's basically a Python API into iTunes podcasting directory.

44:29 You know, some people think that you've got to be part of the Apple ecosystem don't care about iTunes, but really that's just the biggest like directory kind of Yahoo circa 1995 Style of listing of podcast. So if you care about digging in and researching podcasts, check that out. That's pretty cool And then yeah, and then I've also I'm such a big fan of f-strings. How about you - yes. Yes F I'm finally working in like Python 3 only I remember I think last time I was on the podcast I was basically I was saying how like, oh, all these modern things. They're so nice. I wish I could use them more but We're still supporting python2 but like no everything I write now 3.6. Yes, and I've talked previously about a tool called flint flynt which lets you run against an old code base and convert all the various python2 and 3 styles of formatting Magically into python3. I think that was actually really nice the episode I was Yeah, you might have been right like I wish I could run this right? Yeah And I ran that against like 20,000 lines of Python.

45:31 I found like just a couple errors, reported them, they got fixed.

45:34 So that's nice.

45:35 But the thing that's bugged me endlessly about fstrings is I'll be halfway through writing the string and I'm like, oh yeah, I want to put data here.

45:42 So I got to go back to the front of the string, not necessarily back to the front of the line, but maybe back to like the string is being passed to a function.

45:49 So I go back to the first quote, put the f, go back forward, and then start typing out the thing I actually wanted, right?

45:55 Or maybe I'll fstring something and when I really, I'm not gonna put data, right?

45:59 So it's like you're halfway through and you want it to become an F string.

46:02 Well, PyCharm is coming with a new feature where if you start writing a regular string and pretend like it's an F string, it'll automatically upgrade it to f-strings.

46:11 - Yes, thank you. - Halfway through.

46:13 Yes, without leaving.

46:14 So you just say curly variable.

46:15 It's like, oh, okay, that means that's F string and the F appears at the front.

46:18 Yes. - Oh, nice.

46:19 - So that is pretty awesome.

46:21 Anyway, those are my two quick items.

46:23 Ines, I'm also excited about the one you got here.

46:24 - Yeah. - This is awesome.

46:25 - Yeah, I had one which is something coming to 3.9 or in 3.9, which is PEP 585.

46:32 And you can use, when you use type annotations, you can now use the built-in types like list and dict as generic types.

46:40 So that means no more from typing import list with a capital L.

46:46 - Yes.

46:47 (laughing)

46:48 - Yes.

46:48 So you just literally, I mean, when I first saw it, I'm like, that looks strange, but like, yes.

46:53 I'm so excited about this.

46:55 It'd probably be years until I can just like use it all across my codebases because...

46:58 True, true.

46:59 Yeah, but like, yay.

47:01 That's in 3.9?

47:02 Yeah.

47:03 Yeah, it's in 3.9.

47:04 I'm already using 3.9 and I didn't know this.

47:06 You can do this.

47:07 Yeah.

47:08 Yeah, and Guido is one of the guys on the PEP making this happen.

47:12 Like I said, he's really into typing.

47:14 Oh, that's great.

47:15 So this is really cool because it was super annoying to say, "Oh, you have this new import just because you want to use type annotations on a collection." Right?

47:21 Now you don't have to.

47:22 There's actually a bunch of the collection stuff and iterators and whatnot, like the collections module, like that, a bunch of stuff in there.

47:30 - That's pretty neat. - It's really nice.

47:32 And they're compatible, like, lowercase list of str is the same as capital list of str, I believe.

47:38 All right, Brian, what you got?

47:40 - Oh, I just wanted to-- I'll drop a link in the show notes.

47:42 Testing code 120 is where I interviewed Sebastian Ramirez from Explosion, also, and talking about FastAPI and Typer because I'm kind of in love with both of those. They're really cool.

47:55 Yeah, absolutely.

47:56 All right. Well, that's a cool one. Definitely going to check that out.

47:59 And you can find out why he has the cool mustache.

48:02 [laughter]

48:04 That's right.

48:05 All right. So we always end the show with a joke.

48:07 And I thought we could do two jokes today.

48:10 So I think, Enos, do you want to talk about this first one?

48:13 Oh, yeah. I mean, I'm not even sure it counts as a joke per se, but like...

48:16 It's more of a humorous situation, I guess, right?

48:19 Yeah, it ties in. Well, it's Sebastian again, like he had this very viral tweet the other day, where he posted about some experience. I could just read it out because I think it needs to kind of stand on its own. So he's writes, I saw a job post the other day, it required four plus years of experience in FastAPI. I couldn't apply as I only have 1.5 plus years of experience since I created that thing. And then he says, maybe it's time to reevaluate that years of experience equals skill level. So this was like, it resonated with people so much. I was actually surprised to see, like, everyone was like, oh, yeah, HR, like, apparently, this seems to be this huge, huge issue, obviously, that like, well, most job ads not written by the people who actually worked with the technologies and where you have, yeah, you.

49:07 Actually, yeah, this is awesome. And this tweet actually just got covered on DTNS, the the Daily News Tech Show, Daily Tech News Show, I guess it is.

49:15 Alongside another posting that said you needed eight years of Kubernetes experience for another job.

49:21 But of course, Kubernetes has only been around for four years.

49:24 Yeah, when you say this went viral, it had 46,000 retweets and 174,000 likes.

49:29 That's got some traction. I feel like this might be a problem.

49:33 Yeah, I was surprised that so many people were like, "Yeah, that's a big deal." It's like, I mean, it is true, like, kind of tech hiring sort of seems to be broken.

49:42 And it's also, it's like, it's a bit different in my case, I guess, but like, I don't qualify for most roles using the tech that I write.

49:50 And in some cases, that's justified, because I'm not a data scientist, just because I write developer tools for data scientists doesn't mean I can do the job.

49:56 But in other cases, I'm like, there's kind of a ridiculous amount of arbitrary stuff you're asking for in this job ad, maybe that's needed, maybe not.

50:03 but it centers around a piece of software that I happen to have written, and I do not qualify for your job at all.

50:11 That's insane.

50:12 The last time I wrote a job description, I intentionally left off the college degree requirement because all of the other requirements I was listing in there, either they had it from college plus experience, or they had it just from experience, so I was fine with that.

50:28 By the time it actually went live, somebody in HR had added a college degree requirement to it.

50:33 I just couldn't get away with not listing that, I guess.

50:36 Yeah, but that's the problem.

50:37 Master's degree in space is preferred.

50:39 In space is preferred.

50:40 Yeah, but I guess another problem is like, well, look, if you ask, if HR writes these job ads with these bullshit requirements, then, well, who applies?

50:49 Like, it's either people who are like, yeah, whatever, or people who are full of shit.

50:52 And then that's the sort of culture you're fostering.

50:54 And it might not even be the engineer's fault who wrote a very honest job description, But like, yep, who applies to that?

51:01 You're going to make me lie about my FastAPI experience.

51:04 Yeah, people just apply to anything.

51:05 Like, "Yep, I have 10 years experience in everything. Great." And they're like, "Perfect. That's what we're looking for. You're hired." And then you wonder, "Why is our company culture so terrible?" Well, I actually did have somebody apply to a job and say they have multiple years of experience in any new language coming up.

51:23 Nice.

51:27 All right, guys, well, it looks like we're just about out of time.

51:29 Let me give you one more joke for it.

51:32 Brian, will you describe this picture and then I'll read what it says?

51:35 There's a poorly drawn horse, I think, zebra.

51:39 Horse that has white on the back end and black on the front end.

51:43 And the text says, "I defragged my zebra." I don't even know if people defrag drives anymore.

51:48 So this is only going to resonate with the folks that have been around for a while.

51:51 I saw that there was this great video I came across on YouTube where you can actually watch like a live defrag session.

51:56 I don't know, Windows 95. And it's like, I don't know, it takes a few hours. And you know, you can kind of bring back that nostalgia and just put it on your TV and just sit there and you're like, yeah, oh, that's, it's like the aquarium you would put on your TV. Like before tech, follow the show on Twitter via at Python bites. That's Python bites as in b y t e s, and get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit Python ThumbBytes.fm and send it our way.

52:23 We're always on the lookout for sharing something cool.

52:26 On behalf of myself and Brian Auchin, this is Michael Kennedy.

52:29 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page