#191: Live from the Manning Python Conference

Published Wed, Jul 22, 2020, recorded Tue, Jul 14, 2020

Special guest: Ines Montani

Michael #1: VS Code Device Simulator

Want to experiment with MicroPython?
Teaching a course with little IoT devices?
- Circuit Playground Express
- BBC micro:bit
- Adafruit CLUE with a screen
Get a free VS code extension that adds a high fidelity simulator
Easily create the starter code (main.py)
Interact with all the sensors (buttons, motion sensors, acceleration detection, device shake detection, etc.)
Deploy and debug on a real device when ready
Had the team over on Talk Python.

Brian #2: pytest 6.0.0rc1

New features
- You can put configuration in pyproject.toml
- Inline type annotations. Most user facing API and internal code.
- New flags
  - --no-header
  - --no-summary
  - --strict-config : error on unknown config key
  - --code-highlight : turn on/off code highlighting in terminal
- Recursive comparison for dataclass and attrs
Tons of fixes
Improved documentation
There’s a list of breaking changes and deprications. But really, nothing in the list seems like a big deal to me.
Plugin authors, including myself, should go test this.
- Already found one problem. pytest-check: stop on fail works fine, but failing tests marked with xfail show up as xpass. Gonna have to look into that. And might have to recruit Anthony to help out again.
To try it: pip install pytest==6.0.0rc1
I’m currently running through the pytest book to make sure it all still works with pytest 6. So far, so good.
- The one hiccup I’ve found so far, TinyDB had a breaking change with 4.0, so you need to pip install tinydb==3.15.2 to get the tasks project to run right. I should have pinned that in the original setup.py.
- However, all of the pytest stuff is still valid.
Guido just tweeted: “Yay type annotations in pytest!”

Ines #3: TextAttack

Python framework for adversarial attacks and data augmentation for natural language processing
What are adversarial attacks? You might have seen examples like these:
- image classifier predicting a cat even if the image is complete noise
- people at protests wearing shirts and masks with certain patterns to trick facial recognition
- Google Translate hallucinating bible texts if you feed it nonsense or repetitive syllables
What does it mean to "understand" a model?
- How does it behave in different situations, with unexpected data?
- We can't just inspect the weights – that's not how neural networks work
- To understand a model, we need to run it and find behaviours we don't like
TextAttack lets you run various different “attacks” from the current academic literature
It also lets you create more robust training data using data augmentation, for example, replacing words with synonyms, swapping characters, etc.

Michael #4: What is the core of the Python programming language?

By Brett Cannon, core developer
Brett and I discussed Python implementation for WebAssembly before
Get Python into the browser, but with the fact that both iOS and Android support running JavaScript as part of an app it would also get Python on to mobile.
We have lived with CPython for so long that I suspect most of us simply think that "Python == CPython".
PyPy tries to be so compatible that they will implement implementation details of CPython.
Basically most implementations of Python strive to pass CPython's test suite and to be as compatible with CPython as possible.
Python’s dynamic nature makes it hard to do outside of an interpreter
That has led Brett to contemplate the question of what exactly is Python?
How much would one have to implement to compile Python directly to WebAssembly and still be considered a Python implementation?
Does Python need a REPL?
Could you live without locals()?
How much compatibility is necessary to be useful? The answer dictates how hard it is to implement Python and how compatible it would be with preexisting software.
[Brett] has no answers
- It might make sense to develop a compiler that translates Python code directly to WebAssembly and sacrifice some compatibility for performance.
- It might make sense to develop an interpreter that targets WebAssembly's design but maintains a lot of compatibility with preexisting code.
- It might make sense to simply support RustPython in their WebAssembly endeavours.
- Maybe Pyodide will get us there.
Michael’s thoughts:
How about a Python standard language spec? A standard-library “standard???!?” spec. It’s possible - .NET did it.
What would be build if we could build it with web assembly?
Interesting options open up, say with NodeJS like capabilities, front-end frameworks
This could be MUCH bigger if we got browser makes to support alternative runtimes through WebAssembly

Brian #5: Getting started with Pathlib

Chris May
Blog post: Stop working so hard on paths. Get started with pathlib!
PDF “field guide”: Getting started with Pathlib
Really great introduction to Pathlib
Some of the info
- This file as a path object: Path(__file__)
- Parent directory: Path(__file__).parent
- Absolute path: Path(__file__).parent.resolve()
- Two levels up: Path(__file__).resolve(strict=True).parents[1] See pdf for explanation.
- Current working dir: Path.cwd()
- Path building with /
- Working with files and folders
- Using glob
- Finding parts of paths and file names.
Any time spent learning Pathlib is worth it.
If I can do it in Pathlib, I do. It makes my code more readable.

Ines #6: Data Version Control (DVC)

We're currently working on v3.0 of spaCy and one of the big features is going to be a completely new way to train your custom models, manage end-to-end training workflows and make your experiments reproducible
It will also integrate with a tool called DVC (short for Data Version Control), which we've started using internally
DVC is an open-source tool for version control, specifically for machine learning and data
Machine learning = code + data. You can check your code into a Git repo, but you can't really check in your datasets and model weights. So it's very difficult to keep track of changes.
You can think of DVC as “Git for data” and the command line usage is actually pretty similar – for example, you run dvc init to initialize a repo and dvc add to start tracking assets
DVC lets you track any assets by adding meta files to your repository. So everything, including your data, is versioned, and you can always go back to the commit with the best accuracy
It also builds a dependency graph based on the inputs and outputs of each step, so you only have to re-run a step if things changed
- for example, you might have a preprocessing step that converts your data and then a step that trains your model. If the data hasn't changed, you don't have to re-run the preprocessing step.
They recently released a new tool called CML (short for Continuous Machine Learning), which we haven't tried yet.
- CI for Machine Learning
- Previews look pretty cool: you can submit a PR with some changes and a GitHub action will run your experiment and auto-comment on the PR with the results, changes in accuracy and some graphs (similar to tools like Code Coverage etc.)

Extra

Michael:

Podcast Python Search API package, by Anton Zhiyanov
Mid-string f-string upgrades coming to PyCharm. And Flynt! via Colin Martin

Ines:

Built-in generic types in 3.9 (PEP 585): you can now write list[str] !

Brian:

https://testandcode.com/120: FastAPI & Typer - Sebastián Ramírez

Jokes

Fast API Job Experience

Sebastián Ramírez - @tiangolo

I saw a job post the other day.
It required 4+ years of experience in FastAPI.
I couldn't apply as I only have 1.5+ years of experience since I created that thing.
Maybe it's time to re-evaluate that "years of experience = skill level".

Defragged Zebra

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 191, recorded July 14th, 2020.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And welcome, special guest Ines.

00:12 Hi.

00:13 It's great to have you here.

00:14 So I want to kick this off with a cool IoT thing.

00:19 Now, IoT and Python, they've got a pretty special place.

00:23 Because when I think about Python, I think of it as not being something that sort of competes with assembly language and really, really low level type of programming for small devices.

00:33 But, you know, amazing people put together MicroPython, which is a reimplementation of Python that runs on little tiny devices.

00:42 And we're talking like $5 microchip type devices, right?

00:46 Have either of you all played with these?

00:47 No.

00:48 No, I haven't.

00:49 But I've been seeing a bit of this from my brother.

00:51 So he's pretty amazing.

00:53 Like, he's a bit younger than me.

00:54 He's an event technician.

00:55 And he recently taught himself programming and everything just so he can build stuff on these, like, tiny raspberry pies.

01:02 And, like, I don't know.

01:03 He's doing super advanced stuff.

01:04 It's been really interesting to see him learn to program.

01:07 And he's also, he's incredibly good.

01:09 He has, like, amazing instincts about programming, even though he's never done it before.

01:12 But, like, so I've been kind of watching this from afar.

01:14 And it made me really want to build stuff.

01:16 So I'm very curious.

01:17 Yeah, I've done the CircuitPython on some of the Adafruit stuff.

01:22 Exactly.

01:22 So I always just want to build these things.

01:25 I'm like, what could I think of that I could build with these cool little devices?

01:28 I just, in my world, I don't have it.

01:31 Maybe if I had a farm, I could, like, automate, you know, like, watering or monitoring the crops.

01:36 Or if I had a factory.

01:38 But I just don't live in a world that allows me to automate these things.

01:41 Do you have pets?

01:42 Maybe you can build something for pets.

01:44 We generally don't have pets.

01:46 But we are fostering kittens for the summer.

01:48 So I could put a little device onto one of the kittens, potentially.

01:52 GPS tracker.

01:55 Yeah.

01:56 So in general, you have to get these little devices, right?

01:59 You've got the US PyCon.

02:01 We got the Circuit Playground Express, which is that little circular thing.

02:05 It's got some 10 LEDs and a bunch of buttons and other really advanced things like motion sensors and temperature and so on.

02:13 Probably the earliest one of these that was a big hit was the BBC Microbit, where I think every seventh grader in the UK got it.

02:20 Some grade around that scale got one of these.

02:23 And it really made a difference in kids seeing themselves as a programmer.

02:27 And interestingly, especially women were more likely to see programming as something they might be interested in in that group where they went through that experience.

02:37 So I think there's a real value to work with these little devices.

02:39 But getting a hold of them can be a challenge, right?

02:43 You've got to physically get this device.

02:44 That means you have that idea of I want to do this thing and then I have to order it from Adafruit or somewhere else and then wait for it to come.

02:51 And my experience has been I'll go there and I'm like, oh, this is really cool.

02:54 I want one of these.

02:55 Oh, wait, no, it's sold out right now.

02:56 You can order it again in a month.

02:58 Right.

02:58 So getting is a challenge.

02:59 And also, if you're working in a group of, say, like you want to teach a high school class or a college class or something like that, and you want everyone to have access to these.

03:10 Well, then all of a sudden, the fact that maybe it costs $50 wasn't a big deal.

03:15 But if it's $50 times 20 or 100 kids, then all of a sudden, well, maybe not.

03:20 So I want to talk about this thing called Device Simulator Express.

03:24 So this is a plug in or extension or whatever the things that I think is extensions that VS Code calls them that makes VS Code do more stuff.

03:33 And it's a open source free device simulator.

03:37 So what you can do is you just go to the Visual Studio Code extensions thing and you type device probably is sufficient, but device simulator express.

03:44 And it'll let you install this extra thing inside of VS Code that is really quite legit.

03:51 So it gives you a simulated Circuit Playground Express, a simulated BBC Microbit.

03:58 And the most impressive to me is the clue from Adafruit, which actually has a screen that you can put graphics on.

04:06 So really, really cool way to get these little IoT devices with Circuit Playground, Circuit Python.

04:13 So Adafruit's fork of MicroPython on there.

04:17 What do you guys think?

04:17 See that picture?

04:18 Look how cool that is.

04:19 Yeah, so you can write Python in one tab and then just have the visualization in the other.

04:25 That's pretty cool.

04:26 Yeah.

04:26 Yeah, exactly.

04:27 And it's very similar to, say, what you might do with Xcode and iPhones, where you have an emulator that looks quite a bit like it or what you would do on the Android equivalent.

04:37 I actually think this is a little bit better than the device because it's actually larger, right?

04:41 Like the devices are really small, but here's like, you know, you could be like a huge thing on your 4K monitor with a little clue device.

04:48 So you can simulate Circuit Playground Express, BBC MicroBit, and the clue in here.

04:52 And we just say new project, and it'll actually write the boilerplate code for the main.py or code.py or whatever it's called that the various thing is going to run.

05:02 And like you said, Ines, on one half, it's got the code, and the other half, it has the device that you can interact with.

05:07 I was thinking that a couple of cases that would be great is, like you were saying, trying to get a hold of it.

05:13 But you might not even know if the concept that you're going to use is really going to work for the device you're thinking of.

05:19 So this would be a good way to try it out, to try out whether the thing you're thinking of trying for your house or whatever would actually work for this device.

05:27 The other thing was, yes, you brought up education and that it's big.

05:33 I was thinking about a couple of conferences where they tried to do the display and try to have a camera or something.

05:40 Yes.

05:40 Sometimes it works and sometimes it doesn't.

05:43 This way you could just do a tutorial or in a teaching scenario and everybody could see it because it's just going to be displayed on your monitor.

05:51 Right.

05:52 Your standard screen sharing would totally work here.

05:53 That's a good point as well.

05:55 And it doesn't have to be all or nothing.

05:56 Actually, what's really interesting is this thing isn't just an emulator, but you can do debugging.

06:01 You can set like a breakpoint and like step through it running on the device simulated or you can actually run it.

06:07 If you had a real device plugged in, you can run it on there as well and then do debugging and breakpoints and stuff on the actual device.

06:13 So it's like you tested here.

06:14 I always admire people who actually use like the proper debugging features.

06:18 I know VS Code has like so much of this and I'm always like I should use this more, but I'm like, okay, print.

06:23 Print, print.

06:25 Yeah.

06:26 There's some really cool libraries that will actually do that.

06:29 I can't remember what it's called, but Brian and I recently covered one that would actually like print out a little bit of your code and the variables as they change over time.

06:36 It was like the height of the print debugging world.

06:39 It was really, really cool.

06:40 I wish I could remember.

06:41 Do you remember, Brian?

06:42 No, we actually covered a couple of them.

06:44 I know.

06:45 That's a problem.

06:47 We cover thousands of things in here.

06:48 So another thing that's interesting is like, okay, so you see the device.

06:52 Some of them have buttons and they have lights and you can imagine maybe you could touch the button, but they also have things like temperature,

06:57 gyro meter type things or like you moving it or motion sensing or even like if you shake it,

07:02 this thing has little ways to simulate all that stuff.

07:07 So you can like have a temperature slider that freaks it out and says, hey, the temperature is actually this on your temperature sensor and so on.

07:13 So all the stuff that the devices simulate are available here.

07:16 Oh, that's cool.

07:16 Yeah.

07:16 So I actually had the team over on Talk Python not long ago.

07:20 So people can check that over at talkpython.fm.

07:23 And yeah, I'm also really excited about what you got coming here next, Brian.

07:27 What is that?

07:28 Yeah.

07:28 Well, speaking of, I guess, debugging versus test.

07:31 We didn't really talk about testing.

07:33 Anyway, I'm really excited.

07:34 We should have talked about testing.

07:35 Yeah.

07:36 So I was just, I was thinking it.

07:38 I was thinking that, that I hardly ever use a debugger for my source code, but I use a debugger all the time when I'm debugging my tests.

07:47 I don't know.

07:48 It's just something different about it.

07:50 But I've been running a lot of tests and debugging a lot of tests lately because pytest 6, the candidate release is out.

07:57 Now, by the time this episode airs, I don't know if the release candidate will be released or just the release candidate still.

08:05 But it's, you can install it, we'll have instructions in the show notes, but essentially you just have to say 6.0.0 RC1 and you'll get it.

08:15 So there's a whole bunch of stuff that I'm really excited about.

08:18 There's a lot of configuration that you used to be able to put in lots of places in your pytest Any or your setup config or tox any or something.

08:27 pytest 6 will support pyproject.toml now.

08:30 So if you jumped on the Toml bandwagon, you can stick your pytest configuration in there too.

08:35 There's a lot of people excited about the type annotations.

08:38 So the 6.0 is going to support type annotations.

08:41 So it actually was a lot of work.

08:43 There was a volunteer that went through and added type annotations to a bunch of it, especially the user facing API.

08:49 And why this is important is if you're type checking, you're running mypy or something over your source and everything, your project, why not include your tests?

09:02 But if pytest doesn't support types, it doesn't really help you much.

09:06 So it will now.

09:08 So that's really, really cool addition.

09:09 What this is basically the API of pytest itself is now annotated with types.

09:15 Yes.

09:15 And well, a lot of the internal code as well.

09:18 So they actually went through and did a lot.

09:20 There was a lot of work.

09:21 And if you look at the conversation chain, it went on for, it was a month, several month project.

09:27 Wow.

09:28 What does that mean for compatibility?

09:30 Does that make pytest like 3.6 only and above?

09:33 I think the modern versions of pytest really already are 3.6 and above.

09:37 I'm not sure about that.

09:39 Right.

09:39 So then the door was open to use that because otherwise it would cut.

09:42 I mean, it would be a weird move to like release a completely new version with Python 2 backwards compatibility.

09:50 Like that's like, you wouldn't do that.

09:53 Right.

09:53 I mean, it's, it's, I think, well, I think the message it sends, it's like not great.

09:57 I totally agree.

09:58 Totally agree.

09:58 There is a pinned version of pytest.

10:01 I don't remember which one it is.

10:02 That is still supports 2.7 if you're on it, but no new features are going in there.

10:08 The thing I'm really excited about is the, is a, is a little flag they've added called no header.

10:13 So don't use this.

10:15 Most people don't use this.

10:17 When you run pytest, it prints out some stuff like the version of Python, the version of pytest, all the plugins you're using, a bunch of information about it.

10:25 All this stuff is really important for logging.

10:28 If you're, if you're capturing the output to save somewhere or do a deep, a bug report or something, that information is great to help other people understand it.

10:36 What I don't like about that is that it, it's not helpful if you're writing tutorials or if you're writing code to put on a slide or something.

10:46 All that extra stuff just takes up space and it distracts.

10:49 Yeah.

10:49 Like I've had students say, like, I ran it, I think pytest in PyCharm and it has like some kind of output just stating where it is and what it's doing.

10:58 They're like, this didn't work for me.

11:00 I'm like, well, that was just random output from the tool.

11:02 You're not actually supposed to try to run that part.

11:04 You know what I mean?

11:05 But it's, it's, I mean, I saw why they saw that.

11:07 But at the same time, like the ability to just say like, these details don't matter in the longterm.

11:11 Yeah.

11:12 Yeah.

11:13 So I'm, I'm excited about that to trim it down.

11:16 There was a plugin called TLDR.

11:18 Too long.

11:19 Didn't read.

11:20 But it, it actually didn't take enough of the header off than I wanted.

11:23 So I had my own tool that would do this, but now I've got this, which is great.

11:29 So a lot of the configuration, there is a chance for human error if you type something wrong and you type a variable name wrong.

11:36 And so I really like this new, a new flag called strict config, which will throw an error.

11:43 If you have the pytest section of your configuration has something that it doesn't recognize.

11:48 And it probably is just, you've misspelled some variable or something.

11:52 Yeah, that's good to know.

11:53 And then not too, I can't remember the version, but it was, I think it was in pytest 5.

11:58 They added some code highlighting stuff that.

12:00 Yeah, that's super cool.

12:01 I discovered that just the other day.

12:02 I like just somehow updated all my dependencies in some environment and suddenly pytest output was colored.

12:07 And I was like, whoa, this is amazing.

12:09 Yeah.

12:10 Yeah.

12:10 The syntax highlighting.

12:11 I love it.

12:12 Nice.

12:12 But there's times where you don't want that, I guess.

12:15 Oh yeah, sure.

12:15 Yeah.

12:15 So there's a new flag to turn it off.

12:18 And then a little tiny detail that I really like is the diff comparisons on pytest are wonderful,

12:24 but apparently they didn't do recursive comparisons of data classes and adder classes, but now they do.

12:31 So that's neat.

12:31 There's a whole bunch of new features, there's fixes.

12:34 I ran through some of the features I really liked.

12:37 There are deprecations and it's a large list of breaking changes and deprecations.

12:42 That's why they went to a new number, pytest 6.

12:45 But I went through the whole list and I didn't see anything that was like, oh, that's going to stop me.

12:50 I'm going to have to change something.

12:51 Okay.

12:51 That's good to know.

12:52 Like, I mean, if you say, oh, there was nothing that like we're using, I feel confident that maybe there's nothing in my code either.

12:58 And I knew that somebody was going to ask, is my pytest book still valid?

13:02 Yes, it is.

13:04 I'm going through it right now.

13:05 I haven't gone through the whole thing yet to make sure.

13:07 The side that is not compatible is not the book.

13:10 The book's fine.

13:10 It's, I have a plugin that now is broken.

13:14 So pytest check still works.

13:17 But if you depend on X fail, pytest, this is a, wow, this is a corner case.

13:22 But if you depend on pytest check and the X fail feature of it, it doesn't work right now.

13:28 So I'll have to fix that.

13:29 So you would say X fail fails temporarily?

13:31 Yeah.

13:32 It actually marks everything as a pass.

13:35 So if you mark X fail.

13:36 Oh, wow.

13:36 That's like X fail-ception.

13:38 Yeah.

13:39 Yeah.

13:40 It's really bad.

13:42 Anyway, I'll have to get back to that.

13:44 Yeah.

13:44 This is really exciting that pytest 6 is out.

13:46 Super cool.

13:47 I know that there were some waves, some uncertainty in the ecosystem.

13:51 So it sounds like that got ironed out.

13:53 Things are going strong.

13:54 New versions coming out.

13:54 I even saw that Guido had tweeted the announcement, retweeted the announcement and said,

14:00 yay, type annotations coming in pytest.

14:02 Of course, he's been all about type annotations these days.

14:05 We'll come back to that later in the show, actually.

14:07 So Ines, I know you work a lot with text, but are you frustrated with it?

14:10 What's the story of this name here?

14:11 Oh, my point of the day.

14:14 Yeah.

14:16 Text attack.

14:17 Text attack.

14:17 What does text attack?

14:17 What else about it?

14:17 I thought I'd present something for my space, obviously.

14:20 Yeah.

14:21 Awesome.

14:21 Yeah.

14:21 There's this new framework that I came across and it's called text attack.

14:24 Yay.

14:25 And it's a framework for adversarial attacks and data augmentation for natural language processing.

14:31 So what are adversarial attacks?

14:33 You've probably, you might've actually seen a lot of examples of it.

14:37 For instance, an image classifier that predicts a cat or some other image, even though you show it complete noise and you somehow trick the model.

14:45 Or you might've seen people at protests wearing like funny shirts or masks to trick facial recognition technology.

14:52 So really to trick the model into, to like, you know, not recognize them.

14:57 Or the famous example of Google Translate suddenly hallucinating these crazy Bible texts.

15:03 If you just put in some complete gibberish, like just gah, gah, gah, gah.

15:07 And then it would go like, the Lord has spoken to like the people, stuff like that.

15:11 That's amazing.

15:13 I include a link to an article by a researcher who explains like why this happened and shows the example.

15:20 But it's, it's pretty fascinating, but I think it all comes down to like the fundamental problem of like, what, how do you understand a model that you train?

15:29 And what does it, you know, what does it mean to understand your model?

15:32 And how does it behave in situations when it suddenly gets to see something that it doesn't expect at all?

15:37 Like gah, gah, gah, what does it do?

15:38 And the thing with neural network models is you can't just look at the weights.

15:42 They're not linear.

15:43 They're like, you know, you can't just look at what your model is.

15:47 You have to actually run it.

15:48 And so the, that library takes the tack that lets you actually try out different types of attacks from the academic literature and different types of inputs that you can give a model to see whether it produces something that you're like not happy with.

16:03 Or that's like really weird and exposes some problems in your model.

16:07 And it also lets you then, because normally what's the goal?

16:10 The goal is, well, you do that and then you find out, oh damn, like if I suddenly feed it this complete nonsense or if I feed it Spanish text, it like goes completely in the wrong direction and suddenly predicts stuff that's not there.

16:22 And if you, you know, if you deployed that model into like a context where it's actually used, that would be pretty terrible.

16:28 And, you know, there are much worse things that can be happening.

16:30 So you can also create more robust training data by like replacing, replacing words with synonyms.

16:36 You can swap out characters and just, you know, see how the model does.

16:41 So I thought that was very cool.

16:42 And yeah, I thought in general, I think adversarial attacks, it's a pretty interesting topic.

16:46 And yeah.

16:47 Yeah, it's super interesting.

16:49 So the idea is basically you've trained up a model on some text and for what you've given it, it's probably working.

16:54 But if you give it something you weren't expecting, you want to try that to make sure that it doesn't go insane.

17:00 Yeah, exactly.

17:02 And it can do, it can expose very unexpected things like the Bible text, for example.

17:05 That sounds really bizarre when you like first hear it.

17:08 But one explanation for that would be that, well, especially it happens in low resource languages where, you know,

17:13 we don't have much text and especially not much text translated into other languages.

17:18 But there's one type of text that has a lot of translations available and that's the Bible.

17:23 And so they're parallel corpora where you have one text, one line in English, one line in Somali, for example.

17:30 And then people train their models on that.

17:32 But one thing that also is very specific about Bible text is that some Bible text has some words that like really only occur in a Bible text.

17:40 But it uses some really weird words.

17:42 So what your model might be learning is if I come across a super unexpected word that's really, really rare, that must be Bible.

17:49 And also, also the objective is you want your model to output a reasonable sentence.

17:53 So the model's like, well, okay, you know, if that's the rare word, then the next word needs to be something that matches.

17:59 And then you have like this bizarre sentence from the Bible, even though you typed in ga ga ga.

18:03 And that happens.

18:05 Yeah, how funny.

18:06 Yeah.

18:07 So it looks like they have actually a bunch of trained models already at the text attack model zoo, they call it, I guess.

18:15 Yeah.

18:16 Everything's called the model zoo.

18:17 Yeah.

18:18 And so you can just take these and run it against it, like the movie reviews from Rotten Tomatoes or IMDb or the news set or Yelp.

18:28 And just give it that kind of data and see how it comes out, right?

18:32 Exactly.

18:32 Yeah.

18:32 I think that's pretty cool.

18:33 And yeah, and then you can actually, you can also generate your own data or load in your data and generate data that maybe, you know, produces a better model or like covers things that your model previously couldn't handle at all.

18:45 So that's the data augmentation part.

18:47 Yeah, that's all very important.

18:48 And I think it's also very important to understand the models that we train and, you know, really try them out and think about like, what do they do and how are they going to behave in like a real world scenario that we care about?

18:59 Because, yeah, the consequences.

19:00 Right.

19:01 Because as soon as you're making decisions on this data, right?

19:03 Yes, of course.

19:03 On these models.

19:04 Yeah.

19:04 I guess as soon as a human is convinced that the model works and they start making decisions on it, right, that could go bad if the situation changes or the type of data.

19:15 And especially if the model is bad, like I'm always saying, like, well, people are always scared of these dystopian futures where like we have AI that can, I don't know, know anything about us and predict anything and works.

19:26 But the real dystopia is if we have models that kind of don't work and are really shit, but people believe that they work.

19:34 That's much more.

19:35 It's not even about whether they work.

19:37 It's about whether people believe it.

19:38 And then, you know, that's where it gets really bad.

19:40 And yeah.

19:41 Yeah.

19:42 And that's way more likely.

19:43 Yeah.

19:44 Yes.

19:45 It's a more difficult world to test this sort of stuff to figure out.

19:50 What does it mean for a model to be bad?

19:52 How do you tell if it's bad?

19:53 And models can be both working with some data sets and produce gibberish with or, yeah, I guess in this case, the reverse, not produce gibberish if you pass in gibberish.

20:07 Yeah.

20:07 Actually, yeah.

20:08 I just realized it ties in very well with the pie test point earlier and just like, yep.

20:12 Machine learning is quite special in a way that it's code plus data.

20:15 Code, you can test, you can have a function and you're like, yay, that comes in.

20:18 That's what I expect out.

20:20 Easy.

20:20 Write a test for it.

20:21 You know, it's not that easy.

20:23 Testing is hard, but like fundamentally, yeah.

20:25 It's somewhat deterministic.

20:27 Yeah.

20:28 Right.

20:28 And even if it's not, there's like something you can, you know, test around it and it's much harder with the model.

20:34 Yeah.

20:34 Yeah, for sure.

20:35 All right.

20:36 Before we get to the next item, just want to let you know this episode is brought to you all by us.

20:41 Over at Talk Python Training, we have a bunch of courses.

20:44 You can check them out.

20:45 And we're actually featured in the Humble Bundle that's running the Python Humble Bundle right now.

20:49 So if you go to talkpython.fm/humble2020, you can get $1,400 worth of Python training tools and whatnot for 25 bucks.

20:59 So that's a pretty decent deal.

21:01 And Brian, you mentioned your book before.

21:03 Tell people about your book real quick.

21:04 Yeah.

21:04 So Python Testing with pytest is a book I wrote and it's still very valid, even though it was written a few years ago.

21:11 The intent was the 80% of pytest that you will always need to know for any version of pytest.

21:17 And I've had a lot of feedback from people saying a weekend of skimming this makes it so that they understand how to test.

21:25 It's a weekend worthwhile.

21:26 Yeah, absolutely.

21:27 And Ines, you want to talk a little bit about Explosion just to let people know?

21:30 Yeah.

21:30 So, I mean, some of you who are listening to this might know me from my work on spaCy, which is an open source library for NLP and Python, which I'm one of the core developers of.

21:40 And yeah, that's all free open source.

21:42 And we're actually just working on the nightly version or the pre-release of spaCy 3, which is going to have a lot of exciting features.

21:51 I might also mention a few more things later on.

21:54 And yeah, so maybe that's already going to be out by the time this podcast officially comes out.

22:00 Maybe not.

22:01 I don't want to overpromise.

22:02 But yeah, you can definitely try that out.

22:04 And we also recently released a new version of our annotation tool, Prodigy, which comes with a lot of new features for annotating relations, audio, video.

22:13 And the idea here is, well, once you get serious about training your own models, you usually want to create your own data sets for your very specific problems that solve your problems.

22:22 But often the first idea you have might not be the best one.

22:24 It's a continuous process.

22:25 You want to develop your data.

22:27 And Prodigy was really designed as a developer tool that lets you create your own data sets with a web app, a Python backend.

22:35 You can script.

22:36 That's our commercial tool.

22:37 That's how we make money.

22:38 And it's very cool to see a growing community around this.

22:42 So yeah, that's what we're doing.

22:43 We have some more cool stuff planned for the future.

22:45 So stay tuned.

22:46 Yeah, people should check it out.

22:48 Actually, you and I talked on Talk Python 202 about building a software business and entrepreneurship.

22:53 You had a bunch of great advice.

22:54 So people might want to check that out as well.

22:55 Do you actually know these episode numbers by heart?

22:58 Or did you look that up before?

22:59 Some of them I know, but that one I used the search.

23:01 Okay.

23:02 I remember you were on there.

23:03 I remember what it was about, but not the number.

23:05 I just put together that I know two people from Explosion.

23:08 So that's interesting.

23:09 Yeah, it's Sebastian.

23:10 Sebastian.

23:11 Yeah, he was on your podcast recently, which I feel really bad.

23:15 I wanted to listen to this because he advertised it with like, it will tell the true story

23:20 behind his mustache, which I really wanted to know.

23:22 But then I was like, I'll need to listen to this on the weekend.

23:25 And I forgot.

23:25 So yeah, if he's listening, I'm sorry.

23:27 I will definitely, I need to know this.

23:29 So I will listen.

23:29 Excellent.

23:30 So don't spoil it.

23:31 Do a great work on FastAPI.

23:34 All right.

23:35 Speaking of people that have been on all the podcasts as well as Brett Cannon, he recently

23:39 wrote an interesting article called, What is the Core of the Python Programming Language?

23:45 And he's legitimately asking as a core developer, what is not the maybe lowest level, but what

23:52 is the essence, I guess, is maybe the way to think about it.

23:55 Oh, wow.

23:56 I only just got the core, core pun.

23:58 Like it did not occur to me when I first read the article.

24:01 I'm really, I feel really embarrassed now.

24:03 To be fair, English is not my first language, but still, it's not about that.

24:06 Anyway, sorry for interrupting.

24:09 When I first read it, I was thinking like, okay, we're going to talk about what is the

24:12 lowest level.

24:14 And yeah, okay, it's probably C and C eval.h, C eval.c and so on.

24:18 But really the thing is, Brett has been thinking a lot about WebAssembly.

24:23 And what does that mean for Python in the broad sense?

24:26 He and I talked about it on Talk Python.

24:28 I think at the very last PyCon event, we did a live conversation there about that.

24:34 And it's important because there's a few areas where Python is not the first choice, maybe

24:42 not the second choice, sometimes not even the 10th choice of what you might use to program

24:47 some very important things like maybe mobile, maybe the web, the front end part of the web,

24:54 importantly, I mean.

24:55 So there's a few really important parts of technology where Python doesn't have much reach, but all

25:02 of those areas support WebAssembly these days, right?

25:05 And if you have something in C, you can compile it to WebAssembly.

25:09 So there's some thought about like, well, what could we do potentially to make a WebAssembly

25:16 runtime for Python so that Python magically almost instantly gets access to what was just JavaScript

25:24 front end frameworks space and what is mobile, iOS and Android and all those things allow you

25:32 to directly run JavaScript as part of your app?

25:34 So how would we make that happen?

25:36 So it's pretty important, right?

25:38 If we could solve that problem, like Python is already so popular and its growth is so incredible.

25:42 Like what if we could say, oh, yeah, and now it's an important language on mobile and it's

25:47 an important front end language framework like that would just take it to the next level or

25:51 maybe a couple levels up if you do them both.

25:53 And WebAssembly seems to be one of the keys to kind of bridge that gap, right?

25:57 So Brett talks about in this article how for so long we've just had CPython is what we

26:04 think of when we have Python.

26:05 Sometimes people use PyPy, P-Y-P-Y, as a partially JIT compiled version, sometimes faster version

26:14 of Python, but not always because the way it interacts with C, libraries that you might be

26:19 using through packages and so on.

26:22 And really, it's a lot of Python's dynamic nature makes it hard to do outside of an interpreter

26:26 where, to be clear, WebAssembly is a compiled language, right?

26:31 So if you're going to put it over there, maybe it's going to require it to be compiled.

26:34 So this is a really interesting thing to go through and read and think about with Brett.

26:38 He talks about things like, well, how much of the Python language would you have to implement

26:42 and still consider it to be valid Python?

26:45 Like we talked about MicroPython and usually don't people look at, they don't look at that

26:49 and go, that's not Python.

26:50 That's fake, right?

26:51 No, like it's Python, but it's not as much Python, right?

26:53 You don't have the same, all the APIs on MicroPython as you do on regular Python.

26:58 So questions like, do you still need a REPL?

27:01 Could you live without locals, right?

27:04 The ability to ask what the local variables are and so on.

27:06 So he said he didn't really have a great bunch of, a great answer.

27:11 It's more of a philosophical, like we need to solve this.

27:13 But I do want to share some of my thoughts on this.

27:16 And I feel like maybe what we could do is we could come up with like a standard Python

27:22 language definition that is a subset of full Python, right?

27:27 Here's the essence.

27:28 Like, okay, we have to be able to create classes.

27:30 We have to be able to create functions.

27:31 You have to define strings.

27:32 Probably you want type annotations.

27:33 But do you need a vowel?

27:35 Maybe, maybe not.

27:37 Right?

27:38 So like that, if you could have a subset of the language that was smaller, as well as the standard library, because do you really need to like parse CSS hex colors?

27:49 Everywhere?

27:50 Probably not.

27:51 It's a very underused part of the library, but it's in there.

27:54 Right?

27:55 So if we could narrow it down, maybe it would be easier to think about how does it go to WebAssembly?

27:59 How does it go to like some kind of JavaScript runtime or something like that?

28:03 And if it sounds crazy, you know, the .NET people did this.

28:05 They have a .NET standard class library language.

28:09 They got it running on WebAssembly.

28:10 So it's, there's an example of it out there and something that's kind of sort of similar.

28:15 Right?

28:16 So I think this would just open stuff up if you could get Python in these places.

28:21 What do you guys think?

28:21 Initially, I was never so sold on WebAssembly until, and especially WebAssembly and Python until I watched Dave Beasley live code a compiler at PyCon India, I think it was.

28:33 And I was like, oh, this is kind of, this is kind of fun.

28:35 And I mean, it was just also fun to watch Dave Beasley live code a compiler.

28:40 Yeah, for sure.

28:40 Classic.

28:42 But so that did get me thinking.

28:45 I do think one question I think we should ask ourselves is like, well, do we really, do we really need Python to do all of the things in the browser?

28:53 Like, is this really, does this really have a benefit that like actually makes a difference?

28:58 A, B, there are a lot of things people use Python for that just wouldn't work in that way.

29:03 And that's also, I think, part of what makes Python so popular in the first place.

29:07 Like, for instance, you know, all the interactive computing environments.

29:10 That's why people want to use Python for data science.

29:14 Yeah, I Python, Jupyter Notebooks, that sort of stuff.

29:17 That's why, you know, Python as a dynamic language made so much sense to people.

29:21 And that's what made it popular.

29:23 And large scale processing, like a lot of the type of stuff we're working on.

29:26 It's like, yeah, there's stuff that you can run in the browser, but it's never going to be viable to run large scale information extraction in the browser because you want to run that on a machine for like a few hours.

29:37 But I think there are a lot of opportunities also in the machine learning space for privacy preserving technologies that already exist.

29:43 I think from what I understand, Mozilla is working on some features built into the browser, where, you know, you can have models predicting things without it being sent to someone's server.

29:53 And I think that's obviously very powerful.

29:55 That's an interesting idea.

29:57 Right.

29:57 Yeah.

29:58 Because if you could have a little bit of machine learning.

30:00 Yeah.

30:01 But you don't have to give up the data privacy aspect of it.

30:03 That's pretty cool.

30:04 Yeah.

30:04 So I think for that, there's a lot of potential here for running Python in a browser.

30:07 Yeah.

30:08 Well, we start getting used to saying what is Python is what is the CPython implementation.

30:13 And we got to remember CPython is the reference implementation for the language spec.

30:19 And I think, I guess we're kind of getting at maybe we need to split it up and have a, like a core language spec and an extended one or something.

30:30 I don't know.

30:30 The, where would you divide the line?

30:32 Because we've seen, like you said, we've seen things like CircuitPython and, and other things.

30:36 And we've actually talked about several smaller languages based on Python that just try to be the same syntax.

30:43 But at which point is it, when is it not Python anymore?

30:48 And there's at least some of the stuff.

30:50 Like I could totally see having a distribution of Python that doesn't have a REPL still count.

30:56 I could totally see not having idle, for instance.

31:00 If something doesn't ship with idle, is it still Python?

31:02 I think so.

31:04 And because of idle, then you need Tkinter and, or you need TK stuff in there.

31:09 And there's a lot of stuff that maybe I would be in like, you know, could you live without locals?

31:14 Most of the time, probably.

31:16 I actually think this would be since the web and since mobile is so, such a big part of our lives.

31:23 And it will be for a while.

31:24 This might be a decent dividing line to say whether or not it's for WebAssembly or not.

31:29 Maybe we should split the division at whatever we need to implement a WebAssembly version of Python.

31:35 And anything above that line is an extended version of Python or something.

31:41 Yeah.

31:41 Yeah, that's a good point.

31:42 All right.

31:43 I don't want to go too long at this section because I want to make sure we get the others.

31:47 But I do want to leave you with just some thoughts.

31:48 What if shipping Python was just shipping a single binary and a thing that ran it?

31:53 You could do that with WebAssembly.

31:55 Maybe two WebAssemblies, the runtime plus the code.

31:58 What if all the browsers had capability to plug in alternate runtimes through WebAssembly?

32:05 So right now you have a JavaScript engine.

32:06 But what if, like, say, Firefox and Edge and whatnot came up with a way to say, here's a WebAssembly API to plug in alternate runtimes, Python, Ruby, .NET, Java, you name it, and then shipped with the latest version of each of those runtimes.

32:24 So you just don't have to download.

32:25 Like, the big problem now is you can do it, but you've still got to download, like, 10 megs per page, which is not a good idea.

32:32 So anyway, I think there's a ton of interesting things that open up if this were possible.

32:37 So I'm glad Brett's still on this, and hopefully he keeps thinking about it.

32:40 Brian, I still need to learn Pathlib.

32:43 Really?

32:43 You got any ideas on how I do that?

32:44 Really?

32:44 You're not using Pathlib?

32:46 I'm just stuck in the OS.path world.

32:51 I just really need to get with the time.

32:53 Help me out here.

32:54 Okay.

32:54 So Pathlib is where...

32:55 I mean, I know the value.

32:56 Yeah, you're like some kind of animal, like OS.path.

32:58 So I have no offense to OS.path.

33:04 But, you know.

33:05 No, I really love Pathlib a lot.

33:07 But there is...

33:09 I got to tell you that the documentation for Pathlib doesn't cut it as an introduction.

33:13 You can find what you're looking for, but if you know what you're looking for.

33:17 But I agree with Chris May.

33:19 So Chris May wrote a post called Getting Started with Pathlib.

33:23 I guess it's kind of...

33:24 He's got a little PDF field guide that you can download, but he has a little bit of a blog

33:28 post introducing it.

33:30 But I downloaded it.

33:31 It's like nine or ten pages.

33:32 And it's actually a really good introduction to Pathlib.

33:36 So I really like it.

33:37 The big thing with OS.path versus Pathlib is Pathlib creates path objects.

33:42 So there's a class that represents a path that you have methods on.

33:45 And it makes it different for when you're dealing with this.

33:49 With OS.path, it's just strings.

33:51 So it's manipulating strings that represent paths.

33:54 So the object's different.

33:56 I like it.

33:57 Actually, I switched just for the ability to add buildup paths with just having the slash operator.

34:03 Yeah, it's really interesting how they've overridden division.

34:06 But I think it's a good example of where this makes sense.

34:09 It's a reasonable use case.

34:10 It looks good.

34:11 It's defensible.

34:12 There are other cases where you're like, oh, did you really have to overload these operators?

34:16 But they're fine.

34:17 I think that's very valid.

34:19 Yeah.

34:19 And things like how do you find parts of a path?

34:23 When you have to parse paths, that's where Pathlib really shines for me.

34:27 So if you want to find the parent of something or the parent of the second level parent,

34:31 there's ways to do that in Pathlib.

34:34 And in OS.path, you're stuck with trying to split things and stuff.

34:38 And it's gross.

34:39 I mean, there are operations to do it.

34:41 But it's very good to have this relative, I don't know, just all these operators, like parent.

34:47 And then one of the things that it took me a while to figure out was I was used to trying to find the absolute path of something.

34:55 And in Pathlib, finding the absolute path is the resolve method.

34:58 So you say resolve and it finds the absolute path for you.

35:02 You can find the current working directory.

35:04 You can go up and down folders.

35:05 You can use globs.

35:07 You can find parts of path names and stuff.

35:10 And it's just a really comfortable thing.

35:12 So I think you should give it a whirl.

35:14 And it's not like it's going to change your life a lot.

35:18 But the next time you come up with, when the next time you're programming, you're like, okay, I got to figure out, I got to have a base directory and some other directory.

35:25 Well, I'll reach for Pathlib instead of OS.path.

35:28 Yeah.

35:29 I guess it has been there since 3.4, so I should give it the times.

35:32 Yeah.

35:32 So, I mean, now, before I could see the objection of, like, oh, you have to backport it.

35:36 And also, I think what I like as well is a lot of integrations that, like, you know, automatically can perform checks where the path exists, stuff like that.

35:44 Or for me as a library author, you know, you're writing stuff for users and you want to give them feedback.

35:49 And, for instance, in a library like Click or Typer, which is the modern type hint version CLI interface, which was also built by my colleague, Sebastian, you can just say, hey, this argument is a path.

36:01 What you get back from the command line is a path.

36:03 It will check that a path exists via Pathlib.

36:06 So it does, like, you know, a whole bunch of magic there.

36:10 Yeah.

36:10 That is super cool.

36:11 Yeah.

36:12 Or you can say it can't be a directory.

36:14 And then you write your CLI, user passes in an invalid path, and you don't even have to do any error handling.

36:19 It will automatically, before it even runs your code, say, nope, that argument is bad.

36:24 So that's pretty cool.

36:25 That's awesome.

36:25 And you don't have to care about Unix versus Mac or PC or something like that.

36:30 Yeah.

36:30 I mean, Windows.

36:31 I mean, no offense to Windows, but it's always handling paths and Windows is always the classic story.

36:37 Also, as a library author, where you just, well, we're supporting all operating systems.

36:41 But, like, well, Windows just does it a bit differently.

36:43 And you cannot assume that a slash means a slash.

36:47 Yeah, for sure.

36:48 All right.

36:49 Well, the final item is yours, Ines.

36:51 And it's definitely interesting.

36:53 So if you're working in the machine learning data science side of things, it might not be enough to just back up your algorithms and your code, right?

37:01 Yeah.

37:02 You also have, yeah, machine learning is code and data.

37:04 So, yeah.

37:05 So this is something we discovered a while ago and that we're now using internally.

37:10 Internally, so we currently, as I mentioned before, we're working on version three of spaCy.

37:13 And one of the big features is going to be a completely new optimized way for training your custom models, managing the whole end-to-end workflows from pre-processing to training to packaging and also making the experiments more reproducible.

37:27 You want to train a cool model and then send it over to your colleague and your colleague should be able to run the same thing and get the same results.

37:34 Sounds really basic, but it's pretty hard in general in machine learning.

37:37 So our spaCy stuff will also integrate with a tool called DVC, which is short for data version control, which we've started using internally for our models.

37:47 And DVC is basically an open source tool for version control, specifically for machine learning and for data.

37:54 So, you know, you can't really, you can check your code into a Git repo as you're working on it, but you can't just check your data sets and models and artifacts into Git or your model weights.

38:03 Like that's, so it's very, very difficult normally to keep track of changes and your files.

38:08 You kind of, most people just end up with this directory of files somewhere and it can be very frustrating.

38:13 And so you can really, you can think of DVC as Git for data and the command line usage is actually pretty similar.

38:18 So like you type Git in it and DVC in it to initialize it.

38:22 And then you can do DVC add to start tracking your assets and add them.

38:27 So it's like, I think if, yeah, if you're familiar with Git as like abstract, it can be at times, you will also kind of find it easy to get into DVC.

38:35 And it basically lets you track any assets like data sets, models, whatever, by adding meta files to your repository.

38:44 So you always have like the checksum in there and you always have these checkpoints of the asset, even though you're not actually checking that file into your repo.

38:52 And that means you can always go back, fetch whatever it was from your cache and rerun your experiments.

38:59 And it also builds this really cool dependency graph.

39:02 So you can really have these complex pipelines with different steps.

39:06 And then you only have to rerun one step if some of the inputs to it have changed.

39:12 So, you know, in machine learning, you'd often have pipeline, like you start, you download your data, then you pre-process it.

39:19 Then you convert it to something, then you train, then you run an evaluation step.

39:24 And everything sort of depends on each other.

39:26 And that can make things like really hard.

39:28 And you never know, you usually have to run everything, you know, clean from scratch.

39:33 Because, yeah, if something changes, your whole results change.

39:36 So if you set up your pipelines with DVC, it can actually decide whether something needs to be rerun.

39:42 Or it can also know what needs to be rerun to reproduce exactly what you're trying to do.

39:47 So that's pretty cool.

39:48 Yeah, that could save you a ton of time and money if you're doing it in the cloud.

39:51 Yes, exactly.

39:52 Yeah.

39:53 And, you know, you can share it with other people.

39:55 It's like, it's, I think it definitely solves a problem that's very real.

39:58 And, yeah, the people making DVC, they've also recently released a new tool that I have not personally checked out yet.

40:04 But it looks very interesting.

40:05 It's called CML, which is short for Continuous Machine Learning.

40:08 And that's really more of the CI, which kind of is logically the next step, right?

40:12 You manage everything in your repo.

40:14 And then you obviously want to run automated tests and continuous integration.

40:18 So the previous looked really cool.

40:21 Like it showed kind of a GitHub action where you can submit a PR with like some changes to your code and your data.

40:28 And then you have the bot commenting on it and it shows like accuracy results and a little graph and how stuff changes.

40:34 So it's really like these code coverage bots that you've probably seen where like you change some lines and then it tells you, oh, coverage has gone up or down and, you know, the new view of your code.

40:45 So that's what it looks like.

40:47 So I think, yeah, I'm really excited about this.

40:49 And definitely it solves a problem.

40:50 It's already been solving a problem for us.

40:52 And yeah.

40:52 How does it store the large files?

40:54 I know it has this cache.

40:55 Is that a thing that you host?

40:56 Does it have a hosted thing that's kind of like GitHub?

40:59 I'm not sure if you could.

41:01 You probably connected to some cloud, but like normally you have that locally.

41:03 It also has a cool thing where you can actually download files via the tool.

41:07 And then depending on where you're fetching it from, if it's a Google storage bucket or S3 bucket or something, you can actually also tell if the file has changed and whether it needs to be redownloaded.

41:17 And so, for example, internally, what we're doing is we're mounting a Google storage, Google cloud storage bucket or however they call it locally as like, you know, so it's like kind of a drive you have access to locally.

41:30 And then you can just sort of type GS, blah, blah, blah, blah, and then the path and really work with it like a local file system.

41:36 And that's pretty nice.

41:38 So you can, you know, you can have, you can work with private assets because the thing is a lot of toy examples assume that, oh, you just download a public data set and then you train your model and then you upload it somewhere.

41:47 But that's not very realistic because most of the time the data you have can't just go in the cloud publicly.

41:52 So, yeah.

41:54 But yeah, I think I don't even know exactly how it works in detail, but like it can basically tell fetch, I think from the headers or something, it can tell whether the file you're downloading has changed and whether there's something new.

42:04 Yeah.

42:05 Yeah.

42:05 With a normal version control, one of the reasons we use it is to try to find what's different.

42:09 Can you do, do you do diffs on data or?

42:12 I don't know.

42:13 Maybe.

42:14 I mean, I'm not sure if there's, I think the main diff is more like around the results that you get because diff, I mean, diffing large data set, diffing weights, you kind of can't.

42:25 That's really where we are.

42:26 The other problem where like you need to run the model to find out what it does and then you're diffing accuracies rather than weights.

42:33 Okay.

42:34 I don't know if it does like actual diffing of the data sets, but often the thing that changes is really the models.

42:38 Like you have the, you know, you have your whole data and then you change things about your code.

42:44 Yeah.

42:44 And something changes and it's, you want to keep track of what it is or how it manifests.

42:48 Yeah.

42:49 It's really cool to see them working on this.

42:50 Yeah.

42:51 So, and also we'll be in, in spaCy 3.

42:53 We'll hopefully have a pretty neat integration where, you know, if you want, it's not like mandatory, but if you say, Hey, that's cool.

42:59 That's how I want to manage my assets.

43:00 You can just run that in your, in a spaCy project and then it just automatically tracks everything.

43:06 And it, you know, you can check that into Git and share it and other, other people can download it.

43:11 So that's, yeah, I'm pretty excited about that.

43:13 It works pretty well so far.

43:14 Yeah.

43:15 Everything you can do to make it a little easier to work with spaCy and just make it reproducible.

43:19 Yeah.

43:20 And it's just, the things are hard.

43:21 Like there is, I'm not a fan of these all one click, everything just magically works.

43:25 Like it looks, it looks nice and it's a nice demo, but like once you actually get down to like the real work, like things need to be a bit modular.

43:32 Things need to be customizable.

43:33 Otherwise you're always hitting edge cases or you have these leaky abstractions.

43:37 So yeah.

43:39 Yeah.

43:39 I think things should be easy to use, but you can't just magically cover everything by just providing one button.

43:45 That's just not going to work.

43:47 Yeah.

43:47 Cause when it doesn't work, it's not good anymore.

43:49 Yeah, exactly.

43:49 All right.

43:50 Yeah.

43:51 All right.

43:52 Well, that's our six items that we go in depth into, but at the end, we always just throw out a couple of really quick things that maybe we didn't have time to fit into the main section.

44:01 And I want to talk about two things that are pretty exciting.

44:05 One is if you care about podcasts as a catalog of a whole bunch of things, I don't know how many podcasts there are.

44:13 There's probably over a million podcasts these days.

44:15 One of our listeners, Anton Ziyanov wrote a cool Python package that will let you search the iTunes directory and query it.

44:24 And it's basically a Python API into iTunes podcasting directory.

44:29 You know, some people think that you've got to be part of the Apple ecosystem to care about iTunes, but really that's just the biggest like directory kind of Yahoo circa 1995 style of listing of podcasts.

44:42 So if you care about digging in and researching podcasts, check that out.

44:45 That's pretty cool.

44:46 And then, yeah.

44:48 And then I've also, I'm such a big fan of f-strings.

44:51 How about you too?

44:51 Yes.

44:52 Yes.

44:52 F, yes, right?

44:54 Yeah.

44:54 I'm finally, I'm finally working in like Python three only.

44:57 I remember, I think last time I was on the podcast, I was basically, I was saying how like, oh, all these modern things, they're so nice.

45:03 I wish I could use them more, but we're still supporting Python two, but like, no, everything I write now, 3.6.

45:09 Yes.

45:09 And I've talked previously about a tool called Flint, F-L-Y-N-T, which lets you run against an old code base and convert all the various Python two and three styles of formatting magically into Python three.

45:22 I think that was actually really nice.

45:24 The episode I was.

45:25 Yeah.

45:25 You might've been right.

45:26 Like, I wish I could run this.

45:28 Right.

45:28 Yeah.

45:28 And yeah, I ran that against like 20,000 lines of Python.

45:31 I found like just a couple errors reported them.

45:33 They got fixed.

45:34 So that's nice.

45:35 But the thing that's bugged me endlessly about f-strings is I'll be halfway through writing the string and I'm like, oh yeah, I want to put data here.

45:42 So I got to go back to the front of the string, not necessarily back to the front of the line, but maybe back to like the string is being passed to a function.

45:49 So I go back to the first quote, put the F, go back forward and then start typing out the thing I actually wanted.

45:55 Right.

45:55 Or maybe I'll F string something.

45:57 And when I really, I, oh, I'm not going to put data.

45:59 Right.

45:59 So it's like you're halfway through and you want it to become an F string.

46:02 Well, PyCharm is coming with a new feature where if you start writing a regular string and pretend like it's an F string, it'll automatically upgrade to f-strings.

46:11 Yes.

46:11 Halfway through.

46:12 Yes.

46:13 Without leaving.

46:14 So you just say curly variable.

46:15 It's like, oh, okay.

46:16 That means that's an F string and the F appears at the front.

46:18 Yes.

46:19 Nice.

46:19 So that is pretty awesome.

46:20 Anyway, those are my two quick items.

46:22 Ines, I'm also excited about the one you got here.

46:24 Yeah.

46:25 This is awesome.

46:25 Yeah.

46:25 I had one, which is something coming to 3.9 or in 3.9, which is PEP 585.

46:31 And you can use, when you use type annotations, you can now use the built in types like list and dict as generic types.

46:40 So that means no more from typing import list with a capital L.

46:45 Yes.

46:46 Yes.

46:48 So you just literally, I mean, when I first saw it, I'm like, that looks strange.

46:52 But like, yes, I'm so excited about this.

46:55 It probably, it'd be years until I can just like use it all across my code bases because.

46:58 True.

46:59 Yeah.

46:59 But like, yay.

47:00 That's in 3.9?

47:01 Yeah.

47:02 Yeah.

47:03 That's in 3.9.

47:03 I'm already using 3.9 and I didn't know that.

47:06 You can do this.

47:06 Yeah.

47:07 Yeah.

47:07 And Guido is one of the guys on the PEP making this happen.

47:12 Like I said, he's really into typing.

47:13 Oh, that's great.

47:15 So this is really cool because it was super annoying to say, oh, you have this new import

47:18 just because you want to use type annotations on a collection.

47:20 Right?

47:21 Now you don't have to.

47:22 And there's actually a bunch of the collection stuff and iterators and whatnot.

47:25 Like the, you know, the collections module, like that, a bunch of stuff in there is really

47:31 nice.

47:32 And they're compatible, like lowercase list of str is the same as capital list of str, I believe.

47:38 All right, Brian, what you got?

47:39 Oh, I just wanted to, I'll drop a link in the show notes.

47:42 Testing code 120 is where I interviewed Sebastian Ramirez from Explosion also.

47:48 And talking about FastAPI and Typer because I'm kind of in love with both of those.

47:54 They're really cool.

47:55 Yeah.

47:55 Absolutely.

47:56 All right.

47:57 Well, that's a cool one.

47:58 Definitely going to check that out.

47:59 And you can find out why he has the cool mustache.

48:02 That's right.

48:04 All right.

48:05 So we always end the show with a joke and I thought we could do two jokes today.

48:10 So I think, Ines, do you want to talk about this first one?

48:13 Oh, yeah.

48:13 I mean, I'm not even sure it counts as a joke per se, but like it's more of a humorous

48:17 situation, I guess.

48:19 Yeah.

48:19 It ties in.

48:21 Well, it's Sebastian again.

48:24 Like he had this very viral tweet the other day where he posted about some experience.

48:28 I can just read it out because I think it needs to kind of stand on its own.

48:32 So he's right.

48:33 I saw a job post the other day.

48:35 It required four plus years of experience in FastAPI.

48:38 I couldn't apply as I only have 1.5 plus years of experience since I created that thing.

48:44 And then he says, maybe it's time to reevaluate that years of experience equals skill level.

48:50 And this was like, it resonated with people so much.

48:53 I was actually surprised to see like everyone was like, oh, yeah, HR.

48:56 Like apparently this seems to be this huge issue, obviously, that like, well, not most job ads

49:02 not written by the people who actually work with the technologies and where you have.

49:07 Yeah.

49:07 Actually.

49:08 Yeah, this is awesome.

49:09 And this tweet actually just got covered on DTNS, the daily news tech show, daily tech news show.

49:14 I guess it is.

49:15 Alongside another posting that said you needed eight years of Kubernetes experience for another job.

49:21 But of course, Kubernetes has only been around for four years.

49:24 Yeah.

49:24 When you say this went viral, it had 46,000 retweets and 174,000 likes.

49:29 That's like, that's got some traction.

49:31 I feel like this might be a problem.

49:33 Yeah.

49:33 I was surprised that like so many people are like, yeah, that's a big deal.

49:37 And it's like, and I mean, it is true.

49:39 Like kind of tech hiring sort of seems to be broken.

49:42 And it's also, it's like, it's a bit different in my case, I guess.

49:45 But like, I don't qualify for most roles using the tech that I write.

49:50 And in some cases that's justified because I'm not a data scientist just because I write developer

49:54 tools for data scientists doesn't mean I can do the job.

49:56 But in other cases, I'm like, there's kind of a ridiculous amount of arbitrary stuff you're

50:01 asking for in this job ad.

50:02 Maybe that's needed.

50:02 Maybe not, but like it centers around like a piece of software that I happen to have

50:07 written and I do not qualify for your job ad at all.

50:10 Like the last time I wrote a job description, I intentionally left off the college degree

50:18 requirement because all of the other requirements I was listing in there, either they had it from

50:23 college plus experience or they had it just from experience.

50:26 So I was fine with that.

50:27 By the time it actually went live, somebody in HR had added a college degree requirement

50:33 to it.

50:33 I just couldn't get away with that list in that, I guess.

50:36 Yeah.

50:37 Master's degree in spacey is preferred.

50:39 In spacey preferred.

50:40 Yeah.

50:41 But I guess another problem there is, it's like, well, look, if you ask, if HR writes these

50:44 job ads with these bullshit requirements, then well, who applies?

50:49 Like it's either people who are like, yeah, whatever, or people who are full of shit.

50:52 And then that's the sort of culture you're fostering.

50:54 And it might not even be the engineer's fault who wrote a very honest job description, but

50:59 like, yep.

50:59 Who applies to that?

51:00 Like, yeah.

51:01 You're going to make me lie about my FastAPI experience.

51:04 Yeah.

51:04 People just apply to anything.

51:05 I'm like, yep, I have 10 years experience in everything.

51:07 Great.

51:08 And they're like, perfect.

51:09 That's what we're looking for.

51:10 You're hired.

51:10 And then you wonder like, why is our company culture so terrible?

51:13 Hmm.

51:14 Well, I actually did have somebody apply to a job and say they have multiple years of experience

51:21 in any new language coming up.

51:23 Nice.

51:27 All right, guys.

51:28 Well, it looks like we're just about out of time.

51:29 Let me give you one more joke for it.

51:32 Brian, will you describe this picture and then I'll read what it says?

51:35 There's a poorly drawn horse, I think.

51:39 Zebra, horse that has white on the back end and black on the front end.

51:43 And the text says, I defragged my zebra.

51:46 I don't even know if people defrag drives anymore.

51:48 So this is only going to resonate with the folks that have been around for a while.

51:51 I saw that there was this great video I came across on YouTube where you can actually watch

51:55 like a live defrag session.

51:56 Like, I don't know, Windows 95.

51:57 And it's like, I don't know, it takes a few hours.

52:00 And, you know, you can kind of bring back that nostalgia and just put it on your TV and

52:03 just sit there and you're like, yeah.

52:05 Oh, it's like the aquarium you would put on your TV.

52:08 Yeah.

52:09 Like, but for tech.

52:10 Follow the show on Twitter via at Python Bytes.

52:13 That's Python Bytes as in B-Y-T-E-S.

52:16 And get the full show notes at pythonbytes.fm.

52:19 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

52:23 We're always on the lookout for sharing something cool.

52:26 On behalf of myself and Brian Okken, this is Michael Kennedy.

52:29 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript