WEBVTT

00:00:00.001 --> 00:00:04.880
Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:00:04.880 --> 00:00:09.180
This is episode 191, recorded July 14th, 2020.

00:00:09.180 --> 00:00:10.180
I'm Michael Kennedy.

00:00:10.180 --> 00:00:11.060
And I'm Brian Okken.

00:00:11.060 --> 00:00:12.680
And welcome, special guest Ines.

00:00:12.680 --> 00:00:13.080
Hi.

00:00:13.080 --> 00:00:14.300
It's great to have you here.

00:00:14.300 --> 00:00:19.600
So I want to kick this off with a cool IoT thing.

00:00:19.600 --> 00:00:23.800
Now, IoT and Python, they've got a pretty special place.

00:00:23.800 --> 00:00:33.540
Because when I think about Python, I think of it as not being something that sort of competes with assembly language and really, really low level type of programming for small devices.

00:00:33.540 --> 00:00:42.660
But, you know, amazing people put together MicroPython, which is a reimplementation of Python that runs on little tiny devices.

00:00:42.660 --> 00:00:46.440
And we're talking like $5 microchip type devices, right?

00:00:46.440 --> 00:00:47.760
Have either of you all played with these?

00:00:47.760 --> 00:00:48.020
No.

00:00:48.020 --> 00:00:49.520
No, I haven't.

00:00:49.520 --> 00:00:51.800
But I've been seeing a bit of this from my brother.

00:00:51.800 --> 00:00:53.100
So he's pretty amazing.

00:00:53.280 --> 00:00:54.560
Like, he's a bit younger than me.

00:00:54.560 --> 00:00:55.420
He's an event technician.

00:00:55.420 --> 00:01:02.300
And he recently taught himself programming and everything just so he can build stuff on these, like, tiny raspberry pies.

00:01:02.300 --> 00:01:03.400
And, like, I don't know.

00:01:03.400 --> 00:01:04.660
He's doing super advanced stuff.

00:01:04.660 --> 00:01:07.420
It's been really interesting to see him learn to program.

00:01:07.420 --> 00:01:09.160
And he's also, he's incredibly good.

00:01:09.160 --> 00:01:12.100
He has, like, amazing instincts about programming, even though he's never done it before.

00:01:12.100 --> 00:01:14.280
But, like, so I've been kind of watching this from afar.

00:01:14.280 --> 00:01:16.440
And it made me really want to build stuff.

00:01:16.440 --> 00:01:17.820
So I'm very curious.

00:01:17.820 --> 00:01:22.080
Yeah, I've done the CircuitPython on some of the Adafruit stuff.

00:01:22.080 --> 00:01:22.440
Exactly.

00:01:22.760 --> 00:01:25.240
So I always just want to build these things.

00:01:25.240 --> 00:01:28.920
I'm like, what could I think of that I could build with these cool little devices?

00:01:28.920 --> 00:01:31.020
I just, in my world, I don't have it.

00:01:31.020 --> 00:01:36.900
Maybe if I had a farm, I could, like, automate, you know, like, watering or monitoring the crops.

00:01:36.900 --> 00:01:38.240
Or if I had a factory.

00:01:38.240 --> 00:01:41.940
But I just don't live in a world that allows me to automate these things.

00:01:41.940 --> 00:01:42.580
Do you have pets?

00:01:42.580 --> 00:01:44.500
Maybe you can build something for pets.

00:01:44.640 --> 00:01:46.100
We generally don't have pets.

00:01:46.100 --> 00:01:48.740
But we are fostering kittens for the summer.

00:01:48.740 --> 00:01:52.720
So I could put a little device onto one of the kittens, potentially.

00:01:52.720 --> 00:01:55.800
GPS tracker.

00:01:55.800 --> 00:01:56.480
Yeah.

00:01:56.560 --> 00:01:59.540
So in general, you have to get these little devices, right?

00:01:59.540 --> 00:02:01.540
You've got the US PyCon.

00:02:01.720 --> 00:02:05.040
We got the Circuit Playground Express, which is that little circular thing.

00:02:05.040 --> 00:02:12.840
It's got some 10 LEDs and a bunch of buttons and other really advanced things like motion sensors and temperature and so on.

00:02:13.480 --> 00:02:20.820
Probably the earliest one of these that was a big hit was the BBC Microbit, where I think every seventh grader in the UK got it.

00:02:20.820 --> 00:02:23.920
Some grade around that scale got one of these.

00:02:23.920 --> 00:02:27.480
And it really made a difference in kids seeing themselves as a programmer.

00:02:27.480 --> 00:02:37.120
And interestingly, especially women were more likely to see programming as something they might be interested in in that group where they went through that experience.

00:02:37.120 --> 00:02:39.500
So I think there's a real value to work with these little devices.

00:02:39.500 --> 00:02:42.680
But getting a hold of them can be a challenge, right?

00:02:43.100 --> 00:02:44.780
You've got to physically get this device.

00:02:44.780 --> 00:02:51.280
That means you have that idea of I want to do this thing and then I have to order it from Adafruit or somewhere else and then wait for it to come.

00:02:51.280 --> 00:02:54.760
And my experience has been I'll go there and I'm like, oh, this is really cool.

00:02:54.760 --> 00:02:55.400
I want one of these.

00:02:55.400 --> 00:02:56.840
Oh, wait, no, it's sold out right now.

00:02:56.840 --> 00:02:58.160
You can order it again in a month.

00:02:58.160 --> 00:02:58.820
Right.

00:02:58.820 --> 00:02:59.800
So getting is a challenge.

00:02:59.800 --> 00:03:10.040
And also, if you're working in a group of, say, like you want to teach a high school class or a college class or something like that, and you want everyone to have access to these.

00:03:10.440 --> 00:03:15.160
Well, then all of a sudden, the fact that maybe it costs $50 wasn't a big deal.

00:03:15.160 --> 00:03:20.660
But if it's $50 times 20 or 100 kids, then all of a sudden, well, maybe not.

00:03:20.940 --> 00:03:24.160
So I want to talk about this thing called Device Simulator Express.

00:03:24.160 --> 00:03:33.060
So this is a plug in or extension or whatever the things that I think is extensions that VS Code calls them that makes VS Code do more stuff.

00:03:33.060 --> 00:03:37.100
And it's a open source free device simulator.

00:03:37.100 --> 00:03:44.740
So what you can do is you just go to the Visual Studio Code extensions thing and you type device probably is sufficient, but device simulator express.

00:03:44.940 --> 00:03:51.720
And it'll let you install this extra thing inside of VS Code that is really quite legit.

00:03:51.720 --> 00:03:58.940
So it gives you a simulated Circuit Playground Express, a simulated BBC Microbit.

00:03:58.940 --> 00:04:05.560
And the most impressive to me is the clue from Adafruit, which actually has a screen that you can put graphics on.

00:04:06.020 --> 00:04:13.680
So really, really cool way to get these little IoT devices with Circuit Playground, Circuit Python.

00:04:13.680 --> 00:04:17.100
So Adafruit's fork of MicroPython on there.

00:04:17.100 --> 00:04:17.980
What do you guys think?

00:04:17.980 --> 00:04:18.500
See that picture?

00:04:18.500 --> 00:04:19.400
Look how cool that is.

00:04:19.400 --> 00:04:25.620
Yeah, so you can write Python in one tab and then just have the visualization in the other.

00:04:25.620 --> 00:04:26.320
That's pretty cool.

00:04:26.320 --> 00:04:26.520
Yeah.

00:04:26.520 --> 00:04:27.100
Yeah, exactly.

00:04:27.100 --> 00:04:37.300
And it's very similar to, say, what you might do with Xcode and iPhones, where you have an emulator that looks quite a bit like it or what you would do on the Android equivalent.

00:04:37.300 --> 00:04:41.540
I actually think this is a little bit better than the device because it's actually larger, right?

00:04:41.540 --> 00:04:48.020
Like the devices are really small, but here's like, you know, you could be like a huge thing on your 4K monitor with a little clue device.

00:04:48.100 --> 00:04:52.440
So you can simulate Circuit Playground Express, BBC MicroBit, and the clue in here.

00:04:52.440 --> 00:05:02.880
And we just say new project, and it'll actually write the boilerplate code for the main.py or code.py or whatever it's called that the various thing is going to run.

00:05:02.880 --> 00:05:07.980
And like you said, Ines, on one half, it's got the code, and the other half, it has the device that you can interact with.

00:05:07.980 --> 00:05:13.600
I was thinking that a couple of cases that would be great is, like you were saying, trying to get a hold of it.

00:05:13.700 --> 00:05:19.520
But you might not even know if the concept that you're going to use is really going to work for the device you're thinking of.

00:05:19.520 --> 00:05:27.900
So this would be a good way to try it out, to try out whether the thing you're thinking of trying for your house or whatever would actually work for this device.

00:05:27.900 --> 00:05:33.040
The other thing was, yes, you brought up education and that it's big.

00:05:33.040 --> 00:05:40.320
I was thinking about a couple of conferences where they tried to do the display and try to have a camera or something.

00:05:40.320 --> 00:05:40.840
Yes.

00:05:40.840 --> 00:05:43.220
Sometimes it works and sometimes it doesn't.

00:05:43.220 --> 00:05:51.560
This way you could just do a tutorial or in a teaching scenario and everybody could see it because it's just going to be displayed on your monitor.

00:05:51.560 --> 00:05:52.120
Right.

00:05:52.120 --> 00:05:53.940
Your standard screen sharing would totally work here.

00:05:53.940 --> 00:05:54.980
That's a good point as well.

00:05:55.100 --> 00:05:56.800
And it doesn't have to be all or nothing.

00:05:56.800 --> 00:06:01.120
Actually, what's really interesting is this thing isn't just an emulator, but you can do debugging.

00:06:01.120 --> 00:06:07.260
You can set like a breakpoint and like step through it running on the device simulated or you can actually run it.

00:06:07.260 --> 00:06:13.600
If you had a real device plugged in, you can run it on there as well and then do debugging and breakpoints and stuff on the actual device.

00:06:13.600 --> 00:06:14.940
So it's like you tested here.

00:06:14.940 --> 00:06:18.480
I always admire people who actually use like the proper debugging features.

00:06:18.640 --> 00:06:23.980
I know VS Code has like so much of this and I'm always like I should use this more, but I'm like, okay, print.

00:06:23.980 --> 00:06:25.840
Print, print.

00:06:25.840 --> 00:06:26.960
Yeah.

00:06:26.960 --> 00:06:29.420
There's some really cool libraries that will actually do that.

00:06:29.420 --> 00:06:36.800
I can't remember what it's called, but Brian and I recently covered one that would actually like print out a little bit of your code and the variables as they change over time.

00:06:36.800 --> 00:06:39.800
It was like the height of the print debugging world.

00:06:39.800 --> 00:06:40.740
It was really, really cool.

00:06:40.740 --> 00:06:41.540
I wish I could remember.

00:06:41.540 --> 00:06:42.300
Do you remember, Brian?

00:06:42.300 --> 00:06:44.140
No, we actually covered a couple of them.

00:06:44.140 --> 00:06:44.900
I know.

00:06:44.900 --> 00:06:45.460
I know.

00:06:45.460 --> 00:06:47.040
That's a problem.

00:06:47.040 --> 00:06:48.500
We cover thousands of things in here.

00:06:48.500 --> 00:06:52.340
So another thing that's interesting is like, okay, so you see the device.

00:06:52.340 --> 00:06:57.680
Some of them have buttons and they have lights and you can imagine maybe you could touch the button, but they also have things like temperature,

00:06:57.680 --> 00:07:02.980
gyro meter type things or like you moving it or motion sensing or even like if you shake it,

00:07:02.980 --> 00:07:07.100
this thing has little ways to simulate all that stuff.

00:07:07.100 --> 00:07:13.280
So you can like have a temperature slider that freaks it out and says, hey, the temperature is actually this on your temperature sensor and so on.

00:07:13.280 --> 00:07:16.180
So all the stuff that the devices simulate are available here.

00:07:16.180 --> 00:07:16.700
Oh, that's cool.

00:07:16.820 --> 00:07:16.960
Yeah.

00:07:16.960 --> 00:07:20.360
So I actually had the team over on Talk Python not long ago.

00:07:20.360 --> 00:07:23.680
So people can check that over at talkpython.fm.

00:07:23.680 --> 00:07:27.680
And yeah, I'm also really excited about what you got coming here next, Brian.

00:07:27.680 --> 00:07:28.480
What is that?

00:07:28.480 --> 00:07:28.720
Yeah.

00:07:28.720 --> 00:07:31.880
Well, speaking of, I guess, debugging versus test.

00:07:31.880 --> 00:07:33.460
We didn't really talk about testing.

00:07:33.460 --> 00:07:34.840
Anyway, I'm really excited.

00:07:34.840 --> 00:07:35.800
We should have talked about testing.

00:07:35.920 --> 00:07:36.240
Yeah.

00:07:36.240 --> 00:07:38.760
So I was just, I was thinking it.

00:07:38.760 --> 00:07:47.340
I was thinking that, that I hardly ever use a debugger for my source code, but I use a debugger all the time when I'm debugging my tests.

00:07:47.580 --> 00:07:48.700
I don't know.

00:07:48.700 --> 00:07:50.500
It's just something different about it.

00:07:50.500 --> 00:07:57.380
But I've been running a lot of tests and debugging a lot of tests lately because pytest 6, the candidate release is out.

00:07:57.380 --> 00:08:05.840
Now, by the time this episode airs, I don't know if the release candidate will be released or just the release candidate still.

00:08:05.960 --> 00:08:15.740
But it's, you can install it, we'll have instructions in the show notes, but essentially you just have to say 6.0.0 RC1 and you'll get it.

00:08:15.740 --> 00:08:18.240
So there's a whole bunch of stuff that I'm really excited about.

00:08:18.240 --> 00:08:27.060
There's a lot of configuration that you used to be able to put in lots of places in your pytest Any or your setup config or tox any or something.

00:08:27.060 --> 00:08:30.360
pytest 6 will support pyproject.toml now.

00:08:30.360 --> 00:08:35.460
So if you jumped on the Toml bandwagon, you can stick your pytest configuration in there too.

00:08:35.660 --> 00:08:38.200
There's a lot of people excited about the type annotations.

00:08:38.200 --> 00:08:41.300
So the 6.0 is going to support type annotations.

00:08:41.300 --> 00:08:43.220
So it actually was a lot of work.

00:08:43.220 --> 00:08:49.780
There was a volunteer that went through and added type annotations to a bunch of it, especially the user facing API.

00:08:49.780 --> 00:09:02.440
And why this is important is if you're type checking, you're running mypy or something over your source and everything, your project, why not include your tests?

00:09:02.440 --> 00:09:06.580
But if pytest doesn't support types, it doesn't really help you much.

00:09:06.580 --> 00:09:08.080
So it will now.

00:09:08.080 --> 00:09:09.960
So that's really, really cool addition.

00:09:09.960 --> 00:09:15.340
What this is basically the API of pytest itself is now annotated with types.

00:09:15.340 --> 00:09:15.780
Yes.

00:09:15.780 --> 00:09:18.020
And well, a lot of the internal code as well.

00:09:18.180 --> 00:09:20.220
So they actually went through and did a lot.

00:09:20.220 --> 00:09:21.200
There was a lot of work.

00:09:21.200 --> 00:09:27.640
And if you look at the conversation chain, it went on for, it was a month, several month project.

00:09:27.640 --> 00:09:28.600
Wow.

00:09:28.600 --> 00:09:30.260
What does that mean for compatibility?

00:09:30.260 --> 00:09:33.380
Does that make pytest like 3.6 only and above?

00:09:33.480 --> 00:09:37.640
I think the modern versions of pytest really already are 3.6 and above.

00:09:37.640 --> 00:09:39.520
I'm not sure about that.

00:09:39.520 --> 00:09:39.720
Right.

00:09:39.720 --> 00:09:42.900
So then the door was open to use that because otherwise it would cut.

00:09:42.900 --> 00:09:49.620
I mean, it would be a weird move to like release a completely new version with Python 2 backwards compatibility.

00:09:50.720 --> 00:09:53.660
Like that's like, you wouldn't do that.

00:09:53.660 --> 00:09:53.880
Right.

00:09:53.880 --> 00:09:57.580
I mean, it's, it's, I think, well, I think the message it sends, it's like not great.

00:09:57.580 --> 00:09:58.380
I totally agree.

00:09:58.380 --> 00:09:58.800
Totally agree.

00:09:58.800 --> 00:10:01.300
There is a pinned version of pytest.

00:10:01.300 --> 00:10:02.540
I don't remember which one it is.

00:10:02.540 --> 00:10:08.640
That is still supports 2.7 if you're on it, but no new features are going in there.

00:10:08.640 --> 00:10:13.940
The thing I'm really excited about is the, is a, is a little flag they've added called no header.

00:10:13.940 --> 00:10:15.920
So don't use this.

00:10:15.920 --> 00:10:17.260
Most people don't use this.

00:10:17.520 --> 00:10:25.600
When you run pytest, it prints out some stuff like the version of Python, the version of pytest, all the plugins you're using, a bunch of information about it.

00:10:25.600 --> 00:10:28.800
All this stuff is really important for logging.

00:10:28.800 --> 00:10:36.820
If you're, if you're capturing the output to save somewhere or do a deep, a bug report or something, that information is great to help other people understand it.

00:10:36.820 --> 00:10:46.280
What I don't like about that is that it, it's not helpful if you're writing tutorials or if you're writing code to put on a slide or something.

00:10:46.380 --> 00:10:49.440
All that extra stuff just takes up space and it distracts.

00:10:49.440 --> 00:10:49.640
Yeah.

00:10:49.640 --> 00:10:58.900
Like I've had students say, like, I ran it, I think pytest in PyCharm and it has like some kind of output just stating where it is and what it's doing.

00:10:58.900 --> 00:11:00.140
They're like, this didn't work for me.

00:11:00.140 --> 00:11:02.980
I'm like, well, that was just random output from the tool.

00:11:02.980 --> 00:11:04.780
You're not actually supposed to try to run that part.

00:11:04.780 --> 00:11:05.320
You know what I mean?

00:11:05.320 --> 00:11:07.320
But it's, it's, I mean, I saw why they saw that.

00:11:07.400 --> 00:11:11.720
But at the same time, like the ability to just say like, these details don't matter in the longterm.

00:11:11.720 --> 00:11:12.500
Yeah.

00:11:12.500 --> 00:11:13.620
Yeah.

00:11:13.620 --> 00:11:16.160
So I'm, I'm excited about that to trim it down.

00:11:16.160 --> 00:11:18.020
There was a plugin called TLDR.

00:11:18.020 --> 00:11:19.060
Too long.

00:11:19.060 --> 00:11:19.780
Didn't read.

00:11:20.240 --> 00:11:23.600
But it, it actually didn't take enough of the header off than I wanted.

00:11:23.600 --> 00:11:28.500
So I had my own tool that would do this, but now I've got this, which is great.

00:11:29.000 --> 00:11:36.240
So a lot of the configuration, there is a chance for human error if you type something wrong and you type a variable name wrong.

00:11:36.240 --> 00:11:43.340
And so I really like this new, a new flag called strict config, which will throw an error.

00:11:43.340 --> 00:11:48.520
If you have the pytest section of your configuration has something that it doesn't recognize.

00:11:48.520 --> 00:11:52.420
And it probably is just, you've misspelled some variable or something.

00:11:52.420 --> 00:11:53.940
Yeah, that's good to know.

00:11:53.940 --> 00:11:57.920
And then not too, I can't remember the version, but it was, I think it was in pytest 5.

00:11:58.020 --> 00:12:00.320
They added some code highlighting stuff that.

00:12:00.320 --> 00:12:01.160
Yeah, that's super cool.

00:12:01.160 --> 00:12:02.640
I discovered that just the other day.

00:12:02.640 --> 00:12:07.800
I like just somehow updated all my dependencies in some environment and suddenly pytest output was colored.

00:12:07.800 --> 00:12:09.960
And I was like, whoa, this is amazing.

00:12:09.960 --> 00:12:10.220
Yeah.

00:12:10.220 --> 00:12:10.600
Yeah.

00:12:10.600 --> 00:12:11.480
The syntax highlighting.

00:12:11.480 --> 00:12:12.180
I love it.

00:12:12.180 --> 00:12:12.480
Nice.

00:12:12.480 --> 00:12:15.080
But there's times where you don't want that, I guess.

00:12:15.080 --> 00:12:15.400
Oh yeah, sure.

00:12:15.400 --> 00:12:15.720
Yeah.

00:12:15.720 --> 00:12:18.020
So there's a new flag to turn it off.

00:12:18.020 --> 00:12:24.580
And then a little tiny detail that I really like is the diff comparisons on pytest are wonderful,

00:12:24.840 --> 00:12:31.020
but apparently they didn't do recursive comparisons of data classes and adder classes, but now they do.

00:12:31.020 --> 00:12:31.860
So that's neat.

00:12:31.860 --> 00:12:34.760
There's a whole bunch of new features, there's fixes.

00:12:34.760 --> 00:12:37.340
I ran through some of the features I really liked.

00:12:37.340 --> 00:12:42.800
There are deprecations and it's a large list of breaking changes and deprecations.

00:12:42.800 --> 00:12:44.820
That's why they went to a new number, pytest 6.

00:12:45.180 --> 00:12:50.020
But I went through the whole list and I didn't see anything that was like, oh, that's going to stop me.

00:12:50.020 --> 00:12:51.200
I'm going to have to change something.

00:12:51.200 --> 00:12:51.440
Okay.

00:12:51.440 --> 00:12:52.420
That's good to know.

00:12:52.420 --> 00:12:58.280
Like, I mean, if you say, oh, there was nothing that like we're using, I feel confident that maybe there's nothing in my code either.

00:12:58.280 --> 00:13:02.600
And I knew that somebody was going to ask, is my pytest book still valid?

00:13:02.600 --> 00:13:04.020
Yes, it is.

00:13:04.020 --> 00:13:05.180
I'm going through it right now.

00:13:05.260 --> 00:13:07.100
I haven't gone through the whole thing yet to make sure.

00:13:07.100 --> 00:13:10.220
The side that is not compatible is not the book.

00:13:10.220 --> 00:13:10.960
The book's fine.

00:13:10.960 --> 00:13:14.580
It's, I have a plugin that now is broken.

00:13:14.580 --> 00:13:17.300
So pytest check still works.

00:13:17.300 --> 00:13:22.160
But if you depend on X fail, pytest, this is a, wow, this is a corner case.

00:13:22.160 --> 00:13:28.060
But if you depend on pytest check and the X fail feature of it, it doesn't work right now.

00:13:28.060 --> 00:13:29.100
So I'll have to fix that.

00:13:29.100 --> 00:13:31.240
So you would say X fail fails temporarily?

00:13:31.240 --> 00:13:32.820
Yeah.

00:13:32.820 --> 00:13:35.120
It actually marks everything as a pass.

00:13:35.120 --> 00:13:36.560
So if you mark X fail.

00:13:36.560 --> 00:13:36.960
Oh, wow.

00:13:36.960 --> 00:13:38.500
That's like X fail-ception.

00:13:38.500 --> 00:13:39.000
Yeah.

00:13:39.000 --> 00:13:40.460
Yeah.

00:13:40.460 --> 00:13:42.780
It's really bad.

00:13:42.780 --> 00:13:44.480
Anyway, I'll have to get back to that.

00:13:44.480 --> 00:13:44.700
Yeah.

00:13:44.700 --> 00:13:46.700
This is really exciting that pytest 6 is out.

00:13:46.700 --> 00:13:47.220
Super cool.

00:13:47.220 --> 00:13:51.300
I know that there were some waves, some uncertainty in the ecosystem.

00:13:51.300 --> 00:13:53.160
So it sounds like that got ironed out.

00:13:53.160 --> 00:13:54.020
Things are going strong.

00:13:54.020 --> 00:13:54.960
New versions coming out.

00:13:54.960 --> 00:14:00.720
I even saw that Guido had tweeted the announcement, retweeted the announcement and said,

00:14:00.720 --> 00:14:02.640
yay, type annotations coming in pytest.

00:14:02.700 --> 00:14:05.320
Of course, he's been all about type annotations these days.

00:14:05.320 --> 00:14:07.260
We'll come back to that later in the show, actually.

00:14:07.260 --> 00:14:10.680
So Ines, I know you work a lot with text, but are you frustrated with it?

00:14:10.680 --> 00:14:11.960
What's the story of this name here?

00:14:11.960 --> 00:14:14.260
Oh, my point of the day.

00:14:14.260 --> 00:14:15.520
Yeah.

00:14:16.140 --> 00:14:17.140
Text attack.

00:14:17.140 --> 00:14:17.200
Text attack.

00:14:17.200 --> 00:14:17.220
What does text attack?

00:14:17.220 --> 00:14:17.800
What else about it?

00:14:17.800 --> 00:14:20.600
I thought I'd present something for my space, obviously.

00:14:20.600 --> 00:14:21.060
Yeah.

00:14:21.060 --> 00:14:21.500
Awesome.

00:14:21.500 --> 00:14:21.800
Yeah.

00:14:21.800 --> 00:14:24.920
There's this new framework that I came across and it's called text attack.

00:14:24.920 --> 00:14:25.180
Yay.

00:14:25.180 --> 00:14:31.160
And it's a framework for adversarial attacks and data augmentation for natural language processing.

00:14:31.160 --> 00:14:33.820
So what are adversarial attacks?

00:14:33.820 --> 00:14:37.220
You've probably, you might've actually seen a lot of examples of it.

00:14:37.320 --> 00:14:45.880
For instance, an image classifier that predicts a cat or some other image, even though you show it complete noise and you somehow trick the model.

00:14:45.880 --> 00:14:52.440
Or you might've seen people at protests wearing like funny shirts or masks to trick facial recognition technology.

00:14:52.440 --> 00:14:57.100
So really to trick the model into, to like, you know, not recognize them.

00:14:57.100 --> 00:15:03.140
Or the famous example of Google Translate suddenly hallucinating these crazy Bible texts.

00:15:03.140 --> 00:15:07.220
If you just put in some complete gibberish, like just gah, gah, gah, gah.

00:15:07.220 --> 00:15:11.400
And then it would go like, the Lord has spoken to like the people, stuff like that.

00:15:11.400 --> 00:15:13.480
That's amazing.

00:15:13.480 --> 00:15:20.620
I include a link to an article by a researcher who explains like why this happened and shows the example.

00:15:20.620 --> 00:15:29.180
But it's, it's pretty fascinating, but I think it all comes down to like the fundamental problem of like, what, how do you understand a model that you train?

00:15:29.180 --> 00:15:32.440
And what does it, you know, what does it mean to understand your model?

00:15:32.440 --> 00:15:37.140
And how does it behave in situations when it suddenly gets to see something that it doesn't expect at all?

00:15:37.140 --> 00:15:38.780
Like gah, gah, gah, what does it do?

00:15:38.780 --> 00:15:42.980
And the thing with neural network models is you can't just look at the weights.

00:15:42.980 --> 00:15:43.800
They're not linear.

00:15:43.800 --> 00:15:47.200
They're like, you know, you can't just look at what your model is.

00:15:47.200 --> 00:15:48.540
You have to actually run it.

00:15:48.540 --> 00:16:03.560
And so the, that library takes the tack that lets you actually try out different types of attacks from the academic literature and different types of inputs that you can give a model to see whether it produces something that you're like not happy with.

00:16:03.620 --> 00:16:07.320
Or that's like really weird and exposes some problems in your model.

00:16:07.840 --> 00:16:10.760
And it also lets you then, because normally what's the goal?

00:16:10.760 --> 00:16:22.160
The goal is, well, you do that and then you find out, oh damn, like if I suddenly feed it this complete nonsense or if I feed it Spanish text, it like goes completely in the wrong direction and suddenly predicts stuff that's not there.

00:16:22.220 --> 00:16:27.980
And if you, you know, if you deployed that model into like a context where it's actually used, that would be pretty terrible.

00:16:28.320 --> 00:16:30.400
And, you know, there are much worse things that can be happening.

00:16:30.400 --> 00:16:36.560
So you can also create more robust training data by like replacing, replacing words with synonyms.

00:16:36.560 --> 00:16:41.140
You can swap out characters and just, you know, see how the model does.

00:16:41.140 --> 00:16:42.260
So I thought that was very cool.

00:16:42.260 --> 00:16:46.640
And yeah, I thought in general, I think adversarial attacks, it's a pretty interesting topic.

00:16:46.640 --> 00:16:47.540
And yeah.

00:16:47.540 --> 00:16:49.000
Yeah, it's super interesting.

00:16:49.000 --> 00:16:54.280
So the idea is basically you've trained up a model on some text and for what you've given it, it's probably working.

00:16:54.280 --> 00:17:00.780
But if you give it something you weren't expecting, you want to try that to make sure that it doesn't go insane.

00:17:00.780 --> 00:17:02.080
Yeah, exactly.

00:17:02.080 --> 00:17:05.880
And it can do, it can expose very unexpected things like the Bible text, for example.

00:17:05.880 --> 00:17:08.700
That sounds really bizarre when you like first hear it.

00:17:08.700 --> 00:17:13.980
But one explanation for that would be that, well, especially it happens in low resource languages where, you know,

00:17:13.980 --> 00:17:18.200
we don't have much text and especially not much text translated into other languages.

00:17:18.200 --> 00:17:23.480
But there's one type of text that has a lot of translations available and that's the Bible.

00:17:23.480 --> 00:17:30.160
And so they're parallel corpora where you have one text, one line in English, one line in Somali, for example.

00:17:30.160 --> 00:17:32.320
And then people train their models on that.

00:17:32.320 --> 00:17:40.140
But one thing that also is very specific about Bible text is that some Bible text has some words that like really only occur in a Bible text.

00:17:40.140 --> 00:17:42.460
But it uses some really weird words.

00:17:42.460 --> 00:17:49.380
So what your model might be learning is if I come across a super unexpected word that's really, really rare, that must be Bible.

00:17:49.780 --> 00:17:53.820
And also, also the objective is you want your model to output a reasonable sentence.

00:17:53.820 --> 00:17:59.500
So the model's like, well, okay, you know, if that's the rare word, then the next word needs to be something that matches.

00:17:59.500 --> 00:18:03.960
And then you have like this bizarre sentence from the Bible, even though you typed in ga ga ga.

00:18:03.960 --> 00:18:05.440
And that happens.

00:18:05.440 --> 00:18:06.040
Yeah, how funny.

00:18:06.360 --> 00:18:06.480
Yeah.

00:18:06.480 --> 00:18:07.600
Yeah.

00:18:07.600 --> 00:18:15.240
So it looks like they have actually a bunch of trained models already at the text attack model zoo, they call it, I guess.

00:18:15.240 --> 00:18:15.980
Yeah.

00:18:15.980 --> 00:18:16.200
Yeah.

00:18:16.200 --> 00:18:17.200
Everything's called the model zoo.

00:18:17.200 --> 00:18:18.540
Yeah.

00:18:18.540 --> 00:18:28.640
And so you can just take these and run it against it, like the movie reviews from Rotten Tomatoes or IMDb or the news set or Yelp.

00:18:28.880 --> 00:18:32.040
And just give it that kind of data and see how it comes out, right?

00:18:32.040 --> 00:18:32.380
Exactly.

00:18:32.380 --> 00:18:32.620
Yeah.

00:18:32.620 --> 00:18:33.800
I think that's pretty cool.

00:18:33.800 --> 00:18:45.980
And yeah, and then you can actually, you can also generate your own data or load in your data and generate data that maybe, you know, produces a better model or like covers things that your model previously couldn't handle at all.

00:18:45.980 --> 00:18:47.660
So that's the data augmentation part.

00:18:47.660 --> 00:18:48.900
Yeah, that's all very important.

00:18:48.900 --> 00:18:59.340
And I think it's also very important to understand the models that we train and, you know, really try them out and think about like, what do they do and how are they going to behave in like a real world scenario that we care about?

00:18:59.340 --> 00:19:00.980
Because, yeah, the consequences.

00:19:00.980 --> 00:19:01.080
Right.

00:19:01.080 --> 00:19:03.540
Because as soon as you're making decisions on this data, right?

00:19:03.540 --> 00:19:03.560
Yes, of course.

00:19:03.560 --> 00:19:04.240
On these models.

00:19:04.240 --> 00:19:04.620
Yeah.

00:19:04.620 --> 00:19:15.440
I guess as soon as a human is convinced that the model works and they start making decisions on it, right, that could go bad if the situation changes or the type of data.

00:19:15.440 --> 00:19:26.720
And especially if the model is bad, like I'm always saying, like, well, people are always scared of these dystopian futures where like we have AI that can, I don't know, know anything about us and predict anything and works.

00:19:26.720 --> 00:19:34.180
But the real dystopia is if we have models that kind of don't work and are really shit, but people believe that they work.

00:19:34.180 --> 00:19:35.300
That's much more.

00:19:35.300 --> 00:19:37.140
It's not even about whether they work.

00:19:37.140 --> 00:19:38.540
It's about whether people believe it.

00:19:38.540 --> 00:19:40.500
And then, you know, that's where it gets really bad.

00:19:40.500 --> 00:19:41.420
And yeah.

00:19:41.420 --> 00:19:41.700
Yeah.

00:19:41.700 --> 00:19:42.200
Yeah.

00:19:42.200 --> 00:19:43.360
And that's way more likely.

00:19:43.360 --> 00:19:44.200
Yeah.

00:19:44.200 --> 00:19:45.080
Yes.

00:19:45.080 --> 00:19:50.200
It's a more difficult world to test this sort of stuff to figure out.

00:19:50.200 --> 00:19:52.180
What does it mean for a model to be bad?

00:19:52.180 --> 00:19:53.340
How do you tell if it's bad?

00:19:53.340 --> 00:20:06.580
And models can be both working with some data sets and produce gibberish with or, yeah, I guess in this case, the reverse, not produce gibberish if you pass in gibberish.

00:20:07.100 --> 00:20:07.460
Yeah.

00:20:07.460 --> 00:20:08.380
Actually, yeah.

00:20:08.380 --> 00:20:12.480
I just realized it ties in very well with the pie test point earlier and just like, yep.

00:20:12.480 --> 00:20:15.600
Machine learning is quite special in a way that it's code plus data.

00:20:15.600 --> 00:20:18.960
Code, you can test, you can have a function and you're like, yay, that comes in.

00:20:18.960 --> 00:20:20.380
That's what I expect out.

00:20:20.380 --> 00:20:20.780
Easy.

00:20:20.780 --> 00:20:21.800
Write a test for it.

00:20:21.800 --> 00:20:23.140
You know, it's not that easy.

00:20:23.140 --> 00:20:25.840
Testing is hard, but like fundamentally, yeah.

00:20:25.900 --> 00:20:27.440
It's somewhat deterministic.

00:20:27.440 --> 00:20:28.520
Yeah.

00:20:28.520 --> 00:20:28.840
Right.

00:20:28.840 --> 00:20:33.980
And even if it's not, there's like something you can, you know, test around it and it's much harder with the model.

00:20:34.100 --> 00:20:34.260
Yeah.

00:20:34.260 --> 00:20:35.820
Yeah, for sure.

00:20:35.820 --> 00:20:36.400
All right.

00:20:36.400 --> 00:20:41.760
Before we get to the next item, just want to let you know this episode is brought to you all by us.

00:20:41.760 --> 00:20:44.600
Over at Talk Python Training, we have a bunch of courses.

00:20:44.600 --> 00:20:45.740
You can check them out.

00:20:45.740 --> 00:20:49.740
And we're actually featured in the Humble Bundle that's running the Python Humble Bundle right now.

00:20:49.740 --> 00:20:59.280
So if you go to talkpython.fm/humble2020, you can get $1,400 worth of Python training tools and whatnot for 25 bucks.

00:20:59.280 --> 00:21:00.840
So that's a pretty decent deal.

00:21:01.580 --> 00:21:03.560
And Brian, you mentioned your book before.

00:21:03.560 --> 00:21:04.720
Tell people about your book real quick.

00:21:04.720 --> 00:21:04.980
Yeah.

00:21:04.980 --> 00:21:11.080
So Python Testing with pytest is a book I wrote and it's still very valid, even though it was written a few years ago.

00:21:11.080 --> 00:21:17.600
The intent was the 80% of pytest that you will always need to know for any version of pytest.

00:21:17.600 --> 00:21:25.040
And I've had a lot of feedback from people saying a weekend of skimming this makes it so that they understand how to test.

00:21:25.040 --> 00:21:26.500
It's a weekend worthwhile.

00:21:26.500 --> 00:21:27.180
Yeah, absolutely.

00:21:27.180 --> 00:21:30.560
And Ines, you want to talk a little bit about Explosion just to let people know?

00:21:30.700 --> 00:21:30.880
Yeah.

00:21:30.880 --> 00:21:40.380
So, I mean, some of you who are listening to this might know me from my work on spaCy, which is an open source library for NLP and Python, which I'm one of the core developers of.

00:21:40.380 --> 00:21:42.760
And yeah, that's all free open source.

00:21:42.760 --> 00:21:51.140
And we're actually just working on the nightly version or the pre-release of spaCy 3, which is going to have a lot of exciting features.

00:21:51.140 --> 00:21:54.260
I might also mention a few more things later on.

00:21:54.940 --> 00:22:00.660
And yeah, so maybe that's already going to be out by the time this podcast officially comes out.

00:22:00.660 --> 00:22:01.120
Maybe not.

00:22:01.120 --> 00:22:02.380
I don't want to overpromise.

00:22:02.380 --> 00:22:04.560
But yeah, you can definitely try that out.

00:22:04.700 --> 00:22:13.340
And we also recently released a new version of our annotation tool, Prodigy, which comes with a lot of new features for annotating relations, audio, video.

00:22:13.340 --> 00:22:22.180
And the idea here is, well, once you get serious about training your own models, you usually want to create your own data sets for your very specific problems that solve your problems.

00:22:22.240 --> 00:22:24.840
But often the first idea you have might not be the best one.

00:22:24.840 --> 00:22:25.940
It's a continuous process.

00:22:25.940 --> 00:22:27.100
You want to develop your data.

00:22:27.100 --> 00:22:35.220
And Prodigy was really designed as a developer tool that lets you create your own data sets with a web app, a Python backend.

00:22:35.220 --> 00:22:36.040
You can script.

00:22:36.040 --> 00:22:37.200
That's our commercial tool.

00:22:37.200 --> 00:22:38.020
That's how we make money.

00:22:38.020 --> 00:22:42.180
And it's very cool to see a growing community around this.

00:22:42.180 --> 00:22:43.200
So yeah, that's what we're doing.

00:22:43.200 --> 00:22:45.480
We have some more cool stuff planned for the future.

00:22:45.480 --> 00:22:46.380
So stay tuned.

00:22:46.380 --> 00:22:48.020
Yeah, people should check it out.

00:22:48.020 --> 00:22:53.300
Actually, you and I talked on Talk Python 202 about building a software business and entrepreneurship.

00:22:53.300 --> 00:22:54.400
You had a bunch of great advice.

00:22:54.400 --> 00:22:55.920
So people might want to check that out as well.

00:22:55.920 --> 00:22:58.400
Do you actually know these episode numbers by heart?

00:22:58.400 --> 00:22:59.400
Or did you look that up before?

00:22:59.400 --> 00:23:01.960
Some of them I know, but that one I used the search.

00:23:01.960 --> 00:23:02.700
Okay.

00:23:02.700 --> 00:23:03.900
I remember you were on there.

00:23:03.900 --> 00:23:05.700
I remember what it was about, but not the number.

00:23:05.700 --> 00:23:08.400
I just put together that I know two people from Explosion.

00:23:08.400 --> 00:23:09.520
So that's interesting.

00:23:09.520 --> 00:23:10.720
Yeah, it's Sebastian.

00:23:10.720 --> 00:23:11.880
Sebastian.

00:23:11.880 --> 00:23:15.800
Yeah, he was on your podcast recently, which I feel really bad.

00:23:15.800 --> 00:23:20.380
I wanted to listen to this because he advertised it with like, it will tell the true story

00:23:20.380 --> 00:23:22.880
behind his mustache, which I really wanted to know.

00:23:22.880 --> 00:23:25.360
But then I was like, I'll need to listen to this on the weekend.

00:23:25.360 --> 00:23:25.900
And I forgot.

00:23:25.900 --> 00:23:27.540
So yeah, if he's listening, I'm sorry.

00:23:27.540 --> 00:23:29.260
I will definitely, I need to know this.

00:23:29.260 --> 00:23:29.900
So I will listen.

00:23:29.900 --> 00:23:30.660
Excellent.

00:23:30.660 --> 00:23:31.960
So don't spoil it.

00:23:31.960 --> 00:23:34.740
Do a great work on FastAPI.

00:23:34.740 --> 00:23:35.300
All right.

00:23:35.300 --> 00:23:39.760
Speaking of people that have been on all the podcasts as well as Brett Cannon, he recently

00:23:39.760 --> 00:23:45.080
wrote an interesting article called, What is the Core of the Python Programming Language?

00:23:45.080 --> 00:23:52.780
And he's legitimately asking as a core developer, what is not the maybe lowest level, but what

00:23:52.780 --> 00:23:55.840
is the essence, I guess, is maybe the way to think about it.

00:23:55.840 --> 00:23:56.140
Oh, wow.

00:23:56.140 --> 00:23:58.800
I only just got the core, core pun.

00:23:58.800 --> 00:24:01.360
Like it did not occur to me when I first read the article.

00:24:01.360 --> 00:24:03.360
I'm really, I feel really embarrassed now.

00:24:03.360 --> 00:24:06.940
To be fair, English is not my first language, but still, it's not about that.

00:24:06.940 --> 00:24:09.460
Anyway, sorry for interrupting.

00:24:09.460 --> 00:24:12.880
When I first read it, I was thinking like, okay, we're going to talk about what is the

00:24:12.880 --> 00:24:14.200
lowest level.

00:24:14.200 --> 00:24:18.440
And yeah, okay, it's probably C and C eval.h, C eval.c and so on.

00:24:18.440 --> 00:24:22.700
But really the thing is, Brett has been thinking a lot about WebAssembly.

00:24:23.240 --> 00:24:26.800
And what does that mean for Python in the broad sense?

00:24:26.800 --> 00:24:28.800
He and I talked about it on Talk Python.

00:24:28.800 --> 00:24:34.140
I think at the very last PyCon event, we did a live conversation there about that.

00:24:34.940 --> 00:24:42.320
And it's important because there's a few areas where Python is not the first choice, maybe

00:24:42.320 --> 00:24:47.540
not the second choice, sometimes not even the 10th choice of what you might use to program

00:24:47.540 --> 00:24:54.540
some very important things like maybe mobile, maybe the web, the front end part of the web,

00:24:54.540 --> 00:24:55.600
importantly, I mean.

00:24:55.600 --> 00:25:02.260
So there's a few really important parts of technology where Python doesn't have much reach, but all

00:25:02.260 --> 00:25:05.780
of those areas support WebAssembly these days, right?

00:25:05.780 --> 00:25:09.240
And if you have something in C, you can compile it to WebAssembly.

00:25:09.240 --> 00:25:16.420
So there's some thought about like, well, what could we do potentially to make a WebAssembly

00:25:16.420 --> 00:25:24.380
runtime for Python so that Python magically almost instantly gets access to what was just JavaScript

00:25:24.380 --> 00:25:32.100
front end frameworks space and what is mobile, iOS and Android and all those things allow you

00:25:32.100 --> 00:25:34.200
to directly run JavaScript as part of your app?

00:25:34.200 --> 00:25:36.200
So how would we make that happen?

00:25:36.200 --> 00:25:38.540
So it's pretty important, right?

00:25:38.540 --> 00:25:42.800
If we could solve that problem, like Python is already so popular and its growth is so incredible.

00:25:42.800 --> 00:25:47.020
Like what if we could say, oh, yeah, and now it's an important language on mobile and it's

00:25:47.020 --> 00:25:51.520
an important front end language framework like that would just take it to the next level or

00:25:51.520 --> 00:25:53.240
maybe a couple levels up if you do them both.

00:25:53.240 --> 00:25:57.940
And WebAssembly seems to be one of the keys to kind of bridge that gap, right?

00:25:57.940 --> 00:26:04.560
So Brett talks about in this article how for so long we've just had CPython is what we

00:26:04.560 --> 00:26:05.600
think of when we have Python.

00:26:05.600 --> 00:26:14.260
Sometimes people use PyPy, P-Y-P-Y, as a partially JIT compiled version, sometimes faster version

00:26:14.260 --> 00:26:19.880
of Python, but not always because the way it interacts with C, libraries that you might be

00:26:19.880 --> 00:26:21.620
using through packages and so on.

00:26:22.040 --> 00:26:26.940
And really, it's a lot of Python's dynamic nature makes it hard to do outside of an interpreter

00:26:26.940 --> 00:26:31.280
where, to be clear, WebAssembly is a compiled language, right?

00:26:31.280 --> 00:26:34.160
So if you're going to put it over there, maybe it's going to require it to be compiled.

00:26:34.160 --> 00:26:38.060
So this is a really interesting thing to go through and read and think about with Brett.

00:26:38.060 --> 00:26:42.480
He talks about things like, well, how much of the Python language would you have to implement

00:26:42.480 --> 00:26:45.160
and still consider it to be valid Python?

00:26:45.160 --> 00:26:49.040
Like we talked about MicroPython and usually don't people look at, they don't look at that

00:26:49.040 --> 00:26:50.000
and go, that's not Python.

00:26:50.160 --> 00:26:51.000
That's fake, right?

00:26:51.000 --> 00:26:53.660
No, like it's Python, but it's not as much Python, right?

00:26:53.660 --> 00:26:58.640
You don't have the same, all the APIs on MicroPython as you do on regular Python.

00:26:58.640 --> 00:27:01.700
So questions like, do you still need a REPL?

00:27:01.700 --> 00:27:04.160
Could you live without locals, right?

00:27:04.160 --> 00:27:06.960
The ability to ask what the local variables are and so on.

00:27:06.960 --> 00:27:11.060
So he said he didn't really have a great bunch of, a great answer.

00:27:11.060 --> 00:27:13.940
It's more of a philosophical, like we need to solve this.

00:27:13.940 --> 00:27:16.480
But I do want to share some of my thoughts on this.

00:27:16.480 --> 00:27:22.640
And I feel like maybe what we could do is we could come up with like a standard Python

00:27:22.640 --> 00:27:27.060
language definition that is a subset of full Python, right?

00:27:27.060 --> 00:27:28.200
Here's the essence.

00:27:28.200 --> 00:27:30.060
Like, okay, we have to be able to create classes.

00:27:30.060 --> 00:27:31.180
We have to be able to create functions.

00:27:31.180 --> 00:27:32.360
You have to define strings.

00:27:32.360 --> 00:27:33.820
Probably you want type annotations.

00:27:33.820 --> 00:27:35.840
But do you need a vowel?

00:27:35.840 --> 00:27:37.600
Maybe, maybe not.

00:27:37.600 --> 00:27:38.120
Right?

00:27:38.220 --> 00:27:49.020
So like that, if you could have a subset of the language that was smaller, as well as the standard library, because do you really need to like parse CSS hex colors?

00:27:49.020 --> 00:27:50.200
Everywhere?

00:27:50.200 --> 00:27:51.000
Probably not.

00:27:51.000 --> 00:27:54.380
It's a very underused part of the library, but it's in there.

00:27:54.380 --> 00:27:55.020
Right?

00:27:55.020 --> 00:27:59.020
So if we could narrow it down, maybe it would be easier to think about how does it go to WebAssembly?

00:27:59.160 --> 00:28:03.180
How does it go to like some kind of JavaScript runtime or something like that?

00:28:03.180 --> 00:28:05.960
And if it sounds crazy, you know, the .NET people did this.

00:28:05.960 --> 00:28:09.180
They have a .NET standard class library language.

00:28:09.180 --> 00:28:10.980
They got it running on WebAssembly.

00:28:10.980 --> 00:28:15.380
So it's, there's an example of it out there and something that's kind of sort of similar.

00:28:15.380 --> 00:28:15.960
Right?

00:28:16.060 --> 00:28:21.420
So I think this would just open stuff up if you could get Python in these places.

00:28:21.420 --> 00:28:21.980
What do you guys think?

00:28:21.980 --> 00:28:33.340
Initially, I was never so sold on WebAssembly until, and especially WebAssembly and Python until I watched Dave Beasley live code a compiler at PyCon India, I think it was.

00:28:33.340 --> 00:28:35.420
And I was like, oh, this is kind of, this is kind of fun.

00:28:35.420 --> 00:28:40.120
And I mean, it was just also fun to watch Dave Beasley live code a compiler.

00:28:40.120 --> 00:28:40.920
Yeah, for sure.

00:28:40.920 --> 00:28:42.280
Classic.

00:28:42.280 --> 00:28:44.660
But so that did get me thinking.

00:28:45.180 --> 00:28:53.660
I do think one question I think we should ask ourselves is like, well, do we really, do we really need Python to do all of the things in the browser?

00:28:53.660 --> 00:28:58.040
Like, is this really, does this really have a benefit that like actually makes a difference?

00:28:58.040 --> 00:29:03.900
A, B, there are a lot of things people use Python for that just wouldn't work in that way.

00:29:03.900 --> 00:29:07.440
And that's also, I think, part of what makes Python so popular in the first place.

00:29:07.440 --> 00:29:10.520
Like, for instance, you know, all the interactive computing environments.

00:29:10.520 --> 00:29:14.000
That's why people want to use Python for data science.

00:29:14.300 --> 00:29:17.240
Yeah, I Python, Jupyter Notebooks, that sort of stuff.

00:29:17.240 --> 00:29:21.200
That's why, you know, Python as a dynamic language made so much sense to people.

00:29:21.200 --> 00:29:22.500
And that's what made it popular.

00:29:23.240 --> 00:29:26.520
And large scale processing, like a lot of the type of stuff we're working on.

00:29:26.520 --> 00:29:37.480
It's like, yeah, there's stuff that you can run in the browser, but it's never going to be viable to run large scale information extraction in the browser because you want to run that on a machine for like a few hours.

00:29:37.820 --> 00:29:43.800
But I think there are a lot of opportunities also in the machine learning space for privacy preserving technologies that already exist.

00:29:43.800 --> 00:29:53.920
I think from what I understand, Mozilla is working on some features built into the browser, where, you know, you can have models predicting things without it being sent to someone's server.

00:29:53.920 --> 00:29:55.600
And I think that's obviously very powerful.

00:29:55.600 --> 00:29:57.100
That's an interesting idea.

00:29:57.100 --> 00:29:57.580
Right.

00:29:57.580 --> 00:29:57.920
Yeah.

00:29:57.920 --> 00:29:58.200
Yeah.

00:29:58.200 --> 00:30:00.740
Because if you could have a little bit of machine learning.

00:30:00.740 --> 00:30:01.120
Yeah.

00:30:01.120 --> 00:30:03.860
But you don't have to give up the data privacy aspect of it.

00:30:03.860 --> 00:30:04.440
That's pretty cool.

00:30:04.440 --> 00:30:04.680
Yeah.

00:30:04.680 --> 00:30:07.860
So I think for that, there's a lot of potential here for running Python in a browser.

00:30:07.860 --> 00:30:08.160
Yeah.

00:30:08.160 --> 00:30:13.740
Well, we start getting used to saying what is Python is what is the CPython implementation.

00:30:13.740 --> 00:30:19.980
And we got to remember CPython is the reference implementation for the language spec.

00:30:19.980 --> 00:30:30.200
And I think, I guess we're kind of getting at maybe we need to split it up and have a, like a core language spec and an extended one or something.

00:30:30.200 --> 00:30:30.680
I don't know.

00:30:30.680 --> 00:30:32.580
The, where would you divide the line?

00:30:32.580 --> 00:30:36.820
Because we've seen, like you said, we've seen things like CircuitPython and, and other things.

00:30:36.820 --> 00:30:43.500
And we've actually talked about several smaller languages based on Python that just try to be the same syntax.

00:30:43.500 --> 00:30:48.080
But at which point is it, when is it not Python anymore?

00:30:48.080 --> 00:30:50.180
And there's at least some of the stuff.

00:30:50.180 --> 00:30:56.480
Like I could totally see having a distribution of Python that doesn't have a REPL still count.

00:30:56.480 --> 00:31:00.400
I could totally see not having idle, for instance.

00:31:00.580 --> 00:31:02.980
If something doesn't ship with idle, is it still Python?

00:31:02.980 --> 00:31:04.200
I think so.

00:31:04.200 --> 00:31:09.580
And because of idle, then you need Tkinter and, or you need TK stuff in there.

00:31:09.580 --> 00:31:14.800
And there's a lot of stuff that maybe I would be in like, you know, could you live without locals?

00:31:14.800 --> 00:31:16.580
Most of the time, probably.

00:31:16.580 --> 00:31:23.180
I actually think this would be since the web and since mobile is so, such a big part of our lives.

00:31:23.180 --> 00:31:24.180
And it will be for a while.

00:31:24.180 --> 00:31:29.620
This might be a decent dividing line to say whether or not it's for WebAssembly or not.

00:31:29.620 --> 00:31:35.480
Maybe we should split the division at whatever we need to implement a WebAssembly version of Python.

00:31:35.480 --> 00:31:41.260
And anything above that line is an extended version of Python or something.

00:31:41.340 --> 00:31:41.460
Yeah.

00:31:41.460 --> 00:31:42.780
Yeah, that's a good point.

00:31:42.780 --> 00:31:43.700
All right.

00:31:43.700 --> 00:31:47.160
I don't want to go too long at this section because I want to make sure we get the others.

00:31:47.160 --> 00:31:48.660
But I do want to leave you with just some thoughts.

00:31:48.660 --> 00:31:53.980
What if shipping Python was just shipping a single binary and a thing that ran it?

00:31:53.980 --> 00:31:55.540
You could do that with WebAssembly.

00:31:55.980 --> 00:31:58.640
Maybe two WebAssemblies, the runtime plus the code.

00:31:58.640 --> 00:32:05.300
What if all the browsers had capability to plug in alternate runtimes through WebAssembly?

00:32:05.300 --> 00:32:06.840
So right now you have a JavaScript engine.

00:32:06.840 --> 00:32:24.060
But what if, like, say, Firefox and Edge and whatnot came up with a way to say, here's a WebAssembly API to plug in alternate runtimes, Python, Ruby, .NET, Java, you name it, and then shipped with the latest version of each of those runtimes.

00:32:24.060 --> 00:32:25.280
So you just don't have to download.

00:32:25.280 --> 00:32:32.140
Like, the big problem now is you can do it, but you've still got to download, like, 10 megs per page, which is not a good idea.

00:32:32.140 --> 00:32:37.680
So anyway, I think there's a ton of interesting things that open up if this were possible.

00:32:37.680 --> 00:32:40.820
So I'm glad Brett's still on this, and hopefully he keeps thinking about it.

00:32:40.820 --> 00:32:43.440
Brian, I still need to learn Pathlib.

00:32:43.440 --> 00:32:43.940
Really?

00:32:43.940 --> 00:32:44.860
You got any ideas on how I do that?

00:32:44.860 --> 00:32:44.980
Really?

00:32:44.980 --> 00:32:46.440
You're not using Pathlib?

00:32:46.440 --> 00:32:51.500
I'm just stuck in the OS.path world.

00:32:51.500 --> 00:32:53.160
I just really need to get with the time.

00:32:53.160 --> 00:32:54.040
Help me out here.

00:32:54.040 --> 00:32:54.280
Okay.

00:32:54.280 --> 00:32:55.260
So Pathlib is where...

00:32:55.260 --> 00:32:56.260
I mean, I know the value.

00:32:56.260 --> 00:32:58.920
Yeah, you're like some kind of animal, like OS.path.

00:32:58.920 --> 00:33:04.400
So I have no offense to OS.path.

00:33:04.400 --> 00:33:05.320
But, you know.

00:33:05.320 --> 00:33:07.360
No, I really love Pathlib a lot.

00:33:07.840 --> 00:33:09.060
But there is...

00:33:09.060 --> 00:33:13.860
I got to tell you that the documentation for Pathlib doesn't cut it as an introduction.

00:33:13.860 --> 00:33:17.440
You can find what you're looking for, but if you know what you're looking for.

00:33:17.440 --> 00:33:19.120
But I agree with Chris May.

00:33:19.120 --> 00:33:22.420
So Chris May wrote a post called Getting Started with Pathlib.

00:33:23.060 --> 00:33:24.280
I guess it's kind of...

00:33:24.280 --> 00:33:28.980
He's got a little PDF field guide that you can download, but he has a little bit of a blog

00:33:28.980 --> 00:33:30.300
post introducing it.

00:33:30.300 --> 00:33:31.740
But I downloaded it.

00:33:31.740 --> 00:33:32.860
It's like nine or ten pages.

00:33:32.860 --> 00:33:36.140
And it's actually a really good introduction to Pathlib.

00:33:36.140 --> 00:33:37.120
So I really like it.

00:33:37.340 --> 00:33:42.400
The big thing with OS.path versus Pathlib is Pathlib creates path objects.

00:33:42.400 --> 00:33:45.920
So there's a class that represents a path that you have methods on.

00:33:45.920 --> 00:33:49.400
And it makes it different for when you're dealing with this.

00:33:49.400 --> 00:33:51.960
With OS.path, it's just strings.

00:33:51.960 --> 00:33:54.880
So it's manipulating strings that represent paths.

00:33:54.880 --> 00:33:56.380
So the object's different.

00:33:56.380 --> 00:33:57.100
I like it.

00:33:57.100 --> 00:34:03.580
Actually, I switched just for the ability to add buildup paths with just having the slash operator.

00:34:03.580 --> 00:34:06.040
Yeah, it's really interesting how they've overridden division.

00:34:06.340 --> 00:34:09.600
But I think it's a good example of where this makes sense.

00:34:09.600 --> 00:34:10.980
It's a reasonable use case.

00:34:10.980 --> 00:34:11.820
It looks good.

00:34:11.820 --> 00:34:12.580
It's defensible.

00:34:12.580 --> 00:34:16.320
There are other cases where you're like, oh, did you really have to overload these operators?

00:34:16.320 --> 00:34:17.900
But they're fine.

00:34:17.900 --> 00:34:19.060
I think that's very valid.

00:34:19.060 --> 00:34:19.440
Yeah.

00:34:19.440 --> 00:34:23.460
And things like how do you find parts of a path?

00:34:23.460 --> 00:34:27.320
When you have to parse paths, that's where Pathlib really shines for me.

00:34:27.320 --> 00:34:31.940
So if you want to find the parent of something or the parent of the second level parent,

00:34:31.940 --> 00:34:34.080
there's ways to do that in Pathlib.

00:34:34.380 --> 00:34:38.460
And in OS.path, you're stuck with trying to split things and stuff.

00:34:38.460 --> 00:34:39.760
And it's gross.

00:34:39.760 --> 00:34:41.780
I mean, there are operations to do it.

00:34:41.780 --> 00:34:47.740
But it's very good to have this relative, I don't know, just all these operators, like parent.

00:34:47.740 --> 00:34:54.460
And then one of the things that it took me a while to figure out was I was used to trying to find the absolute path of something.

00:34:55.000 --> 00:34:58.800
And in Pathlib, finding the absolute path is the resolve method.

00:34:58.800 --> 00:35:02.280
So you say resolve and it finds the absolute path for you.

00:35:02.280 --> 00:35:04.340
You can find the current working directory.

00:35:04.340 --> 00:35:05.820
You can go up and down folders.

00:35:05.820 --> 00:35:07.020
You can use globs.

00:35:07.020 --> 00:35:09.600
You can find parts of path names and stuff.

00:35:10.020 --> 00:35:12.200
And it's just a really comfortable thing.

00:35:12.200 --> 00:35:14.360
So I think you should give it a whirl.

00:35:14.360 --> 00:35:17.560
And it's not like it's going to change your life a lot.

00:35:18.200 --> 00:35:25.560
But the next time you come up with, when the next time you're programming, you're like, okay, I got to figure out, I got to have a base directory and some other directory.

00:35:25.560 --> 00:35:28.520
Well, I'll reach for Pathlib instead of OS.path.

00:35:28.520 --> 00:35:29.260
Yeah.

00:35:29.460 --> 00:35:32.160
I guess it has been there since 3.4, so I should give it the times.

00:35:32.160 --> 00:35:32.460
Yeah.

00:35:32.460 --> 00:35:36.100
So, I mean, now, before I could see the objection of, like, oh, you have to backport it.

00:35:36.100 --> 00:35:44.160
And also, I think what I like as well is a lot of integrations that, like, you know, automatically can perform checks where the path exists, stuff like that.

00:35:44.160 --> 00:35:49.000
Or for me as a library author, you know, you're writing stuff for users and you want to give them feedback.

00:35:49.000 --> 00:36:01.600
And, for instance, in a library like Click or Typer, which is the modern type hint version CLI interface, which was also built by my colleague, Sebastian, you can just say, hey, this argument is a path.

00:36:01.600 --> 00:36:03.740
What you get back from the command line is a path.

00:36:03.740 --> 00:36:06.820
It will check that a path exists via Pathlib.

00:36:06.820 --> 00:36:10.120
So it does, like, you know, a whole bunch of magic there.

00:36:10.120 --> 00:36:10.440
Yeah.

00:36:10.440 --> 00:36:11.640
That is super cool.

00:36:11.640 --> 00:36:12.000
Yeah.

00:36:12.000 --> 00:36:14.000
Or you can say it can't be a directory.

00:36:14.000 --> 00:36:19.700
And then you write your CLI, user passes in an invalid path, and you don't even have to do any error handling.

00:36:19.700 --> 00:36:24.200
It will automatically, before it even runs your code, say, nope, that argument is bad.

00:36:24.200 --> 00:36:25.160
So that's pretty cool.

00:36:25.160 --> 00:36:25.660
That's awesome.

00:36:25.660 --> 00:36:30.620
And you don't have to care about Unix versus Mac or PC or something like that.

00:36:30.620 --> 00:36:30.720
Yeah.

00:36:30.720 --> 00:36:31.920
I mean, Windows.

00:36:31.920 --> 00:36:37.620
I mean, no offense to Windows, but it's always handling paths and Windows is always the classic story.

00:36:37.620 --> 00:36:41.700
Also, as a library author, where you just, well, we're supporting all operating systems.

00:36:41.700 --> 00:36:43.760
But, like, well, Windows just does it a bit differently.

00:36:43.760 --> 00:36:47.000
And you cannot assume that a slash means a slash.

00:36:47.000 --> 00:36:48.660
Yeah, for sure.

00:36:48.660 --> 00:36:49.240
All right.

00:36:49.240 --> 00:36:51.640
Well, the final item is yours, Ines.

00:36:51.640 --> 00:36:53.560
And it's definitely interesting.

00:36:53.560 --> 00:37:01.740
So if you're working in the machine learning data science side of things, it might not be enough to just back up your algorithms and your code, right?

00:37:01.740 --> 00:37:02.000
Yeah.

00:37:02.000 --> 00:37:04.740
You also have, yeah, machine learning is code and data.

00:37:04.740 --> 00:37:05.480
So, yeah.

00:37:05.480 --> 00:37:09.760
So this is something we discovered a while ago and that we're now using internally.

00:37:10.080 --> 00:37:13.620
Internally, so we currently, as I mentioned before, we're working on version three of spaCy.

00:37:13.620 --> 00:37:27.800
And one of the big features is going to be a completely new optimized way for training your custom models, managing the whole end-to-end workflows from pre-processing to training to packaging and also making the experiments more reproducible.

00:37:27.800 --> 00:37:33.900
You want to train a cool model and then send it over to your colleague and your colleague should be able to run the same thing and get the same results.

00:37:34.160 --> 00:37:37.540
Sounds really basic, but it's pretty hard in general in machine learning.

00:37:37.540 --> 00:37:47.080
So our spaCy stuff will also integrate with a tool called DVC, which is short for data version control, which we've started using internally for our models.

00:37:47.080 --> 00:37:53.620
And DVC is basically an open source tool for version control, specifically for machine learning and for data.

00:37:54.200 --> 00:38:03.800
So, you know, you can't really, you can check your code into a Git repo as you're working on it, but you can't just check your data sets and models and artifacts into Git or your model weights.

00:38:03.800 --> 00:38:08.060
Like that's, so it's very, very difficult normally to keep track of changes and your files.

00:38:08.060 --> 00:38:13.260
You kind of, most people just end up with this directory of files somewhere and it can be very frustrating.

00:38:13.600 --> 00:38:18.860
And so you can really, you can think of DVC as Git for data and the command line usage is actually pretty similar.

00:38:18.860 --> 00:38:22.900
So like you type Git in it and DVC in it to initialize it.

00:38:22.900 --> 00:38:27.180
And then you can do DVC add to start tracking your assets and add them.

00:38:27.180 --> 00:38:35.680
So it's like, I think if, yeah, if you're familiar with Git as like abstract, it can be at times, you will also kind of find it easy to get into DVC.

00:38:35.940 --> 00:38:44.200
And it basically lets you track any assets like data sets, models, whatever, by adding meta files to your repository.

00:38:44.200 --> 00:38:52.280
So you always have like the checksum in there and you always have these checkpoints of the asset, even though you're not actually checking that file into your repo.

00:38:52.280 --> 00:38:59.540
And that means you can always go back, fetch whatever it was from your cache and rerun your experiments.

00:38:59.540 --> 00:39:02.060
And it also builds this really cool dependency graph.

00:39:02.060 --> 00:39:06.440
So you can really have these complex pipelines with different steps.

00:39:06.440 --> 00:39:12.280
And then you only have to rerun one step if some of the inputs to it have changed.

00:39:12.280 --> 00:39:19.160
So, you know, in machine learning, you'd often have pipeline, like you start, you download your data, then you pre-process it.

00:39:19.160 --> 00:39:24.380
Then you convert it to something, then you train, then you run an evaluation step.

00:39:24.380 --> 00:39:26.900
And everything sort of depends on each other.

00:39:26.900 --> 00:39:28.400
And that can make things like really hard.

00:39:28.400 --> 00:39:33.220
And you never know, you usually have to run everything, you know, clean from scratch.

00:39:33.220 --> 00:39:36.040
Because, yeah, if something changes, your whole results change.

00:39:36.040 --> 00:39:42.380
So if you set up your pipelines with DVC, it can actually decide whether something needs to be rerun.

00:39:42.380 --> 00:39:47.240
Or it can also know what needs to be rerun to reproduce exactly what you're trying to do.

00:39:47.240 --> 00:39:48.240
So that's pretty cool.

00:39:48.240 --> 00:39:51.660
Yeah, that could save you a ton of time and money if you're doing it in the cloud.

00:39:51.660 --> 00:39:52.640
Yes, exactly.

00:39:52.640 --> 00:39:53.000
Yeah.

00:39:53.100 --> 00:39:55.240
And, you know, you can share it with other people.

00:39:55.240 --> 00:39:58.740
It's like, it's, I think it definitely solves a problem that's very real.

00:39:58.740 --> 00:40:04.600
And, yeah, the people making DVC, they've also recently released a new tool that I have not personally checked out yet.

00:40:04.600 --> 00:40:05.440
But it looks very interesting.

00:40:05.440 --> 00:40:08.800
It's called CML, which is short for Continuous Machine Learning.

00:40:08.800 --> 00:40:12.860
And that's really more of the CI, which kind of is logically the next step, right?

00:40:12.860 --> 00:40:14.540
You manage everything in your repo.

00:40:14.540 --> 00:40:18.940
And then you obviously want to run automated tests and continuous integration.

00:40:18.940 --> 00:40:21.260
So the previous looked really cool.

00:40:21.260 --> 00:40:27.800
Like it showed kind of a GitHub action where you can submit a PR with like some changes to your code and your data.

00:40:28.040 --> 00:40:34.640
And then you have the bot commenting on it and it shows like accuracy results and a little graph and how stuff changes.

00:40:34.640 --> 00:40:45.980
So it's really like these code coverage bots that you've probably seen where like you change some lines and then it tells you, oh, coverage has gone up or down and, you know, the new view of your code.

00:40:45.980 --> 00:40:47.380
So that's what it looks like.

00:40:47.380 --> 00:40:49.140
So I think, yeah, I'm really excited about this.

00:40:49.140 --> 00:40:50.500
And definitely it solves a problem.

00:40:50.500 --> 00:40:52.320
It's already been solving a problem for us.

00:40:52.320 --> 00:40:52.860
And yeah.

00:40:52.860 --> 00:40:54.420
How does it store the large files?

00:40:54.420 --> 00:40:55.460
I know it has this cache.

00:40:55.460 --> 00:40:56.680
Is that a thing that you host?

00:40:56.680 --> 00:40:59.380
Does it have a hosted thing that's kind of like GitHub?

00:40:59.380 --> 00:41:01.200
I'm not sure if you could.

00:41:01.200 --> 00:41:03.640
You probably connected to some cloud, but like normally you have that locally.

00:41:03.640 --> 00:41:07.580
It also has a cool thing where you can actually download files via the tool.

00:41:07.580 --> 00:41:17.640
And then depending on where you're fetching it from, if it's a Google storage bucket or S3 bucket or something, you can actually also tell if the file has changed and whether it needs to be redownloaded.

00:41:17.640 --> 00:41:30.300
And so, for example, internally, what we're doing is we're mounting a Google storage, Google cloud storage bucket or however they call it locally as like, you know, so it's like kind of a drive you have access to locally.

00:41:30.300 --> 00:41:36.500
And then you can just sort of type GS, blah, blah, blah, blah, and then the path and really work with it like a local file system.

00:41:36.500 --> 00:41:38.040
And that's pretty nice.

00:41:38.040 --> 00:41:47.740
So you can, you know, you can have, you can work with private assets because the thing is a lot of toy examples assume that, oh, you just download a public data set and then you train your model and then you upload it somewhere.

00:41:47.740 --> 00:41:52.960
But that's not very realistic because most of the time the data you have can't just go in the cloud publicly.

00:41:52.960 --> 00:41:54.380
So, yeah.

00:41:54.380 --> 00:42:04.920
But yeah, I think I don't even know exactly how it works in detail, but like it can basically tell fetch, I think from the headers or something, it can tell whether the file you're downloading has changed and whether there's something new.

00:42:04.920 --> 00:42:05.200
Yeah.

00:42:05.200 --> 00:42:05.520
Yeah.

00:42:05.520 --> 00:42:09.660
With a normal version control, one of the reasons we use it is to try to find what's different.

00:42:09.660 --> 00:42:12.180
Can you do, do you do diffs on data or?

00:42:12.180 --> 00:42:13.260
I don't know.

00:42:13.260 --> 00:42:13.940
Maybe.

00:42:14.160 --> 00:42:25.240
I mean, I'm not sure if there's, I think the main diff is more like around the results that you get because diff, I mean, diffing large data set, diffing weights, you kind of can't.

00:42:25.240 --> 00:42:26.800
That's really where we are.

00:42:26.800 --> 00:42:33.740
The other problem where like you need to run the model to find out what it does and then you're diffing accuracies rather than weights.

00:42:33.740 --> 00:42:34.080
Okay.

00:42:34.080 --> 00:42:38.680
I don't know if it does like actual diffing of the data sets, but often the thing that changes is really the models.

00:42:38.680 --> 00:42:43.460
Like you have the, you know, you have your whole data and then you change things about your code.

00:42:44.020 --> 00:42:44.160
Yeah.

00:42:44.160 --> 00:42:48.720
And something changes and it's, you want to keep track of what it is or how it manifests.

00:42:48.720 --> 00:42:49.220
Yeah.

00:42:49.220 --> 00:42:50.980
It's really cool to see them working on this.

00:42:50.980 --> 00:42:51.260
Yeah.

00:42:51.260 --> 00:42:53.480
So, and also we'll be in, in spaCy 3.

00:42:53.480 --> 00:42:59.180
We'll hopefully have a pretty neat integration where, you know, if you want, it's not like mandatory, but if you say, Hey, that's cool.

00:42:59.180 --> 00:43:00.760
That's how I want to manage my assets.

00:43:00.760 --> 00:43:06.040
You can just run that in your, in a spaCy project and then it just automatically tracks everything.

00:43:06.040 --> 00:43:11.260
And it, you know, you can check that into Git and share it and other, other people can download it.

00:43:11.420 --> 00:43:13.460
So that's, yeah, I'm pretty excited about that.

00:43:13.460 --> 00:43:14.580
It works pretty well so far.

00:43:14.580 --> 00:43:15.040
Yeah.

00:43:15.040 --> 00:43:19.760
Everything you can do to make it a little easier to work with spaCy and just make it reproducible.

00:43:19.760 --> 00:43:20.420
Yeah.

00:43:20.420 --> 00:43:21.560
And it's just, the things are hard.

00:43:21.560 --> 00:43:25.680
Like there is, I'm not a fan of these all one click, everything just magically works.

00:43:25.680 --> 00:43:32.020
Like it looks, it looks nice and it's a nice demo, but like once you actually get down to like the real work, like things need to be a bit modular.

00:43:32.020 --> 00:43:33.600
Things need to be customizable.

00:43:33.600 --> 00:43:37.500
Otherwise you're always hitting edge cases or you have these leaky abstractions.

00:43:37.500 --> 00:43:39.620
So yeah.

00:43:39.620 --> 00:43:39.940
Yeah.

00:43:39.940 --> 00:43:45.760
I think things should be easy to use, but you can't just magically cover everything by just providing one button.

00:43:45.760 --> 00:43:47.060
That's just not going to work.

00:43:47.060 --> 00:43:47.360
Yeah.

00:43:47.560 --> 00:43:49.300
Cause when it doesn't work, it's not good anymore.

00:43:49.300 --> 00:43:49.840
Yeah, exactly.

00:43:49.840 --> 00:43:50.520
All right.

00:43:50.520 --> 00:43:51.360
Yeah.

00:43:51.360 --> 00:43:52.100
All right.

00:43:52.100 --> 00:44:01.380
Well, that's our six items that we go in depth into, but at the end, we always just throw out a couple of really quick things that maybe we didn't have time to fit into the main section.

00:44:01.380 --> 00:44:05.040
And I want to talk about two things that are pretty exciting.

00:44:05.040 --> 00:44:13.200
One is if you care about podcasts as a catalog of a whole bunch of things, I don't know how many podcasts there are.

00:44:13.200 --> 00:44:15.220
There's probably over a million podcasts these days.

00:44:15.220 --> 00:44:24.340
One of our listeners, Anton Ziyanov wrote a cool Python package that will let you search the iTunes directory and query it.

00:44:24.340 --> 00:44:29.080
And it's basically a Python API into iTunes podcasting directory.

00:44:29.920 --> 00:44:42.760
You know, some people think that you've got to be part of the Apple ecosystem to care about iTunes, but really that's just the biggest like directory kind of Yahoo circa 1995 style of listing of podcasts.

00:44:42.760 --> 00:44:45.460
So if you care about digging in and researching podcasts, check that out.

00:44:45.460 --> 00:44:46.060
That's pretty cool.

00:44:46.060 --> 00:44:48.120
And then, yeah.

00:44:48.120 --> 00:44:51.200
And then I've also, I'm such a big fan of f-strings.

00:44:51.200 --> 00:44:51.700
How about you too?

00:44:51.700 --> 00:44:52.120
Yes.

00:44:52.120 --> 00:44:52.540
Yes.

00:44:52.540 --> 00:44:54.460
F, yes, right?

00:44:54.460 --> 00:44:54.680
Yeah.

00:44:54.680 --> 00:44:57.400
I'm finally, I'm finally working in like Python three only.

00:44:57.400 --> 00:45:03.220
I remember, I think last time I was on the podcast, I was basically, I was saying how like, oh, all these modern things, they're so nice.

00:45:03.220 --> 00:45:09.220
I wish I could use them more, but we're still supporting Python two, but like, no, everything I write now, 3.6.

00:45:09.220 --> 00:45:09.560
Yes.

00:45:09.780 --> 00:45:22.800
And I've talked previously about a tool called Flint, F-L-Y-N-T, which lets you run against an old code base and convert all the various Python two and three styles of formatting magically into Python three.

00:45:22.800 --> 00:45:24.180
I think that was actually really nice.

00:45:24.180 --> 00:45:25.020
The episode I was.

00:45:25.020 --> 00:45:25.840
Yeah.

00:45:25.840 --> 00:45:26.900
You might've been right.

00:45:26.900 --> 00:45:28.160
Like, I wish I could run this.

00:45:28.160 --> 00:45:28.360
Right.

00:45:28.360 --> 00:45:28.640
Yeah.

00:45:28.840 --> 00:45:31.380
And yeah, I ran that against like 20,000 lines of Python.

00:45:31.380 --> 00:45:33.820
I found like just a couple errors reported them.

00:45:33.820 --> 00:45:34.420
They got fixed.

00:45:34.420 --> 00:45:35.200
So that's nice.

00:45:35.200 --> 00:45:42.460
But the thing that's bugged me endlessly about f-strings is I'll be halfway through writing the string and I'm like, oh yeah, I want to put data here.

00:45:42.460 --> 00:45:49.180
So I got to go back to the front of the string, not necessarily back to the front of the line, but maybe back to like the string is being passed to a function.

00:45:49.180 --> 00:45:55.540
So I go back to the first quote, put the F, go back forward and then start typing out the thing I actually wanted.

00:45:55.540 --> 00:45:55.780
Right.

00:45:55.780 --> 00:45:57.320
Or maybe I'll F string something.

00:45:57.320 --> 00:45:59.460
And when I really, I, oh, I'm not going to put data.

00:45:59.460 --> 00:45:59.620
Right.

00:45:59.620 --> 00:46:02.220
So it's like you're halfway through and you want it to become an F string.

00:46:02.220 --> 00:46:11.260
Well, PyCharm is coming with a new feature where if you start writing a regular string and pretend like it's an F string, it'll automatically upgrade to f-strings.

00:46:11.260 --> 00:46:11.980
Yes.

00:46:11.980 --> 00:46:12.720
Halfway through.

00:46:12.720 --> 00:46:13.520
Yes.

00:46:13.520 --> 00:46:14.080
Without leaving.

00:46:14.080 --> 00:46:15.560
So you just say curly variable.

00:46:15.560 --> 00:46:16.180
It's like, oh, okay.

00:46:16.180 --> 00:46:18.240
That means that's an F string and the F appears at the front.

00:46:18.240 --> 00:46:19.040
Yes.

00:46:19.040 --> 00:46:19.560
Nice.

00:46:19.560 --> 00:46:20.880
So that is pretty awesome.

00:46:20.880 --> 00:46:22.860
Anyway, those are my two quick items.

00:46:22.860 --> 00:46:24.860
Ines, I'm also excited about the one you got here.

00:46:24.860 --> 00:46:25.080
Yeah.

00:46:25.080 --> 00:46:25.540
This is awesome.

00:46:25.540 --> 00:46:25.940
Yeah.

00:46:25.940 --> 00:46:31.800
I had one, which is something coming to 3.9 or in 3.9, which is PEP 585.

00:46:31.800 --> 00:46:40.660
And you can use, when you use type annotations, you can now use the built in types like list and dict as generic types.

00:46:40.660 --> 00:46:45.760
So that means no more from typing import list with a capital L.

00:46:45.760 --> 00:46:46.820
Yes.

00:46:46.820 --> 00:46:47.880
Yes.

00:46:48.300 --> 00:46:52.260
So you just literally, I mean, when I first saw it, I'm like, that looks strange.

00:46:52.260 --> 00:46:55.020
But like, yes, I'm so excited about this.

00:46:55.020 --> 00:46:58.560
It probably, it'd be years until I can just like use it all across my code bases because.

00:46:58.560 --> 00:46:59.100
True.

00:46:59.100 --> 00:46:59.420
Yeah.

00:46:59.420 --> 00:47:00.760
But like, yay.

00:47:00.760 --> 00:47:01.860
That's in 3.9?

00:47:01.860 --> 00:47:02.820
Yeah.

00:47:02.820 --> 00:47:03.180
Yeah.

00:47:03.180 --> 00:47:03.720
That's in 3.9.

00:47:03.720 --> 00:47:06.180
I'm already using 3.9 and I didn't know that.

00:47:06.180 --> 00:47:06.920
You can do this.

00:47:06.920 --> 00:47:07.300
Yeah.

00:47:07.300 --> 00:47:07.780
Yeah.

00:47:07.860 --> 00:47:12.420
And Guido is one of the guys on the PEP making this happen.

00:47:12.420 --> 00:47:13.780
Like I said, he's really into typing.

00:47:13.780 --> 00:47:15.220
Oh, that's great.

00:47:15.220 --> 00:47:18.400
So this is really cool because it was super annoying to say, oh, you have this new import

00:47:18.400 --> 00:47:20.740
just because you want to use type annotations on a collection.

00:47:20.740 --> 00:47:21.540
Right?

00:47:21.540 --> 00:47:22.600
Now you don't have to.

00:47:22.600 --> 00:47:25.740
And there's actually a bunch of the collection stuff and iterators and whatnot.

00:47:25.740 --> 00:47:31.700
Like the, you know, the collections module, like that, a bunch of stuff in there is really

00:47:31.700 --> 00:47:32.040
nice.

00:47:32.040 --> 00:47:38.400
And they're compatible, like lowercase list of str is the same as capital list of str, I believe.

00:47:38.400 --> 00:47:39.780
All right, Brian, what you got?

00:47:39.780 --> 00:47:42.680
Oh, I just wanted to, I'll drop a link in the show notes.

00:47:42.680 --> 00:47:48.900
Testing code 120 is where I interviewed Sebastian Ramirez from Explosion also.

00:47:48.900 --> 00:47:54.320
And talking about FastAPI and Typer because I'm kind of in love with both of those.

00:47:54.320 --> 00:47:55.060
They're really cool.

00:47:55.060 --> 00:47:55.360
Yeah.

00:47:55.740 --> 00:47:56.120
Absolutely.

00:47:56.120 --> 00:47:57.240
All right.

00:47:57.240 --> 00:47:58.340
Well, that's a cool one.

00:47:58.340 --> 00:47:59.620
Definitely going to check that out.

00:47:59.620 --> 00:48:02.060
And you can find out why he has the cool mustache.

00:48:02.060 --> 00:48:04.920
That's right.

00:48:04.920 --> 00:48:05.760
All right.

00:48:05.760 --> 00:48:10.460
So we always end the show with a joke and I thought we could do two jokes today.

00:48:10.460 --> 00:48:13.420
So I think, Ines, do you want to talk about this first one?

00:48:13.420 --> 00:48:13.960
Oh, yeah.

00:48:13.960 --> 00:48:17.800
I mean, I'm not even sure it counts as a joke per se, but like it's more of a humorous

00:48:17.800 --> 00:48:19.320
situation, I guess.

00:48:19.320 --> 00:48:19.840
Yeah.

00:48:19.840 --> 00:48:21.780
It ties in.

00:48:21.780 --> 00:48:24.020
Well, it's Sebastian again.

00:48:24.180 --> 00:48:28.820
Like he had this very viral tweet the other day where he posted about some experience.

00:48:28.820 --> 00:48:32.980
I can just read it out because I think it needs to kind of stand on its own.

00:48:32.980 --> 00:48:33.660
So he's right.

00:48:33.840 --> 00:48:35.600
I saw a job post the other day.

00:48:35.600 --> 00:48:38.800
It required four plus years of experience in FastAPI.

00:48:38.800 --> 00:48:44.100
I couldn't apply as I only have 1.5 plus years of experience since I created that thing.

00:48:44.100 --> 00:48:49.120
And then he says, maybe it's time to reevaluate that years of experience equals skill level.

00:48:50.000 --> 00:48:53.380
And this was like, it resonated with people so much.

00:48:53.380 --> 00:48:56.660
I was actually surprised to see like everyone was like, oh, yeah, HR.

00:48:56.660 --> 00:49:02.300
Like apparently this seems to be this huge issue, obviously, that like, well, not most job ads

00:49:02.300 --> 00:49:07.040
not written by the people who actually work with the technologies and where you have.

00:49:07.040 --> 00:49:07.220
Yeah.

00:49:07.220 --> 00:49:08.160
Actually.

00:49:08.240 --> 00:49:09.120
Yeah, this is awesome.

00:49:09.120 --> 00:49:14.440
And this tweet actually just got covered on DTNS, the daily news tech show, daily tech news show.

00:49:14.440 --> 00:49:15.120
I guess it is.

00:49:15.120 --> 00:49:21.640
Alongside another posting that said you needed eight years of Kubernetes experience for another job.

00:49:21.640 --> 00:49:24.180
But of course, Kubernetes has only been around for four years.

00:49:24.180 --> 00:49:24.740
Yeah.

00:49:24.740 --> 00:49:29.940
When you say this went viral, it had 46,000 retweets and 174,000 likes.

00:49:29.940 --> 00:49:31.660
That's like, that's got some traction.

00:49:31.660 --> 00:49:33.180
I feel like this might be a problem.

00:49:33.360 --> 00:49:33.560
Yeah.

00:49:33.560 --> 00:49:33.900
Yeah.

00:49:33.900 --> 00:49:37.200
I was surprised that like so many people are like, yeah, that's a big deal.

00:49:37.200 --> 00:49:39.280
And it's like, and I mean, it is true.

00:49:39.280 --> 00:49:42.260
Like kind of tech hiring sort of seems to be broken.

00:49:42.260 --> 00:49:45.060
And it's also, it's like, it's a bit different in my case, I guess.

00:49:45.060 --> 00:49:50.120
But like, I don't qualify for most roles using the tech that I write.

00:49:50.120 --> 00:49:54.200
And in some cases that's justified because I'm not a data scientist just because I write developer

00:49:54.200 --> 00:49:56.620
tools for data scientists doesn't mean I can do the job.

00:49:56.620 --> 00:50:01.000
But in other cases, I'm like, there's kind of a ridiculous amount of arbitrary stuff you're

00:50:01.000 --> 00:50:02.000
asking for in this job ad.

00:50:02.000 --> 00:50:02.840
Maybe that's needed.

00:50:02.980 --> 00:50:07.400
Maybe not, but like it centers around like a piece of software that I happen to have

00:50:07.400 --> 00:50:10.820
written and I do not qualify for your job ad at all.

00:50:10.820 --> 00:50:18.140
Like the last time I wrote a job description, I intentionally left off the college degree

00:50:18.140 --> 00:50:23.420
requirement because all of the other requirements I was listing in there, either they had it from

00:50:23.420 --> 00:50:26.660
college plus experience or they had it just from experience.

00:50:26.660 --> 00:50:27.580
So I was fine with that.

00:50:27.780 --> 00:50:33.400
By the time it actually went live, somebody in HR had added a college degree requirement

00:50:33.400 --> 00:50:33.880
to it.

00:50:33.880 --> 00:50:36.840
I just couldn't get away with that list in that, I guess.

00:50:36.840 --> 00:50:37.400
Yeah.

00:50:37.400 --> 00:50:39.660
Master's degree in spaCy is preferred.

00:50:39.660 --> 00:50:40.660
In spaCy preferred.

00:50:40.660 --> 00:50:41.000
Yeah.

00:50:41.000 --> 00:50:44.260
But I guess another problem there is, it's like, well, look, if you ask, if HR writes these

00:50:44.260 --> 00:50:49.060
job ads with these bullshit requirements, then well, who applies?

00:50:49.300 --> 00:50:52.660
Like it's either people who are like, yeah, whatever, or people who are full of shit.

00:50:52.660 --> 00:50:54.860
And then that's the sort of culture you're fostering.

00:50:54.860 --> 00:50:59.080
And it might not even be the engineer's fault who wrote a very honest job description, but

00:50:59.080 --> 00:50:59.500
like, yep.

00:50:59.500 --> 00:51:00.640
Who applies to that?

00:51:00.640 --> 00:51:01.800
Like, yeah.

00:51:01.800 --> 00:51:04.040
You're going to make me lie about my FastAPI experience.

00:51:04.040 --> 00:51:04.380
Yeah.

00:51:04.380 --> 00:51:05.980
People just apply to anything.

00:51:05.980 --> 00:51:07.880
I'm like, yep, I have 10 years experience in everything.

00:51:07.880 --> 00:51:08.420
Great.

00:51:08.420 --> 00:51:09.220
And they're like, perfect.

00:51:09.220 --> 00:51:10.140
That's what we're looking for.

00:51:10.140 --> 00:51:10.660
You're hired.

00:51:10.800 --> 00:51:13.900
And then you wonder like, why is our company culture so terrible?

00:51:13.900 --> 00:51:14.440
Hmm.

00:51:14.440 --> 00:51:21.460
Well, I actually did have somebody apply to a job and say they have multiple years of experience

00:51:21.460 --> 00:51:23.300
in any new language coming up.

00:51:23.300 --> 00:51:27.080
Nice.

00:51:27.080 --> 00:51:28.140
All right, guys.

00:51:28.140 --> 00:51:29.520
Well, it looks like we're just about out of time.

00:51:29.520 --> 00:51:32.060
Let me give you one more joke for it.

00:51:32.060 --> 00:51:35.320
Brian, will you describe this picture and then I'll read what it says?

00:51:35.320 --> 00:51:38.480
There's a poorly drawn horse, I think.

00:51:39.020 --> 00:51:43.880
Zebra, horse that has white on the back end and black on the front end.

00:51:43.880 --> 00:51:46.020
And the text says, I defragged my zebra.

00:51:46.020 --> 00:51:48.740
I don't even know if people defrag drives anymore.

00:51:48.740 --> 00:51:51.740
So this is only going to resonate with the folks that have been around for a while.

00:51:51.740 --> 00:51:55.000
I saw that there was this great video I came across on YouTube where you can actually watch

00:51:55.000 --> 00:51:56.660
like a live defrag session.

00:51:56.660 --> 00:51:57.920
Like, I don't know, Windows 95.

00:51:57.920 --> 00:52:00.080
And it's like, I don't know, it takes a few hours.

00:52:00.080 --> 00:52:03.940
And, you know, you can kind of bring back that nostalgia and just put it on your TV and

00:52:03.940 --> 00:52:05.120
just sit there and you're like, yeah.

00:52:05.120 --> 00:52:08.640
Oh, it's like the aquarium you would put on your TV.

00:52:08.800 --> 00:52:09.040
Yeah.

00:52:09.040 --> 00:52:10.300
Like, but for tech.

00:52:10.300 --> 00:52:13.200
Follow the show on Twitter via at Python Bytes.

00:52:13.200 --> 00:52:16.060
That's Python Bytes as in B-Y-T-E-S.

00:52:16.060 --> 00:52:19.300
And get the full show notes at pythonbytes.fm.

00:52:19.300 --> 00:52:23.500
If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

00:52:23.500 --> 00:52:26.200
We're always on the lookout for sharing something cool.

00:52:26.200 --> 00:52:29.300
On behalf of myself and Brian Okken, this is Michael Kennedy.

00:52:29.300 --> 00:52:32.740
Thank you for listening and sharing this podcast with your friends and colleagues.

