#139: f"Yes!" for the f-strings

Published Thu, Jul 18, 2019, recorded Thu, Jul 11, 2019

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Special guest: Ines Montani

Brian #1: Simplify Your Python Developer Environment

Contributed by Nils de Bruin
“Three tools (pyenv, pipx, pipenv) make for smooth, isolated, reproducible Python developer and production environments.”
The tools:
- pyenv - install and manage multiple Python versions and flavors
- pipx - install a Python application with it’s own virtual environment for use globally
- pipenv - managing virtual environments, dependencies, on a per project basis
Brian note: I’m not sold on any of these yet, but honestly haven’t given them a fair shake either, but also didn’t really know how to try them all out. This is a really good write up to get started.

Ines #2: New fast.ai course: A Code-First Introduction to Natural Language Processing

fast.ai is a really popular, free course for deep learning by Rachel Thomas and Jeremy Howard
Also comes with a Python library and lots of notebooks
Some influential research developed alongside the course, e.g. ULMFiT (popular algorithm for NLP tasks like text classification)
New course on Natural Language Processing:
- Practical introduction to NLP covering both modern neural network approaches and traditional techniques
- Highlights:
  - NLP background: topic modeling and linear models
  - Rule-based approaches and real-world problem solving
  - Focus on ethics – videos on bias and disinformation

Michael #3: Cloning the human voice

In 5 minutes, with Python
via Brenden
Clone a voice in 5 seconds to generate arbitrary speech in real-time
An implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time.
Watch the video: https://www.youtube.com/watch?v=-O_hYhToKoA
Also: Fake voices 'help cyber-crooks steal cash’

Brian #4: Ab(using) pyproject.toml and stuffing pytest.ini and mypy.ini content into it

Contributed by Andrew Spittlemeister
My first reaction is horror, but this is kinda my thought process with this one
- toml is not ini (but they look close)
- neither pytest nor mypy support storing configuration in pyproject.toml
- they both do support using setup.cfg (but flit and poetry projects don’t use that file, or try not to)
- they both support passing in the config file as a command line argument
- you can be careful and write a pyproject.toml file that is both toml and ini compliant
- drat, this is a reasonable idea, if not a little wacky
- no guarantee that it will keep working
one thing to note: use quotes for stuff you normally wouldn’t need to in ini file.

Example ini:

    [pytest]
    addopts = -ra -v

if stuffed in pyproject.toml

    [pytest]
    addopts = "-ra -v"

to run:

    > mypy --config-file pyproject.toml module_name
    > pytest -c pyproject.toml

Ines #5: *Polyaxon*

A platform for reproducing and managing the whole life cycle of machine learning and deep learning applications.
We talked to lots of research groups and everyone works with just their GPU on desktop. Super slow – you need to wait for results, schedule next job etc.
Polyaxon is a free open source library built on Kubernetes. Really easy to set up, especially on Google Kubernetes Engine.
Especially good for hyper-parameter search, where you might not need GPU experiments if you can run lots of experiments in parallel
Release v0.5 just came today. Big improvements:
- Plugins system
- Local runs, for much easier debugging
- New workflow engine for chaining things together and run experiments with lots of steps

Michael #6: Flynt for f-strings

A tool to automatically convert old string literal formatting to f-strings
F-Strings: Not only are they more readable, more concise, and less prone to error than other ways of formatting, but they are also faster!
Converted over 500 lines / expressions in Talk Python Training and Python Bytes.
Get started with a pipx install: pipx install flynt
Then point it at
- A file: flynt somefile.py
- A directory (recursively): flynt ./
Converts code like this: print(``"``Greetings {}, you have found {:,} items!``"``.format(name, count))
To code like this: print(f"Greetings {name}, you have found {count:,} items!")
Beware of the digit grouping bug.
Good project to jumping in and contributing to open source

Extras:

Thanks to André Jaenisch for pointing the existence of ReDoS attacks and a good video explaining them.

Michael:

Python httptoolkit
Python Magic’s name via David Martínez
Flying Fractals (video and code)
Python 3.7.4 is out

Ines:

Explosion (?)
spaCy IRL 2019
- our very first conference held on July 6 in Berlin
- many amazing speakers from research, applied NLP and the community
- all talks were recorded and will be up on our YouTube channel very soon
FastAPI core developer Sebastián Ramírez is joining our team
- FastAPI was presented by Brian in episode 123 of this podcast
- we’re big fans and have been switching all our APIs over to FastAPI
- we’ll keep supporting the project and will definitely give Sebastián enough time to keep working on it

Joke:

A programmer walks into a bar and orders 1.38 root beers. The bartender informs her it's a root beer float. She says 'Make it a double!’
What do you call a developer without a side project?
- Well rested.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 139, recorded July 11th, 2019.

00:09 I'm Michael Kennedy.

00:11 And I'm Brian Okken.

00:11 And I want to welcome Ines Montani to the show.

00:14 Ines, it's great to have you here.

00:16 Special guest, going to help us cover the news of the week.

00:18 Thanks for being here.

00:19 Yeah, thanks for having me.

00:20 I'm really excited.

00:20 It's going to be really fun to have you participate in this.

00:23 Also, thank you to DigitalOcean for sponsoring this episode.

00:25 Check them out at pythonbytes.fm/digitalocean.

00:28 More on that later.

00:30 Brian, we've talked a lot about like, how do you install Python?

00:32 How do you manage Python?

00:34 How do you upgrade your stuff?

00:35 There's just so many ways.

00:37 And then we've got things like pipenv, flit, poetry.

00:40 It goes on and on.

00:41 And it still goes on, right?

00:43 This was a contribution from Niels DeBruin sent us this.

00:46 There was an article called Simplify Your Python Developer Environment.

00:50 And it talked about using pyenv, pipex, and pipenv together.

00:56 And my first reaction was like, we've already covered all of these.

01:00 However, I have tried pipex.

01:02 Actually, I did the joke, the pi jokes.

01:05 I did that with pipex.

01:06 And I've tried pipenv once before.

01:09 It doesn't really do anything for me that I really need.

01:13 And the pyenv, I've tried it and it didn't work for me.

01:16 So actually, all these things, I kind of wanted to give them another shot anyway.

01:21 So I went ahead and read this article.

01:23 And it actually is pretty nice.

01:26 It's a nice pros and cons of all the tools and how to set them up.

01:30 And I think for somebody that wants to try these out again, this is a good article to read to try to get back into it.

01:37 So if people don't remember, pyenv is used to install and manage multiple Python versions and flavors on a computer.

01:45 And then pipex is something that allows you to create, take a Python application and have it bundled with its own virtual environment and use it globally on your system without having to activate the environment.

01:57 And then pipenv is for when you're working on project and application, it's a way to manage virtual environments and dependencies on a per project basis.

02:07 These are really interesting because I feel like often they kind of blend like in a blur together, right?

02:14 You're like, well, I know there's all these E and V things I could use and whatnot.

02:17 And it's like, well, when should I use what and what one is relevant for the situation?

02:21 And what do you think, Anis?

02:22 It actually took me a second to remember the one that I've used.

02:26 So I'm totally in the market for stuff like that.

02:27 But actually, I hadn't heard of pipex at all.

02:30 But I think, yeah, pyenv I definitely use.

02:32 And I think it's quite important, at least for the work I'm doing, because I need to run stuff in all kinds of Pythons.

02:38 And I mean, as a library developer, we need to build stuff for like Python 2.

02:42 We're supporting Python 2.7 and 3.5.

02:45 Just so folks know, like you're deeply involved with spaCy and some tools built on top of that, the natural language processing, which will come evident as we go through some of your topics and more.

02:54 But, you know, maybe not everyone knows your background.

02:56 But I feel like pyenv is most relevant for people building libraries.

03:02 And pipenv is really relevant for people building applications.

03:07 I don't know.

03:08 What do you think, Brian?

03:08 Both of them are important for people like us that have to do both.

03:11 I have a question for Ines.

03:14 The pyenv.

03:16 So one of my concerns was, can I use it to install, to have multiple Pythons and still be able to run them all from one project, like with the tox build to be able to access all of them?

03:28 So I haven't tried it all like within the exact same projects.

03:31 I think you might need different virtual environments for that.

03:34 At least that's how I use it.

03:35 So I use venv to create my virtual environments.

03:38 And then they are created with whichever Python version I've configured locally.

03:44 So I do like pyenv local 2.7.

03:46 And then the virtual environment will have 2.7 in it.

03:49 Okay.

03:49 Well, I'll give it a shot.

03:50 Yeah, yeah.

03:51 Really nice.

03:52 I'm a big fan of PipX.

03:53 I think PipX is super cool.

03:55 So PipX is great if you have a thing that you just want as a utility on the command line in your terminal.

04:00 That happens to be Python-based.

04:02 So instead of brew install something or npm install something, PipX does that.

04:07 So, for example, I have like cookie cutter.

04:09 I have the HTTP library.

04:12 I have glances.

04:12 I have PyJokes, of course, because we run this podcast.

04:15 And some other stuff that I'll even talk about later, like PTPython.

04:19 Is that a bit like PEX files or is that something else?

04:21 It's a little bit.

04:23 What it does is you say I want to have, let's say I want the utility PTPython, which is like an Emacs enabled rebel, basically.

04:31 Okay.

04:31 So I want to have that.

04:32 And I want to be able to just type it.

04:33 It's not tied to any project.

04:34 I just want to have it as a command on my computer that I can use.

04:37 So what you can do is you can PipX install PTPython.

04:41 And then it just automatically puts it in a location, modifies your path so that it has its own virtual environment.

04:48 It upgrades itself and its dependencies separately.

04:51 But anytime I want, I can just type PTPython and go crazy or glances or PyJoke or whatever.

04:55 Okay.

04:56 I think I get it.

04:57 Okay.

04:57 That actually sounds quite cool.

04:58 Yeah.

04:58 It's really nice.

04:59 You can just say, what are the updates from my Python libraries that I use as applications or little utilities?

05:04 It's pretty cool.

05:04 I like that one as well.

05:05 Yeah.

05:05 I wonder if we can ship our annotation tool Prodigy like that because it's very, very command line heavy.

05:10 And, you know, it's usually kind of a separate thing.

05:13 All it does is build upon pip.

05:15 So if you could pip install the thing and it has an entry point, then you can pipX install it as well.

05:21 Ah, okay.

05:22 Yeah.

05:22 Cool.

05:22 Yeah.

05:22 Pretty cool.

05:23 All right, Brian, are you switching to this?

05:25 Are you going to use PyEnv, pipX, and pipEnv?

05:28 Is this your new plan?

05:29 I definitely want to try PyEnv because I want to try new Python versions.

05:33 And my old process was to just download the regular install and install it.

05:39 And then my path is all weird.

05:42 And, yeah, it's a mess.

05:43 Yeah, classic.

05:44 Cool.

05:44 Yeah, this is nice.

05:45 Definitely for people looking for a different workflow.

05:47 It's something they can check out for sure.

05:49 All right, Ines, what's this next one that you got for us?

05:51 This week was actually, or like the past few weeks were actually super exciting also in my field.

05:56 So there was a new release of a fast AI course for natural language processing.

06:02 So fast AI is a very popular free online course for deep learning by Rachel Thomas and Jeremy Howard.

06:07 And it also comes with a Python library.

06:09 It comes with lots of notebooks, really active communities.

06:12 So if you want to get into like the modern machine learning stuff, that's like probably the go-to course that I would also recommend to you.

06:19 And they've also produced some very influential research developed alongside the library and the course.

06:26 So, for example, ULM Fit, which was a very popular algorithm for text classification.

06:31 And yeah, the new thing is they've just released a course of natural language processing.

06:34 And it's a very practical introduction.

06:36 And what I thought was really, really interesting and really cool about it is that it, of course,

06:40 covers like the modern neural network approaches and all the like very hip stuff.

06:44 But it also focuses on traditional techniques.

06:47 So just the whole background of like, okay, what did people do before deep learning, topic modeling, linear models,

06:54 just really all the basics and even like rule-based approaches like regular expressions.

06:59 Like, you know, some people might look at this and be like, what regics?

07:01 That's like, you know, I did that 20 years ago.

07:03 But in fact, like, you know, in real life and in like real life practical applications, that's like super important.

07:09 And you can do a lot with that, that, you know, really gets the job done.

07:12 So I thought that was really cool.

07:13 And of course, another thing, the course has a really strong focus on ethics as well.

07:18 So their videos on bias and disinformation and basically just the topic of like, okay, think about the impact that the work you'll be doing has.

07:26 And I thought that was incredibly refreshing.

07:27 And of course, disclaimer, like I haven't actually watched all the videos yet.

07:31 I don't even think that's like physically possible to do all of that since it was released.

07:36 It's too new and it's too long.

07:37 Yeah.

07:38 Yeah.

07:38 It's like, it's a lot of material, but yeah, I really like the work they're doing.

07:42 And it's like, yeah, it's a very, very significant release.

07:44 And it's all free.

07:45 Yeah.

07:45 It's super cool.

07:46 Yeah.

07:46 It looks like it's free.

07:47 Like I was able to go pull up the videos without even having an account there.

07:50 It just sort of takes you through it.

07:51 More like an online video book.

07:53 You just make your way through, right?

07:54 Yeah.

07:55 And lots of like notebooks.

07:56 So you can open the same notebooks.

07:57 You can play through the examples and of course also use their library to, you know, really work through the things efficiently.

08:04 Sure.

08:04 Yeah.

08:04 That's super cool.

08:05 Does it cover any of the libraries that you all work on?

08:08 Like spaCy or anything?

08:09 spaCy is a much more like high level toolkit and framework.

08:12 So it's like, you know, this is really the basics of the technology.

08:16 So while the Fast AR library, I think it does use spaCy for tokenization.

08:19 But spaCy is really, you know, once you're building applications and you have some problems and you want to construct your pipelines and really ship something into production, that's when you would be using spaCy.

08:28 But actually, in fact, spaCy is not really the best solution if you really want to learn the underlying algorithms and implementations.

08:35 Because we're actually super opinionated.

08:37 You get one implementation and, you know, you kind of take that or you plug in your own.

08:41 I see.

08:42 This is like learning the algorithms and the foundations that maybe spaCy uses so you understand it better.

08:46 Yeah, exactly.

08:46 And also, you know, giving you some, yeah, the background.

08:49 And yeah, even the rule-based ideas, which, yeah, I still think it's so great.

08:54 I scroll through it.

08:55 I'm like, great, regular expressions.

08:56 That's like really what people should think about.

08:59 Yeah, I mean, that's a start, right?

09:00 Like sometimes you just want to pull data out of text and there you go.

09:03 Yeah, and especially then some people really, you know, then they start throwing like a neural network model at it when actually best solution would have been to just write one regular expression.

09:11 Like, I don't know, you work at a company, you have a statistical model that recognizes organization names.

09:17 And then, you know, your manager comes to you and is like, well, it's all great, but like it often gets our own company name wrong.

09:22 Can you fix that?

09:23 That's really embarrassing.

09:24 And, you know, you can spend like hours trying to tune your model and like update it and fine tune it with more examples.

09:30 Or you can just add one regular expression or one rule on top that says, okay, whenever I come across this string, don't get it wrong.

09:38 And that will likely take you like five minutes and it's much more effective in the real world.

09:42 So this is the practical applied natural language processing, right?

09:46 Yeah.

09:46 Speaking of language, this next item that I want to talk about, it just scares me.

09:51 So let me tell you quick about it.

09:52 I'll get your two opinions.

09:54 The idea is that we can clone the human voice by giving it a sample and using some sort of neural network type thing.

10:02 I'm not sure exactly.

10:03 So this was sent in by Brendan.

10:05 Thank you for sending that over.

10:06 And, you know, in just a couple of minutes, you can load up somebody's voice, use some Python, a pre-trained machine learning model, and then type in text.

10:16 It will speak whatever that text is in the voice of that person.

10:20 It's just, I don't know.

10:23 I feel like public discourse is in serious jeopardy here.

10:26 What do you all think?

10:27 This is pretty interesting.

10:28 And I was a little frightened, actually, watching this video of how easy it was to copy somebody's voice.

10:35 I mean, I don't know what the code behind it's probably not easy, but it's a little creepy.

10:39 My first thought was I could use it for if I have solo episodes, I could be my own co-host.

10:44 But that'd be cool.

10:46 That'd be pretty funny.

10:47 Yeah.

10:47 No, but I think I would say, I think it's still quite compute intensive, like, to do that, right?

10:52 I think you still, you know, if you really want to have good results and really want to do it right, but it's true that this, you know, this really, this is a really good, like, example of, wow, that's possible.

11:01 And that's the type of stuff that's been possible for quite a while, especially across, like, you know, video image and, you know, also voice, audio.

11:09 Right.

11:09 I mean, we've heard of the deepfake stuff for videos and whatnot, and that's kind of scary.

11:13 But this is, it's almost like this could be worse, right?

11:16 I can imagine somebody just putting a little static, a little muffling filter on it and saying, oh, here's a hot mic take behind the scenes.

11:24 Somebody said something they weren't supposed to during a presidential debate or some kind of public figure they're trying to discredit and just make them say stuff.

11:33 Now, I don't know what you all think.

11:34 To me, it still sounds a little bit off.

11:36 Like, it doesn't sound exactly like the person, but, you know, it's pretty close.

11:42 But I think you would still use it.

11:43 Yeah.

11:43 And also, I mean, I do think this type of technology will definitely lead to a situation where we will all just take things we hear in, like, recordings or videos much less seriously.

11:52 Like, you know, I do think it will develop into a culture where we don't necessarily trust an audio recording because, you know, it could have been produced by something like this.

12:00 But I do think just for everyday life scam, like, I think it's very timely because, yeah, there are all these news articles about all these, like, financial scams and companies using deepfake audio.

12:11 Or even before, like, using spoofed emails that were quite effective.

12:15 And actually reading that, like, you know, the trick there, I'm like, I can totally see how, like, an accounting department falls for that and thinks, oh, it's their boss in a hurry.

12:22 And then imagine that with a voice of, like, you know, CEO.

12:26 So scary.

12:27 Exactly.

12:28 Yeah.

12:28 So I'm going to link to the video.

12:29 You all can watch it.

12:30 I'm also linking to the software that did this.

12:33 Apparently, it uses something called transfer learning from speaker verification to multi-speaker text-to-speech synthesis.

12:41 And that even has an acronym, SV2-TTS, of course.

12:44 And so this was based on someone's thesis.

12:47 And you can watch the video and get a good sense.

12:49 But, yeah, you just imagine, like, I call, I somehow get the number of the CEO and I call them up and I record that call.

12:57 And then I take their, like, I don't need only five seconds or so with their voice.

13:00 And then I take that and I use some text to, like, real-time generate.

13:04 I'll call up the accounting department and say, hey, this is so-and-so.

13:07 We've got a super emergency.

13:08 Really important client.

13:09 We forgot to pay them.

13:10 We owe them $10,000.

13:12 And you just, like, type and then replay what you type live back to them.

13:17 Yeah, and then you overlay that, yeah, with some background music, background noise and just, like, oh, I'm, like, in a taxi right now.

13:24 Exactly.

13:25 Like, please get this done ASAP.

13:26 Otherwise, we're in trouble.

13:27 Bye.

13:27 Sometimes what usually doesn't work for email.

13:29 But if it's literally a voice that's interacting with you that sounds like the boss, well, it might work.

13:34 Hopefully we didn't give anyone ideas.

13:36 I guess you guys are better criminals than me.

13:38 I was just thinking, like, a different version of Ferris Bueller's Day Off.

13:42 You could just use this to call in and excuse yourself from school.

13:45 Oh, my gosh, you're right.

13:47 This would be beautiful when I was in, like, middle school or high school.

13:50 Oh, my goodness.

13:51 Yeah.

13:51 No, Michael's not feeling well.

13:54 Is he going to be okay?

13:54 He may be out tomorrow, but he'll be back pretty soon.

13:57 All right, then.

13:59 Yeah, that's really horrible.

14:01 Yeah, I mean, pretty soon you have, like, kids recording their parents' voices.

14:04 Yes.

14:04 I don't know if that works.

14:06 These are all bad.

14:07 This is a good example of why, yes, a focus on ethics, you know, when you're learning these technologies is, like, incredibly important.

14:12 Because, you know, we have that technology and every developer should think about, okay, what's the impact of having this and using this?

14:19 And, okay, we can release it to really also, you know, make everyone aware that this exists.

14:24 And, you know, we're talking about this right now, but still.

14:26 Yeah.

14:27 I guess final thought on this one, Ines.

14:28 What do you think the chances of some sort of fingerprinting or, like, system that can determine that this was faked, right?

14:37 Like, not a human, but if I could take this and feed it to, say, another ML model that knows, like, the little glitches that show up in the system, like, will we be able to verify stuff or not in the future?

14:48 Are we just lost?

14:49 I think so.

14:49 I think there have been, like, some experiments where they tried that.

14:52 And, I mean, another approach would be, okay, you can always encode things in the model that, like, only show up under very, very certain circumstances.

14:59 So, that's how, you know, you can watermark that model.

15:01 You can release that.

15:03 And then if you say a very, very, very specific sequence or if you type nonsense secrets in there, it will always produce something nonsense but very differently.

15:12 And then you're like, ah, that's the system that was used.

15:15 And, you know, that's how we can.

15:17 It won't work on the output, but, like, at least, you know, we can at least have some way of kind of finding out what type of system.

15:23 Yeah, that's part of the ethics part, right?

15:25 Like, that you embed these little watermarks rather than just, like, put it out there.

15:29 I don't know.

15:30 It's pretty scary to me, but I think as a society, we'll come around to figure out what to do about it.

15:35 Cool.

15:35 Well, not scary is DigitalOcean.

15:37 I just want to tell you quickly about them.

15:39 And they're supporting the show.

15:40 So, thanks to DigitalOcean.

15:41 All of our software runs on top of DigitalOcean infrastructure.

15:45 You get the MP3s, it delivers either streams or downloads out of there, things like that.

15:49 So, they're really, really great.

15:51 You can get started for as little as $5 per month for a server.

15:54 And they got a bunch of cool services, managed databases, load balancers, and whatnot.

15:59 And it's, you know, it's not like EC2, which is so complicated.

16:02 It could run Netflix.

16:03 It's like the simple thing that you just need to build your app and get it going.

16:06 So, check them out at pythonbytes.fm/DigitalOcean.

16:10 And you get a $50 credit for new users there.

16:12 And definitely highly recommend it.

16:14 Brian, what's this next one that you're working on here?

16:16 Okay.

16:17 Well, another contributed by a listener, this one from Andrew.

16:20 He contributed a little snippet that was on a Reddit stream.

16:26 And it was, I'm going to just guess, just describe it as abusing the pyproject.tomo file by putting any file stuff in it.

16:35 So, the example that he gave was you can have the pytest any file and the mypy any file or two any files for tools.

16:44 And you can, they're in any file format.

16:46 And Tommel files kind of look like any files, but they're not.

16:51 They're different.

16:51 And you can break any with Tommel syntax and you can break Tommel with any syntax.

16:57 However, you can write them such that they are, if you're careful, you can write them such that they comply with both.

17:04 And I went ahead and tried this out.

17:07 I was able to try putting pytest any, like, options within the Tommel file.

17:13 And, you know, both pytest and mypy do not support doing this.

17:17 But they do support passing in a path of where their config file is.

17:21 And if you pass in the project.tommel file, you can get it to work.

17:27 And if all you're trying to do is reduce the number of files in your project, yeah, this kind of works.

17:32 Why would you want to do that?

17:34 Just to try to reduce the number of files in your top level directory.

17:37 Okay.

17:37 I don't know.

17:38 I should, like, use all of these, like, files more.

17:41 And I feel like, you know, I love this idea of imagine if there was one config file.

17:44 Like, really, you know, only one place where you put everything.

17:47 Also, your dependencies.

17:49 Everything just goes in one file.

17:51 And then you have that.

17:52 But for some reason, it's never actually worked out that way in practice.

17:55 You can put it in the setup.cfg.

17:57 That's a possibility.

17:59 But those are any file syntax also.

18:01 I have to admit, like, our projects don't even have a setup.cfg.

18:04 We have a setup.py.

18:06 And then we have the requirements.txt.

18:07 Okay.

18:08 I have an admission as well.

18:10 Like, I know we're talking about pyproject.toml, but I'm still just using requirements.txt as well for some of my projects.

18:15 Because, you know what?

18:17 The workflow works.

18:18 I've got, like, external systems, like PyUp that are out there, like, automatically doing PRs for changes.

18:24 I mean, it's just, like, it's super cool.

18:26 But at the same time, I already got a flow working.

18:28 And I just, you know.

18:29 Yeah.

18:30 No, I can relate.

18:31 And so a user once, I think, contributed a pyproject.toml to spaCy.

18:35 And, you know, I really appreciated that because I'm like, great.

18:38 But it's still, I don't think, yeah, we can't really ditch requirements.txt yet.

18:43 And so now we also have that.

18:45 And now if we update, like, a dependency, I have to manually edit that in three places.

18:50 Yeah.

18:50 Yeah, yeah.

18:51 That's how it goes.

18:53 I'm on the bandwagon.

18:54 I'm switching.

18:56 I'm using Flit now.

18:57 So I'm using pyproject.toml and Flit.

19:00 Okay.

19:01 And that works cross-platform, cross-Python?

19:04 I don't know.

19:04 It works for me.

19:06 Well, that's always good.

19:07 Yeah.

19:07 I'm using it for mostly 3.6 and above, 3.6, 3.7, 3.8.

19:12 My personal projects, I think it's sufficient.

19:15 And the ones I'm supporting for other people, I think it's fine for an individual project

19:19 owner to say, I'm not supporting 2.7.

19:21 So.

19:22 No, of course.

19:22 Like, I would never, you know, and I also would never go and just, like, whine about, like,

19:25 oh, there's, of course, there's, like, an edge case for 2.7 on Windows.

19:29 And there's something there.

19:30 Like, I understand it's not, like, as easy if, like, you know, there was, like, the Python

19:34 that could just, like, magically fix everything.

19:36 Yeah.

19:36 You know, I appreciate it.

19:37 There's a lot of work that went into all of this.

19:39 And it's like, yeah, something's a bit tricky.

19:41 One of the interesting things was coverage.py.

19:44 I use that a lot.

19:45 And it got, there was a request to put pyproject.toml support on coverage.

19:52 The reason why it isn't there isn't because of any sort of, like, not that it would be

19:56 cool, but the Toml parsing is not part of the standard library.

20:01 And coverage has the strict policy that the only dependencies that it has are standard library

20:07 dependencies.

20:08 That's a reasonable desire also.

20:11 Well, yeah.

20:11 Yeah.

20:12 Yeah.

20:12 That's cool.

20:12 You just get coverage.py and just run the file or whatever, right?

20:15 Yeah.

20:15 But maybe we should get Toml support added to the standard library and then it wouldn't be

20:19 an issue.

20:19 There you go.

20:20 Yeah.

20:21 That's a whole different discussion.

20:22 I know that's quite a heated debate about what should be on the standard library these

20:26 days.

20:26 And the trend is less, not more.

20:27 I think if you take the poll.

20:29 Yeah.

20:30 Yeah.

20:31 Yeah.

20:32 So some of the tools that you build are absolutely about making machine learning easier and do that

20:38 across teams.

20:39 So I know that you're really turned on to that space and pay a lot of attention.

20:42 So this Polyaxon one that you found for our next item must be pretty interesting.

20:46 Yeah.

20:47 So basically, it's actually quite funny because, you know, I've obviously like, you know, thought

20:50 about what I was going to talk about and had like something else planned.

20:53 And then really today, earlier today, that release came out version 0.5 of Polyaxon.

20:58 So I was like, okay, great.

20:59 This is perfect.

21:00 It's like as if like, you know, they'd waited for my podcast recordings.

21:06 Yeah.

21:06 So basically, we've been using Polyaxon internally.

21:08 And essentially, it's a tool for experiment management.

21:11 So, you know, if you work in machine learning and you train your models, you have to like

21:14 run tons of experiments.

21:16 And you have to, you know, you run an experiment, train a model, look at the results, then you

21:20 stop it.

21:21 Then you tweak some other knobs, then you try again.

21:24 And you keep doing that until you have a good result.

21:26 And one thing we always do when we travel and like visit universities and research labs,

21:31 we usually always ask them like, hey, how do you run your experiments?

21:35 And usually they're like, well, yeah, you know, we got this GPU and it sits on my desk.

21:39 And then I start an experiment and then I sit around and then I wait.

21:43 And, you know, that's at like the top, some of the top like labs and like, you know, people

21:47 where you'd think like, oh, they must have everything taken care of, tons of money.

21:51 It's like, no, they're sitting there with their little GPU on their desk.

21:53 And that's how it's done.

21:56 And basically, Polyaxon basically, you know, helps you solve this.

21:59 So it's like super, you know, it's built on Kubernetes.

22:01 It's like very easy to set up.

22:03 And especially, you know, if you're already set up with cloud computing and yeah, and you

22:07 can also do stuff like hyperparameter search where like, you know, every hyperparameter

22:11 is a tiny knob and you have like tons of them.

22:13 And then you want to find the one combination that gives you better accuracy.

22:17 And so you can run lots of experiments, see them, you know, in their little graphs and like

22:22 try things out.

22:23 So it's been a very great tool.

22:25 It's all open source, which is, you know, very much in our spirit.

22:28 And yeah, they just released 0.5 and which comes with a plugin system, which is also great.

22:35 It's very much in our spirit.

22:36 That's also how we like to do things with spaCy.

22:39 You can run it locally.

22:40 And like, you know, it comes with some new features for chaining stuff together.

22:44 If you have your experiments have lots of steps.

22:46 So yeah, it's a great tool.

22:47 If you're working in the field.

22:49 It looks super cool.

22:50 Yeah, I can definitely recommend it.

22:51 Yeah.

22:51 So it's got a platform as a service offering.

22:53 So you just, you know, kick it off and have it go.

22:55 But also like on-premise enterprise option too.

22:59 Yeah, that's cool.

22:59 I think that's the focus.

23:00 It's like, you know, you run it and then it gives you like a little UI and you just like,

23:04 you know, you set it up yourself on your service and then it manages that.

23:08 Yeah, this looks really great.

23:09 Runs on Google's Kubernetes engine, among other things.

23:12 Probably pretty much any Kubernetes cluster, I guess.

23:16 I haven't tried it.

23:17 Yeah, I think you might have to do a bit more like setup.

23:20 If, you know, you're bringing your own, but like it should be quite very straightforward.

23:24 Google actually makes this quite straightforward.

23:26 And yeah, and another nice thing here is with the hyperparameter search.

23:30 Like if you, yeah, most machine learning stuff is done on GPU, but not everyone has GPUs.

23:35 They're very expensive.

23:36 And we actually say it's not always the best choice necessarily, because if you just want

23:41 to run lots of experiments, you can run them all in parallel.

23:43 You can run like thousands in parallel.

23:46 And then, you know, if you have a tool like Polyaxon that can help you do that.

23:50 So, you know, you don't have to kick them all off manually.

23:52 It's actually going to be much cheaper and much more efficient.

23:56 And you don't need a fancy GPU.

23:58 You can just run it on CPU.

23:59 Right.

24:00 Yeah.

24:00 The GPUs are great, but they're much harder to come by.

24:03 So that's cool.

24:04 Yeah.

24:04 It also has something about you can run it on your laptop as well.

24:07 Yeah.

24:07 It's a little data science of the box thing they talk about there at the end.

24:10 So, yeah, super cool.

24:11 A nice one.

24:12 All right.

24:12 This last one I want to talk about here.

24:14 Actually, the way I got it onto my system is I use PipX.

24:18 It's my PipX installed.

24:20 This thing called Flint.

24:21 So we've heard about linting and we've heard about f-strings.

24:26 And I'm guessing some combination thereof is where the name of the thing called Flint came

24:31 from.

24:32 It's quite new.

24:32 It's not super popular yet, but it works really, really well.

24:36 So the idea is I've got some code.

24:38 Maybe it's old code.

24:41 Maybe I just haven't bothered to write everything using f-strings.

24:44 And I would like to modernize it in its string processing.

24:47 So this tool, what you can do is you can point it at a single Python file, or you can just point

24:53 it as a directory, like a top level directory to just go to every, you know, traverse the

24:57 whole directory tree and find all the Python files and then rewrite all the string operations

25:03 to be f-strings.

25:04 Nice.

25:05 And it does a pretty good job.

25:06 It'll do the percent, you know, Python 2 style formats, as well as the, you know, dot format

25:13 style.

25:14 And it'll just replace all those with f-strings.

25:16 There's a couple of things it doesn't do.

25:18 If it's like multi-line, really long stuff, it won't replace those.

25:22 And when I first tried it, it actually was making a mistake on digit grouping format.

25:28 So if you have curly braces colon comma, because you want, you know, thousands, millions grouping

25:34 and so on, that like just went insane and broke my code.

25:37 But I submitted a bug over to the guys working on it, fixed it.

25:41 I believe a new release is already out.

25:43 So that shouldn't be there, but just, you know, run it on something you have under source

25:47 control and just look at the lines that have changed before you do the commit or, you know,

25:52 run your task, something like that.

25:53 Yeah.

25:53 Well, you're such a good open source user.

25:55 Like, you know, found a bug, you know, reported it probably with like a nice, you know, description

26:01 fixed.

26:01 I'm like, I ran this on all of Talk Python training and all of Python bytes and some of it broke.

26:06 So what I found out is it's exactly this.

26:08 If it has digit grouping, it broke.

26:10 And so then they fixed it, but it was no big deal.

26:12 I think I converted about 500 to 700 string formats over to f-strings.

26:18 And it just, it's cleaner, shorter, nicer.

26:21 The thing with f-strings is I always, I don't know if you all use it, I'll ask you a sec,

26:26 but like, I'm always like, okay, I'm going to write the string.

26:28 I'd say quote, type, type, type.

26:30 Oh, I want to put something in here.

26:31 Curly brace.

26:33 I wish I would have done the F back, back, back, back, back, back, back, put the F and then

26:37 back, back, back, back, back.

26:38 And then type the thing.

26:39 I'm like, well, that was more work than just dot format.

26:40 Cause the IDE will auto-complete us like dot F and then boom.

26:43 Like, so a lot of times I ended up using the format anyway, but I still prefer to have the

26:48 f-strings and read them.

26:49 So this way I can write it however I want and then just hit it with this before I do a check-in.

26:53 I think that we should ask VS Code and PyCharm to detect when we put a curly brace in a string

26:58 and automatically add the F.

27:00 Yes.

27:00 Just like a hockey.

27:01 I was going to say the exact same thing.

27:03 So it's not.

27:04 That's awesome.

27:05 You all are in the same boat.

27:07 So yeah, this is, this is really cool.

27:08 I definitely think my code is nicer.

27:10 I originally created like when I created the Python bytes website and I created Talk Python

27:14 training.

27:15 This was when the latest version of Python and Ubuntu was three, five.

27:20 So we didn't have f-strings.

27:22 And I actually took the server down once on accident because I used an F string and a

27:25 little utility file that was like in the same directory and like the scanning path, looking

27:30 for the routes, found that, couldn't parse it and the website couldn't start.

27:34 I'm like, why is it down?

27:35 What have I done?

27:35 I didn't even change it.

27:36 So anyway, I'm really happy now that I could just take all that code that I used to like

27:41 leave in the format style and just flint space, you know, just run on this directory.

27:45 Boom, it's done.

27:45 So yeah, it's really nice.

27:47 I can't wait to just like intuitively just use like f-strings and all that stuff.

27:50 It's still kind of, I don't know, just ingrained in like my brain.

27:53 Like even, I know, I go to conferences and I see people use all the new syntax and I'm

27:56 like, yeah, oh, that's so nice.

27:58 But it's just like, you know, in my day to day work, you know, even if we don't support

28:03 two, seven, we support three, five.

28:05 And they're just like a lot of these.

28:07 Yeah, exactly.

28:08 You know, there is this thing called, I can't remember what it was called, Brian.

28:12 We covered it where it lets you add F string support to Python two.

28:15 It may also work for Python three, five.

28:18 You can definitely retroactively add f-strings to the format.

28:22 It's some weird way.

28:23 But then you need like another runtime dependency, which.

28:26 Yeah.

28:26 It's not worth it.

28:27 It does some weird thing.

28:28 That's how it works.

28:29 Yeah.

28:29 That's a bit unattractive.

28:30 It like rewrites like the file loader with a certain weird encoder.

28:35 It's like, it's pretty sketchy.

28:36 Yeah.

28:36 Okay.

28:37 Now I wouldn't want to ship that in like our libraries.

28:39 Oh, come on.

28:40 Why not?

28:40 Just run your tests.

28:42 Make sure they pass.

28:43 That's good.

28:43 Yeah.

28:44 Good.

28:44 But it's cool that something like this is out so that when you decide like we're no

28:47 longer supporting three, five, you just hit it with this and like, you know, a quick

28:51 scan through the files and it's, it's F stringified.

28:53 That's going to be so, so nice.

28:54 Like I can't wait.

28:55 I'm yet all that and all the type hints and like, you know, once we can drop all of the,

29:00 the older versions, like I can be so, also nice.

29:03 So yeah, I'm really, I won't even mind like rewriting all of the code.

29:06 Like I think with our team, we're just going to sit down and be like, yay, let's do this.

29:09 We're going to do it.

29:10 Yes.

29:11 Here we come.

29:11 Yeah.

29:11 It's going to be so satisfying.

29:12 Like years later.

29:14 Yeah.

29:14 Well, it's the curse of success, right?

29:16 You have so many people using your libraries that you just got to keep it sort of a little

29:20 bit backwards.

29:20 Yeah.

29:21 Sure.

29:21 And like some people are still stuck on legacy code.

29:23 Like, I mean, I'm not, it's not, you know, some people like look down on like companies

29:26 that are still on like Python too, but it's like, you know, it's not like many of them like really enjoy using all this legacy software and legacy like stuff.

29:35 They're just like, it just exists.

29:37 And we might as well keep supporting it if we can.

29:40 Yeah.

29:40 What's the Python 2 story for you all?

29:42 You're still supporting it for now?

29:44 Yeah.

29:44 And I think we will for a while.

29:45 After January?

29:46 Yeah.

29:47 We probably will.

29:47 Like there will just naturally be a point where like we cannot upgrade any of our dependencies.

29:52 Like, I don't know, NumPy for example.

29:54 Okay.

29:54 If we ever, you know, it is a good reason why we want to use a newer version of that.

29:59 We just can't.

30:00 And if everyone else drops it, we just have to be like, okay, that's it.

30:03 We can't.

30:04 And it's also not like the old versions are going away.

30:06 Like if we make sure we don't have any major bugs, like you can still use an old version

30:12 of spaCy and like, we're not going to take that away from you.

30:14 Right.

30:14 Just pin the version and you'll be good.

30:16 Yeah.

30:16 Very cool.

30:17 Very cool.

30:17 All right.

30:18 Well, that's it for all of our main items.

30:19 Brian, you want to, I know we've got a few little extra things just thrown here at the

30:22 end.

30:23 Do you want to kick it off?

30:23 Yeah.

30:23 So we had an email from Andre Janisch.

30:26 I think that's how you say his name.

30:27 Saying that we were, in one of our episodes, we talked about regular expressions taking down,

30:33 and I even forgot, took down something.

30:35 It was something major.

30:36 Yeah.

30:36 I can't remember, but some major cloud provider went down because of it.

30:40 Yeah.

30:40 And how that could happen.

30:41 So there's an interesting video talking about regular expression denial of service attacks

30:46 and how it happens.

30:48 It was just an interesting video.

30:49 We'll have a link to it in the show notes if anybody wants to watch.

30:52 Okay.

30:52 Yeah.

30:53 Yeah.

30:53 Super cool.

30:54 So I got a couple I want to throw out there.

30:55 One is if you're doing any work where you're working with like microservices or you want to

31:02 have some kind of application that's talking to some API endpoint, you want to debug it.

31:07 There's a new thing called HTTP toolkit, and it has like special Python support.

31:13 So this is like a proxy you can run on your computer and say, start recording, and it'll

31:19 start recording all the requests that you're making.

31:22 So it integrates with url.lib.request, url.lib2, request, pip, Python 2 and 3, photo, all those

31:29 things, and specifically catches traffic from those.

31:33 And it does interesting stuff by changing the Python path and environment variables.

31:39 And then all of these libraries that I talked about apparently respect certain proxy settings

31:44 and things you can set.

31:45 So you don't have to change your app at all.

31:47 You just start a terminal with HTTP toolkit and then run your code, and it can record it.

31:52 That's cool.

31:52 Yeah.

31:53 So if you're like, why is this crashing?

31:55 You know, I run my request thing, and then I get some kind of crash.

31:59 And how am I supposed to, you know, you don't have like the developer tools in your web browser

32:03 to like look at the headers and whatnot.

32:05 So you can do it with this.

32:07 It's pretty cool.

32:07 It's free, but there's also a paid version.

32:09 Just heads up.

32:10 Also, there's a nice little link.

32:13 Last time, time before, Brian, we were talking about magic, Python magic.

32:16 You're like, well, that's a pretty strong, strong name to be magic.

32:22 And all it does is detect file types.

32:23 Right?

32:24 Do you remember that thing?

32:25 David Martinez said, well, the reason it's called magic is there's basically these magic

32:32 number signatures that appear at the beginning of files.

32:36 And that's like teaches you about the syntax.

32:39 So for example, if you had a SQLite file, it would start with 53, 51, 4C, 69, et cetera.

32:44 If you see those numbers up the front, that means SQLite, right?

32:47 So wait, did you actually know this by heart?

32:49 Or did you write this down?

32:50 No, no.

32:51 I don't know.

32:53 Oh, of course.

32:54 Yeah, I know that one.

32:55 Sure.

32:55 No, I just, I pulled it up.

32:56 Let's see.

32:58 Really quickly, Python 374 is out.

33:01 So I brew and upgraded my Python 37, which is how I'm getting on my machine.

33:06 And it's already on homebrew as well.

33:08 So that was like, I don't know, six, seven hours delay there.

33:11 It was really nice to see that come out real quick.

33:12 Or if you have pyenv, you can do pyenv.

33:15 I don't even know the command, but like install something and then...

33:18 Right away have it.

33:19 That's awesome.

33:20 Yeah.

33:20 And then the last one, I want to point out this thing called flying.

33:24 I just call it flying fractals.

33:26 This person put together this project using Py Wonderland.

33:31 And it automates some other libraries that are based on C.

33:35 So they made some really cool videos of like flying through three-dimensional, like animated

33:42 Mandelbrot sets and other kinds of stuff.

33:45 When I just watched it, I thought just, wow, this is super cool computational stuff over

33:50 here.

33:50 I've done a bunch of work in like complex dynamics and like trying to visualize that.

33:54 And like this kind of blew me away.

33:56 So if you care at all about that stuff, I think you'll just enjoy like a minute of that video.

33:59 Yeah.

34:00 Yeah.

34:00 So yeah, actually, you might have actually noticed that my voice still isn't like 100%

34:04 and it's a bit rough.

34:06 And that's because last weekend we had our very, very first conference here in Berlin called

34:11 spaCy IRL, like, you know, spaCy in real life.

34:14 So our community came together and I'm still like absolutely blown away.

34:18 Like there was, you know, the vibe was amazing.

34:20 We had like 200 people there and a lovely like old theater, 12 really amazing talks by people

34:26 from research, industry, community.

34:28 So it was really a lot of fun and all talks were recorded and we're currently uploading

34:33 them to our YouTube channel.

34:34 So probably by the time this airs, they might've already been released.

34:37 And yeah, it'd be a bunch of, yeah, if you're interested in natural language processing, spaCy,

34:42 those, the talks were really, really great.

34:43 And yeah, I hope you, you enjoy watching them.

34:46 That's awesome.

34:46 I'm so glad you're putting them on YouTube and wow, congratulations.

34:50 You must be just blown away at how awesome it is to put on a conference about your own stuff,

34:55 right?

34:55 And so many people came and the energy and like you could tell your voices.

34:58 And it's great.

34:59 I know that was like, yeah.

35:00 Wow.

35:01 And it was also, it's actually quite refreshing to organize your own conference.

35:04 Like, you know, we were like, okay, let's do all the things that we think a conference

35:08 should do and like try them out.

35:09 And it actually worked quite well.

35:11 Like, you know, only one track, for example, stuff like that.

35:13 No questions from the audience, much more social time, some, you know, really healthy food,

35:19 stuff like that.

35:20 So it was, it was, it was great.

35:21 Like my voice suffered a bit, but you know, we're giving a lot back to the community as

35:25 in the videos, photos, stuff.

35:27 So yeah.

35:28 Sounds great.

35:29 Awesome.

35:29 Congratulations.

35:30 A couple of episodes ago, Brian actually talked about FastAPI, not to be confused with fast

35:35 AI, which I talked about earlier.

35:37 So yeah, it's a great, very modern, cool Python library for FastAPIs.

35:42 And we're like, yeah, we've been big fans.

35:44 We've started switching all our APIs over.

35:46 The exciting news here is that their core developer, Sebastian Ramirez is actually going to join our

35:52 team here in Berlin.

35:53 This means like a lot of cool development for us, but also we obviously, since we love the

35:58 FastAPI project, we'll keep supporting that and we'll definitely give him enough time to

36:03 keep working on it.

36:03 That's great news.

36:04 We found him through the project.

36:05 We saw, oh, he's doing like some consulting work.

36:07 We're like, hey, we'd love to work with him.

36:09 And one thing came to another and now, yeah, he'd be part of our Explosion team, which is

36:13 still growing, by the way.

36:14 So we've been very lucky that we were able to work with more people and expand our team.

36:19 Yeah.

36:19 That's so cool that, yeah, that your business is growing.

36:22 And I guess it's worth pointing out that back on Talk Python To Me, we talked about Explosion

36:28 AI.

36:28 I interviewed you about building a software business.

36:31 So back on episode 202.

36:32 So, you know, this is just like more, more evidence that that's all good advice.

36:36 Thanks.

36:36 Yeah, super cool.

36:37 Well, great, great news.

36:39 I guess it's probably time for a joke here too.

36:42 A pie joke, if you will, maybe.

36:44 So I think this, I'll do the first one.

36:46 This one, I think came from pie jokes.

36:48 We'll see.

36:48 We're starting to run that well dry.

36:50 So people send in your jokes, please.

36:52 But a programmer walks into the bar and orders 1.38 root beers.

36:58 The bartender informs her that it's a root beer float.

37:02 She says, no, make it a double.

37:04 All right.

37:05 It's pretty bad type system.

37:06 Maybe it doesn't work so well in Python.

37:08 Like we don't care so much about types, but you know, still.

37:10 So it's all right.

37:11 I like it.

37:11 All right.

37:11 We got one more up here.

37:12 Who put this one in?

37:13 Brian?

37:13 Yeah.

37:14 So just last night I was researching for this podcast, writing notes for the other podcast,

37:19 and working on an open source project.

37:22 And I came up with this.

37:24 What do you call a developer without a side project?

37:27 What's that?

37:27 Well rested.

37:28 Yeah, that's true.

37:28 That's totally true.

37:29 It's almost a bit sad in that sense.

37:31 Yeah.

37:35 I know it's too real.

37:36 Too real.

37:37 I personally shouldn't be complaining.

37:40 That's right.

37:42 But you know, there are some comments we could make here about like, you know, the culture

37:45 and like what's expected of developers these days and how that's maybe not ideal.

37:49 You know, sleeping enough, stuff like that.

37:51 Oh, definitely.

37:52 People should be.

37:53 And I was, it's mostly a self-reflection.

37:56 No, I mean, it is a good joke.

37:57 I'm not, I wasn't criticizing your joke.

37:59 It's a totally fine joke.

38:00 I'm just saying it's also, it's very real.

38:02 You know how jokes can be like too real and then you're like, oh.

38:05 Yeah, I'm really uncomfortable now.

38:09 Exactly.

38:11 All right.

38:12 Well, I think that's a good place to leave it.

38:13 Brian, thank you as always.

38:14 Thank you.

38:15 It was great to have you here.

38:16 Thanks for coming.

38:16 Yeah, thanks.

38:17 Bye.

38:17 Thank you for listening to Python Bytes.

38:19 Follow the show on Twitter via at Python Bytes.

38:22 That's Python Bytes as in B-Y-T-E-S.

38:25 And get the full show notes at Pythonbytes.fm.

38:28 If you have a news item you want featured, just visit Pythonbytes.fm and send it our way.

38:32 We're always on the lookout for sharing something cool.

38:35 On behalf of myself and Brian Okken, this is Michael Kennedy.

38:38 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript