#197: Structured concurrency in Python

Published Sat, Sep 5, 2020, recorded Thu, Aug 27, 2020

Sponsored by us! Support our work through:

Our courses at Talk Python Training
Test & Code Podcast

Michael #1: Structured concurrency in Python with AnyIO

AnyIO is a Python library providing structured concurrency primitives on top of asyncio.
Structured concurrency is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by using a structured approach to concurrent programming. The core concept is the encapsulation of concurrent threads of execution (here encompassing kernel and userland threads and processes) by way of control flow constructs that have clear entry and exit points and that ensure all spawned threads have completed before exit. — Wikipedia
The best overview is Notes on structured concurrency by Nathaniel Smith (or his video if you prefer).
Python has three well-known concurrency libraries built around the async/await syntax: asyncio, Curio, and Trio. (WHERE IS unsync?!?! 🙂 )
Since it's the default, the overwhelming majority of async applications and libraries are written with asyncio.
The second and third are attempts to improve on asyncio, by David Beazley and Nathaniel Smith respectively
The AnyIO library by Alex Grönholm describes itself as follows: > an asynchronous compatibility API that allows applications and libraries written against it to run unmodified on asyncio, curio and trio.

Example:

    import anyio

    async def task(n):
        await anyio.sleep(n)

    async def main():
        try:
            async with anyio.create_task_group() as tg:
                await tg.spawn(task, 1)
                await tg.spawn(task, 2)
        finally:
            # e.g. release locks
            print('cleanup')

    anyio.run(main)

AnyIO also provides other primitives to replace the native asyncio ones if you want to benefit from structured concurrency's cancellation semantics:
Synchronisation primitives (locks, events, conditions)
Streams (similar to queues)
Timeouts (e.g. [move_on_after](https://anyio.readthedocs.io/en/latest/api.html#timeouts-and-cancellation), [fail_after](https://anyio.readthedocs.io/en/latest/api.html#timeouts-and-cancellation))
... and more

Brian #2: The Consortium for Python Data API Standards

One unintended consequence of the advances in multiple frameworks for data science, machine learning, deep learning and numerical computing is fragmentation and differences in common function signatures.
The Consortium for Python Data API Standards aims to tackle this fragmentation by developing API standards for arrays (a.k.a. tensors) and dataframes.
They intend to work with library maintainers and the community and have a review process.

One example of the problem, “mean”. Five different interfaces over 8 frameworks:

    numpy:         mean(a, axis=None, dtype=None, out=None, keepdims=&lt;no value>)
    dask.array:    mean(a, axis=None, dtype=None, out=None, keepdims=&lt;no value>)
    cupy:          mean(a, axis=None, dtype=None, out=None, keepdims=False)
    jax.numpy:     mean(a, axis=None, dtype=None, out=None, keepdims=False)
    mxnet.np:      mean(a, axis=None, dtype=None, out=None, keepdims=False)
    sparse:        s.mean(axis=None, keepdims=False, dtype=None, out=None)
    torch:         mean(input, dim, keepdim=False, out=None)
    tensorflow:    reduce_mean(input_tensor, axis=None, keepdims=None, name=None,   
                               reduction_indices=None, keep_dims=None)

They are going to start with array API
Then dataframes
Also, it’s happening fast, hoping to make traction in next few months.

Michael #3: Ask for Forgiveness or Look Before You Leap?

via PyCoders
Think C++ style vs Python style of error handling
Or any exception-first/only language vs. some hybrid thing
If you “look before you leap”, you first check if everything is set correctly, then you perform an action.

Example:

    from pathlib import Path
    if Path("/path/to/file").exists():
        ...

With “ask for forgiveness,” you don’t check anything. You perform whatever action you want, but you wrap it in a try/catch block.

    try:
        with open("path/to/file.txt", "r") as input_file:
            return input_file.read()
    except IOError:
        # Handle the error or just ignore it

Their example, “Look before you leap” is around 30% slower (155/118≈1.314). Testing for subclass basically with no errors
But if there are errors: The tables have turned. “Ask for forgiveness” is now over four times as slow as “Look before you leap” (562/135≈4.163). That’s because this time, our code throws an exception. And handling exceptions is expensive.
If you expect your code to fail often, then “Look before you leap” might be much faster.
Michael’s counter example: gist.github.com/mikeckennedy/00828db1d49d2cd2dac8fa0295e54c23

Brian #4: myrepos

“You have a lot of version control repositories. Sometimes you want to update them all at once. Or push out all your local changes. You use special command lines in some repositories to implement specific workflows. Myrepos provides a mr command, which is a tool to manage all your version control repositories.”
Run mr register for all repos under a shared directory.
Then be able to do common operations on a subtree of repos, like mr status, mr update, mr diff, or really anything.
See also: Maintaining Multiple Python Projects With myrepos - Adam Johnson

Michael #5: A deep dive into the official Docker image for Python

by Itamar Turner-Trauring, via PyCoders
Wait, there’s an official Docker image for Python
The base image is Debian GNU/Linux 10, the current stable release of the Debian distribution, also known as Buster because Debian names all their releases after characters from Toy Story
Next, environment variables are added: ENV PATH /usr/local/bin:$PATH
Next, the locale is set: ENV LANG C.UTF-8
There’s also an environment variable that tells you the current Python version: ENV PYTHON_VERSION 3.8.5
In order to run, Python needs some additional packages (the dreaded certificates, etc)
Next, a compiler toolchain is installed, Python source code is downloaded, Python is compiled, and then the unneeded Debian packages are uninstalled. Interestingly, The packages—gcc and so on—needed to compile Python are removed once they are no longer needed.
Next, /usr/local/bin/python3 gets an alias /usr/local/bin/python, so you can call it either way
the Dockerfile makes sure to include that newer pip

Finally, the Dockerfile specifices the entrypoint: CMD ["python3"] Means docker run launches into the REPL:

    $ docker run -it python:3.8-slim-buster
    Python 3.8.5 (default, Aug  4 2020, 16:24:08)
    [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>>

Brian #6: “Only in a Pandemic” section nannernest: Optimal Peanut Butter and Banana Sandwiches

Ethan Rosenthal
Computer vision, deep learning, machine learning, and Python come together to make sandwiches.
Just a really fun read about problems called “nesting” or “packing” and how to apply it to banana slices and bread.

Extras:

Brian:

Patreon link

Michael:

Sign up for the free Excel to Python webcast on Sept 29.
Check out the early access version of the memory course.

Joke

via Eduardo Orochena

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 197, recorded August 26th, 2020. Brian, can you believe it's

00:12 the end of August? Even if I can't say it, it still is true?

00:15 No, I can't. I don't know where August went. I just don't.

00:18 I thought this whole pandemic thing would make the summer seem long and slow. It seems like it

00:23 just went faster.

00:23 Yeah, I've got like a Lego kit that I was planning on doing like the first week of summer vacation,

00:28 and it's still sitting here. So yeah, for sure. Yeah, there's a lot of things I want to get done

00:33 before the sun goes away and rain starts for six months straight. That's a Pacific Northwest problem,

00:38 but it's our problem. All right. Now this episode is brought to you by us as well. We'll tell you

00:42 more about the things that we're doing that we think you will appreciate later. Right now,

00:46 I want to talk about something that I think we might've covered before, but I don't know if we've

00:50 ever satisfactorily covered it. Maybe this time we'll get a little closer and that's AsyncIO.

00:55 Oh yeah, I think that's a new topic.

00:56 It's a totally new topic. Covered only less than GUIs. No. So there's a new, how should I put it,

01:05 a new compatibility-like layer library that allows you to work a little bit better with AsyncIO

01:14 and some of the other Async libraries that are not directly immediately the same as or built right on top of AsyncIO.

01:23 Curio from David Beasley and Trio from Nathaniel Smith. So there's an article that talks about this. I'm going to mention as part of this conversation.

01:32 And then say, Hey, Python has three well-known concurrency libraries built around Async and await syntax. AsyncIO, Curio, and Trio.

01:40 True. But where's Unsync, people? Unsync is the best of all four of those. I don't know where Unsync is.

01:46 Anyway, Unsync is not part of this conversation, but Unsync plays a role a little bit like this thing I'm going to mention today is AnyIO.

01:56 And it's a pretty clever name because the idea is that it provides structured concurrency primitives built on top of AsyncIO.

02:03 Okay.

02:04 Right? So one of the challenges with AsyncIO is you can kick off a bunch of tasks and then not wait for them.

02:10 And your program can exit or you can do other things.

02:12 And maybe you've seen runtime warnings like task such and such was never awaited.

02:17 You're like, Hmm, I wonder what that means.

02:19 Well, that probably means your program exited while it was halfway done or something like that. Right?

02:24 Or your thing returned a value before it waited for it to finish. Right?

02:28 And at the low level, something that's a little bit frustrating or annoying that you've got to deal with is that you've got to make sure that all the stuff you started on the Async event loop,

02:36 that you wait for that event loop to finish before your program completely shuts down or completely carries on.

02:43 And so that's basically the idea of this library.

02:46 It's a compatibility layer across those three types, those three different well-known concurrency libraries that provides this structured concurrency.

02:55 So you look at Wikipedia, they say structured concurrency is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by using a structured approach to concurrent programming.

03:08 The core concept is encapsulations of threads of execution by way of control flow constructs that have a clear entry and exit points.

03:17 In Python, this mostly manifests itself through this library as async with blocks or async context managers.

03:28 So you're like, I'm going to do some async work.

03:30 So let's create a width block, do all the work in there.

03:32 And then by the way, when you leave the width block, it's going to have made sure all the tasks that were started and the tasks started by those tasks and so on all finished.

03:41 Oh, that's nice.

03:42 Yeah, that's pretty cool.

03:44 So the way it works is you basically go anyio.createTaskGroup and then from the task group, you can spawn other subtasks and it will keep track of those.

03:55 If there's an exception, I believe it will cancel the other undone ones, the unfinished ones and so on.

04:00 So it's about saying we're just going to go through this thing and it's all going to run here and like it enters at the top and it exits at the bottom of the width block.

04:08 Okay.

04:09 That's pretty cool, right?

04:10 Yeah.

04:10 So I think that that's pretty neat.

04:12 Also has other primitives.

04:13 So that's like a real simple example.

04:15 Other example or other things it does include synchronization, primitives, locks.

04:20 So if you create a reentrant lock in Python, often called a critical section and things like C++ and whatnot, it's never, ever going to help you.

04:30 Well, maybe that's a little bit strong.

04:32 It's likely not going to help you because those mechanisms come from the operating system process level.

04:39 And what they do is they make sure two threads don't run at the same time.

04:42 Well, with asyncio, it's all a bunch of stuff that's being broken apart on a single thread, right?

04:49 It's all on the one, wherever the event loop.run is, run to complete or whatever, like wherever that's happening, that's the thread.

04:56 So like the thread locks don't matter.

04:57 It's all the same thread.

04:59 Like you're not going to block anything.

05:00 So having primitives that will kind of function like threads to protect data while stuff is happening, while it's in temporarily invalid states, that's pretty cool for asyncio.

05:10 Okay.

05:10 So you need it or you don't need it?

05:12 You probably need it.

05:13 I think people often don't really think too much about these invalid states that programs get into.

05:19 And you think, well, asyncio, it's going to be fine.

05:21 And a lot of times what you're doing with asyncio is kind of standalone.

05:26 Like I'm going to kick off this thing.

05:28 And when it comes back, I'm going to take the data and do something.

05:30 But if you're modifying shared data structures, you could still end up in some kind of event loop, erase condition.

05:36 It's not as bad as like true threading because you're not going to, it's, I don't believe it's like a plus equals, right?

05:43 Of something that actually might be multiple steps at the lower level runtime.

05:47 I don't think that it would get broken up to that fine grained.

05:50 But if you say like debit this account, this amount of money, or await, debit this account, this amount of money, await, put that amount into the other one.

05:59 And some other one is like reading in some kind of loop, like that level of higher order, like temporarily invalid state.

06:05 That could be a problem for asyncio and you want some kind of lock.

06:09 So this comes with that, it comes with streams, which are similar to queues, timeouts through things like move on after or fail after a certain amount of time and so on.

06:18 So it's pretty cool little library.

06:19 Yeah, that's nice.

06:20 Nice.

06:20 My vote still for unsync is the best of the four, even though it was unmentioned.

06:24 Isn't unsync built on those also?

06:28 It's a compatibility layer that takes asyncio, threading, and multiprocessing and turns them all into things that you can await.

06:35 Oh, yeah.

06:36 Yeah.

06:36 So don't you think there should be like a standard, like a, they should get together like some consortium and have a standard about this?

06:42 Yeah.

06:43 Well, they probably should, but we're still in the early stages of figuring out what the right API is.

06:48 That's right.

06:50 That's why they haven't done it.

06:51 There's something else that has, that could use some standards and that's in a lot of data science libraries.

06:58 There's an announcement that there's a new consortium for Python data API standards.

07:03 So there is one happening and it's happening actually quite fast.

07:06 They're getting started right away and there's activities to the announcements right away.

07:13 Then in September, I believe they're going to kick off some work on data frames or on, no, starting with arrays and then move on to data frames.

07:23 And so, okay, I'm getting ahead of myself.

07:25 There are little blurbs says, one of the unintended consequences of the advances in multiple frameworks for data science, machine learning, deep learning, and numerical computing is that there is fragmentation.

07:37 And in using the tools and then there are differences in common function signatures.

07:43 They have one example that shows what the, generally a mean function to get the average or mean, people are going to like flame me for calling average mean, but as a commoner, I kind of think of those the same thing.

07:58 But anyway, they show eight different, frameworks then, and some of them are common with other ones.

08:04 And so there's five different interfaces for over the eight frameworks for just the mean function for an array.

08:09 Yeah.

08:10 And what's crazy is like, they all are basically the same.

08:12 They're so, so similar, but they're not the same, not code wise the same, but they might as well be.

08:18 Yeah.

08:18 And so one of the issues is there's people are using more than one framework for different parts of their, maybe different parts of their data flow.

08:27 And sometimes you can kind of forget which one you're using and having a lot of these things common actually would just make life easier, I think.

08:37 So I think, I don't know how far they'll get with this, but I think it's a really, so they're not trying to make all of these, these frameworks look exactly the same, but with, commonalities in arrays and data frames.

08:49 Or, and they note that arrays are also called tensors.

08:53 So those are, trying to make some of those common is, I think a really good idea for some of the easy, simple stuff.

09:01 why not?

09:02 It seems like a great idea.

09:03 It seems like a huge challenge though.

09:05 Like who's going to give, whose function is going to be the one that's like, yeah, we're dropping this part of our API to make it look like everyone else's.

09:12 Right.

09:12 And that's why I think that they've, they've went through a lot of thought on how to go about with this process and try to convince people.

09:19 So they're working with, they're trying to kind of be in between the framework authors and maintainers and the community and try to do some, some review process for different APIs, put a proposal out, have feedback from both from, from the different projects and from the community to have, have more of a, you know, more input to try to make it.

09:43 It isn't just like one set of people saying, Hey, I think this should be this way.

09:47 Yeah, no, it's, it's a good idea.

09:49 It would be great if a lot of these applications or these frameworks may be renamed.

09:53 If it's the same function, if it's like, for instance, mean in this example, if it's spelled exactly the same, maybe it should be the same API.

10:01 And if you want a special version of it, maybe have a, have a underscore with an extra, you know, some reason why it's different.

10:08 you can have extra different functions.

10:10 Yeah.

10:11 It seems like you could find some pretty good common ground here.

10:13 It's a good idea.

10:14 And if they make it happen, you know, it'd just be easier to mix and match frameworks and use the best or different situations.

10:21 Cause I can certainly see you're like, Oh, I'm working with pandas here.

10:24 It would be great if I could do this on CUDA cores with QPy, but I don't really know that it's close, but it's not the same.

10:31 So I'm just going to keep stroking along here as opposed to change the import statement.

10:35 Now it runs there.

10:36 Yep.

10:36 I don't know if it's ever really going to be like, you can just swap out a different framework, but for some of the common stuff, it'd really be great.

10:43 And that's why one of the reasons why we're bringing it up is so that people can get on board and start being part of this review process if they care about it.

10:50 Yeah.

10:50 It also seems like there might be some room for like adaptive layers, like from QPy import pandas layer or something like that, where it basically, you talk to the, in terms of say a pandas API and it converts it to its internal.

11:04 It's like, Oh, these, these arguments are switched in order or this keyword is named differently or whatever.

11:09 And there's even things like differences.

11:11 And even if the API looks the same or it's very similar, the default might be like in some cases, the default might be none versus false or versus no value or things.

11:22 I don't know what no value means, but anyway.

11:25 Yep.

11:26 Cool.

11:27 That's a good one.

11:28 Now, also good is the things that we're working on.

11:31 Brian, you want to tell folks about our Patreon?

11:34 Actually, we've kind of silently announced it a while ago, but we've got 47 patrons now and it's set up for a monthly contribution and we, everything really appreciate people helping out because there are some expenses with the show.

11:48 So that's a really cool.

11:50 We'd love to see that grow.

11:51 I don't, we'd also like to hear from people about how we'd like to come up with some special thank you benefits for patrons.

11:57 And so I'd like to have ideas come from the community.

12:00 If you can come up with some ideas, we will think about it.

12:04 Yeah.

12:04 So, and I'm trying to figure out how to get to it.

12:06 So on our Python bytes.

12:08 If you're on any episode page, it's there on the right.

12:11 Okay.

12:11 If you go to an episode page.

12:13 Got it.

12:13 Yep.

12:13 Then it says on the right, I believe somewhere it says sponsors on, off the double check.

12:19 I believe it does.

12:20 Okay.

12:20 We'll double check.

12:21 It can for sure.

12:23 If it doesn't already.

12:24 And also, I want to just tell folks about a couple of things going on over at Talk Python

12:29 training.

12:30 We're doing a webcast on helping people move from using Excel for all their data analysis

12:35 to pandas, basically moving from Excel to the Python data science stack, which has all sorts

12:41 of cool benefits and really neat things you can do there.

12:43 So Chris Moffitt is going to come on and writing a course with us and he's going to do a webcast,

12:48 which I announced it like, well, yeah.

12:51 It's 15 hours ago and already has like 600 people signed up for it.

12:54 So it's free.

12:55 People can just come sign up.

12:56 It happens late September, September 29th.

12:59 I'll put the link at the extra section of the show notes so people can find it there.

13:03 And also the Python memory management course is out for early access.

13:07 A bunch of people are signing up and enjoying it.

13:08 So if you want to get to it soon, get to it early, people can check that out as well.

13:13 Very exciting.

13:14 So this next one I want to talk about has to do with manners.

13:17 What kind of developer are you?

13:19 Are you a polite developer?

13:21 You're talking to the framework.

13:22 Are you always checking in with it to see how it feels, what you're allowed to do?

13:26 Are you kind of a rebel?

13:27 You're just going to do what you like.

13:29 But every now and then you get smacked down by the framework with an exception.

13:33 I don't want to describe how a developer I am because I don't want the explicit tag on this episode.

13:40 So there's an article that talks about something I think is pretty fun and interesting to consider.

13:46 And it talks about the two types of error handling patterns or mechanisms that you might use when you're writing code.

13:55 And Python naturally leans towards one.

13:57 But there might be times you don't want to use it.

14:00 And that is it's the two patterns are it's easier to ask for forgiveness than permission.

14:05 That's one.

14:07 And the other one is look before you leap or please may I.

14:10 All right.

14:12 And with the look before you leap, it's a lot of checks, like something you might do in C code.

14:18 So you would say, I'm going to create a file.

14:21 Oh, does the folder exist?

14:23 If the folder doesn't exist, I'm going to need to create the folder.

14:27 And then I can put the file there.

14:29 Do I have permission to write the file?

14:30 Yes.

14:31 Okay.

14:31 Then I'll go ahead and write the file.

14:32 Right.

14:33 You're always checking if I can do this, if this is in the right state and so on.

14:37 That's the look before you leap style.

14:39 The ask for forgiveness style is just try with open this thing.

14:45 Oh, that didn't work.

14:46 Catch exception.

14:48 Right.

14:48 Except some IO error or something like that.

14:51 So there's reasons you might want to use both.

14:54 Python leans or nudges you towards the ask for forgiveness.

14:58 Try except version.

15:00 The reason is, let's say you're opening a file and it's a JSON file.

15:04 You might check first.

15:05 Does the file exist?

15:07 Yes.

15:07 Do I have permission to read it?

15:08 Yes.

15:09 Okay.

15:09 Open the file.

15:10 Well, guess what?

15:11 What if the file's malformed and you try to feed it over to like JSON load and you give

15:16 it the file pointer?

15:17 It's not going to say, sorry, it's malformed.

15:20 It's going to raise an exception.

15:21 It's not going to return it like a value, like malformed, constant, weird thing.

15:25 It's just going to throw an exception and say, you know, invalid thing on line seven or

15:29 whatever.

15:29 Right.

15:30 And so what that means is even if you wanted to do the look before you leap, you probably can't

15:35 test everything and you're going to end up in a situation where you're still going to

15:39 have to have the try except block anyway.

15:41 So maybe you should just always do that.

15:44 Right.

15:45 Maybe you should just go, well, we're going to have to have exception handling anyway.

15:48 That's just, we're going to do exception handling as much as possible and not do these tests.

15:52 So that's the, this article over here.

15:55 It's on the switwoski.com.

15:58 Oh yeah.

16:00 It's on Sebastian.

16:01 Widooski.

16:02 So yeah, it's his, I didn't realize that it was his article.

16:06 So it's, his article.

16:09 Anyway, he talks about like, what is the relative performance of these things and tries to talk

16:15 about it from a, well, sure.

16:18 It's cool to think of how it looks in code, but is one faster or one slower than the other?

16:22 Okay.

16:22 And this actually came up on talk Python as well.

16:25 And so I said, look, if we're going to come up with an example, let's have a class and a

16:30 base class.

16:31 And let's have the base class define an attribute.

16:34 And sometimes let's try to access the attribute.

16:36 And when you don't have the base class, it'll, or when you only have the base class, it'll

16:41 crash, right?

16:41 Cause it's in the derived class.

16:42 So let's say we have two ways to test.

16:45 We could either ask, does it have the attribute and then try to access it?

16:49 Or we could just right to access it.

16:52 And it says, well, look, if it, if it works all the time and you're not actually getting

16:56 errors and you're doing this, it's 30% slower to do the look before you leap.

17:00 Cause you're doing an extra test and basically the try accept block is more or less free.

17:05 Like it doesn't cost anything if there's not actually an error, but if you turn it around

17:11 and you say, no, it's not there.

17:14 All of a sudden it turns out the ask the try accept block is four times slower.

17:20 That's a lot slower.

17:21 Oh really?

17:22 Because the raising of the exception, figuring out this call stack, all that kind of stuff

17:27 is expensive.

17:28 So instead of just going, does it have the attribute?

17:30 You're going, well, let's do the whole call stack thing, every error, right?

17:34 And create an object and throw it and all that kind of stuff.

17:37 So it's a lot slower when there are errors.

17:39 And anyway, it's a, an interesting thing to consider if you care about performance and things

17:46 like parsing integers or parsing data that might sometimes fail, might not, you know, sometimes

17:51 it doesn't fail.

17:51 Yeah.

17:52 Okay.

17:52 Devil's advocate here.

17:54 His example doesn't have any activity in the ask for forgiveness.

17:59 If it isn't there.

18:01 That's the way I saw when I first read it as well.

18:02 There's two sections.

18:04 There's like one part where he says, let's do it with the attribute on the drive class

18:08 and let's do it again a second time by taking away the attribute and seeing what it's like.

18:13 Right.

18:13 But I mean, the code that if it, if it isn't exist, it just doesn't do anything.

18:17 Right.

18:17 Whereas in reality, you're still going to have to do something.

18:20 Yeah.

18:20 You got to do something.

18:21 Notify the user.

18:21 It's wrong.

18:22 Yeah.

18:22 Whatever.

18:22 Yeah.

18:23 Okay.

18:23 Yeah.

18:23 For sure.

18:23 That's a good point.

18:24 Like it's just basically a try except pass.

18:26 Yeah.

18:27 So what do you think about this?

18:28 So what I think is you're going to have to write the try except anyway, almost all the

18:35 time.

18:35 And you don't want both.

18:38 Like that doesn't seem good.

18:39 That seems like just extra complexity.

18:42 So when it makes sense, just go with ask for forgiveness.

18:46 Just embrace exceptions.

18:47 Right.

18:48 Remember you have a finally block that often can like get rid of a test as well.

18:52 You have multiple types of error except clauses are based on error type.

18:57 I think people should do a lot with that.

19:00 That said, if your goal is to like parse specific data, right?

19:04 Like I'm going to read this number I got off by off of the internet by web scraping and there's

19:09 a million records here.

19:10 I'm going to parse it.

19:11 If you want to do that a lot, a lot faster, that might make a lot of sense.

19:15 I actually have a gist example that I put up trying to compare the speed of these things

19:20 in a mixed case.

19:22 So like the cases we're looking at here are kind of strange because it's like, well,

19:26 there's, it's all errors or it's zero errors.

19:29 Right.

19:29 And then it doesn't really do anything, which are both weird.

19:31 So I have this one where it comes up with like a million records strings.

19:35 And most of the time they're number, they're legitimate numbers, like 4.2 as a string.

19:40 And then you can parse it.

19:42 And what I found was if you have more than 4% errors, I think it was four, like 4.5% or

19:48 something errors, Aaron's data, it's slower to use exceptions.

19:53 Okay.

19:54 The cutoff is 4% errors.

19:55 And I think if you have more than 4% errors, then the exceptions become more expensive.

19:58 That's right.

19:59 Anyway, it's something that people can run and get real numbers out of and play with it in

20:02 a slightly more concrete way.

20:04 But I don't know.

20:05 What do you think?

20:06 I think you start out by focusing on the code and making it easy and clear to understand

20:12 and then worry about this stuff.

20:13 Yeah.

20:13 So I don't actually put either.

20:15 I don't usually do the asking or the checking stuff.

20:18 And that is one of the things that's good about bringing this up is that is more common in

20:23 Python code is to not check stuff, just to, you know, to just go ahead and do it.

20:28 And then I write a lot of tests.

20:30 So I write a lot of tests around things.

20:32 Yeah.

20:33 And so either case checking for things or like, for instance, if it is, if it is input,

20:39 if I've got user input, I'm checking for things.

20:41 Yeah.

20:41 I'm going to do it checks ahead of time because I want, because the behavior of what happens

20:45 when it isn't there or when there's a problem, it isn't really a problem.

20:50 It needs to be designed into the system as to what behavior to do when something unexpected

20:55 happens.

20:55 But the, in normal code, like, well, what happens if there's not an attribute?

21:00 Well, you shouldn't be in that situation, right?

21:03 You shouldn't be in that situation.

21:04 And I usually push it up higher.

21:05 I don't have try accept blocks all over the place.

21:08 I have them around APIs that might not be trustworthy or around external systems or something.

21:16 I don't put try accept blocks around code that I'm calling on my own code.

21:19 Things like that.

21:20 Yeah.

21:21 I'm with you on that.

21:22 That makes a lot of sense.

21:23 The one time that I'll do the test, the look before you leave style, is if I think I can

21:27 fix it, right?

21:28 Does this directory not exist?

21:30 I'm going to write a file to it.

21:31 Well, I'm just going to make the directory.

21:32 Then I'm going to write to it, you know?

21:35 Those kinds of tests can get you out of trouble.

21:37 But if you're just going to say this didn't work, chances are, you know, you still need

21:41 the error handling and exception format anyway.

21:44 Yeah.

21:44 And you're probably going to throw an exception.

21:45 So, anyway.

21:46 Cool.

21:47 So, you probably should get your code right, test it, and then just stick it in GitHub.

21:52 Get in your repository.

21:54 Make sure it's all up to date, right?

21:56 Oh, I was wondering how you were going to do that transition.

21:58 So, yeah.

21:59 That's good.

21:59 I was following a discussion on Twitter, and I think, actually, I think Anthony Shaw may

22:05 have started it, but I can't remember.

22:06 But dealing with different, if you've got a lot of repositories, just sometimes you have

22:11 a lot of maintenance to do or a little, you know, some common things you're doing for

22:16 a whole bunch of repos.

22:17 And there's lots of different reasons why that might be the case or related tools or

22:24 maybe just your work.

22:25 You've got a lot of repos.

22:26 But there's a project that came up in this discussion that I hadn't really played with

22:31 before, and it's a project called My Repos.

22:33 And on the site, it says you've got a lot of version control repositories.

22:38 Sometimes you want to update them all at once or push out all your local changes.

22:43 You may use special command lines in some repos to implement specific workflows.

22:48 Well, the My Repos project provides an MR command, which is a tool to manage all your version

22:54 control repositories.

22:56 And the way it works is it's on directory structures.

22:59 So it's a, and I usually have all of my repos that I'm working with under a common, like,

23:05 projects directory or something so that I know where to look.

23:08 And so I'm already set up for something like this might work.

23:12 And you go into, into one of your repos and you type, if you have this installed, you type

23:17 MR register.

23:18 And it registers this under, registers that repo for common commands.

23:24 And then whether you're in a parent directory or one of the specific directories and type

23:30 a command, like for instance, if you say MR status, it'll do status on all of the repos

23:36 that you care about or update or diff or something like that.

23:40 And then you can build up even more complex commands yourself to do more complicated things.

23:47 But I would, I mean, I'm probably going to use it right away just for just checking the

23:51 status or doing polls or updates or something like that on, on lots of repos.

23:56 So this looks neat.

23:57 Yeah, it looks neat.

23:57 I like the idea a lot.

23:58 So basically I'm the same as you.

24:01 I've got a directory, maybe a couple of levels, but all of my GitHub repos go in there, right?

24:08 I grouped them by like personal stuff or work stuff.

24:10 But other than that, they're just all next to each other.

24:13 And this would just let you say, go do a Git poll on all of them.

24:17 That's great.

24:17 Yeah.

24:17 Or like, for instance, at work, I've got often like three or four different related repos

24:22 that if I switch to another project that I'm working on, I need to go through and make sure

24:27 I'm not sure what branch I'm using or if everything's up to date.

24:31 So being able to just go through all, like even two or three, being able to go and update

24:37 them all at once or just even check the status of all, it'll save time.

24:41 And then for the show, at least somebody that interviewed for a testing code, at least,

24:47 Adam Johnson wrote an article called maintaining multiple Python projects with my repos.

24:53 And we'll link to his article in the show notes.

24:55 Yeah.

24:56 Perfect.

24:56 I like this idea enough that I wrote something like that already.

24:59 You did.

24:59 Well, what I wrote is something that will, it'll go and actually synchronize my GitHub account

25:05 with a folder structure on my computer.

25:09 So I'll go and just say like repo sync or whatever I called it.

25:15 And it'll use the GitHub API to go and figure out all the repos that I've cloned or created

25:21 and the different organizations like talk Python organization versus my personal one.

25:26 And then it'll create folders based on the organization or where I forked it from and then

25:29 clone it.

25:30 And if it's already there, it'll update it within.

25:32 It'll like basically pull all this down.

25:34 Oh, that's cool.

25:34 I need that.

25:35 It was a lot of work.

25:37 This seems like it's pre-built and pretty close.

25:39 So it looks pretty nice.

25:41 The one thing it doesn't do is it doesn't look like it doesn't go to GitHub and say, oh,

25:44 what other repos have you created that you maybe don't have here?

25:47 Maybe you want that.

25:48 Maybe you don't.

25:49 If you've like forked Windows source code and it's like 50 gigs, you don't want this tool

25:53 that I'm talking about.

25:54 But if you have reasonable size things like I forked Linux.

25:58 Okay, great.

25:58 That's going to take a while.

25:59 But normally it would be I think it would be pretty neat.

26:03 Yeah.

26:03 Another thing that's neat around managing these types of things is Docker.

26:06 And did you know that Python has an official Docker image?

26:10 I did not.

26:11 I didn't either.

26:11 Well, I recently heard that, but it's fairly new news to me that there is an official Docker

26:17 Python image.

26:18 So theoretically, if you want to work with some kind of Linux Docker machine that uses Python,

26:24 you can go and Docker run or to create the Python one.

26:29 Right.

26:29 So it's not super surprising.

26:32 It's just called Python.

26:34 Right.

26:35 But it's yeah, it's just called Python.

26:37 That's it.

26:37 I believe so.

26:38 Pretty straightforward working with it.

26:40 But I'm going to talk about like basically looking through that Docker, that official Docker

26:47 image.

26:48 So Itamar Turner Trouring, who was on Talk Python not long ago, talking about Phil.

26:53 And we also talked about Phil and Python bytes, the data science focused memory tool.

26:57 He wrote an article called a deep dive into the official Docker image for Python.

27:02 So basically it's like, well, if there's an official Docker image for Python, what is it?

27:09 How do you set it up?

27:10 Because understanding how it's set up is basically how do you take a machine that has no Python

27:15 whatsoever and configure it in a Python way?

27:18 So this is using Debian.

27:20 That's just what it's based on.

27:22 And it's using the Buster version because apparently Debian names all their releases after characters

27:28 from Toy Story.

27:29 I didn't know that.

27:30 But yep.

27:31 Buster.

27:31 Buster is the current one.

27:34 So it's going to create a Docker image.

27:36 You create the Docker file.

27:37 You say this Docker image is based on some other foundational one.

27:42 So Debian Buster.

27:43 And then it sets up slash user slash local slash bin for the environmental path.

27:50 Because that is the first thing in the path.

27:52 Because that's where it's going to put Python.

27:54 It sets the locale explicitly to the EMV language is to UTF-8.

28:01 There's some debate about whether this is actually necessary because current Python also defaults

28:05 UTF-8.

28:06 But, you know, here it is.

28:08 And then it also sets an environment variable Python underscore version to whatever the Python

28:13 version is.

28:14 Right now it's 385.

28:15 But whatever it is, that's kind of cool.

28:17 So you can ask, hey, what version is in this system without actually touching Python?

28:22 That's cool.

28:23 And then it has to do a few things like register the CA certificates.

28:27 Like, I've had people sending me messages that are taking courses.

28:32 And they're trying to run the code from, you know, something that talks to requests,

28:36 whether it's SSL certificate endpoint, HTTPS endpoint.

28:41 And they'll say, this thing says the certificate is invalid.

28:45 I'm like, the certificate's not invalid.

28:47 What's going on here?

28:48 Right?

28:48 And almost always, something about the way that Python got set up on their machine didn't

28:53 run the create certificate command.

28:55 So there's like this step where Python will go download all the major certificate authorities

29:01 and like trust them in the system.

29:02 So that happens next.

29:03 And then it actually will set up things like GCC and whatnot.

29:07 So it can compile it.

29:09 It's interesting.

29:10 Downloads the source code, compiles it.

29:13 But then what's interesting is it uninstalls the compiler tools.

29:17 It's like, okay, we're going to download Python and we're going to compile it.

29:21 But you didn't explicitly ask for GCC.

29:23 We just needed it.

29:24 So those are gone.

29:25 Right?

29:26 Cleans up the PYC files and all those kinds of things.

29:29 And then it gives an alias to say that Python 3 is the same as Python.

29:34 Like the command, you could do it without the 3.

29:36 Another thing that we've gone on about that's annoying is like, I created a virtual environment.

29:41 Oh, it has the wrong version of pip.

29:43 Is my pip out of date?

29:44 Your pip's probably out of date.

29:45 Everyone's pip is out of date.

29:46 Unless you're like a rare, like two-week window where Python has been released at the same time

29:52 like the modern pip has been released.

29:54 So guess what?

29:56 They upgrade pip to the new version, which is cool.

29:59 And then finally it sets the entry point of the Docker container, which is the default command to do if you just say Docker run this image,

30:08 like Docker run Python 3.8-slim-buster.

30:12 If you just say that by itself, what program is going to run?

30:16 Because the way it works is it basically starts Linux and then runs one program.

30:19 And that program exits, the Docker container goes away.

30:22 And so it sets that to be the Python 3 command.

30:25 So basically, if you Docker run the Python Docker image, you're going to get just the REPL.

30:32 Interesting.

30:33 Yeah, you can always run it with different endpoints like Bash and then go in and like do stuff to it

30:37 or run it with MicroWSGI or Nginx or whatever.

30:40 But if you don't, you're just going to get Python 3 REPL.

30:44 Anyway, that's the way the official Python Docker image configures itself from a bare Debian buster

30:52 over to Python 3.

30:53 Neat.

30:54 Yeah, neat.

30:54 I thought it might be worth just thinking about like, what are all the steps?

30:57 And you know, how does that happen on your computer?

31:00 No, that's good.

31:01 Because yeah, I have been curious about that.

31:04 I was going to throw Python on a Docker image.

31:07 What does that get me?

31:08 Yeah, exactly.

31:09 And that's what it is.

31:10 Oh, you could also apt install Python 3 dash dev.

31:18 Yeah, that might be cheating.

31:19 All right.

31:19 What's this final one?

31:20 Oh, so it was recommended by, we covered some craziness that Anthony did an episode or two

31:27 ago.

31:27 And somebody commented that maybe we need to only in a pandemic section.

31:32 Yeah, that sounds fun.

31:33 So I selected Nanner Most.

31:36 No, sorry.

31:37 Nanner Nest.

31:38 It's an optimal peanut butter and banana sandwich placement.

31:42 So this is kind of an awesome article by Ethan Rosenthal.

31:46 Talks about during the pandemic, he's been sort of having trouble doing anything.

31:52 And so he really liked peanut butter and banana sandwiches when he was just still even.

31:58 He got picked this habit up from his grandfather, I think.

32:01 Anyway, this is using Python and computer vision and deep learning and machine learning and a

32:07 whole bunch of cool libraries to come up with the best packing algorithm for a particular

32:13 banana and the particular bread that you have.

32:16 So you take a picture that includes both the bread and the bananas or the banana you have.

32:22 And it will come up with the optimal slicing and placement of the banana for your banana

32:28 sandwich.

32:28 Wow, this is like a banana maximization optimization problem.

32:33 So if you want, you got to see the pictures to get this.

32:36 So like, if you're going to cut your banana into slices, and obviously the radius of the

32:42 banana slice varies at where you cut it in the banana, right?

32:45 Is it near the top?

32:45 Is it in the middle?

32:46 It's going to result in different size slices.

32:48 On where do you place your bread, the banana circles on your bread to have maximum surface

32:56 area of bananas relative to the what's left of the bread, right?

32:59 Something like that?

32:59 Yes.

33:00 And he's trying to maximize, make it so that you have almost all of the bites of the sandwich

33:05 have an equal ratio of banana, peanut butter, and bread.

33:08 Oh yeah, okay.

33:09 It's all about the flavor.

33:10 I didn't understand like the real motivation, but yeah, you want to have an equal layer, right?

33:16 So you don't want that spot where you just get bread.

33:19 You actually learn quite a bit about all these different processes and there's quite a bit

33:23 of math here talking about coming up with arcs for, you have to estimate the banana shape as

33:31 part of an ellipse and using the radius of that to determine banana slices and estimates for,

33:40 because you're looking at a banana sideways, you have to estimate what the shape of the banana

33:45 circle will be and it's not really a circle, it's more of an ellipse also.

33:49 Yeah, there's a lot going on here.

33:51 Some advanced stuff to deliver your bananas perfectly.

33:56 I love it.

33:57 Actually, this is really interesting.

33:58 This is cool.

33:59 And it's, I mean, it's a silly application, but it's also a neat example.

34:03 Yeah, actually, and this would be, I think, a cool thing for, to talk about difficult problems

34:09 and packing for like a teaching, like in a school setting.

34:14 I think this would be a great example to talk about some of these different complex problems.

34:18 Yeah, totally.

34:19 Well, that's it for our main items.

34:21 For the extras, I just wanted to say I'll put the links for the Excel to Python webcast and

34:26 the memory measurement course down there and we'll put the Patreon link as well.

34:30 Let's see if you have anything else you want to share.

34:32 No, that's good.

34:32 Yeah, cool.

34:33 How about sharing a joke?

34:34 A joke would be great.

34:35 So I'm going to describe the situation and you can be the interviewer slash boss who has the

34:41 caption, okay?

34:42 Okay.

34:42 So the first, there's two scenarios.

34:45 The title is Job Requirements.

34:47 This comes to us from Eduardo Orochena.

34:49 Thanks for that.

34:50 And the first scenario is the job interview where you're getting hired.

34:56 And then there's the reality, which is later, which is the actual on the job day to day.

35:02 So on the job interview, I come in, I'm an applicant here and Brian, the boss says.

35:06 Invert a binary tree on this whiteboard.

35:09 Or some other random data structure, like quick sort this, but using some other weird thing,

35:15 right?

35:15 Yeah.

35:15 Something that is kind of really computer science-y, way out there, probably not going

35:20 to do, but kind of maybe makes sense, right?

35:22 All right.

35:23 Now I'm at the job and I've got like my computer.

35:25 I have a huge purple buy button on my website that I'm working on.

35:29 And the boss says, make the button bigger.

35:31 Yep.

35:33 That's the job.

35:33 Yeah.

35:37 Very nice.

35:38 Good, good.

35:39 All right.

35:40 Well, I love the jokes and all the tech we're covering.

35:42 Thanks, Brian.

35:42 Yeah.

35:43 Thank you.

35:43 Yeah.

35:44 Bye.

35:44 Thank you for listening to Python Bytes.

35:46 Follow the show on Twitter via at Python Bytes.

35:48 That's Python Bytes as in B-Y-T-E-S.

35:51 And get the full show notes at pythonbytes.fm.

35:54 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

35:59 We're always on the lookout for sharing something cool.

36:01 On behalf of myself and Brian Okken, this is Michael Kennedy.

36:04 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript