Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #73: This podcast comes in any color you want, as long as it's black

Return to episode page view on github
Recorded on Wednesday, Apr 4, 2018.

00:00 Hello and welcome to Python bites where we deliver Python news and headlines directly to your earbuds. This is episode 73 recorded April 4th 2018 I'm Michael Kennedy number and this episode is brought to you by data dog Check them out at Python bytes FM slash data dog. Tell you more about them later right now. I want to relive my math education Brian and my Set theory and all these sorts of fun number theory things, right?

00:27 So there's a there's an article called I think I truncated it set theory and Python tips and tricks Actually, that's one of the classes. I loved I loved The discrete math class I had in college that was early in my computer science days To talk and learn about set theory and then you use it a lot you to it's a useful tool But there's a lot of people that come to Python that haven't taken that or really not sure what set theory is it's really not that complicated. And this is a good introduction article on really what set theory is, or some of the set theory concepts, and then how to do that in Python, including things like checking to see if something's in a set and unions and intersections and differences and things. And it's just a lot of fun.

01:14 Yeah, I think this is awesome. And this is one of those things where the right data structure can just make all the difference in terms of performance and simple code, right? Like if you try to model this with lists, which you totally could do, then you're writing basically implementing is this set contained within that set? What is the intersection, the actual performance of checking if something's in there, making sure you don't get duplicates, all that stuff, it just falls apart. But if you use sets, well, then it flies, right? And anytime you need something distinct, right, I want a a distinct number of IDs or usernames or emails or whatever, and where there might be duplication, set.

01:52 It's all about the set.

01:53 - Right, and yeah, that's the thing to take away from, if you're not familiar with sets, sets are a container that only have unique elements in it.

02:03 So if you add, like if you're adding words, you're going through an article and you add words to a set, if the word "the" is already in there, it's only gonna be added once.

02:12 - Right.

02:13 - If you had a list that contained like every word in order of a play or something, you threw it into a set, that would just give you all the words used distinctly.

02:21 Yeah.

02:22 - It's simple data structure built into Python and it's a good thing to know how to use.

02:26 - Yeah, I love it.

02:27 And it's super, super fast for containment, a little bit like dictionaries in that sense.

02:30 So, nice one, I'm glad you found that.

02:32 - The syntax for how to do math, set theory math is not obvious.

02:38 It's not complicated, but it's something that's good to review so that you know what those are. Yeah, very, very nice. So the next thing that I want to talk about, in fact, you're going to notice a little bit of a trend this week, Brian. I'm on some kind of like rant on async programming this week. So all three of my items have to do with async programming in one way or another. Generally each one of them in like improving what already exists. Oh great. Yeah, so the first one is called Trio, T-R-I-O, and it says async programming for humans and snake people.

03:11 (laughing)

03:12 So it's interesting, right?

03:14 Like in Python 3 and 3.4 we got async.io, in 3.5 we got async and await, the keywords that build upon the async.io foundation, which is really coroutines and functions and stuff like that.

03:27 And so this guy who created this is like, this thing already exists, but the API for it's really crummy.

03:33 And you'll hear this as like a theme over and over with slightly different takes.

03:36 But so like, why does this exist, right?

03:38 So it says basically the trio of projects goals produce production quality, async and await native IO libraries for Python.

03:47 And like all the other stuff, it allows you to do sort of IO block stuff in parallel in really nice, nice ways.

03:54 But the API on how you do that is quite a bit nicer.

03:59 And it supports advanced concepts like canceling a task that started while you're still waiting for it.

04:05 right? Things like that. Like if you're doing a web service call in a database transaction and the database fails, you want to roll back or cancel the web call or vice versa, something like that.

04:17 So it really tries to distinguish itself by being like really focused on this usability thing. And it's built like entirely from the ground up. And what's really interesting is they have this like these are our sources for inspiration.

04:29 a link like a GitHub issue.

04:32 Maybe it's in the wiki, I can't remember.

04:33 But a big long part where they talk about all the places.

04:36 There's a lot of Erlang, there's C#, there's Go that makes a big appearance there.

04:41 But in particular it's based on David Beasley's Curio, which is kind of a similar project as well.

04:46 Interesting, right?

04:47 - Yeah, and I like the, like the API's actually pretty nice.

04:50 Things like start soon.

04:52 - Yeah, exactly.

04:53 I love the things you can do.

04:54 So like async I/O stuff is really fairly complicated.

04:58 It's like three, four lines of code.

05:00 You always have to do just a transition from synchronous mode into the asynchronous world.

05:06 You've always got to create this async loop and then begin running an async method to do it.

05:10 But here you just say trio.run and you give it a function.

05:13 Boom, done.

05:14 If you want to like pause but not block the rest of the program, you can instead of saying time.sleep, which will totally block that thread, you can say trio.sleep.

05:23 And that will basically release the main thread to go do more work as if you're doing IO.

05:28 - Nice. - That's nice.

05:29 But like you said, the main part is, you can say, create an async with block, which already is like mind bending, and create this thing called a nursery.

05:38 And then you go to the nursery and say, start this task soon, start that task soon.

05:42 And then the with block will wait and block at the very end until all of the things you started within it are done.

05:48 - Yeah, and the nursery thing confused me at first until I remembered that we're doing child tasks.

05:54 So you get your child's children from a nursery, nice.

05:59 - That's right, you put them in the nursery, they can grow up, and when they're done, then you're out of, but this like adaptation of the async with block is really, really interesting, which I believe requires Python 3.6.

06:09 So they have a bunch of tools for like inspecting and debugging your programs, and like the async flow, how all this stuff is working, which is I think a really nice addition.

06:19 One of the problems that you run into with this is if you've got like say an async Postgres data access layer, it's probably built on asyncio, not on Trio.

06:29 So you can't, even though they are effectively the same, they're not compatible, right?

06:34 So there's this other project called Trio AsyncIO that puts a layer, a compatibility layer, so anything that supports AsyncIO can run on Trio.

06:42 - Oh, cool.

06:43 - Yeah, so this is a really cool project.

06:45 I'm super impressed with this.

06:46 - I'd love to check that out too, but just a slight correction, it says a three, five and above and also pi, pi.

06:53 - Yes, that's true, but I think the async with block, I'm not sure that structure itself is supported in Python 3.5.

07:01 I don't think so.

07:02 I know more things became async in 3.6, like async generator expressions and list comprehensions came in 3.6.

07:10 So there's like, it might be a slightly different context there.

07:14 >> And whatever, f strings came in 3.6, so nothing exists to me before 3.6.

07:19 >> 3.5, you're dead to me.

07:20 [LAUGH]

07:21 >> Awesome.

07:22 So I've been hearing a lot on the social medias about this thing called Black.

07:27 - Yeah, so I thought we'd better cover it even though it's been around for a little while, not like a long time, but you're right, it has had a lot of social media attention.

07:37 So Black is the uncompromising Python code formatter, but I thought it was just sort of an amusing take on something like, so we have linters and everything, but they just tell you what is wrong or what doesn't comply with PEP 8 standards or other standards.

07:55 And this one just goes and changes your code for you and doesn't tell you anything.

07:59 Well, I'm not sure if it tells you or not.

08:01 - Is it like a black box?

08:02 - I'm not sure why the name black, but it is amusing that you can also, now that after you run it on your code, your code is blackened code, which actually makes me just hungry because I really like blackened salmon.

08:15 - Yeah, a good little sauce on it, yeah.

08:17 - It is an interesting take to say, If you're gonna say our code needs to follow certain standards, if this one works for you, just put that in your tool chain and it'll just automatically format everything and you don't need to argue about it anymore.

08:30 - Make it part of like a GitHub, a Git check-in hook or part of your automated build that just formats it and checks it back in.

08:38 - The GitHub repo has some amusing stuff in it also.

08:41 So poke around in there because like for instance in the tests, there's a comments file and it has examples of comments that should be removed.

08:50 And it's sort of funny stuff like some comment about why this function doesn't work and it's still in production anyway.

08:58 Things like that that should just be removed.

09:00 It's funny.

09:01 It's a good read.

09:02 - Yeah, I think this is pretty interesting.

09:03 I haven't done anything with it, but it's definitely worth checking out, especially if you have a team of people and you want to try to make it continuously format stuff the same, right?

09:11 - I don't really have an opinion on it other than it's interesting.

09:15 - Sounds pretty good.

09:16 All right, before we get to the next one, let me tell you about Datadog.

09:18 So if you run any sort of distributed app, understanding how a request flows from one part through the whole thing, what the performance of those are, what the bottlenecks around those are, can be really tricky.

09:31 So you can plug in Datadog, in just a few minutes, you'll be able to investigate those bottlenecks and explore dashboards that show you where you're spending your time in the app.

09:40 And you get to visualize your Python performance, super easy and nice, get a free trial and a free Datadog t-shirt with a cute little dog on it.

09:49 So just check them out at pythonbytes.fm/datadog.

09:53 We gotta get our shirts, right Brian?

09:55 - Yeah, definitely.

09:56 - Definitely.

09:56 I'm looking at PyCon.

09:57 I'm gonna try to get a shirt from them at PyCon.

09:59 - I'm gonna try to get one before that.

10:01 - Nice.

10:02 Back onto my rant on async stuff.

10:05 So there's this thing called gain.

10:07 And the point of gain is you can give it a base URL, a set of regular expressions of types of links in there to traverse and follow, and then just tell it to go, and it will basically spider an entire site.

10:26 Think Google Web Spider, but yours.

10:29 Yeah, and it's all based on AsyncIO, UVLoop, and AIOHCP, which is pretty cool.

10:35 So all you gotta do is you define a class that has a CSS selectors and what you wanna do, like save the data to a file or a database or whatever.

10:43 How many concurrent connections you want it to go spider on with its async aspect, where to start, the URL, things to match, like anything on the page or anything that's under the catalog section or whatever, and you say go.

10:56 And then you wait a little bit and all sorts of stuff has been downloaded, processed, and saved.

11:01 - Right, and it's very terse.

11:03 I mean, you don't really have to put that much code in place to get this done.

11:06 - No, it's like 10, 15 lines of code and you've completely analyzed somebody's entire website structure.

11:12 Pretty cool.

11:13 - Yeah.

11:14 - And 'cause it's based on asyncio and AIO HTTP, it should totally fly.

11:18 - Neat.

11:18 - Yeah, very neat.

11:19 So not a whole lot to do on that one, but if you're doing like screen scraping, web analysis of more than just one page, this is pretty cool 'cause you can sort of just set up patterns and say, go forth and do that.

11:31 - I was thinking it'd be fun to do something like that, to hook up with a website you're running to just even attach it to a post project of checking to make sure the link using a request or something I grab.

11:43 - Yeah, that's a good point.

11:44 Like link validation to make sure every link on the page works correctly, things like that, right?

11:49 - Yeah, and set up a notifier or something like that to let you know if something's broken.

11:53 - All right, so what is this next one you got with these decorators, single dispatch?

11:57 - Yeah, actually, so this is an article called Generic Functions in Python with Single Dispatch.

12:04 And I didn't know this was a thing.

12:06 Apparently in Python 3.4 it was added, this decorator, a single dispatch decorator.

12:12 And we'll talk about it and read it, but you kind of need to see the code.

12:17 You can decorate a function with single dispatch and that makes that function the default function.

12:24 Then you can use a decorator to register other functions to be the non-default.

12:30 >>Oh, this is interesting.

12:32 - Yeah, so that you can have one function name that does, calls different functions based on the type of the first element in the parameter list.

12:41 - So it's basically like declarative function overloading polymorphism based on the type, which we don't have in Python.

12:50 - Right, which apparently we do have in Python, I just didn't know about it.

12:54 - It just requires decorators, but it's built in.

12:56 Built in decorators.

12:57 - Yeah.

12:58 - Yeah, this is wild.

12:59 So you've got like one function and you say that single dispatch, And then you have other functions that just have docstrings, but you would basically wrap them with, this one takes a list setter tuple called this version, if it takes a dictionary called this other version.

13:11 Yeah, this is interesting. It's a little non-obvious, but it's interesting.

13:15 I took the example out of the article and trimmed it down.

13:19 So those docstrings are just to make our code example and our notes small.

13:23 But yeah, it has an example of like building your own fprintf function that can print differently.

13:31 Like the default is just to print the string representation.

13:35 But for instance, lists and sets and dictionaries can be printed differently and having elements on each line.

13:42 I'm sure there's other reasons, but I know I've run across times where I wished Python had function overloading and it doesn't.

13:50 - I've implemented function overloading with a if is instance of this type, else if instance of this type, you know, which is not really great but it's what you got.

14:01 But apparently you've got this as well, which is pretty awesome.

14:04 All right, you ready for another rant?

14:05 Another thing on async?

14:07 Final one.

14:08 - We haven't talked about async for a while.

14:09 - No, let's talk about that.

14:10 So there's this thing by a guy named Alex Sherman and he wrote a library called unsync, async, unsync, called Unsynchronizing Async and Await in Python 3.6.

14:22 So he says, "I'm a big fan of async and await, But there's two major problems with this.

14:31 First of all, it's difficult to do fire and forget async stuff, right?

14:34 You can't just go to an async function and call it and let it run.

14:37 You have to do this weird sort of series of creating the async loop, blocking on the async loop.

14:43 So you create a loop, and you ensure the future by giving it a function, a loop function to call.

14:49 But it's really not obvious to just run a basic asynchronous thing from a synchronous task, okay?

14:56 So if you look at the article right at the top, it links sort of that code there.

15:00 You also can't block.

15:02 You can't say, "Well, I've gotten this thing back from an async thing.

15:05 I just want to stop here and just wait until its answer comes back." It'll throw an exception, right?

15:09 So this is all kind of weird.

15:11 This guy says, "Hey, what can we do about this?" So he kind of solves it in a sense.

15:16 He says, "C# had this idea of asyncing away in these tasks that run almost identical to what Python has.

15:23 The way they fixed it was by creating this ambient thread pool that will capture and run.

15:29 Basically the async I/O loop is like this thing behind the scenes you never see and internally Python or C# would find it, just put it in the default one.

15:37 And they said Python didn't take this approach and his hunch is the maintainers didn't want to add an ambient thread pool to their language, which makes sense.

15:44 He says, "I, however, am not a Python maintainer and I did add an ambient thread to mine and And here's how it works.

15:53 So you just take any async function and you put an @unsync decorator.

15:58 We also have a big decorator theme going on here.

16:00 Put an @unsync decorator on it and then you just call it.

16:03 And it sounds real simple, like so what it does is it will basically wrap it up and do all that async I/O initialization stuff for you.

16:11 And then you can wait on the result or not wait on the result, however you like.

16:15 That alone is pretty cool.

16:17 But then if you put that on a regular function, not an async one, it'll cause it to run on a thread pool thread, on thread pool executor.

16:24 If you flip a bit and say, at unsync your decorator, but it's CPU bound, it'll actually run it on the process pool executor in a separate process.

16:33 So you can get around the gill.

16:35 Oh, interesting.

16:36 And it's all just one decorator.

16:39 And it'll like traverse, it'll sort of manage those dependencies as async, how does it run?

16:43 it's really pretty slick on how it detects the different ways in which asynchrony can be manifest in Python.

16:49 To be fair, my first thought is that this might be, if I'm writing an asynchronous library, that synchronizing my asynchronous library, for instance, might be helpful during just a functional test, for example.

17:05 Yeah, you want to wait for it to go, yeah.

17:08 Definitely. So I think there's a lot of interesting use cases for this, this and it definitely provides a lot of flexibility. It's not, it doesn't have a huge number of GitHub stars. I think it's pretty new but you know people can think about it and maybe there's even some tie-ins like maybe somehow Trio and its Trio async I/O could plug together with this. I don't know but a lot of interesting news around the async I/O space or async await space. Very amusing code example of things like return I hate event loops and naming is event loop, annoying event loop. Yeah, he's got some great naming. And then his async function that he calls by putting the unsync decorator on it, its return value is I like decorators. Yeah, it's pretty light-hearted. It's nice, but it's a cool project. People can check it out and see if it works for them. All right, well thanks. Yeah, you got anything else to share with the Brian? I'm out of news for the week. No, I'm out as well. All right, how about that? Well, thanks for finding all these things and sharing them, and thanks everyone for listening.

18:09 Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes.

18:14 That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm.

18:20 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

18:24 We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page