Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #73: This podcast comes in any color you want, as long as it's black

Return to episode page view on github
Recorded on Wednesday, Apr 4, 2018.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 73, recorded April 4th, 2018.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And this episode is brought to you by Datadog.

00:13 Check them out at pythonbytes.fm/datadog.

00:16 I'll tell you more about them later.

00:17 Right now, I want to relive my math education, Brian, and my set theory and all these sorts of fun number theory things.

00:27 Right.

00:28 So there's an article called, I think I truncated it, set theory and Python tips and tricks.

00:35 Actually, that's one of the classes I loved.

00:38 I loved the discrete math class I had in college that was early in my computer science days to talk and learn about set theory.

00:45 And then you use it a lot.

00:47 It's a useful tool.

00:49 But there's a lot of people that come to Python that haven't taken that or really not sure what set theory is.

00:55 It's really not that complicated.

00:56 And this is a good introduction article on really what set theory is or some of the set theory concepts.

01:02 And then how to do that in Python, including things like checking to see if something's in a set and unions and intersections and differences and things.

01:13 And it's just a lot of fun.

01:14 Yeah, I think this is awesome.

01:15 And this is one of those things where the right data structure can just make all the difference in terms of performance and simple code.

01:22 Right.

01:22 Like if you try to model this with lists, which you totally could do, then you're writing basically implementing.

01:29 Is this set contained within that set?

01:31 What is the intersection?

01:32 The actual performance of checking if something's in there, making sure you don't get duplicates, all that stuff.

01:38 It just falls apart.

01:39 But if you use sets, well, then it flies, right?

01:42 And anytime you need something distinct, all right, I want a distinct number of IDs or usernames or emails or whatever where there might be duplication, set.

01:52 It's all about the set.

01:52 Right.

01:53 And yeah, that's the thing to take away from if you're not familiar with sets.

01:57 Sets are a container that only have unique elements in it.

02:03 So if you're adding words, you're going through an article and you add words to a set.

02:08 If the word the is already in there, it's only going to be added once.

02:12 Right.

02:12 If you had a list that contained every word in order of a play or something, you threw it into a set, that would just give you all the words used distinctly.

02:21 Yeah.

02:21 It's simple data structure built into Python, and it's a good thing to know how to use.

02:26 Yeah, I love it.

02:27 And it's super, super fast for containment, a little bit like dictionaries in that sense.

02:30 So nice one.

02:31 I'm glad you found that.

02:32 The syntax for how to do math, set theory math, is not obvious.

02:38 It's not complicated, but it's something that's good to review so that you know what those are.

02:43 Yeah, very, very nice.

02:44 So the next thing that I want to talk about, in fact, you're going to notice a little bit of a trend this week, Brian.

02:51 I'm on some kind of rant on async programming this week.

02:54 So all three of my items have to do with async programming in one way or another.

03:00 Generally, each one of them and improving what already exists.

03:03 Oh, great.

03:03 Yeah.

03:03 So the first one is called Trio, T-R-I-O, and it says async programming for humans and snake people.

03:11 So it's interesting, right?

03:14 Like in Python 3 and 3.4, we got asyncio.

03:17 In 3.5, we got async and await, the keywords that built upon the asyncio foundation, which is really coroutines and functions and stuff like that.

03:27 And so this guy who created this is like, this thing already exists, but the API for it is really crummy.

03:33 And you'll hear this is like a theme over and over with slightly different takes.

03:36 But so like, why does this exist?

03:38 Right.

03:38 So it says basically the Trio project's goal is to produce production quality, async and await native IO libraries for Python.

03:47 And like all the other stuff, it allows you to do sort of IO block stuff in parallel in really nice ways.

03:55 But the API on how you do that is quite a bit nicer.

03:59 And it supports advanced concepts like canceling a task that started while you're still waiting for it.

04:05 Right.

04:06 Things like that.

04:07 Like if you're doing a web service call and a database transaction and the database fails, you want to roll back or cancel the web call or vice versa, something like that.

04:17 So it really tries to distinguish itself by being like really focused on this usability thing.

04:22 And it's built like entirely from the ground up.

04:24 And what's really interesting is they have this like, these are our sources for inspiration.

04:28 A link like a GitHub issue.

04:32 Maybe it's in the wiki.

04:33 I can't remember.

04:33 But a big long part where they talk about all the places.

04:36 There's a lot of Erlang.

04:37 There's C#.

04:38 There's Go that makes a big appearance there.

04:41 But in particular, it's based on David Beasley's Curio, which is kind of a similar project as well.

04:46 Interesting, right?

04:47 Yeah.

04:47 And I like the API is actually pretty nice.

04:50 Things like start soon.

04:52 Yeah, exactly.

04:52 I love the things you can do.

04:54 So like asyncio stuff is really fairly complicated.

04:58 It's like three, four lines of code.

05:00 You always have to do just a transition from synchronous mode into the asynchronous world.

05:05 You've always got to create this async loop and then begin running an async method to do it.

05:10 But here you just say trio.run and you give it a function.

05:12 Boom, done.

05:14 If you want to like pause but not block the rest of the program, you can instead of saying time.sleep, which will totally block that thread, you can say trio.sleep.

05:23 And that will basically release the main thread to go do more work as if you're doing IO.

05:27 Nice.

05:28 That's nice.

05:29 But like you said, the main part is you can create an async with block, which already is like mind bending, and create this thing called a nursery.

05:38 And then you go to the nursery and say start this task soon.

05:40 Start that task soon.

05:42 And then the with block will wait and block at the very end until all of the things you started within it are done.

05:48 Yeah.

05:48 And, you know, the nursery thing confused me at first until I remembered that we're doing child tasks.

05:53 So you get your child's children from a nursery.

05:58 Nice.

05:59 That's right.

05:59 You put them in the nursery.

06:00 They can grow up.

06:01 And when they're done, then you're out of it.

06:03 But this like adaptation of the async with block is really, really interesting, which I believe requires Python 3.6.

06:09 So they have a bunch of tools for like inspecting and debugging your programs and like the async flow, how this stuff is working, which is, I think, a really nice addition.

06:18 One of the problems that you run into with this is if you've got like, say, an async Postgres data access layer, it's probably built on asyncio, not on Trio.

06:28 So you can't, even though they are effectively the same, they're not compatible.

06:33 Right.

06:33 So there's this other project called Trio Async IO that puts a layer, a compatibility layer.

06:40 So anything that supports asyncio can run on Trio.

06:42 Oh, cool.

06:43 Yeah.

06:43 So this is a really cool project.

06:45 I'm super impressed with this.

06:46 I'll have to check that out, too.

06:47 But just a slight correction.

06:49 It says 3.5 and above and also PyPy.

06:53 Yes, that's true.

06:54 But I think the async with block.

06:56 I'm not sure.

06:57 Oh, okay.

06:58 That structure itself is supported in Python 3.5.

07:01 I don't think so.

07:02 I know more things became async in 3.6, like async generator expressions and list comprehensions

07:09 came in 3.6.

07:10 So there's like, it might be a slightly different context there.

07:14 And whatever, f-strings came in 3.6.

07:16 So nothing exists to me before 3.6.

07:18 3.5, you're dead to me.

07:20 Awesome.

07:22 So I've been hearing a lot on the social medias about this thing called Black.

07:27 Yeah.

07:27 So I thought we'd better cover it, even though it's been around for a little while, not like

07:32 a long time.

07:33 But you're right.

07:34 It has had a lot of social media attention.

07:36 So Black is the uncompromising Python code for matter.

07:42 But I thought it was just sort of an amusing take on something like, so we have linters and

07:47 everything, but they just tell you what is wrong or what doesn't comply with PEP 8 standards

07:54 or other standards.

07:55 And this one just goes and changes your code for you and doesn't tell you anything.

07:59 Well, I'm not sure if it tells you or not.

08:01 Is it like a black box?

08:02 I'm not sure why the name black, but it is amusing that you can also, now that after you

08:08 run it on your code, your code is blackened code, which actually makes me just hungry because

08:13 I really like blackened salmon.

08:14 Yeah, a good little sauce on it, yeah.

08:16 It is an interesting take to say, if you're going to say our code needs to follow certain

08:22 standards, if this one works for you, just put that in your tool chain and it'll just

08:27 automatically format everything and you don't need to argue about it anymore.

08:30 Make it part of like a GitHub, a git check-in hook or part of your automated build that just

08:36 formats it and checks it back in.

08:38 The GitHub repo has some amusing stuff in it also.

08:41 So poke around in there because like, for instance, in the tests, there's a comments file and it

08:47 has examples of comments that should be removed.

08:49 And, you know, it's sort of funny stuff like some comment about why this function doesn't

08:55 work and it's still in production anyway.

08:57 Things like that that should just be removed.

09:00 It's funny.

09:00 It's a good read.

09:01 Yeah, I think this is pretty interesting.

09:03 I haven't done anything with it, but it's definitely worth checking out, especially if you have a

09:07 team of people and you want to try to make it continuously format stuff the same, right?

09:11 I don't really have an opinion on it other than it's interesting.

09:15 Sounds pretty good.

09:15 All right, before we get to the next one, let me tell you about Datadog.

09:18 So if you run any sort of distributed app, understanding how a request flows from one

09:24 part through the whole thing, what the performance of those are, what the bottlenecks around those

09:29 are, can be really tricky.

09:30 So you can plug in Datadog.

09:32 In just a few minutes, you'll be able to investigate those bottlenecks and explore dashboards that

09:38 show you where you're spending your time in the app.

09:40 And you get to visualize your Python performance.

09:43 Super easy and nice.

09:44 Get a free trial and a free Datadog t-shirt with a cute little dog on it.

09:49 So just check them out at pythonbytes.fm/Datadog.

09:53 We've got to get our shirts, right, Brian?

09:54 Yeah, definitely.

09:55 Definitely.

09:56 I'm looking at PyCon.

09:57 I'm going to try to get a shirt from him at PyCon.

09:59 I'm going to try to get one before that.

10:01 Nice.

10:01 Back onto my rant on async stuff.

10:04 So there's this thing called gain.

10:07 And the point of gain is you can give it a base URL, a set of regular expressions of types

10:15 of links in there to traverse and follow, and then just tell it to go.

10:22 And it will basically spider an entire site.

10:25 Think Google Web Spider, but yours.

10:29 Yeah.

10:30 And it's all based on asyncio, UV loop, and AIoHCP, which is pretty cool.

10:35 So all you've got to do is you define a class that has a CSS selectors and what you want

10:40 to do, like save the data to a file or a database or whatever.

10:42 How many concurrent connections you want it to go spider on with its async aspect.

10:48 Where to start.

10:49 The URL things to match, like anything on the page or anything that's under the catalog section

10:55 or whatever.

10:55 And you say go.

10:56 And then you wait a little bit and all sorts of stuff has been downloaded, processed, and

11:01 saved.

11:01 Right.

11:01 And it's very terse.

11:02 I mean, you don't really have to put that much code in place to get this done.

11:05 No, it's like 10, 15 lines of code.

11:08 And you've like completely analyzed somebody's entire website structure.

11:12 Pretty cool.

11:13 Yeah.

11:13 And because it's based on asyncio and AIoHCP, it should totally fly.

11:17 Neat.

11:18 Yeah.

11:18 Very neat.

11:18 So not a whole lot to do on that one.

11:20 But if you're doing like screen scraping, web analysis of more than just one page, this

11:26 is pretty cool because you can sort of just set up patterns and say, go forth and do that.

11:31 I was thinking it'd be fun to do something like that to hook up with a website you run

11:36 in to just even attach it to a post project of checking to make sure the link using a request

11:42 or something to grab.

11:43 Yeah.

11:44 That's a good point.

11:44 Like link validation to make sure every link on the page works correctly.

11:48 Things like that.

11:48 Right.

11:48 Yeah.

11:49 And set up a notifier or something like that to let you know if something's broken.

11:53 All right.

11:53 So what is this next one you got with these decorators, single dispatch?

11:57 Yeah, actually.

11:58 So this is an article called generic functions in Python with single dispatch.

12:03 And I didn't know this was a thing.

12:06 Apparently in Python 3.4, it was added this decorator, a single dispatch decorator.

12:12 And we'll talk about it and read it.

12:14 But you kind of need to see the code.

12:16 You can decorate a function with single dispatch.

12:20 And that makes that function the default function.

12:24 Then you can use a decorator to register other functions to be the non-default.

12:30 Oh, this is interesting.

12:31 Yeah.

12:32 So that you can have one function name that does calls different functions based on the

12:38 type of the first element in the parameter list.

12:41 So it's basically like declarative function overloading polymorphism based on the type, which we don't

12:49 have in Python.

12:49 Right.

12:50 Which apparently we do have in Python.

12:52 I just didn't know about it.

12:53 It just requires decorators.

12:55 But it's built in.

12:56 Built in decorators.

12:57 Yeah.

12:57 Yeah.

12:58 This is wild.

12:58 So you've got like one function and you say single dispatch.

13:01 And then you have other functions that just have doc strings.

13:03 But you would basically wrap them with this one takes a list setter tuple called this version.

13:09 If it takes a dictionary called this other version.

13:11 Yeah.

13:12 This is interesting.

13:13 It's a little non-obvious, but it's interesting.

13:15 I took the example out of the article and trimmed it down.

13:19 So those doc strings are just to make our code example and our notes small.

13:23 But yeah, it has an example of like building your own fprintf function that can print differently.

13:31 Like the default is just to print the string representation.

13:34 But for instance, lists and sets and dictionaries can be printed differently and having elements

13:41 on each line.

13:41 I'm sure there's other reasons.

13:43 But I know I've run across times where I wished Python had function overloading and doesn't.

13:49 I've implemented function overloading with if is instance of this type.

13:54 Else if instance of this type, you know, which is not really great, but it's what you got.

14:00 Right.

14:01 But apparently you've got this as well, which is pretty awesome.

14:03 Yeah.

14:04 All right.

14:04 You ready for another rant?

14:05 Another thing on async?

14:06 Final one.

14:07 We haven't talked about async for a while.

14:09 No.

14:09 Let's talk about that.

14:10 So there's this thing by a guy named Alex.

14:13 Sherman.

14:14 And he wrote a library called unsync, async, unsync, called unsynchronizing async and await

14:21 in Python 3.6.

14:22 So he says, I'm a big fan of async and await, but there's two major problems with this.

14:30 First of all, it's difficult to do fire and forget async stuff, right?

14:34 You can't just go to an async function and call it and let it run.

14:37 You have to do this weird sort of series of creating the async loop, blocking on the async

14:43 loop.

14:43 So you create a loop and you ensure the future by giving it a function and a loop function

14:49 to call.

14:49 But it's really not obvious to just run a basic asynchronous thing from a synchronous task.

14:56 Okay.

14:56 Okay.

14:56 So if you look at the article, like right at the top, it links sort of that, that code

15:00 there.

15:00 You also can't block.

15:02 You can't say, well, I've gotten this thing back from an async thing.

15:05 I just want to stop here and just wait until its answer comes back.

15:07 It'll throw an exception.

15:08 Right?

15:09 So this is all kind of weird.

15:10 This guy says, Hey, well, what can we do about this?

15:13 So he kind of solves it in a sense says, you know, C# had this idea of async and away

15:19 in these tasks that run almost identical to what Python has.

15:22 The way they fixed it was by creating this ambient thread pool that will capture and run.

15:28 Basically the asyncio loop is like this thing behind the scenes you never see.

15:32 And internally Python or C# would like find it, just put it in like the default one.

15:37 And they said, Python didn't take this approach and his hunch is the maintainers didn't want

15:41 to add an ambient thread pool to their language, which makes sense.

15:44 He says, I, however, am not a Python maintainer.

15:48 And I did add an ambient thread to mine and here's how it works.

15:52 So you just take any async function and you put an at unsync decorator.

15:57 We also have a big decorator theme going on here.

15:59 Put an at unsync decorator on it and then you just call it.

16:02 And it sounds real simple.

16:04 So what it does is it will basically wrap it up and do all that asyncio initialization

16:10 stuff for you.

16:11 And then you can wait on the result or not wait on the result, however you like.

16:14 That alone is pretty cool.

16:17 But then if you put that on a regular function, not an async one, it'll cause it to run on a

16:22 thread pool thread on thread pool executor.

16:24 If you flip a bit and say at unsync your decorator, but it's CPU bound, it'll actually run it

16:30 on the process pool executor in a separate process.

16:33 So you can get around the GIL.

16:34 Oh, interesting.

16:36 And it's all just one decorator and it'll like traverse.

16:40 It'll like sort of manage those dependencies as async.

16:42 How does it run?

16:43 Where's the, it's really pretty slick on how it like detects the different ways in which

16:47 asynchrony can be like manifest in Python.

16:50 To be fair, I know my first thought is like that this might be, if I'm writing an asynchronous

16:56 library that synchronizing my asynchronous library, for instance, might be helpful during like

17:04 just a functional test, for example.

17:06 Yeah.

17:06 You want to wait for it to go.

17:08 Yeah, definitely.

17:09 So I think there's a lot of interesting use cases for this and it definitely provides a lot

17:13 of flexibility.

17:14 It's not, it doesn't have a huge number of GitHub stars.

17:17 I think it's pretty new, but you know, people can think about it.

17:20 And maybe there's even some tie-ins, like maybe somehow Trio and it's Trio async.io could

17:26 plug together with this.

17:27 I don't know, but a lot of interesting news around the async.io space or async await space.

17:32 Very amusing code example of things like a return.

17:36 I hate event loops and naming his event loop, annoying event loop.

17:42 Yeah.

17:43 He's got some great naming.

17:44 And then his async function that he calls by putting the unsync decorator, it's return value

17:49 is I like decorators.

17:50 Yeah.

17:51 It's pretty lighthearted.

17:53 It's nice, but it's a cool project.

17:55 People can check it out and see if it works for them.

17:57 All right.

17:57 Well, thanks.

17:58 Yeah.

17:58 You got anything else to share with Brian?

17:59 I'm out of news for the week.

18:01 No, I'm out as well.

18:03 All right.

18:04 How about that?

18:04 Well, thanks for finding all these things and sharing them.

18:07 And thanks everyone for listening.

18:09 Thank you for listening to Python Bytes.

18:11 Follow the show on Twitter via at Python Bytes.

18:14 That's Python Bytes as in B-Y-T-E-S.

18:17 And get the full show notes at pythonbytes.fm.

18:20 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

18:24 We're always on the lookout for sharing something cool.

18:27 On behalf of myself and Brian Okken, this is Michael Kennedy.

18:30 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page