Transcript #246: Love your crashes, use Rich to beautify tracebacks
Return to episode page view on github00:00 Hey there, thanks for listening. Before we jump into this episode, I just want to remind you
00:03 that this episode is brought to you by us over at Talk Python Training and Brian through his pytest
00:09 book. So if you want to get hands-on and learn something with Python, be sure to consider our
00:14 courses over at Talk Python Training. Visit them via pythonbytes.fm/courses. And if you're
00:21 looking to do testing and get better with pytest, check out Brian's book at pythonbytes.fm slash
00:27 pytest. Enjoy the episode. Welcome to Python Bytes, where we deliver Python news and headlines
00:31 directly to your earbuds. This is episode 246, recorded August 11th, 2021. I'm Michael Kennedy.
00:38 And I'm Brian Okken.
00:39 And I'm David Smith.
00:40 Hey, David Smith. Welcome. So good to have you here.
00:43 It's good to be here.
00:44 Yeah, you've been a suggester of topics, I believe. You've sent in some ideas and thoughts for us. And
00:50 well, we're going to get a good dose of that today for sure.
00:53 Quite honestly, if I'd known that you're going to open this up, I probably would have
00:56 afforded some of those because it was a little bit of a scramble. Be like,
00:59 Oh yeah, I already gave them that tip. So yeah, I had to dig a little bit.
01:03 Yeah, you've already shared all your favorites. Well, your losses are gained because you've made it easier for us in the past. So thanks for sharing
01:10 those things. And yeah, thanks for being here. It's going to be great to have you.
01:14 Definitely.
01:14 Yeah, I want to give the quick elevator pitch on you. People, what should they know about you?
01:19 Well, I'm a recent tech convert, I'll say. Over the last 10 years, I've been working in
01:24 the manufacturing space, either in quality engineering or manufacturing engineering. And
01:28 over the last couple of years, been using Python a lot more heavily. I used to do a lot of VBA
01:34 and Excel, which it was painful. And I got a suggestion from one of our equipment suppliers
01:39 to say, hey, use Python. It's really, really nice. I kind of resisted doing it because I didn't
01:43 want to learn something new. It seemed intimidating because it's a programming language. I'm not a programmer,
01:47 but I finally caved when it came to trying to automate plotting, which is pretty painful in
01:54 Excel. And yeah, once I started on it and had something useful working in a couple hours,
01:59 I was hooked. And then I started looking for more and more resources, found your show and got more
02:04 and more into it from there. I started digging into the web and it's just been a, I'd say an upward
02:08 spiral from there. And about probably about two and a half weeks ago, I started in my first,
02:13 I guess, official tech role and a similar kind of domain as a for an automotive supplier. I'm doing
02:19 engineering work. So it's been really exciting to be able to use Python full time. It is part of my
02:25 job because, you know, the bits of times I got to use Python before, that's always the parts I like
02:30 the most. So I'm happy to be doing it, you know, on purpose.
02:33 Awesome. Yeah, me too. I wish I could do it full time.
02:36 I remember my first full time software development job. I was like, I can't believe they're paying me to
02:42 do this. I better figure this stuff out before they fire me. I can't believe I'm doing this. It
02:46 was so great. Yeah.
02:47 So good. All right. Well, congratulations and happy to have you here.
02:50 Brian, I feel like we should document this.
02:52 Definitely should document it and test our docs too. So one of the things I'd like to try, did I just
02:58 try to edit? There we go. Something that came up recently was Vincent Warmerdam. I think we've had
03:05 him on the show.
03:05 Mm hmm.
03:06 Yeah. A couple episodes ago. Yeah.
03:08 Yeah. So Vincent announced that he's got a library called make test docs. And I kind of love this. So the idea is you, it's a bunch of utilities that you can use to, to help test your documentation. It doesn't do it right out of the box. You have to, you have to create your own test files to do this. But the idea, like the, the, the, the first example that he shows on his read me is,
03:38 is that you've got a markdown file and it's got some, some Python blocks and code blocks in it. And you can make a test that goes through, reads the markdown, grabs the Python code and runs it. And if there's any problems with it, if there's any exceptions, it fails the test. This is just brilliant. There's examples in here for, for doing it with doc strings and even class doc strings. And then Vincent even did, he does the com code.
04:08 And he did a little com code video on how to use this. Yeah. And you're putting that in the show notes for people, right? To check out. Yep. There's a link to the tutorial with the video. the suggestion or the use case that he was talking about at first was, that maybe you're using make docs for documentation. Therefore you've got a bunch of markdown, but my use case is going to be blogs. So.
04:34 Yeah. I think that's a huge use case actually. Yeah. I've got Python code in my, in my blog source code. That's it's markdown files. I totally want that's one. On my to-do list is to try this, to make sure that the blog content is accurate.
04:48 it. That is super cool. You know, one more thing that you might find interesting. I think this is
04:54 a more true software engineering type of solution, but another sort of whizzy wig as you work style
05:01 of solution is PyCharm. If you have a markdown file and you have Python code in there, we'll
05:07 highlight the errors and actually show you if like symbols are missing and stuff. So if you had the
05:12 markdown associated with the sample code and then you like do stuff with your little examples, it
05:17 may actually show you the errors live as well.
05:20 Oh, that's cool.
05:21 Yeah. I mean, that's not like a CI sort of keep it fixed, but that's a as you type kind of thing.
05:25 Yeah. And the other comment that he had is if you, I normally don't put like asserting things are valid
05:32 in documentation, but the comment in the read me is that if you put asserts in there, it'll get
05:39 checked also. So you've got like unit tests built into your documentation.
05:42 Super cool. David, what do you think?
05:44 It's interesting. I'm just trying to figure out.
05:47 Is a, are you doing like a parameterized test and looking at your inputs versus outputs for the
05:52 code that's in the documentation or how do you actually know it's testing correctly?
05:56 Oh, right.
05:57 Is it a valid Python or?
05:59 So the little code snippet we've got in that we're showing on the screen in the chat, but
06:04 also there's a link in the read me to the read in the show notes to the read me.
06:09 The parameterizes that it uses uses the like in this example, I'm saying go look in my docs folder.
06:17 And for everything that it finds in there, that's a markdown file that'll show up as a, a parameterized, the parameterization of the test.
06:26 So if I've got this test will run once per file.
06:30 So if I've got three markdown files in there, it'll, the test will run three times.
06:36 This is the most comprehensive and yet extremely short test I've seen.
06:39 In a really long time.
06:41 It's three lines and it will like basically work, traverse a tree of markdown file hierarchy type thing.
06:48 Oh, I do tons of really tiny tests.
06:50 So yeah.
06:51 Yeah.
06:52 Nice.
06:52 Nice.
06:52 Nice.
06:53 Nice.
06:53 Nice.
06:53 All right.
06:54 Avaro.
06:54 Welcome to live stream.
06:55 Happy to have you here.
06:56 let's see, let's move on to the next one.
06:58 I think speaking of users, giving us our listeners, giving us ideas and helping us out here.
07:04 I want to talk about something that I've been hanging onto for a little while since March,
07:09 but I finally decided it's time to talk about it.
07:12 And that is creating cues, out of process sort of asynchronous queue processing.
07:19 So if I've got, say a web app or an API, or even if I'm testing a bunch of the hardware
07:25 and I want to kick off a bunch of jobs, eventually I don't want to, you know, necessarily block on
07:30 all of them.
07:31 I might want to push them down so other things can work on them.
07:34 you know, if I'm going to send a bunch of emails, if you've ever tried to send a thousand
07:38 emails in order, synchronously, it turns out that times out your web request.
07:42 Don't do that.
07:43 So a better idea would be to like push them to a queue and have some sort of background process.
07:46 Go, oh, there's new emails to send.
07:48 Let me jam those on down the line.
07:49 So, Scott hacker sent over this, pointer to this library, a small, but cool little one
07:56 called, called it is called QR three and QR three is a queue for Redis.
08:02 And the three means Python three.
08:03 Cause there used to be a QR that wasn't three.
08:05 That's not Python three compatible.
08:07 So here's like a, a re-imagining of that for Python three, or just a compatibility
08:13 that got moved over.
08:14 So it's pretty cool.
08:16 We check it out.
08:17 The API and implementation or the, the, the usage is quite simple as you could imagine.
08:22 So all you got to do is you got to, it's built upon Redis pie.
08:26 You've got to have Redis installed.
08:27 That could be, you know, wherever even be Redis as a service on some of these cloud platforms,
08:31 run it in Docker, run it locally.
08:33 You have Redis pie.
08:34 And then you just go over and you create a queue.
08:37 So you just say queue and you give it a name and then some server connect info, like a location,
08:43 authentication and whatnot.
08:44 And then all you've got to do is you push items to it.
08:47 They could be just really simple things like a bunch of email addresses you're going to send,
08:51 but it could also be really complicated.
08:53 Like for example, it could be, say Pydantic models that store all the data that you need
08:59 to process that request.
09:00 So that's pretty cool.
09:01 It has, the, the default way of getting data over to it is through C pickle and C pickle
09:08 is better than pickle, but still has issues and other restrictions.
09:12 some of the restrictions are, you can't put certain types of objects.
09:15 Like it wouldn't make sense to serialize a database connection that has an open socket or
09:20 a thread or some weird thing like that.
09:22 Right.
09:22 But most of the sort of message, here's the data you need to process that you would send
09:26 over all that stuff at work.
09:28 And you can also, create your own serial serializer on a per queue basis, which is kind
09:34 of cool.
09:35 So if you said, I want to only work with Pydantic models, you could put the sort of from dictionary
09:40 to dictionary transformation with the validation and all that kind of stuff.
09:44 I personally would not use C pickle because one of the things you can run into is if you upgrade
09:49 your version of Python on one server, but not the other, because you're in the process of going
09:54 from one to the other and some thing has a different structure and memory and gets put
09:58 over there.
09:59 The other ones can't read it or like, there's always these, these challenges of pure binary
10:03 matches.
10:03 I don't know.
10:04 I would do that.
10:04 Probably serializes JSON or something and serialize it back.
10:08 But anyway, it's pretty cool.
10:09 What do you guys think?
10:10 This looks nice.
10:10 I actually haven't used cues in Python before, but it's on my to-do list.
10:15 because I mean, designing complex systems, breaking it up into, into different processes
10:20 with cues back and forth is a cool way to do it.
10:23 Yeah.
10:23 I'm kind of inspired by this.
10:25 I kind of want to do more stuff with cues as well.
10:27 David?
10:28 Oh, it seems like a really clean, simple way to use cues.
10:30 I'm with Brian.
10:31 I haven't really used it in a Python context before, but like the examples you gave are
10:35 perfect.
10:36 You know, emails are, they take a long time.
10:38 So you don't want to be binding up your, your main application.
10:41 You need to dump this off into a background task.
10:42 And this looks really, really simple to use.
10:45 So, you know, I seem like it'd be worth a try for sure.
10:49 Yeah, for sure.
10:50 Other things are like, you need to generate a report that takes 30 seconds, you know, kick
10:54 off the generation and then see if it's in the database and just do some sort of like Ajax
10:58 poll until it's there or whatever.
11:00 it has some more features.
11:02 So it has a queue, which is first in first out, as you can imagine.
11:05 it has a capped and I call it a capped collection.
11:08 I feel like it should be capped a queue because it's implemented behind the scenes as a capped
11:13 collection.
11:13 they also say a bounded queue is another AKA.
11:16 So the idea is if you're doing like analytics and logging and you're trying to eventually
11:21 process that and save it to the database, but you want to say, you know what, we really don't
11:24 want this queue to get more than a hundred thousand items at a time because we should be writing
11:29 this to the database.
11:29 And if for something goes wrong, it can completely wreck the server.
11:32 So you increase these capped queues where you're like, I'm going to start throwing away
11:35 old stuff.
11:35 If we don't get to it in time, there's a DQ, which to me sounds like getting stuff out of
11:40 a queue, but oh no, it's a double ended queue.
11:42 A double ended queue.
11:43 it should be a, yeah.
11:45 Anyway, it should be, the idea is you can basically put stuff onto the front or the back,
11:50 and you can pop stuff off the front and the back.
11:52 So you could, for example, put low priority items on the back, or something's really important.
11:59 You could kick it up to the front or right to the front of the queue.
12:01 And then finally, you also do a stack.
12:03 you can also do a priority queue, which is like sort of pretty close to what I described,
12:08 but you can't jump ahead of the things that have a similar priority, right?
12:12 Like if there's super urgent and then low, you can put like a super urgent new thing at the
12:17 front of the super urgent ones, but it would appear before all the others, things like that.
12:21 So, this is all pretty neat.
12:23 What I really like about this is obviously Python has queues built in, right?
12:27 Like that's just a data type.
12:28 A list itself could basically be a queue.
12:30 You can pop stuff off the front and shazam, you have a queue.
12:32 But this is out of process, right?
12:35 This means if you have to scale out for your worker processes in any sort of API,
12:40 or you want it to be able to be durable across app restarts, things like that.
12:44 And if you think, oh, I'm not going to scale out across, I'm not having multiple servers.
12:48 Like almost every Python web app and web API runs with multiple worker processes at a minimum.
12:52 So yeah, you're scaling out.
12:54 Anyway, I think this is pretty useful.
12:55 And if you're all about Redis, this is cool.
12:57 Redis seems nice.
12:58 I'm kind of inspired to do something like this with MongoDB, but I'm also busy.
13:01 So probably, probably not right away.
13:03 And John Sheehan out there in the live stream is telling me that, learned a few years
13:07 ago that DQ is pronounced deck.
13:09 So yeah, double ended.
13:11 Yeah.
13:11 All right.
13:12 So deck.
13:13 Thanks.
13:13 And then Teddy on live stream says, I'm not so familiar with queues, but how would it work if you, your queue process that execute Python code, it would end up being a process, sequentially because of the Python GIL.
13:27 yeah.
13:28 So are you, are you ending up with like a serial process because of this, serial processing?
13:33 I think it depends on just how you create the workers, right?
13:36 So there's two ends that you build.
13:37 One end is the put stuff in the queue.
13:39 Then you literally build the end that goes to the queue and says, give me the next item.
13:43 And that's stored in Redis, which obviously can support multiple clients.
13:47 So if you just scaled out the cons, the consumers of the, the queue messages, the things running the jobs, then you would escape the gill, right?
13:55 Because you would have multiple processes.
13:57 You can do, you can have multiple things feeding the queue as well.
14:01 Yes.
14:01 Multiple web requests or something.
14:03 Yeah, absolutely.
14:04 Absolutely.
14:04 All right, David, what you got for us?
14:07 All right.
14:08 Well, are you, either of you have the pandas users?
14:12 I'm a pandas admirer and I use it a little bit, but I always feel like when I come to the end of the queue,
14:17 I come to pandas, I know there's way more I should be doing with this.
14:19 And this is so cool, but not as much as I should be.
14:22 Well, and I use pandas pretty, pretty heavily in my previous job to do a lot of analysis, especially on the one dimensional data sets.
14:30 And, you know, it always happened.
14:33 When I first started using pandas, I was doing a lot of really bad things like it arose and that type of thing.
14:37 And the more you kind of learn about it, the better you get at doing setup type operations.
14:41 But even, even, you know, in the last, you know, couple of months, you think I'd have everything down.
14:46 But the API is huge.
14:47 And I always had these ah moments because I learned about something like transform.
14:52 And, you know, once I realized what you could do with transform, it simplified so many things that I was doing.
14:56 And the first item I have is an article that says 25 panda functions you didn't know existed.
15:02 And I don't normally like these articles because they almost feel a little bit clickbaity.
15:06 But this one actually had a handful of ah moments for me.
15:09 So I thought I would go ahead and share it.
15:11 So I have them listed in the show notes, kind of the ah moments for me.
15:14 But ah, between is a really nice, ah, really nice.
15:18 Ah, I think it would consider it a method on the data frame or a series and basically allows you to simplify logic instead of trying to say greater than or equal to blank and less than or equal to blank.
15:28 You can just say between values very similar to the operation that you would do in a sequel transaction.
15:34 Ah, Styler, I had no idea existed.
15:37 Ah, you can actually apply styles to the tables coming out of pandas.
15:42 Ah, I do a lot to try to make my notebooks really, really pretty so that I can convert them to HTML or another format and share them with the business.
15:51 The business isn't typically like notebooks, but I'm trying because I can't stand the intermediate step of copying to a PowerPoint.
15:57 But this would definitely help.
16:00 You can do gradients.
16:01 You can may have a bunch of different functions behind that options is another one I kind of played with a little bit.
16:09 Ah, but there's one in here that I wanted to try before the show.
16:12 I hadn't had a chance.
16:12 You can change the graphing back end on pandas from that plot lead to something else.
16:16 So at some point I'm going to try changing it to plot because that's my my preferred plotting library for most things.
16:22 Convert D types is really nice.
16:23 If you know you have a categorical types, a set of information, you can dramatically lose some or reduce how much memory is taken.
16:30 So, mask was a nice, a nice one.
16:33 It basically allows you to quickly convert somewhere down here.
16:37 Quickly converts certain particular values or values that meet a criteria to another value.
16:43 I was doing this oftentimes in multiple stages.
16:47 This would clean up that code significantly.
16:49 Any smallest and any largest also could have been very helpful.
16:53 Essentially, it's similar to like a max or a min, but instead of just pulling a single, you can pull in this case five.
17:01 And a clip at time.
17:04 So like if I want to see the five largest revenue producing customers in my data frame, I could just quick do that.
17:11 Yeah. And there are ways you can like with anything else pandas.
17:14 You could use a couple other methods to get that done, too.
17:17 But it's just so much cleaner to do diamonds and largest five and then price.
17:22 It's just very clean and fast instead of having multiple lines to do a transformation and then a transformation and then another change.
17:28 So I wanted to suggest this article.
17:30 Like I said, I've been doing pandas for a couple of years and I still have these moments and this article.
17:36 Well, some of them are maybe quite on moments for me.
17:39 They may be on moments for someone else because everybody probably knows 20% and maybe a slightly different 20% of the pandas API.
17:46 Yeah, this is really neat.
17:48 I love these types of things that I mean, it's super easy to just scan through and decide whether or not it's it's really helpful to you.
17:54 The one for me, the pandas one that had the biggest like, oh, my goodness, was web scraping and like pulling HTML tables and turning those into data frames.
18:04 So, like, obviously, I can go.
18:06 Yeah, you go with like requests and beautiful soup and do something.
18:10 But then you still end up with just a table of HTML.
18:13 But with pandas, you can say, read HTML and then just give me table three as a data frame.
18:18 Like, it's ridiculous, right?
18:19 Mm hmm.
18:20 Now, pandas has some really nice I/O tools, too, around CSVs, Parquet, the most common data format types and even some of the lesser common ones.
18:30 It's a really nice library overall.
18:32 But yeah, like I said, there's always always some odd moments.
18:35 And it's nice to have a article that highlights several odd moments for me.
18:39 Yeah, super cool.
18:40 So go ahead, Brian.
18:42 The one that jumps right out at me was the number number one one.
18:45 I didn't know that that you could just write Excel with pandas.
18:49 That's pretty cool.
18:50 And I think there's another wrapper around write Excel that kind of simplifies converting a data frame to Excel.
18:58 But I think write Excel lets you do some more more intricate things with Excel.
19:02 Yeah.
19:03 That's pretty cool.
19:04 Yeah, that's super cool.
19:05 All right.
19:05 Before we move on really quick from the live stream, I liked when you ask if anyone uses pandas and likes it.
19:12 Dean Langston just said, yes.
19:14 All caps, beautiful.
19:15 But then also suggested pointed out this project that he built that is a like a give you live tips while you work with pandas and notebooks type thing called Dove Panda.
19:23 So I literally am just checking this out now.
19:26 But as you work with it, you can see here like it gives you like little tips like, oh, by the way, do you know you can concatenate like this?
19:32 If you specify to access one, you get, you know, such and such and gives you a little little tips and tricks as you work with it so people can check that out.
19:39 Yeah.
19:39 Yeah.
19:40 Yeah.
19:40 This is a great time on moments.
19:42 Exactly.
19:43 Exactly.
19:43 Exactly.
19:44 Thanks, Dean.
19:44 Brian, I do love some FastAPI and I love rich and I'm looking forward to what you're going to do by trying to put these together.
19:51 Yeah.
19:52 Well, I was I've been watching.
19:54 Yeah, we've been watching rich, of course, and FastAPI a lot.
19:57 Yeah.
19:58 And so this articles by Hayden Kotelman, I think, and it's a FastAPI and rich tracebacks in development.
20:07 So the idea is that one of the things that cool things that rich has is like these awesome tracebacks and logging.
20:15 They're just beautiful.
20:16 And I mean, if you can say a traceback is beautiful, it's because of rich, probably.
20:20 They look pretty great and the logging is pretty good.
20:24 So the I'm just going to scroll down to some of these examples at the bottom.
20:28 So the it's kind of tiny, but the logging is nice and colorized and stuff.
20:33 And then the the exceptions, one of the things with the tracebacks and exceptions is there's a highlighted line number.
20:39 It highlights the actual file name and kind of puts in lower, you know, more muted colors, the stuff you don't really need to care about right away.
20:48 And it's just kind of a nice way to do it.
20:51 But it gives you syntax highlighting in your like keyword highlighting in your code.
20:56 Yeah.
20:57 And that is the stack trace of a crash in the traceback.
21:00 And so we've we've seen some examples of how to how to how to how to use the rich tracebacks from other programs, but I haven't seen it actually written up by somebody else.
21:12 And so this is nice using FastAPI is fast.
21:17 API is awesome for building web web APIs.
21:20 And but how do you do this?
21:22 How do you get this your application to do this?
21:24 And so I'm not going to scroll through all of this, but the the gist of it is, is there's really only a few steps.
21:31 So this post walks through all of it with all the code and just for the most part, you create a database data class with the logger configuration.
21:41 And then you need a function that will either install rich as a handler or the production log configuration.
21:47 I like that he puts this, this, this switch in place.
21:50 So the idea around this is when you're debugging, you're going to use this, this nice, these nice tracebacks.
21:57 But when you're winning some production, it's not going to use that.
22:00 It's just going to do the, the, the default logging.
22:03 And then you have to call logging basic config with the new settings.
22:07 And then a little note that if you're using UVA corn, you probably want to override the logger for that.
22:13 And that's it really sets it up.
22:15 And it's got all the code in place so that your FastAPI application can have these lovely logs and tracebacks during development.
22:22 Yeah, it's super neat.
22:24 David, are you a fan of either of these frameworks?
22:26 I haven't had a chance to use rich too much.
22:28 I have been watching textual pretty closely on Twitter because it's just phenomenal.
22:32 What he's been able to do.
22:33 Like how, how do you have a docking scrolling side thing in a terminal window?
22:38 What's going on here?
22:38 I mean, I, I do, I love FastAPI.
22:41 I built my, my wife's website using flask.
22:44 And I liked how FastAPI was similar to flask in a lot of ways.
22:48 But, you know, some of the syntax is, was a little bit cleaner.
22:50 Although with the, the newer version of flask, it kind of borrows some of the same syntax.
22:54 And it's just got a lot of really good, cities built in the API documentation was really, I think that's kind of clutch when you're learning a new framework too, because you're not having to do like curl commands or anything like that.
23:06 You can just bring up a webpage and hook at it, you know, visually, which is, which is pretty nice.
23:10 So no, I really like FastAPI.
23:12 I just, you know, other than, you know, kind of building some small toy things, haven't had a really compelling reason to use it yet.
23:18 So yeah.
23:19 Yeah.
23:19 Very cool.
23:19 Toys are compelling reasons.
23:21 I think definitely, definitely.
23:24 Maybe some Arduino thing could run a FastAPI server or who knows.
23:27 All right.
23:28 So let me talk about some good news, good news, good news.
23:31 We've had a couple of things we've covered about some visionary sponsors coming on to support Python and the PSF.
23:39 So on, which is fantastic, right?
23:41 I've certainly whinged a lot about people running, you know, multi-billion dollar revenue companies and doing nothing really to give back other than maybe a PR or something.
23:51 But we've got Microsoft, we've got Bloomberg, we've got Google as visionary sponsors, right?
23:55 And one of the things that that made possible is the CPython developer in residence.
24:02 I don't know if it's directly related to one of those or if it's just sort of like that sort of brought it all together.
24:06 But recently the PSF said they're going to have a developer in residence position and well-known community member, friend of the show, Lucas Lenga has applied and got hired.
24:19 He's now the developer in residence.
24:21 This is a little bit old news for it's from last month, but I wanted to make sure we gave it a quick shout out because I think it's going to be pretty interesting to know that there is a developer side person inside the PSF making sure things are going.
24:34 So the PSF has seven, eight, nine, I don't know, something like this.
24:37 I haven't got recent updates including this, but include this position.
24:41 Full-time employees, right?
24:43 So there's a bunch of people who work there, but to my knowledge, this is the first like developer person rather than marketing, legal, whatever, right?
24:52 All that sort of business director, administrative side.
24:55 Apologies to everybody that works at the PSF.
24:58 That's like, don't forget me.
25:00 Yeah, no, no, no.
25:01 Those are super important, but it's interesting that there's not been a Python developer type of role within that group is all I'm saying.
25:10 So they put that out.
25:12 Lucas Lenga is now part of it.
25:14 And there's some interesting takeaways here.
25:16 So basically, let me just give a bit of a quote here for how Lucas decided to sort of position this and how he sees it.
25:26 He said, I don't really want this to be like, hey, I'm the appointed CEO of Python.
25:31 So listen to what I have to say, right?
25:34 He said, no, he's incredibly hopeful for Python because of this and wanting to apply for it and so on.
25:42 He says, I think it's a role with transformational potential for the project.
25:48 In short, I believe the mission of the developer in residence, the DIR, is to accelerate the developer experience of everybody else.
25:55 And that not includes just the core team, but most importantly, the drive-by contributions, contributors submitting pull requests and creating issues on the tracker.
26:03 So he's hoping that with this role, he can do things like make sure that there's a steady review of the stream of PRs and issues so they don't get stale and there's not a backlog.
26:14 Triage the issues.
26:15 Be present in the official communication channels to unblock people if they get stuck trying to contribute.
26:20 Keeping CI and test suites in a usable state and making them run quick.
26:25 And keeping tabs on where the work is most needed and the projects that are most important.
26:29 So he's sort of the, it sounds to me almost like the technical person in the room to help the community keep moving.
26:36 And just making sure, oh, everyone's having a problem.
26:39 Many people are having a problem trying to do a PR because they can't get CPython to build.
26:43 Let's make that incredibly simple for them and things like that.
26:46 Yeah, I like his attitude of where he's going with this.
26:50 Yep, yep.
26:51 If I didn't point it out, Lucas is also the creator of Black, the Black formatter, which I know we've talked about in 100,000 variations here.
26:59 So that's great.
27:00 David, how do you feel about this?
27:02 I think it's great.
27:03 Any full-time person that can have working for the PSF or on Python directly is going to help increase stability.
27:09 And I like his approach too, where he's going to try to increase throughput by maximizing everybody else's efficiency.
27:16 I think that's a, it'd be easy to say like, oh, I'm going to work on these features or on this, but he's most concerned about making development for Python as ergonomic as possible, which I think ultimately will create more throughput and, you know, a better, better Python in the long run.
27:30 Yeah.
27:30 And absolutely props to the PSF because it's easy to hire somebody and say, here's what I want you to produce for us.
27:37 It's harder to hire somebody and say, I want you to be an enabler of other people because it's hard to measure that, right?
27:43 Mm-hmm.
27:44 Yep.
27:44 One of the interesting things that I think that he's doing is, I'm not sure if he's going to keep this up, but it looks like he has so far, is he puts out weekly report posts of what he's been doing.
27:54 So this, I can't imagine having that much public scrutiny over what my work week looks like.
28:01 But I mean, it's really cool.
28:02 So much time working on CI, come on.
28:06 So it's pretty, pretty, pretty impressive.
28:09 And it's cool that he's, he's doing that.
28:12 That's a, the entire Python world is watching.
28:15 No pressure or anything.
28:16 Yeah.
28:18 He did say he was a little nervous about this because this is the first year of this position.
28:23 And so the success or failure that he has will influence like whether it continues and, you know, what happens sort of in the future.
28:31 So super cool.
28:32 Let me get a little feedback from the audience here.
28:34 So Sam Morley.
28:36 Kate says, good for Lucas.
28:38 He's great.
28:38 I watched a bunch of videos he did on YouTube about making music with asyncio.
28:42 I haven't seen those.
28:43 I have to check them out.
28:44 And Dean out in the live stream says, CEO of Python reminds me of a known joke in my country where this famous newscaster was shouting, get me the person in charge of the internet.
28:54 Get me the person in charge of the internet.
28:58 That's great.
28:59 That's great.
28:59 Dean, you have to let us know what country that is.
29:02 That's awesome.
29:02 All right.
29:03 Brian, you want the next one?
29:04 What's that?
29:05 You're next.
29:06 No, you already did this, right?
29:07 Yeah.
29:07 Yeah.
29:08 David's next.
29:08 I got to keep track of what's happening here.
29:10 David, you're next.
29:11 Okay.
29:11 Yep.
29:12 So my next item is a library or framework.
29:16 I'm not sure which one it falls under called Dagster.
29:19 It is a data orchestrator for machine learning analytics and ETL.
29:24 It's one of the first attempts I tried for any kind of data pipeline.
29:29 And it's based in Python.
29:32 So you programmatically build up your pipeline using Python and, you know, different decorators depending on what kind of if you're building a solid or, you know, depending on what you're building in the pipeline or if you're doing configuration, use different decorators.
29:46 And it took a little bit to kind of wrap my head around it.
29:49 I think it had more to do with the just kind of understanding how pipelines are typically constructed in industry.
29:55 But once I got my head wrapped around it, it was really simple to use.
29:59 I felt like I could produce things pretty quickly.
30:00 One really nice thing that they do is they, you know, allow you to essentially work on your pipeline locally, then deploy to production to like a Kubernetes or you can deploy to Airflow or Dask or, you know, whatever underlining engine you want to run your pipeline.
30:16 And, you know, there's very little transition there.
30:20 You know, you're not developing something local and having to completely change it for, you know, like a cluster or, you know, larger scale.
30:29 So and another really nice feature it has is a UI called Daggett.
30:33 So you could do everything via the command line if you want to, but it does come with a really nice UI that allows you to see an overview of your pipeline.
30:43 It allows you to test it using the playground.
30:46 You can update your configuration in the playground.
30:49 You can look at previous runs to see if they passed or failed.
30:53 It gives detailed logging and error messaging.
30:55 So it's, you know, this by itself is pretty nice on top of an already very nice tool.
31:02 So I can give a quick demo too.
31:06 So this is the, I think it's the first part of the tutorial they have you where you have multiple solids.
31:12 So these represent different, different pieces of processing.
31:16 And then, like I said, you can use the playground.
31:18 It'll check all of your configuration, everything to make sure it's correct before it lets you run anything.
31:23 So if you have something misconfigured, it's not going to blow up halfway through a, you know, a 30 minute job.
31:28 And then when you, oh no.
31:29 Oh, that's nice.
31:29 Like that?
31:30 Oh no.
31:31 That's unfortunate.
31:32 Yeah.
31:32 No.
31:33 So I'll probably, I'll probably forgo the, the real time demonstration.
31:38 I think my terminal probably died is what that was.
31:40 But yeah, it'll actually show a run in sequence and show the different pieces that they're completing and feeding into the other piece too.
31:48 So it's not so much for this because it's a very small, quick pipeline.
31:54 But if you have like longer SQL queries or something like that, it'll actually kind of show in real time, you know, how it's processing.
31:59 So you can kind of get a visual intuition to what's going on on top of everything else too.
32:04 So yeah, a couple of the resources around this too, if you want someone that explains it a little bit better than I do.
32:10 The data engineering podcast had an episode and a software engineering daily also did an episode about Dexter.
32:16 So, you know, that's kind of where I first learned about it.
32:19 And there's a lot of really good information in those podcasts.
32:21 Yeah.
32:21 These data pipeline frameworks are super interesting.
32:26 I've certainly realized just how valuable they can be.
32:28 Dean asks, David, how is this compared to Airflow?
32:32 Do you have any idea?
32:32 Have you tried?
32:33 Have you looked at either?
32:34 Yeah.
32:34 This was, I haven't used Airflow.
32:36 This is the first, my first stab at any kind of data pipeline.
32:39 And in my current job, we're not using Airflow or Dexter.
32:42 We're using one of the cloud-based tools.
32:44 So it's, I think Airflow is more draggy, droppy, more visual, but I could be wrong about that.
32:50 One thing I really liked about Dexter is, at least compared to my, what I'm currently using,
32:56 is that you could programmatically create these interfaces.
32:59 And technically the tool I'm using now has an API that you can throw JSON against to create your different resources and everything.
33:06 But it's nice having Python code because that works a little bit better with my brain than a lot of the draggy, droppy stuff.
33:12 Yeah, yeah.
33:13 I did have the Airflow folks on the show, on Talk Python, not the show, a little while ago.
33:21 It's not out yet, but last week maybe.
33:23 And they pointed out that it's mostly, it's like pretty much all Python here as well.
33:27 So you program it in Python over on Airflow.
33:30 And then you have similar visual tools to actually see what's happening, but you can't interact with it through those things.
33:39 You can just like kind of watch it and debug it and stuff from my understanding.
33:42 So I would put them in a pretty similar category.
33:44 I would say one thing that's pretty interesting is there's, that's not what I would pull out.
33:47 Actually, when Airflow GitHub is what I wanted to sort of point out.
33:51 I was really surprised to learn that Airflow has 22,000 stars on GitHub, which kind of blew my mind.
33:56 I thought of it as like a little framework that people might use.
33:59 Apparently it's popular.
34:00 I'm not really sure about Dagster.
34:02 I guess I could look as well.
34:03 I think it's relatively new.
34:04 So I'd be surprised if it were quite as popular as Airflow.
34:07 But one nice thing that Dagster can do, if you're running, if you're running, or if you have Airflow pipelines that you're using,
34:16 you can use that server to run Dagster too.
34:18 It can basically pilot you something that's compatible with Airflow if you need to do that.
34:23 So there's a couple of different, I think, translation ways you can translate it too.
34:28 So it seems like a pretty interesting tool.
34:31 And like I said, I had developed a small pipeline in my previous job.
34:34 It's kind of my first stab at pipelines to eliminate it in Excel sheet.
34:39 I was doing a bunch of horrible, awful SQL queries.
34:41 I could just imagine that people are trying to do this with Excel and it was probably wrong.
34:45 Not necessarily incorrect, but it was wrong to do it.
34:48 Well, it was interesting.
34:51 Excel is just very interesting to reverse engineering.
34:54 It's a lot of go-to statements.
34:56 It's ubiquitous, but it's definitely, as far as programming production systems, not a good tool.
35:03 Yeah.
35:04 Yeah.
35:04 Very cool.
35:04 All right.
35:05 So I got some more real-time updates here.
35:07 Teddy says, I know one of the big differences with Airflow is that you can use the output of a task as the input of the next task.
35:13 From what I understand, Dagster is kind of a second generation data orchestration.
35:16 Unsure which generation Airflow would be, but here we go.
35:21 And then Airflow mostly assumes you store and load data in each task, even though Airflow has something called XCOM, which allows you to pass the output as input of the next.
35:31 Okay.
35:31 Interesting.
35:32 Yeah.
35:32 Thanks for all that background info there.
35:34 I haven't used either, but I definitely, definitely think they're both neat.
35:37 And I feel there's a lot of places that are just like, well, how else are we going to do it?
35:41 Of course, we're going to use that spreadsheet.
35:42 Right.
35:42 And if they had tools like this, it would be very empowering.
35:45 One of the things I find very interesting about these frameworks is usually what you end up building is like the little piece, like load the CSV into the database or run the report that gets me the revenue for the day.
35:56 And what you end up building are very, very small pieces and you don't have to worry about the reusability, the reproducibility, the durability.
36:04 You just go like, I'm going to build an incredibly small bit of Python and we'll just click it in as part of this workflow, which really seems to empower people almost like the microservices story, but for data processing without all the hard deployment side of things.
36:18 I hope that they, if they don't already have it, I hope that they put a tool connected with Degster called Degnavit because it needs to be there.
36:26 I think maybe some sort of capture tool or something.
36:30 Degnavit would be good.
36:31 Yeah.
36:32 I love the UI bit of it as well.
36:34 All right.
36:34 Quick bit of follow up.
36:36 I guess, Brian, you want to start and you got any extras today?
36:38 I've got just a vanity extra.
36:42 So one of the things that we noticed, Will mentioned about textual.
36:50 We talked about textual briefly.
36:52 The stars on textual is just going through the roof.
36:57 I love the graph.
36:58 What like, is this the XKCD format of Matt Plotlib or something?
37:02 What is this?
37:03 It's a, it's, I have no idea what it is.
37:07 That's great.
37:07 Anyway, show us the other pictures.
37:09 This is the, yeah, the stars are insane.
37:10 It's like a vertical line on a graph.
37:12 One of my, one of my own projects has a similar trajectory.
37:16 So I wanted to just highlight that.
37:18 It's looking up to, of course, I only have 16 stars.
37:23 Will has like 3000.
37:24 A little different, but still look, it's kind of the same.
37:29 Do you think?
37:29 Yeah.
37:30 That's awesome.
37:31 It's 15 stars.
37:33 Most of my repos.
37:34 Hey, you just got to extrapolate out a little bit.
37:37 No, that's, that's really cool.
37:38 Awesome.
37:39 David, do you have any extra stuff you want to throw out?
37:40 Sorry, Brian.
37:41 I had one, one extra.
37:43 I didn't load it on my screen over here.
37:45 Let me see if I can pop it over real quick.
37:48 But I, and this isn't Python, but I know SQL and Python.
37:53 Are you going to go back to some nostalgic time on the internet where you open up a DOS prompt
37:58 and type when to start Windows?
38:00 What is this?
38:00 This, this is a modern SQL.
38:02 It's a really fantastic slideshow that goes through a lot of updates.
38:07 So if you're still doing SQL the old fashioned way, it shows you how you can replace that with,
38:12 you know, better cleaner, more concise versions.
38:16 And there are so many things in here that I have was doing a lot of like just horrible hacky tricks
38:21 to get to work that you could take care of in one line for SQL.
38:24 And that, you know, even with some of the newer things I've learned, like there's just so many,
38:27 so many great, great, you know, I don't know if you call them tools or methods or what,
38:32 but, you know, Python and SQL tend to work together a lot, especially in the data space.
38:37 So if you're kind of like me where you, you have some, some, I guess, self-taught SQL experience,
38:42 something like this can be very helpful, to kind of learn some of the, I guess,
38:46 better practices for, for different things that you might want to try to do with SQL.
38:49 No, this is great because I, I learned SQL like in the nineties.
38:53 So it's changed a lot since then.
38:55 And I was just thinking the same thing, Brian, like it's been at least 10 years since I've tried to
39:00 refresh my SQL skill.
39:02 so there's probably a lot of stuff that's, oh, you, you shouldn't do this.
39:06 Michael, why you do this?
39:07 If you use this other keyword, it's more efficient, safer, faster.
39:10 Come on.
39:10 Yeah.
39:11 That's like it.
39:12 Jealous of the people learning SQL now.
39:15 Yeah.
39:16 How about you, Michael?
39:17 Got anything extras?
39:18 I got some follow-up, some follow-up from last time.
39:21 This comes to us from John Hagan.
39:25 And I think I probably is the one who said this.
39:28 I said, oh, there's really cool timepip.
39:29 I would like about being able to use lowercase d dict and lowercase l list as type hints rather than,
39:36 from typing import capital L list or capital D dict, right?
39:40 He said, oh, that's coming in 310.
39:41 Fantastic.
39:42 He's like, you know, that's in 3.5 or 3.9.
39:44 So it's kind of already out.
39:45 Oh, right.
39:46 Okay.
39:47 But he did point out some things that are coming that are neat.
39:49 So for example, we're previously, we had to say, if I want a potentially optional, it could
39:55 be none or it could be a list.
39:56 And the list, if it is a list has strings, you have to say optional bracket list bracket
40:01 stir.
40:01 And those are all capital because they have this parallel type implementation over in typing,
40:06 right?
40:06 In Python 3.9, I can now say optional of lowercase l list of bracket stir.
40:12 And you might think who cares if it's lowercase or uppercase l?
40:15 Well, the difference is you don't have to do an import and explain to people who don't
40:19 know that code.
40:19 Like, oh, you've got to go import this other type things to say the type.
40:22 Yes, I know list is right there, but you can't use list.
40:24 You got to do something else, right?
40:26 So that's the feature that I was excited about that I said was in 3.10, in 3.9.
40:30 So hooray.
40:31 But he also pointed out that the union operators were simplified.
40:36 It used to be you would have a similar syntax for union as optional.
40:39 You would say union of bracket one thing, comma, bracket the other thing.
40:43 But now you can say just type one pipe vertical bar type two.
40:48 And this actually allows us to model optional without importing optional.
40:51 So instead of optional of list of string, we can just have list of string pipe none.
40:57 Yeah, this is cool.
40:58 And I'm glad somebody pointed out because the 3.10 announcements don't say anything about
41:03 optional.
41:04 But in effect, they do.
41:06 You don't have to use this anymore.
41:08 But are you going to start using this?
41:10 The pipe thing?
41:12 Well, yeah.
41:13 And the optional thing.
41:14 Because I started to.
41:15 And then I realized that if I start using that, then my code is 3.10 only.
41:19 Yes, exactly.
41:20 Which depends on the scenarios, right?
41:24 So for, say, Talk Python training, the code all behind that, I control the server.
41:28 Yeah, nobody's looking at it.
41:30 It's easy for me to make it the brand new thing.
41:32 If I were to say generate, if I were going to build an example app for a course, then I
41:38 would be hesitant to use this right away.
41:39 I might wait a year or two.
41:41 Because I don't want to have to have people have a bad experience.
41:43 Like, well, I have 3.9.
41:44 That's pretty new.
41:45 That should be worth.
41:45 Like, nope, that doesn't work because I didn't want to say the word optional, right?
41:48 Yeah.
41:49 And if it was an open source project, I guess it would depend on if I wanted to support older
41:55 versions.
41:55 Probably even longer there.
41:57 Wait.
41:57 I don't know.
41:58 What do you think?
41:58 Yeah, I was thinking a library specifically, you'd probably want to almost stick with the
42:04 3.5 to 3, at least for a while to kind of flush out people that are using some
42:09 the older versions of Python.
42:10 Yeah, I think 3.9, I'm using 3.9 on everything now, but I think for a lot of people, that's
42:17 still pretty aggressive to have a 3.9 or higher requirement for a library.
42:21 Yeah, I agree.
42:22 A couple of bits of real-time feedback out there.
42:25 Sam and Dean both say there are Dunder future imports that you can do now that will enable
42:31 some of this stuff already.
42:32 So, like, from Dunder future import pipe.
42:35 I don't know if that's true.
42:38 Or if it's a joke.
42:39 Well, I do know that the Dunder future stuff does support the newer type information.
42:45 I don't know about for pipe.
42:47 Okay.
42:47 Yeah.
42:48 Yeah.
42:48 Okay.
42:49 We can do some after coding on this.
42:53 Coding after the recording.
42:54 And we'll know.
42:55 Oh, Dean says he's kidding.
42:58 Yeah.
42:58 So, but you really can.
43:00 Thank you.
43:01 You really can do some of these other type information with the import Dunder futures.
43:06 Okay.
43:06 Are you ready for a joke?
43:09 Yeah.
43:10 All right, Brian.
43:11 So, you're going to have to help me along here.
43:13 Okay.
43:14 So, there's two developers staring very worried at a screen.
43:18 They have one section, then a big, long, quiet section, and then some more.
43:24 So, you be the very first person, and I'll be the second person here.
43:27 Okay.
43:27 Okay.
43:28 I hope it works.
43:29 Do not hope.
43:30 Pray.
43:31 Pray.
43:31 Pray it works.
43:34 Have you ever been there and just in this situation where you're just like, oh, you must.
43:38 It must work.
43:39 If this doesn't work, we're done.
43:40 Yeah.
43:41 Yeah.
43:42 And that's so much on the software side of things.
43:44 But when I was a manufacturing engineer, there was so many times we'd be troubleshooting a machine on a Saturday for eight hours straight.
43:50 And you'd think you made that.
43:51 And everybody's just holding their breath, crossing their fingers.
43:54 It worked.
43:55 It worked.
43:55 It worked.
43:56 It's like you want to go home someday.
43:57 Yeah.
43:58 I remember how.
43:59 Go ahead, Brian.
44:00 No, I definitely feel this when I'm working on C++ code because you have to wait for it to compile and then load it and then test it and stuff like that.
44:11 But even with Python stuff, I still feel this when I'm working on CI tools because the continuous integration, you're not sure if you got it right, the syntax right, the YAML right or whatever until you push it and see what happens.
44:23 Yeah.
44:24 Yeah.
44:24 CI is a good point.
44:25 You have so little visibility in there.
44:27 And if it's not working, there's one bit of real time follow up on mine here.
44:31 It's like if you come over here and you look at the PEP 585, it does say the implementation of some of these new features under typing.
44:40 This is the one that's coming out that came out in 3.9.
44:43 It says you can say from future import annotations and then start using lowercase L and things like that.
44:48 Lowercase D.
44:49 Who knows?
44:50 I know Dean said he was joking, but maybe you really can get the pipe to come out that way.
44:54 But at least you can do like these sort of 3.9 level changes using a back to 3.7, it looks like.
45:02 Okay.
45:02 All right.
45:03 Cool, cool.
45:03 Well, that was a lot of fun.
45:05 Yeah, it was.
45:06 I had another one, but I'm going to save it.
45:07 So.
45:08 Good.
45:08 All right.
45:09 Well, I'm looking forward to hear about it next week.
45:10 David, thank you for joining us.
45:12 Thank you for having me.
45:13 Yeah.
45:14 Yeah.
45:14 And thanks for all the tips and stuff you've had throughout the years.
45:17 And yeah, it's really good to have you here.
45:18 And congratulations on your first dev job.
45:21 That's fantastic.
45:22 That is fantastic.
45:23 And thanks for, thanks Dean for correcting us in real time.
45:29 That's awesome.
45:30 It's good.
45:31 Yeah, absolutely.
45:32 Yeah.
45:33 Thank you everyone.
45:33 And oh, Sam does sadly show us that import pipe from the future doesn't work, but yeah.
45:39 Thanks everyone.
45:40 See y'all later.
45:41 Bye.
45:41 Thank you.
45:42 Thanks for listening to Python Bytes.
45:44 Follow the show on Twitter via at Python Bytes.
45:47 That's Python Bytes as in B-Y-T-E-S.
45:50 Get the full show notes over at Pythonbytes.fm.
45:53 If you have a news item we should cover, just visit Pythonbytes.fm and click submit in the
45:57 nav bar.
45:58 We're always on the lookout for sharing something cool.
46:00 If you want to join us for the live recording, just visit the website and click live stream
46:04 to get notified of when our next episode goes live.
46:07 That's usually happening at noon Pacific on Wednesdays over at YouTube.
46:12 On behalf of myself and Brian Okken, this is Michael Kennedy.
46:15 Thank you for listening and sharing this podcast with your friends and colleagues.