Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #246: Love your crashes, use Rich to beautify tracebacks

Return to episode page view on github
Recorded on Wednesday, Aug 11, 2021.

00:00 Hey there, thanks for listening.

00:01 Before we jump into this episode, I just want to remind you that this episode is brought to you by us over at TalkBython Training and Brian through his pytest book.

00:10 So if you want to get hands on and learn something with Python, be sure to consider our courses over at TalkBython Training.

00:17 Visit them via pythonbytes.fm/courses.

00:20 And if you're looking to do testing and get better with pytest, check out Brian's book at pythonbytes.fm/pytest.

00:28 Enjoy the episode.

00:29 Welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:33 This is episode 246, recorded August 11th, 2021.

00:37 I'm Michael Kennedy.

00:38 - And I'm Brian Okken.

00:39 - And I'm David Smith.

00:40 - Hey, David Smith, welcome.

00:42 So good to have you here.

00:43 - It's good to be here.

00:44 - Yeah, you've been a suggester of topics, I believe.

00:48 You've sent in some ideas and thoughts for us, and well, we're gonna get a good dose of that today, for sure.

00:54 - But honestly, if I'd known that you were gonna open this up, I probably would have ordered some of those, 'cause it was a little bit of a scramble to like, oh yeah, I already gave them that tip.

01:02 So yeah, had to dig a little bit.

01:03 - Yeah, you've already shared all your favorites.

01:05 Well, your losses are gained because you've made it easier for us in the past.

01:09 So thanks for sharing those things.

01:10 And yeah, thanks for being here.

01:13 It's gonna be great to have you.

01:14 - Definitely.

01:15 - Yeah, wanna give the quick elevator pitch on you?

01:17 People, what do people, what do they know about you?

01:19 - Well, I'm a recent tech convert, I'll say.

01:23 Over the last 10 years, I've been working in the manufacturing space, either in quality engineering or manufacturing engineering.

01:28 And over the last couple years, been using Python a lot more heavily.

01:33 Used to do a lot of VBA and Excel, which it was painful.

01:36 And I got a suggestion from one of our equipment suppliers to say, "Hey, use Python, it's really, really nice." I kind of resisted doing it 'cause I didn't want to learn something new.

01:44 It seemed intimidating 'cause it's a programming language.

01:46 I'm not a programmer, but I finally caved when it came to trying to automate plotting, which is pretty painful in Excel.

01:54 And yeah, once I started on it and had something useful working in a couple hours, I was hooked.

02:00 And then I started looking for more and more resources, found your show and got more and more into it from there.

02:05 Started digging into the web and it's just been a, I'd say an upward spiral from there.

02:09 And about, probably about two and a half weeks ago, I started in my first, I guess, official tech role in a similar kind of domain as a foreign automotive supplier doing engineering work. - Fantastic.

02:21 - So it's been really exciting to be able to use Python full time is part of my job because, you know, the bits of times I got to use Python before, that's always the parts I like the most.

02:31 So I'm happy to be doing it, you know, on purpose.

02:33 - Awesome. - Yeah, me too.

02:34 - Congratulations.

02:35 - I wish I could do it full time.

02:37 - I remember my first full time software development job.

02:41 I was like, I can't believe they're paying me to do this.

02:43 I better figure this stuff out before they fire me.

02:45 I can't believe I'm doing this.

02:46 It was so great.

02:47 - Yeah. - So good.

02:48 All right, well, congratulations and happy to have you here.

02:50 Brian, I feel like we should document this.

02:52 - Definitely should document it.

02:53 And test our docs too.

02:55 So one of the things I'd like to tie, did I just try to edit?

02:59 There we go.

03:00 Something that came up recently was Vincent Warmerdam.

03:04 I think we've had him on the show.

03:06 - Mm-hmm, we have. - A couple episodes ago, yeah.

03:08 - Yeah, so Vincent announced that he's got a library called MakeTestDocs, and I kind of love this.

03:17 So the idea is it's a bunch of utilities that you can use to help test your documentation.

03:25 It doesn't do it right out of the box.

03:27 You have to create your own test files to do this.

03:31 But the idea, like the first example that he shows on his readme is that you've got a Markdown file and it's got some Python blocks and code blocks in it.

03:45 And you can make a test that goes through, reads the Markdown, grabs the Python code and runs it.

03:51 and if there's any problems with it, if there's any exceptions, it fails the test.

03:57 This is just brilliant.

03:59 There's examples in here for doing it with docstrings and even class docstrings.

04:04 And then Vincent even did, he does the com code and he did a little com code video on how to use this.

04:13 - Yeah, and you're putting that in the show notes for people, right, to check out?

04:16 - Yep, there's a link to the tutorial with the video.

04:20 The suggestion, or the use case that he was talking about at first was that maybe you're using make docs for documentation, therefore you've got a bunch of markdown.

04:29 But my use case is going to be blogs.

04:33 So I write--

04:34 - Yeah, I think that's a huge use case, actually.

04:36 - Yeah, I've got Python code in my blog source code.

04:40 It's markdown files.

04:42 I totally want-- that's one of my to-do list is to try this to make sure that the blog content is accurate.

04:48 - That is super cool.

04:49 You know, one more thing that you might find interesting, I think this is a more true software engineering type of solution, but another sort of whizzy wig as you work style of solution is PyCharm.

05:03 If you have a Markdown file and you have Python code in there, will highlight the errors and actually show you if like symbols are missing and stuff.

05:11 So if you had the Markdown associated with the sample code and then you like do stuff with your little examples, it may actually show you the errors live as well.

05:20 - Oh, that's cool.

05:21 - Yeah, I mean, that's not like a CI sort of keep it fixed, but that's a as you type kind of thing.

05:25 - Yeah, and the other comment that he had is, I normally don't put like asserting things are valid in documentation, but the comment in the readme is that if you put asserts in there, it'll get checked also.

05:39 So you've got like unit tests built into your documentation.

05:43 - Super cool. David, what do you think?

05:45 - It's interesting.

05:46 just trying to figure out, are you doing a parameterized test and looking at your inputs versus outputs for the code that's in the documentation?

05:54 Or how do you actually know it's testing correctly?

05:57 Is it a valid Python?

05:59 The little code snippet we've got in that we're showing on the screen in the chat, but also there's a link in the show notes to the readme, the parameterize is that it uses, like in this example, I'm saying, go look in my docs folder, and for everything that it finds in there that's a markdown file, that'll show up as a parameterization of the test.

06:27 If I've got this test will run once per file.

06:31 I've got three markdown files in there, the test will run three times.

06:36 >> This is the most comprehensive and yet extremely short test I've seen.

06:40 Really long times, three lines, and it will basically work, traverse a tree of Markdown file hierarchy type thing.

06:48 - Oh, I do tons of really tiny tests.

06:50 So yeah.

06:51 - Yeah, nice, nice, nice.

06:53 All right, Avaro, welcome to the live stream.

06:55 Happy to have you here.

06:56 Let's see, let's move on to the next one.

06:58 I think speaking of users giving us, our listeners giving us ideas and helping us out here, I wanna talk about something that I've been hanging on to for a little while, since March, but I finally decided it's time to talk about it.

07:12 And that is creating queues out of process, sort of asynchronous queue processing.

07:19 So if I've got say a web app or an API, or even if I'm testing a bunch of the hardware and I wanna kick off a bunch of jobs, eventually I don't wanna necessarily block on all of them.

07:31 I might wanna push them down so other things can work on them.

07:35 You know, if I'm gonna send a bunch of emails, if you've ever tried to send a thousand emails in order synchronously.

07:40 It turns out that times out your web request.

07:42 Don't do that.

07:43 So a better idea would be to like push them to a queue and have some sort of background process go, oh, there's new emails to send.

07:48 Let me jam those on down the line.

07:50 So Scott Hacker sent over this pointer to this library, that small but cool little one called, well, they just called it QR3.

07:59 And QR3 is a Q for Redis.

08:02 And the three means Python three 'cause there used to be a QR that wasn't three, that's not Python three compatible.

08:07 So here's like a re-imagining of that for Python 3 or just a compatibility that got moved over.

08:14 So it's pretty cool.

08:16 We check it out, the API and implementation or the usage is quite simple as you could imagine.

08:22 So all you gotta do is you gotta, it's built upon Redis Pi.

08:26 You've gotta have Redis installed.

08:27 That could be, you know, wherever.

08:29 Even be Redis as a service on some of these cloud platforms or in Docker, run it locally.

08:33 You have Redis Pi and then you just go over and you create a queue.

08:38 So you just say queue and you give it a name and then some server connect info like location, authentication and whatnot.

08:45 And then all you've got to do is you push items to it.

08:47 They could be just really simple things like a bunch of email addresses you're gonna send, but it could also be really complicated.

08:53 Like for example, it could be say, Pydantic models that store all the data that you need to process that request.

09:00 So that's pretty cool.

09:01 It has the default way of getting data over to it through CPickle and CPickle is better than Pickle but still has issues and other restrictions.

09:12 Some of the restrictions are you can't put certain types of objects like it wouldn't make sense to serialize a database connection that has an open socket or a thread or some weird thing like that right but most of the sort of message here's the data you need to process the UITS and over all that stuff at work and you can also create your own serial serializer on a per-queue basis which which is kind of cool.

09:35 So if you said, I want to only work with identic models, you could put the sort of from dictionary to dictionary transformation with the validation and all that kind of stuff.

09:44 I personally would not use CPQL because one of the things you can run into is if you upgrade your version of Python on one server, but not the other, 'cause you're in the process of going from one to the other.

09:55 And some thing has a different structure and memory and gets put over there.

09:59 The other ones can't read it.

10:00 Like there's always these challenges of pure binary matches.

10:03 I don't know, I would do that probably serialize as JSON or something and serialize it back.

10:08 But anyway, it's pretty cool. What do you guys think?

10:10 This looks nice. I actually haven't used queues in Python before, but it's on my to-do list.

10:15 Because I mean, designing complex systems, breaking it up into different processes with queues back and forth is a cool way to do it.

10:23 Yeah, I'm kind of inspired by this.

10:25 I kind of want to do more stuff with queues as well. David?

10:28 Oh, it seems like a really clean, simple way to use queues.

10:30 I'm with Brian, I haven't really used it in a Python context before, but like the examples you gave are perfect.

10:36 You know, emails are, they take a long time, so you don't want to be binding up your main application, you need to dump those off into a background task.

10:43 And this looks really, really simple to use.

10:45 So, you know, I seem like it'd be worth a try for sure.

10:49 Yeah, for sure.

10:50 Other things are like you need to generate a report that takes 30 seconds, you know, kick off the generation and then see if it's in the database and just do some sort of like Ajax pull until it's there or whatever.

11:00 It has some more features.

11:02 So it has a queue, which is first in first out, as you can imagine.

11:05 It has a capped, I call it a capped collection.

11:08 I feel like it should be capped queue because it's implemented behind the scenes as a capped collection.

11:13 They also say a bounded queue is another AKA.

11:16 So the idea is if you're doing like analytics and logging and you're trying to eventually process that and save it to the database, but you want to say, you know what, we really don't want this queue to get more than a hundred thousand items at a time 'cause we should be writing this to the database and if something goes wrong, it can completely wreck the server.

11:33 So you increase these capped queues where you're like, I'm gonna start throwing away old stuff if we don't get to it in time.

11:37 Here's a DQ, which to me sounds like getting stuff out of a queue, but oh no, it's a double-ended queue.

11:42 A double-ended queue, it should be a, yeah.

11:45 Anyway, it should be, the idea is you can basically put stuff onto the front or the back and you can pop stuff off the front and the back.

11:53 So you could, for example, put low priority items on the back or something's really important, you could kick it up to the front or right to the front of the queue.

12:01 And then finally, you also do a stack.

12:04 You can also do a priority queue, which is like sort of pretty close to what I described, but you can't jump ahead of the things that have a similar priority, right?

12:12 Like if there's super urgent and then low, you can put like a super urgent new thing at the front of the super urgent ones, but it would appear before all the others, things like that.

12:21 So this is all pretty neat.

12:23 What I really like about this is obviously Python has queues built in, right?

12:27 Like that's just a data type.

12:28 List itself could basically be a queue.

12:30 You can pop stuff off the front and shazam, you have a queue.

12:33 But this is out of process, right?

12:35 This means if you have to scale out for your worker processes in any sort of API, or you want it to be able to be durable across app restarts, things like that.

12:44 And if you think, oh, I'm not gonna scale out across, I'm not having multiple servers.

12:48 Like almost every Python web app and web API runs with multiple worker processes at a minimum.

12:52 So yeah, you're scaling up.

12:54 Anyway, I think this is pretty useful.

12:55 And if you're all about Redis, this is cool.

12:57 Redis seems nice.

12:58 I'm kind of inspired to do something like this with MongoDB, but I'm also busy.

13:01 So probably not right away.

13:03 And John Sheehan out there in the live stream is telling me that learned a few years ago that DQ is pronounced deck.

13:10 So yeah, double, yeah.

13:12 All right, so deck, thanks.

13:14 And then Teddy out in live stream says, I'm not too familiar with queues, but how would it work if you were a queue process that execute Python code, it would end up being a process sequentially because of the Python gill?

13:27 So are you ending up with like a serial process because of this, a serial processing?

13:33 I think it depends on just how you create the workers, right?

13:36 So there's two ends that you build.

13:38 One end is the put stuff in the queue, then you literally build the end that goes to the queue and says, give me the next item.

13:43 And that's stored in Redis, which obviously can support multiple clients.

13:47 So if you just scaled out the consumers of the queue messages, the things running the jobs, then you would escape the gill, right?

13:56 Because you would have multiple processes.

13:57 You can do, you can do multiple things, feeding the queue as well.

14:00 Yes.

14:01 Yeah.

14:01 Multiple web requests or something.

14:03 Yeah, absolutely.

14:04 Absolutely.

14:05 All right, David, what you got for us?

14:07 All right.

14:08 Well, are you, either of you have the pandas users?

14:12 I'm a pandas admirer and I use it a little bit, but I always feel like when I I come to pandas, I know there's way more I should be doing with this.

14:19 And this is so cool, but not as much as I should be.

14:23 - And I use pandas pretty heavily in my previous job to do a lot of analysis, especially on the one dimensional data sets.

14:30 And it always happened.

14:33 When I first started using pandas, I was doing a lot of really bad things like it arose and that type of thing.

14:37 And the more you kind of learn about it, the better you get at doing set of type operations.

14:41 But even in the last couple of months, you'd think I'd have everything down, But the API is huge.

14:47 And I always had these ah moments because I learned about something like transform.

14:52 And, you know, once I realized what you could do with transform, it simplified so many things that I was doing.

14:57 And the first item I have is an article that says 25 Panda functions you didn't know existed.

15:02 I don't normally like these articles because they almost feel a little bit clickbaity.

15:06 But this one actually had a handful of moments for me.

15:09 So I thought I would go ahead and share it.

15:11 So I have them listed in the show notes, kind of the moments for me.

15:14 But between is a really nice, I think it would be considered a method on the data frame or a series, and basically allows you to simplify logic instead of trying to say greater than or equal to blank and less than or equal to blank, you can just say between values very similar to the operation that you would do in a SQL transaction.

15:35 Styler, I had no idea existed, you can actually apply styles to the tables coming out of pandas.

15:42 I do a lot to try to make my notebooks really, really pretty so that I can convert them to HTML or another format and share them with the business. The business isn't typically like notebooks, but I'm trying because I can't stand the intermediate step of copying to a PowerPoint. But this would definitely help. You can do gradients, you can may have a bunch of different functions behind that.

16:05 Options is another one I've kind of played with a little bit.

16:09 But there's one in here that I wanted to try before the show I hadn't had a chance, you can change the graphing back end on pandas, from app plotlib to something else.

16:16 So at some point, I'm going to try changing it to plot because that's my preferred plotting library for most things.

16:22 ConvertDtypes is really nice if you know you have a categorical type set of information, you can dramatically reduce how much memory is taken.

16:30 Mask was a nice one, it basically allows you to quickly convert somewhere down here, quickly convert certain particular values or values that meet a criteria to another value.

16:43 I was doing this oftentimes in multiple stages, this would clean up that code significantly.

16:49 Any smallest and any largest also could have been very helpful. Essentially, it's similar to like a max or a min, but instead of just pulling a single, you can pull, in this case, five. And clip at time.

17:04 So like, if I want to see the five largest revenue producing customers in my data frame, I could just quick do that. Yeah.

17:11 Yep. And there are ways you can, like with anything else, pandas, you could use a couple other methods to get that done too. But it's just so much cleaner to do diamonds in largest five and then price is just very clean and fast instead of having multiple lines to do a transformation and then a transformation and then another chain. So I wanted to suggest this article, like I said, I've been doing pandas for a couple years, and I still have these moments.

17:35 And this article, well, some of them are maybe quite odd moments for me, they may be odd moments for someone else, because everybody probably knows 20%, and maybe a slightly different 20% of the pandas API.

17:46 Yeah, this is really neat. I love these types of things that I mean, it's super easy to just scan through and decide whether or not it's really helpful to you.

17:54 The one for me, the pandas one that had the biggest like, oh my goodness, was web scraping and like pulling HTML tables and turning those into data frames.

18:04 So like, obviously I can go, yeah, you go with like requests and beautiful soup and do something but then you still end up with just a table of HTML but with pandas you can say, read HTML and then just give me table three as a data frame. Like it's ridiculous, right?

18:20 Now pandas has some really nice I/O tools to around CSVs, Parquet, most, the most common data, data format types, and even some of the lesser common ones.

18:30 It's a really nice library overall. But yeah, like I said, there's always some aww moments and it's nice to have an article that highlights several moments for me.

18:39 Yeah, super cool.

18:40 So, go ahead, Brian.

18:42 The one that jumps right out at me was the number one one, I didn't know that, that you could just write Excel with pandas. That's pretty cool.

18:51 And I think there's another wrapper around write Excel that kind of simplifies converting a data frame to Excel, but I think write Excel lets you do some more more intricate things with Excel.

19:02 Yeah, that's pretty cool.

19:03 Yeah, that's super cool. Alright, before we move on really quick from the live stream, I liked when you asked if anyone uses pandas and likes it, Dean Langston just said, yes, all caps, beautiful, but then also suggested, pointed out this project that he built that is a like a give you live tips while you work with pandas and notebooks type thing called Dove Panda.

19:23 So I literally I'm just checking this out now. But as you work with it, you can see here like it gives you like little tips like, oh, by the way, do you know you can concatenate like this? If you specified access one, you get, you know, such and such and gives you a little little tips and tricks as you work with it. So people can check it out. Yeah. Yeah. Moments. Exactly. Exactly. Thanks, - 16, Brian, I do love some FastAPI and I love Rich and I'm looking forward to what you're gonna do by trying to put these together.

19:51 - Yeah, well, I've been watching, we've been watching Rich, of course, and FastAPI a lot.

19:58 And so this article's by Hayden Kodelman, I think, and it's FastAPI and Rich Tracebacks in Development.

20:08 So the idea is that one of the things, the cool things that Rich has is like these awesome tracebacks and logging, they're just beautiful.

20:16 And I mean, if you can say a traceback is beautiful, it's because of rich probably.

20:20 They look pretty great and the logging is pretty good.

20:24 So the, I'm just gonna scroll down to some of these examples at the bottom.

20:28 So the, oh, it's kind of tiny, but the logging is nice and colorized and stuff.

20:33 And then the exceptions, one of the things with the tracebacks and exceptions is there's a highlighted line number.

20:39 It highlights the actual file name and puts in lower, more muted colors, the stuff you don't really need to care about right away.

20:49 It's just a nice way to do it.

20:51 >> It gives you syntax highlighting in your keyword highlighting in your code.

20:56 >> Yeah.

20:57 >> That is the stack trace of a crash in the traceback.

21:01 >> We've seen some examples of how to use the rich tracebacks from other programs, but I haven't seen it actually written up by somebody else.

21:12 This is nice. Using FastAPI, FastAPI is awesome for building web APIs.

21:20 But how do you do this? How do you get your application to do this?

21:25 I'm not going to scroll through all of this, but the gist of it is, there's really only a few steps.

21:31 This post walks through all of it with all the code.

21:35 For the most part, you create a database, our data class with the logger configuration.

21:41 Then you need a function that will either install rich as a handler or the production log configuration.

21:47 I like that he puts this switch in place.

21:50 The idea around this is when you're debugging, you're going to use these nice tracebacks.

21:57 But when it's in production, it's not going to use that. It's just going to do the default logging.

22:03 Then you have to call logging basic config with the new settings, And then a little note that if you're using UVA Corn, you probably want to override the logger for that.

22:13 And that's it. It really sets it up and it's got all the code in place so that your FastAPI application can have these lovely logs and tracebacks during development.

22:22 Yeah, that's super neat.

22:24 David, are you a fan of either of these frameworks?

22:26 I haven't had a chance to use Rich too much. I have been watching textual pretty closely on Twitter because it's just phenomenal what he's been able to do.

22:34 How do you have a docking scrolling side thing in a terminal window? What's going on here?

22:39 I do. I love FastAPI. I built my wife's website using flask. And I liked how FastAPI was similar to flask in a lot of ways. But, you know, some of the syntax is was a little bit cleaner, although with the newer version of flask, it kind of borrows some of the same syntax. And it's just got a lot of really good nested built in the API documentation was really, I think that's kind of clutch when you're learning a new framework to because you're not having to do like curl commands or anything like that you can just bring up a web page and poke at it, you know, visually, which is which is pretty nice. So no, I really like fast. I just, you know, other than, you know, kind of building some small toy things haven't had a really compelling reason to use it yet. So yeah, yeah, very cool. Toys are compelling new reasons, I think.

23:23 >> Definitely. Maybe some Arduino thing could run a FastAPI server, who knows?

23:27 So let me talk about some good news.

23:30 Good news. We've had a couple of things we've covered about some visionary sponsors coming on to support Python and the PSF, so on, which is fantastic.

23:41 I've certainly whinged a lot about people running multi-billion dollar revenue companies and doing nothing really to give back than maybe a PR or something.

23:51 But we've got Microsoft, we've got Bloomberg, we've got Google as visionary sponsors, right?

23:56 And one of the things that that made possible is the CPython developer in residence.

24:02 I don't know if it's directly related to one of those or if it's just sort of like that, sort of brought it all together.

24:06 But recently the PSF said they're gonna have a developer in residence position.

24:12 And well-known community member, friend of the show, Lucas Lenga, has applied and got hired.

24:19 He's now the developer in residence.

24:21 This is a little bit old news for it's from last month, but I wanted to make sure we gave it a quick shout out because I think it's going to be pretty interesting to know that there is a developer side person inside the PSF making sure things are going.

24:34 So the PSF has seven, eight, nine, I don't know, something like this.

24:37 I haven't got recent updates, including this, but include this position.

24:41 Full time employees. Right.

24:43 So there's a bunch of people who work there.

24:45 But to my knowledge, this is the first like developer person rather than marketing, legal, whatever, right?

24:52 All that sort of business director, administrative side.

24:55 So this is pretty interesting.

24:56 Apologies to everybody that works at the PSF.

24:58 That's like, don't forget me.

25:00 Yeah, no, no, no.

25:01 Those are super important, but it's, it's interesting that there's not been a Python developer type of role within that group is all I'm saying.

25:10 so they put that out.

25:12 LucasLinga is now part of it.

25:14 And there's some interesting takeaways here.

25:16 So basically, let me do the, just give a bit of a quote here for how Lucas decided to sort of position this and how he sees it.

25:26 He said, I don't really want this to be like, hey, I'm the, you know, the appointed CEO of Python.

25:31 So listen to what I have to say, right?

25:34 That now, he's in, it's incredible hope, incredibly hopeful for Python because of this and wanted to apply for it.

25:42 and so on, he says, I think it's a role that with a role with transformational potential for the project. In short, I believe the mission of the developer in residence, the DIR is to accelerate the developer experience of everybody else.

25:55 And that not includes just the core team, but most importantly, the drive by contributions, contributors submitting pull requests and creating issues on the tracker.

26:03 So he's hoping that with this role, he can do things like make sure that there's a steady review of the stream of PRs and issues so they don't get stale and there's not a backlog, triage the issues, be present in the official communication channels to unblock people if they get stuck trying to contribute, keeping CI and test suites in a usable state, making them run quick, and keeping tabs on where the work is most needed in the projects that are most important.

26:29 So he's sort of the, it sounds to me almost like the technical person in the room to help the community keep moving and just making sure Oh, everyone's having a problem.

26:39 Many people having a problem trying to do a PR because they can't get CPython to build. Let's make that incredibly simple for them and things like that.

26:46 Yeah, I like his attitude of where he's going with this.

26:50 So, yeah, yeah. If I didn't point out, Lucas is also the creator of black, the black formatter, which I know we've talked about in 100,000 variations here. So that's great.

27:00 David, how do you feel about this?

27:02 I think it's great any any full time person that can have working for the PSF or on Python directly is gonna help increase stability and I like his approach to where he's going to try to increase throughput by maximizing everybody else's efficiency. I think that's a it'd be easy to say like, oh, I'm going to work on these features or on this, but he's most concerned about making development for Python as ergonomic as possible, which I think ultimately will create more throughput and you know, a better better Python in the long run.

27:30 Yeah, and absolutely props to the PSF because it's easy to hire somebody and say, here's what I want you to produce for us.

27:37 It's harder to hire somebody and say, I want you to be an enabler of other people because it's hard to measure that.

27:44 One of the interesting things that I think that he's doing is, I'm not sure if he's going to keep this up, but it looks like he has so far, is he puts out weekly report posts of what he's been doing.

27:55 I can't imagine having that much public scrutiny over what my work week looks like, but I mean, >> Brian, why do you spend so much time working on CI?

28:04 Come on.

28:05 >> It's pretty impressive and it's cool that he's doing that.

28:13 The entire Python world is watching, no pressure or anything.

28:17 >> Yeah, he did say he was a little nervous about this because this is the first year of this position.

28:23 The success or failure he has will influence whether it continues and what happens in the future.

28:31 So super cool.

28:32 Let me get a little feedback from the audience here.

28:34 So Sam Orlehate says, "Good for Lucas.

28:38 He's great. I watched a bunch of videos he did on YouTube about making music with AsyncIO." I haven't seen those. I'll have to check them out.

28:44 And Dean out in the live stream says, "CEO of Python reminds me of a known joke in my country where this famous newscaster was shouting, 'Get me the person in charge of the internet.

28:55 Get me the person in charge of the internet.'" That's great.

29:00 Dean, you also let us know what country that is. That's awesome.

29:02 All right, Brian, you're with the next one?

29:04 What's that?

29:05 You're next.

29:06 No, you already did this, right?

29:07 Yeah, David's next.

29:08 I got to keep track of what's happening here.

29:10 David, you're next.

29:11 Okay.

29:12 So my next item is a library or framework. I'm not sure which one it falls under called Dagster.

29:19 It is a data orchestrator for machine learning analytics and ETL.

29:24 It's one of the first attempts I tried for any kind of data pipeline.

29:30 And it's based in Python. So you programmatically build up your pipeline using Python and be, you know, different decorators depending on what kind of if you're building a solid, or, you know, depending on what you're building in the pipeline, or if you're doing configuration use different decorators.

29:47 And it took a little bit to kind of wrap my head around it. I think it had more to do with the just kind of understanding how pipelines are typically constructed in industry. But once I got my head wrapped around, it was really simple to use, I felt like I could produce things pretty quickly.

30:01 One really nice thing that they do is they, you know, allow you to essentially work on your pipeline locally, then deploy to production to like a Kubernetes, or you can deploy to Airflow or Dask or you know, whatever underlying engine you want to run your pipeline and the, you know, there's very little transition there.

30:21 you know, you're not developing something local and having to completely change it for, you know, like a cluster, you know, larger scale. So and another really nice feature it has is a UI called Daggett. So you could do everything via the command line if you want to, but it does come with a really nice UI that allows you to see an overview of your pipeline, it allows you to test it using the playground, you can update your configuration in the playground, you can look at previous runs to see if they pass or fail to give detailed logging and error messaging. So it's, it's got, you know, this this by itself is pretty, pretty nice on top of an already already very nice tool. So I can give a quick demo too. So this is the I think it's the first part of this work tutorial they have you where you have multiple solids. So these represent different, different pieces of processing. And then like I said, said, you can use the playground, it'll check all of your configuration, everything to make sure it's correct before it lets you run anything. So if you have something misconfigured, it's not going to blow up halfway through a, you know, a 30 minute job. And then when you like that, Oh, no, no, no, so I'll probably I'll probably forgo the, the real time demonstration.

31:38 I think my terminal probably died is what that was. But yeah, it will actually show a run in sequence and show the different pieces as they're completing and feeding into the other piece too.

31:48 So it's not so much for this because it's a very small, quick pipeline.

31:54 But if you have like longer SQL queries or something like that, it'll actually kind of show in real time, you know how it's processing. So you can kind of get a visual intuition to what's going on, on top of everything else too. So yeah, there are a couple of the resources around this too, if you want someone that explains a little bit better than I do.

32:10 The data engineering podcast had an episode and software engineering daily also did an episode about Dagster.

32:16 So, you know, that's kind of where I first learned about it.

32:19 And there's a lot of really good information in those podcasts.

32:21 Yeah, these data pipeline frameworks are super interesting. I've certainly realized just how valuable they can be.

32:28 Dean asks, David, how is this compared to Airflow? Do you have any idea?

32:32 Have you tried? Have you looked at either?

32:34 This was, I haven't used Airflow. This is the first, my first stab at any kind of data pipeline. And in my current job, we're not using Airflow or Dagster, we're using one of the cloud based tools. So it's, I think Airflow is more draggy, droppy, more visual, but I could be wrong about that. One thing I really liked about Dagster is at least compared to my what I'm currently using is that you could programmatically create these interfaces and technically the tool I'm using now has a an API that you can throw JSON against to create your different resources and everything but it's nice having Python code because that works a little bit better with my brain than a lot of the draggy droppy stuff.

33:13 I did have the Airflow folks on the show, on Talk Python not the show, a little while ago.

33:21 It's not out yet but last week maybe.

33:23 And they pointed out that it's mostly, it's like pretty much all Python here as well.

33:27 So you program it in Python over on Airflow and then you have similar visual tools to actually see what's happening but you can't interact with it through those things.

33:39 You can just like kind of watch it and debug it and stuff from my understanding.

33:42 So I would put them in a pretty similar category.

33:44 I would say one thing that's pretty interesting is there's, that's not what I would pull up.

33:47 Actually, when Airflow GitHub is what I wanted to sort of point out.

33:51 I was really surprised to learn that Airflow has 22,000 stars on GitHub which kind of blew my mind.

33:56 I thought of it as like a little framework that people might use.

33:59 Apparently, it's popular. I'm not really sure about Dagster.

34:02 I guess I could look as well.

34:03 I think it's relatively new, so I'd be surprised if it were quite as popular as Airflow. But one nice thing that Dagster can do if you're running, if you're running, or if you have Airflow pipelines that you're using, you can use that server to run Dagster 2. It can basically pilot to something that's compatible with Airflow if you need to do that. So there's a couple different I think translation ways you can translate it to so it's it seems like a pretty interesting tool. And like I said, I had developed a small pipeline in my previous job is kind of my first stab at pipelines to to eliminate it in Excel sheet that was doing a bunch of horrible, awful SQL queries. I can just imagine that people are trying to do this with Excel and it was probably wrong.

34:45 It was not necessarily incorrect, but it was wrong to do it.

34:48 Well, it was it was interesting.

34:52 Excel's just very interesting to reverse engineering.

34:54 It's a lot of go-to statements.

34:56 It's ubiquitous, but it's definitely, as far as programming production systems, not a good tool.

35:02 Yeah, very cool.

35:04 All right, so I got some more real-time updates here.

35:07 Teddy says, "I know one of the big differences with Airflow is that you can use the output of a task as the input of the next task.

35:13 From what I understand, Dagster is kind of a second generation data orchestration." Unsure which generation Airflow would be, but here we go.

35:21 And then Airflow mostly assumes you store and load data in each task, even though Airflow has something called Xcom, which allows you to pass the output as input of the next.

35:31 Okay, interesting.

35:32 Yeah, thanks for all that background info there.

35:34 I haven't used either, but I definitely think they're both neat.

35:37 And I feel there's a lot of places that are just like, "Well, how else are we going to do it?

35:41 Of course, we're going to use that spreadsheet." Right? And if they had tools like this, it would be very empowering.

35:45 One of the things I find very interesting about these frameworks is usually what you end up building is like the little piece, like load the CSV into the database or run the report that gets me the revenue for the day or, and what you end up building are very, very small pieces. And you don't have to worry about the reusability, the reproducibility, the durability, you just go like, I'm going to build an incredibly small bit of Python, and we'll just click it in as part of this workflow, which really seems to empower people almost like the microservices story, but for data processing, without all the the hard deployment side of things.

36:18 - I hope that they, if they don't already have it, I hope that they put a tool connected with Degster called Dagnabit, 'cause it needs to be there, I think.

36:28 Maybe some sort of capture tool or something.

36:30 Dagnabit would be good.

36:31 - Yeah, yeah, I love the UI bit of it as well.

36:34 All right, quick bit of follow-up.

36:36 I guess, Brian, you wanna start?

36:37 You got any extras today?

36:39 - I've got just a vanity extra.

36:42 So one of the things that we noticed, Will mentioned about Textual, we talked about Textual briefly.

36:53 The stars on Textual is just going through the roof.

36:57 I love the graph.

36:59 Is this the XKCD format of Matplotlib or something?

37:03 What is this?

37:04 It's a-- I have no idea what it is.

37:07 But it's--

37:07 Anyway, show us the other pictures.

37:09 Yeah, the stars are insane.

37:10 It's like a vertical line on a graph.

37:12 One of my one of my own project has a similar trajectory, so I wanted to just highlight that it's looking up to, of course, I only have 16 stars.

37:23 Will has like 3000 a little different, but still look, it's done kind of the same day, I think.

37:29 Yeah, that's awesome.

37:31 It's been seen stars most of my repost.

37:34 So you just got to extrapolate it a little bit.

37:37 No, that's really cool.

37:38 Awesome.

37:39 David, do you have any extra stuff you want to throw out?

37:41 Sorry, Brian.

37:41 Yeah, I had one extra.

37:43 I didn't load it on my screen over here.

37:45 Let me see if I can pop it over real quick.

37:48 But I...

37:50 And this isn't Python, but I know SQL and Python tend to play a lot together.

37:53 Are you going to go back to some nostalgic time on the internet where you open up a DOS prompt and type win to start Windows? What is this?

38:00 This is a modern SQL.

38:02 It's a really fantastic slideshow that goes through a lot of updates.

38:07 So if you're still doing SQL the old fashioned way, it shows you how you can replace that with, you know, better, cleaner, more concise versions of them.

38:16 There's so many things in here that I have was doing a lot of like, just horrible hacky tricks to get to work that you could take care of in one line for SQL.

38:24 And, you know, even with some of the newer things I've learned, like, there's just so many, so many great, great, you know, I don't know if you call them tools or methods or what, but, you know, I found in a SQL tend to work together a lot, especially in the data space. So if you're kind of like me where you have some, I guess, self-taught SQL experience, something like this can be very helpful to kind of learn some of the, I guess, better practices for different things you might want to try to do with SQL.

38:50 No, this is great because I learned SQL like in the 90s. So it's changed a lot since then.

38:55 And I was just thinking the same thing, Brian, like, it's been at least 10 years since I've tried to refresh my SQL skill So there's probably a lot of stuff that's, "Oh, you shouldn't do this." Like, "Why you do this?" If you use this other keyword, it's more efficient, safer, faster.

39:10 Come on.

39:11 - Yeah.

39:11 - That's like a--

39:12 - Jealous of the people learning SQL now.

39:15 - Yeah.

39:16 - How about you, Michael?

39:17 Got anything extras?

39:18 - I got some follow-up, some follow-up from last time.

39:22 This comes to us from John Hagan.

39:25 And I think I probably is the one who said this.

39:28 I said, "Oh, there's really cool time before, like about being able to use lowercase ddict and lowercase llist as type hints rather than from typing import capital L list or capital ddict, right?

39:40 So, oh, that's coming in 3.10. Fantastic.

39:42 He's like, you know, that's in 3.5 or 3.9.

39:44 So it's kind of already out.

39:46 Oh, right. Okay.

39:47 But he did point out some things that are coming that are neat.

39:49 So, for example, we're previously we had to say if I want a potentially optional, it could be none or it could be a list in the list.

39:57 if it is a list has strings, you have say optional bracket list bracket str and those are all capital because they have this parallel type implementation over in typing, right?

40:06 In Python 3.9, I can now say optional of lowercase l list of bracket str and you might think who cares if it's lowercase or uppercase L?

40:15 Well, the difference is you don't have to do an import and explain to people who don't know that code like, oh, you've got to go import this other type things to save the type.

40:22 Yes, I know list is right there, but you can't use list.

40:25 You got to do something else, right?

40:26 So that's the feature that I was excited about that I said was in 3.10, in 3.9, so hooray.

40:31 But he also pointed out that the union operators were simplified.

40:36 It used to be you would have a similar syntax for union as optional.

40:39 You would say union of bracket one thing comma bracket the other thing.

40:43 But now you can say just type one pipe vertical bar type two.

40:48 And this actually allows us to model optional without importing optional.

40:52 So instead of optional of list of string, we can just have list of string, pipe, none.

40:57 >> Yeah, this is cool.

40:58 I'm glad somebody pointed it out because the 310 announcements don't say anything about optional.

41:04 But in effect, they do.

41:06 You don't have to use this anymore.

41:08 But are you going to start using this?

41:11 >> The piping?

41:12 >> Well, yeah, and the optional thing.

41:14 Because I started to and then I realized that if I start using that, then my code is 310 only.

41:19 >> Yes, exactly. It depends on the scenarios.

41:24 So for say, talk, Python training, the code all behind that, I control the server.

41:29 - Yeah, nobody's looking at that.

41:30 - It's easy for me to make it the brand new thing.

41:32 If I were to say generate, if I were gonna build an example app for a course, then I would be hesitant to use this right away.

41:39 I might wait a year or two, because I don't wanna have to have people have a bad experience.

41:43 Like, well, I have three nine, that's pretty new.

41:45 That should be work.

41:46 Like, nope, that doesn't work because of, I didn't wanna say the word optional, right?

41:49 - Yeah.

41:50 - And if it was an open source project, I guess it would depend on how, if I wanted to support older versions.

41:56 Probably even longer there, wait.

41:58 I know, what do you think?

41:59 - Yeah, I was thinking, thinking library specifically, probably want to almost stick with the 3.5.3, at least for a while, to kind of flush out people that are using some of the older versions of Python.

42:11 Yeah, I think 3.9, I'm using 3.9 on everything now, but I think for a lot of people, that's still pretty aggressive to have a 3.9 or higher requirement for a library.

42:22 - Yeah, I agree.

42:23 Couple of bits of real-time feedback out there.

42:26 Sam and Dean both say there are Dunder Future imports that you can do now that will enable some of this stuff already.

42:32 So like from Dunder Future import pipe.

42:36 - I don't know if that's true or if it's a joke.

42:40 - Well, I do know that the Dunder Future stuff does support the newer type information.

42:46 I don't know about for pipe.

42:47 - Okay. - Yeah, yeah.

42:49 Okay, we can do some after coding on this.

42:53 Coding after the recording and we'll know.

42:56 Oh, Dean Stays is kidding, yeah.

42:59 But you really can, thank you, you really can do some of these other type information with the import under features.

43:06 Okay, ready for a joke?

43:09 - Yeah.

43:10 - All right, Brian.

43:11 So you're gonna have to help me along here.

43:14 - Okay.

43:15 - So there's two developers staring very worried at a screen.

43:19 They have one section, then a big long quiet section, and then some more.

43:24 So you be the very first person and I'll be the second person here.

43:27 - Okay.

43:28 - Okay, I hope it works.

43:30 - Do not hope, pray.

43:32 Pray it works.

43:35 Have you ever been there and just in this situation where you're just like, oh, it must work.

43:39 If this doesn't work, we're done.

43:41 - Yeah, not so much in the software side of things, but when I was a manufacturing engineer, there was so many times we'd be troubleshooting a machine on a Saturday for eight hours straight.

43:50 And you think you made it.

43:52 Everybody's just holding their breath, crossing their fingers.

43:54 Work, work, because you want to go home someday.

43:57 Yeah, I remember how... Go ahead, Brian.

44:00 No, I definitely feel this when I'm using...

44:03 when I'm... you're working on C++ code, because you have to, you know, wait for it to compile, and then test, load it, and then test it, and stuff like that.

44:11 But even with Python stuff, I still feel this when I'm working on CI tools, because the continuous integration, you have to...

44:17 you're not sure if you got the syntax right, the YAML right or whatever until you push it and see what happens.

44:23 Yeah, CI is a good point. You have so little visibility in there.

44:27 And if it's not working, there's one better real-time follow up on mine here.

44:31 It's like if you come over here and you look at the the pep-585, it does say the implementation of some of these new features under typing.

44:40 This is the one that came out in 3.9. So you can say from future import annotations and then start using lowercase l and things like that, lowercase d.

44:49 Who knows? I know Dean said he was joking, but maybe you really can get the pipe to come out that way.

44:54 But at least you can do like these sort of 3.9 level changes using a back to 3.7 it looks like.

45:02 Okay.

45:02 Alright, cool cool.

45:03 Well, that was a lot of fun.

45:05 Yeah, it was. I had another one, but I'm going to save it.

45:08 Good. Alright, well, I'm looking forward to hear about it next week.

45:11 David, thank you for joining us.

45:12 Thank you for having me.

45:13 Yeah, and thanks for all the tips and stuff you've had throughout the years.

45:16 And it's really good to have you here.

45:18 And congratulations on your first dev job.

45:21 That's fantastic.

45:22 That is fantastic.

45:23 And thanks, Dean, for correcting us in real time.

45:29 That's awesome.

45:30 That's good.

45:31 Yeah, absolutely.

45:32 Thank you, everyone.

45:33 And oh, Sam does sadly show us that import pipe from the future doesn't work.

45:38 But yeah, thanks, everyone.

45:40 See you all later.

45:41 Bye.

45:42 Thanks for listening to Python Bytes.

45:44 Follow the show on Twitter via @pythonbytes.

45:47 That's Python Bytes as in B-Y-T-E-S.

45:50 Get the full show notes over at pythonbytes.fm.

45:53 If you have a news item we should cover, just visit pythonbytes.fm and click submit in the nav bar.

45:58 We're always on the lookout for sharing something cool.

46:00 If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live.

46:08 That's usually happening at noon Pacific on Wednesdays over at YouTube.

46:12 On behalf of myself and Brian Okken, this is Michael Kennedy.

46:15 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page