Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #215: A Visual Introduction to NumPy

Return to episode page view on github
Recorded on Wednesday, Jan 6, 2021.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 215, recorded January 6th.

00:09 One of my favorite dates, 21.

00:10 I'm Brian Okken.

00:11 I'm Michael Kennedy.

00:12 And we have Jason.

00:13 Hello.

00:14 Yeah, hey Jason, nice to have you here. Jason McDonald.

00:16 Yeah, it's good to be here. Thank you for having me.

00:18 Yeah, thanks for joining us. Oh, and Brian, I think he's going to cover something we haven't really covered much on the show, GUIs.

00:24 Oh, good.

00:25 Yeah.

00:26 Actually, to be honest, I know this is like a longstanding joke in the show for long time listeners, but we actually haven't covered goos that much recently, but there was a long stretch where we did.

00:35 Yeah, yeah, that was probably like a year ago.

00:37 Yeah, yeah, like my programming projects and my brownies to be gooey.

00:41 And fudge, come on, fudge is good.

00:45 And you like bad jokes, so you'll fit in nice.

00:47 Oh, absolutely.

00:48 No, if anyone likes puns, follow my Twitter.

00:50 I posted original every Monday.

00:52 (laughs)

00:53 - Nice.

00:54 I heard that there's gonna be a lot of exciting news for space in 2021.

00:58 So I kind of wanna bring a little space and Python together.

01:00 - That's good, yeah.

01:02 - Yeah.

01:02 So the first topic that I wanna talk about is this video done by a woman in the UK who is a astrophysicist.

01:11 She goes by the name Dr. Becky, which is cool.

01:14 She has a fantastic YouTube channel.

01:16 She's also a Python developer and she works in cosmology, which is pretty cool.

01:22 And she did this video that I'd just like to highlight for people who maybe are coming into Python, not from the, hey, I'm gonna create a microservice set of APIs talking to Docker, but more from the, hey, I do some kind of science or data science or something like that.

01:39 And the video is called, The Five Ways That I Use Code As An Astrophysicist.

01:44 Cool, huh? - Yeah.

01:45 - Yeah, so she basically lays out the idea of, As a modern day scientist, you can barely do your job if you're not doing some sort of programming.

01:55 And of course, one of the best languages, technologies for programming these days is Python in the data science space, right?

02:02 - Surprise, surprise.

02:03 - Yeah, no big surprise there since 2012, I would say.

02:06 And so she covers five different things with examples of each.

02:09 So I thought that was just a nice way for people who are either getting into Python from a science side, or maybe they're teachers and they want people to ask, "Why should I not just use MATLAB "or some other custom tool?" Like, let me show you.

02:22 So here's some really cool examples of real astronomy being done with Python, but it's also super accessible to even like middle schoolers, I would say.

02:32 And number one is image processing of galaxies from telescopes.

02:37 So you can do things like noise removal.

02:40 So it turns out that when you're taking pictures of galaxies, even if there's no actual background light or disturbances, just the basic disturbance in the actual sensors themselves will put little marks and imperfections in the images.

02:55 So using Python to go through and clean those up makes it much easier to get started.

03:01 And the size of these pictures and the amount of data coming in from some of these new telescopes is stunningly large.

03:07 - Yeah, for sure.

03:08 Another one is data analysis.

03:11 So if you're trying to find the brightness of some part of an image, say maybe you're looking for a transit of an exoplanet, right?

03:18 You want to constantly monitor the brightness of a star, or in her case, what she's studying, it just blows my mind, she's studying galaxies.

03:25 Like when you see pictures of stars and you zoom in, you're like, oh, that's not a star, that's a galaxy.

03:29 Right, it's just, you know, like I still can't really get my mind around that, but she talks about one of her data sets that has 600,000 rows of like brightness of galaxies.

03:38 So 600,000 galaxies, they all have information about that they're comparing.

03:42 So that's pretty awesome, right?

03:43 - Yep.

03:44 There's an example about theory that most galaxies have a supermassive black hole in the middle.

03:50 There's also this idea that possibly the size of the black hole and the size of the galaxy, these things kind of grow in mass together.

03:57 So she has all this data. She's like, well, let's do some statistical fits of black hole size and galaxy size.

04:02 Also, the color of galaxies can indicate the relative speed or rate of star formation.

04:08 And the age.

04:09 And the age, exactly. Yeah. All tied together.

04:12 And so she's using Python for that.

04:14 Finally, data visualization, you know, pretty straightforward, but in drawing graphs and pictures.

04:19 And the last part that I was my favorite part is simulation.

04:22 So there's two really cool examples.

04:24 What happens if a star gets too close to a black hole and gets, she said, spaghetti-ified, that's cool.

04:31 And the other one is examples of galaxies colliding, which is just, again, mind-blowing, but really cool computational examples of all that.

04:40 So I wanted to highlight this video because it's super accessible, but it's also really neat to show concrete examples of real science being done with Python.

04:48 - Yeah, I thought it was cool when she was talking to her colleague about building the simulations of the universe.

04:54 You have a simulation of the universe, where do you start on that?

04:57 It's like, we think we have project blocking.

05:01 You know, it's like you start on a project, it's like, yeah, I'm just gonna build a tool.

05:03 Where do I begin?

05:04 It's like, I'm gonna build a simulation of the entire universe.

05:06 Where do I start?

05:08 - Exactly.

05:09 I'm going to simulate gravity at a collective scale.

05:12 Let's just do that.

05:13 Yeah.

05:14 Awesome.

05:15 So if people are out there and they're interested in this kind of stuff, yeah, this is all in one video.

05:20 Yeah.

05:21 This is all in one video.

05:22 Yeah.

05:22 Robert says star galaxy.

05:24 It's big.

05:25 Yeah.

05:25 They're, they're both huge, but obviously man, it's just like, I can't get my head around like galaxy size stuff.

05:30 It's so, so stars and a star as a, as a primitive type and in the universe and and that a galaxy is a collection.

05:38 (laughing)

05:39 That's what I just immediately go to right there.

05:41 - Yeah, yeah, exactly.

05:43 Exactly, yeah, so Brian, it's like a 15 minute video that half of it is the stuff that I talked about, then half is what Jason touched on.

05:49 She actually interviews one of her colleagues who basically does the more, the simulation side of programming.

05:55 - That's pretty cool, yeah.

05:56 - Yeah, yeah. - I'll have to check that out.

05:58 - Yeah, it's definitely worth it.

05:59 - Yeah, I enjoyed it.

05:59 I don't do very much data science actually at all, and so it's like, you know, understanding, Seeing data science stuff is always interesting because most of my work is in application development.

06:09 I don't usually work with a lot of data, so it's that side of it explained in this really cool, relevant way.

06:14 Instead of, well, the statistics is a number of people who buy cheese every weekend, but the supermarket is not interested.

06:21 Galaxies, wow.

06:22 - Exactly, getting better click-through rates on your ads is not super compelling, but I think it's really valuable to see alternate perspectives, right?

06:30 We all get into our own little world of like, this is what programming is, this is what Python is for, and then, you know, it's bigger.

06:37 - I wanna talk about NumPy a little bit.

06:39 - All right, tell us about it.

06:40 - Well, I've actually, I've used NumPy off and on a lot, and it's definitely a staple in scientific use of machine learning and all sorts of stuff.

06:48 But I'm starting to use it more, and I've realized that I had the wrong mental model.

06:54 So I like, think of arrays kind of just like lists, but sort of different.

06:59 And so I came across this article.

07:02 a couple years old, but it's a visual intro to NumPy and data represent. And to me, it really helps a lot, like, to help me understand what you can do with it, and just have a good mental picture of what what the arrays are in NumPy. So it talks about arrays, matrices, and in the arrays, which are in dimensional, but like, for instance, I'm even just creating an array, I knew how to create an array, I mean, you just kind of initialize it with a list and you get an array. But I didn't know you You could do like just say I want a list of ones or a list of zeros or an array of ones or just a random array pre filled with random numbers.

07:40 That's pretty.

07:41 And then he talks about arithmetic you can do with them and slicing stuff.

07:45 You know, Brian, like when we talk about Pythonic code all the time, like, oh, you could write code in this way where you kind of hack a numerical for loop, but you should do this other way.

07:54 And that would be more Pythonic.

07:55 I suspect there's also a numpy way.

07:58 A numpy way, right?

08:00 Instead of like filling up stuff, you're like, oh, you should just do ones on this one.

08:03 And then you always, like, there's a lot of cool other ways of sort of conceptualizing things, right?

08:08 - Yeah.

08:09 Well, and it's worth remembering, and I've said this quite a few times, not here obviously, but I regularly like to remind people, abstractions are there to save us typing, never to save us thinking.

08:21 It's like, it helps to have that mental model, as you put it, Brian, you know, straight, because if your mental model is wrong, and really wind up, well, you're prone to both cargo cold programming, well, I do it this way because the way I was taught, or trying to, you know, ill fit pattern that's familiar to, you know, the wrong sort of problem, and you don't realize what it is you're really working with. So understanding what's happening under the hood, even if you know, you don't know all the technical details of the application, still understanding how it's doing things important to, you know, choosing the right idiomatic patterns always. Yeah, yeah. And you'll hear stuff like, oh, well, Python is slow. It's like, well, because you're doing it wrong. Don't do it that way. For example, use something like NumPy, right?

09:01 And like, for instance, one of the things I really loved about this article was the explanation of dot product because I've heard this before. I've never had to use a dot product, but it like somebody described it to me several times and I'm like, yeah, okay, weird. But then like the visual representation of it, I looked like just stared at it and read it for like, you know, 30 seconds. Oh, that's easy. Now I get it. And I'll have it forever because of that sunk in there pretty good.

09:27 One of the reasons why I went to it.

09:30 I had this problem is that I I have like large arrays, but they're not like huge.

09:36 They're like in the thousands say of numbers and I need to make sure that one array is like comparing to another.

09:44 I know equal works air free, but I wanted to compare item by item if to make sure every element is less than the other element in the other array.

09:54 They didn't know how to do that.

09:55 And I'm like, I think NumPy would probably do that easy.

09:58 - Can you do one NumPy array less than the other?

10:01 - Yeah, so if you say less than, it compares it element by element and it gives you a list of true or false.

10:07 And then you can do all.

10:08 - Yeah, do an all on it, yeah.

10:09 - You can just say all of these two arrays less than or equal to each other and I get exactly what I want.

10:15 They're very expressive, simple line of code.

10:17 - Yeah, it's that kind of stuff I was thinking of when I was talking about like the NumPy, NumPyonic way or whatever.

10:23 - Idiomatic NumPy.

10:24 - Thank you.

10:25 (laughing)

10:26 Is like, that's like one or two lines and it's really fast.

10:29 Whereas you could loop over each item individually and it not only is more code, but it's also slower.

10:33 - Yeah, well, and also I like, I also have to, I like that there's the intermediate step of that there's, gives me a list of true and false too, because I also on the debugging side, I need to be able to like wrap this in something and pick like say the first five elements that are not matching.

10:49 I mean, I don't want, if I, if it's false, the whole statement's false.

10:54 I don't wanna like just say, you know, list all the thousands that are wrong, but I wanna be able to like list a few to say, at least these are not in the right.

11:02 - Yeah. - Yeah.

11:03 - It's good.

11:04 - I'm gonna try out NumPy now.

11:05 I now have a reason to try it out.

11:09 - Exactly, like why am I not using this in certain situations?

11:12 Magnus of the live stream says, "Two dimensions is okay, three is hard, but N, then my mind blows." Yeah, I actually did a bunch of math research and four dimensional stuff, dimensional but complex numbers. So four dimensional sort of. And yeah, it's just it's just hard.

11:26 Well, one of my weird knacks as a programmer is I actually can think in six dimensions.

11:30 It's I mentioned before the podcast I had a head injury a few years ago, so I'm a minor traumatic savant. I can think in six dimensions and the best way I can explain it if you're trying to do it without having a really bizarre brain like mine is think of think of the fourth dimension as a timeline and for each timeline you have you have space represented as a cube, But then you have this row of cubes, which represents the timeline.

11:53 It becomes a lot easier to think of four dimensional arrays when you think of it in that fashion.

11:57 Yeah.

11:58 And the way that we did it, we actually had animations of that three dimension thing and the animations were moving through that, that bit.

12:04 But still it's, it's, it's no easy, no easy thing.

12:07 Yeah.

12:08 It's easier when you're an animator to wrap your head around 4D than if you're just a, you know, an ordinary run of the mill programmer like most of us.

12:14 Brian, would you say that that's a GUI type of solution?

12:17 No.

12:18 Maybe you could do something with cute.

12:21 Yeah, yeah, yeah.

12:22 Oh yeah, yeah, Jason.

12:25 Yeah, who knows?

12:26 It's possible.

12:27 So that's our next topic.

12:29 Grab it, Jason.

12:30 Yeah, well, OK.

12:31 Well, I was really excited to discover the Qt 6 just released on December 8th.

12:36 So Qt, yeah, it is officially pronounced Qt, although it's very debatable.

12:41 People are like, oh, it's Qt.

12:42 It's Qt.

12:43 GIF, GIF, come on.

12:44 Yeah, exactly.

12:45 Anyway, whatever you're going to call it.

12:46 It just released.

12:47 And this includes the Python binding so Pyside 6, Shabokan 6, which is the...

12:54 So Pyside 2 was Qt 5 as if that made sense. Pyside 6 is Qt 6. Qt 6, even how I'm doing it.

13:01 Anyway, so that just released. And you also have the PyQt 6 if you prefer Riverbanks version, but in whatever case you're gonna wind up with all the Qt 6 features.

13:13 I think the coolest thing here is if you're doing like, you know, really fancy sort of graphics is that previously Qt 5 and prior had this hard dependency on OpenGL and They've actually put in a what they call the rendering hardware interface with an abstraction layer into Qt So now it can natively support whatever the 3d graphics driver is on that device whether it's DirectX, Vulkan, Metal whatever you want it to work with.

13:41 But uses the native by default.

13:42 You could.

13:43 You could tell it to use whatever whatever you want. That is so cool.

13:47 Yeah, and there's a bunch of other optimizations and fixes to have in here.

13:51 I am really excited because I discovered this was actually introduced in 515, but they now support snake case.

14:00 For those of us who are like Pep 8 addicts really hate the fact that cute kind of seemed to force you to use the camel case.

14:07 You can use snake case.

14:08 there is a setting for it.

14:11 You can also use properties instead of getters and setters as of Keefex.

14:15 So you can just rely on properties and that is, it makes it a lot easier to write idiomatic Python code that is cute, which is kind of fun.

14:25 >> Well, it just feels wrong, right?

14:26 Get with, set with, all those things.

14:29 >> They also have this cool thing called property binding where you can actually link those together now too.

14:33 It's like you can link the width and the height.

14:35 So when you change the width, height automatically changes.

14:37 Nice. Yeah, I really want to build some stuff with cute.

14:40 I've got a few app ideas in mind.

14:42 What I don't have is time sadly.

14:45 Can you help me with that?

14:46 Jason can tell me just have more time.

14:48 Well, I know I have a reputation as a time Lord, but unfortunately, I can't control the stream of flow of time there.

14:55 If I find my Tardis, I'll pick you up and drop you off, you know, 10 years ago and you can relive those 10 years of additional things.

15:02 Okay, that nice nice.

15:04 - Obie, go.

15:05 - Yep.

15:06 Let's see, actually a couple of questions from the live stream.

15:09 Magnus asks, "Any news about Qt going mobile?" - I actually am ashamed to admit I don't know.

15:16 - I don't know either.

15:18 But I think the bigger, more interesting question would, could PyQt stuff, like would you be able to write a Python Qt application and make it mobile?

15:27 Right, I think that's where it gets really interesting.

15:30 'Cause there's other, if you pick another language to like C++ or there's other options you might be able to choose.

15:36 And then maybe you know this one, you're gonna ask, are there any well-known Python apps built with Qt?

15:41 - Oh yeah, yeah, they're on the spot.

15:42 I'm trying to think of one.

15:43 Mine, it's not well-known, but I built Timecard in Qt.

15:47 If you look up, if you look up Timecard, it's just a time tracking app that I built.

15:53 But actually there's quite a lot built with Qt.

15:57 I think with a K in front of it, If you're in the KDE, the entire KDE stack is built on top of Qt.

16:04 There's actually quite a bit of it that's done in Python.

16:07 Names are escaping me off the top of my head here.

16:11 Anything in the KDE universe is Qt.

16:14 You're either going to get C++ or Python.

16:17 Python is certainly a lot faster to write.

16:19 >> FileZilla, apparently, is built.

16:22 One that I know that's written in it for sure, it's one of my favorite apps, actually, is Robo Mongo or Robo 3T it got renamed to.

16:31 I believe it's just C++, it's not Python Qt, but that one's a really nice one as well.

16:37 Actually, there's a huge long list, I'll put it in the show notes over here, of a bunch of apps written as well.

16:43 - It's definitely a lot easier to write something in Qt.

16:47 I've used a lot of different UI toolkits and Qt's definitely one of the easiest.

16:51 - Yeah, the thing that I like about it is it looks like it belongs.

16:54 'Cause so many apps you build with these sort of cross-platform things and it's just like, "Okay, well, that's not how the file dialog is supposed to look." You just know it's alien, but you're like, "No, no, this looks like it belongs here." - Well, and packaging's the other half of it, because I tried to build something with Kivy, and I love Kivy from a development standpoint.

17:11 It's really cool.

17:12 From a packaging standpoint, it's like beating yourself to death with a wet trout.

17:15 So, and actually, if you're gonna do cross-platform, then actually, GTK's horrible too, because it's really hard to get it to package on Windows a lot of times.

17:26 - Yeah. - It just works.

17:27 It just packages everywhere.

17:28 - Yeah, that's great.

17:29 That's nice.

17:30 All right, Brian, I think this episode is brought to everyone by us.

17:34 - Wonderful.

17:35 We're good people. - Yeah, so we are.

17:36 We're doing a lot of work out there, as everyone probably knows.

17:39 If you're into testing, check out Brian's pytest book.

17:43 If you're looking to take a Python course, we are just about to pass 200 hours of Python courses over at Talk Python Training.

17:51 I'm working on a new course, how to build web apps, not web APIs, but we have apps with fast API, super neat stuff.

17:58 So that's, that should be out in a week or two.

17:59 So anyway.

18:00 - I wanted to bring up that there was kind of a spike in the itest book sales in last quarter of 2020 and I'm hoping that like some schools and they try to teach testing while they're teaching stuff.

18:13 - Yeah, that'd be super cool.

18:14 - Yeah, it's nice to see more, more, more, more stuff about stuff other than unit test.

18:18 I mean, unit test has its place, but I, when I wrote the chapter, I've got a book coming out in May and when I wrote the chapter on testing And one of my editors was like, thank you for not forcing me to edit yet one more unit test chapter.

18:29 Nice.

18:30 What's your book on?

18:31 Oh, my book's called Dead Simple Python.

18:33 It just it, it introduces the language of Python, the idiomatic practices of Python to people who are coming from another language.

18:42 So if you don't want to have to sit through yet one more explanation of what a variable or a function is, or a class is, you can pick this up and it dives straight into the fine details of why idiomatic patterns are what they are in Python.

18:54 Nice.

18:54 That's, yeah, that's a great idea.

18:56 The courses or books that say, "We're going to pretend you know nothing about the world, and we're going to force you to go through everything from scratch every time," that drives me crazy.

19:04 The only thing else that drives me crazy, Brian, is when my Python GC is doing stuff when I know that it doesn't need to do stuff.

19:12 Yeah, I like to not have to think about the garbage collector.

19:14 And you generally don't, right?

19:16 Like, one of the things that genuinely surprises me is the fact that we don't really talk about memory very much in Python.

19:23 It's like, oh, okay, I think it cleans itself up.

19:24 That's good.

19:24 Now what?

19:25 Let's go, let's go about stuff, right?

19:26 But if you dig into it, it's pretty interesting.

19:29 There's a lot of stuff around allocation we've covered before, but it's quite unique.

19:33 But Python is also somewhat unique in the sense that it has like two modes.

19:37 So it has reference counting, which I would say 98% of all like memory management, cleanup stuff is in the reference counting side.

19:46 This is totally made up, these numbers.

19:47 But there's a little, I would say maybe even more like 99.5 unless you're building some kind of a certain kind of app like with interesting algorithms.

19:56 Most apps don't create cycles.

19:59 And the only reason we have garbage collection in addition to the reference counting is to catch those cycles, right?

20:05 You know, I've got a customer object, I've got it out of a SQL community database.

20:09 It has a relationship over to the orders.

20:11 I go to the orders, the orders have a link back to the customer.

20:14 maybe like traversing that lazy loaded list has created a cycle and now I need the GC to save me.

20:19 So the rule for when the garbage collector runs is you can ask it, you can say import the GC module, say gc.getthreshold or thresholds, I can't remember if it's singular or plural on my screen if I would switch to it, singular, getthreshold, it returns three numbers.

20:35 They're not the same units, which makes them really hard to understand.

20:38 The first number is how many allocations of collection objects, So classes, dictionaries, lists, tuples, things that could contain other stuff.

20:47 So things that could potentially be participants in a cycle, like numbers and strings are not even considered by the GC.

20:53 But how many allocations of collection types are there that exceed the reference counting deallocation?

20:59 So if I had a list and I put a thousand classes, class objects in it by allocating and filling it up, then I would hold on to a thousand and none of them would have become garbage.

21:08 So the first number that comes back is, well, how big is that number before we just run a GC, no matter what?

21:13 And the default is 700.

21:14 So my example there, if I create a list of a thousand objects, that's a GC that's gonna run.

21:19 It doesn't matter if there's cycles, there's no cycles, it just doesn't matter.

21:23 Like I've made a thousand of them, that's over 700, so we're gonna run a GC.

21:26 And then the rest are like, how much do you run like a whole memory GC versus a local, a small, like recent object GC?

21:32 And what occurred to me is, you know my website, there's a lot of pages that pull back thousands of items in any website that uses a database and an ORM that pulls stuff back and hangs onto it and not just like streams over the items, but puts them maybe in a list or something temporarily, anytime you do that more with the thousand, you're going to have the GC run, right?

21:51 They're just looking for anything to throw away, basically.

21:53 Yeah.

21:54 But you know, you're still in the process of building the list of them.

21:57 I got to get 10,000.

21:59 Well, guess what?

22:00 That means you're going to have 14 GCs and you're just in the process of building the list.

22:04 And I'm like, that's kind of weird.

22:06 That seems excessive to me.

22:08 And then I went and looked at the site map on TalkByThon training, where we're pulling back like thousands of transcripts and all sorts of stuff to generate all the pages on there.

22:15 77, there's 77 GCs to render the site map.

22:18 There's no cycles.

22:19 There's not one.

22:20 So I'm like, that's not good.

22:21 Well, let me think about that for a second.

22:22 So what I ended up doing was I said, well, what if I made the threshold 10,000?

22:26 Actually, I ended up on 50,000.

22:28 So only run the GC if you get more than 50,000 allocations without deallocation.

22:33 What was really interesting is doing that made my unit tests, which were including many, many integration tests on talk, by then training run 10 to 12% faster, just setting that one line.

22:43 And it basically does not use more memory in my case.

22:46 Is that crazy?

22:47 Well, it makes sense.

22:48 It's most, most issues of performance just come down to memory and how memory allocation, the allocation.

22:56 I spent almost all my time in, in C++ more time in C++ than I do in Python, and we don't have a garbage collector over there.

23:02 So you have to do all this manually and, and doing it.

23:04 >> You know how much work it is, right?

23:06 >> Yeah.

23:06 >> Yeah, exactly. It's like doing it wrong is why stuff slow.

23:09 People are like, "Well, Python is slower than C++." Well, it has the potential.

23:14 It uses the potential to be faster than Python.

23:16 But it really depends on how you write that code.

23:18 Because well-written code is always going to run faster than poorly written code.

23:22 It doesn't matter what the two languages.

23:24 >> Yeah. I realized that in my world, in my type of application, I almost never create cycles, but I often get back more than 700 class objects, which also have dictionaries potentially in the mix as they're like allocating the converting, serializing into classes.

23:40 Like there's gotta be a lot of places where that's happened.

23:42 So I just set this number to say, you know what?

23:44 Let's waste a little bit of memory.

23:46 And if there are cycles, we'll come back and get them later.

23:48 And because there's almost no cycles, there's almost no memory growth.

23:51 For example, so the server is running like eight worker processes, one of them.

23:55 And I made this change.

23:57 And I think over after running for a week without restarting any of the processes, It went from 1.89 gigs of memory usage to 1.91.

24:05 So like 220 megs, I think it was 20 megs more memory usage.

24:09 And yet like 10% speed up by just changing like one call at startup.

24:14 It was insane.

24:15 - Well, and think about what Dr. Becky's code, you know, like, you know, go back to the astrophysicist, you know, thing here, you know, with the sizes of data structures that she's doing or any data scientist who's listening, you know, they're usually dealing with 10,000, 100,000 million items.

24:29 You know, you combine this with all the stuff that we talked about with NumPy and with data processing.

24:34 We talked about how long it takes to do some of these data regressions.

24:37 How much would this be?

24:38 >> Yeah, exactly. If that data is being done in Python and it's not just purely being pushed down into the C data science layer, then yeah, that's really interesting, I think.

24:49 >> Although I would caution at the same time that there's no such thing as a magic bullet.

24:55 So you have to understand why this is going to speed things up.

24:58 Well, I have to just copy and paste that line that my colleague has that he got from Michael Kennedy because it'll make the code faster.

25:04 No, you have to know why.

25:05 - Yeah. - It makes the code faster.

25:07 - It's an easy test, some cases it makes sense.

25:09 People can check it out.

25:10 I thought it was really, it just so surprised me.

25:12 I was walking along with it, I'm like, wait a minute, that must mean something weird is going on.

25:16 And then I put it on just on one of my pages, like why would I do 77 GCs on a single page load?

25:21 That's crazy.

25:22 And so I just started exploring this and here we are.

25:24 - So did you, whatever you're linking to, does it talk about how you can test How many garbage collections?

25:31 - Let me see.

25:32 I'm leaking to a Twitter thread and way deep down.

25:37 No, but there is a way to do it.

25:40 If you go to the GC, you can say, I think it's set debug stats or something.

25:45 I'll look it up real quick while we're talking.

25:48 I'll throw it in at the end here, but yeah, it's, there is a way to do it.

25:51 Actually, I got it right here.

25:53 Hold on, give me just a sec.

25:54 The way you do it is you say GC.set_debug, and then you pass in enumeration, and the value is GC debug stats.

26:01 >> Okay.

26:02 >> That thing was just lighting up.

26:04 When I turn that on, it would just light up, it just completely fill the terminal with the debug, GC, GC, GC, GC, GC, over and over and over when I hit that one page.

26:14 Then changing it, guess what? Made it better.

26:16 >> Yeah. Now, we should probably be PC about the GC and call the garbage collector the programmatic sanitation engineer.

26:24 Well, it doesn't take offense.

26:29 It's just there to help us out.

26:32 Brian, it's probably a pretty awesome library, honestly, the GC library.

26:36 >> Probably, but it's built in.

26:38 Yeah. Of course, I'm susceptible to click on the listicle.

26:43 >> Who isn't? Come on.

26:45 >> Right. But we don't cover them very much, but I really like this.

26:48 This article is top 10 Python libraries of 2020, But their criteria was interesting.

26:54 The criteria was it has to be a library that was launched or popular, has to be well maintained, has to have maintenance changes since their launch date, and it has to be just outright cool that you should check it out.

27:07 So I'm going to go through a handful of these.

27:09 They listed 10.

27:10 I don't know if all of them, since there's like four of them that are machine learning focused that I--

27:16 I think cool is relative.

27:18 Yeah.

27:19 But the first one was Typer.

27:22 And I can't I'm like, I'm really a fan of Typer now.

27:25 Is it really just 2020?

27:26 And I went back and look like it was released like in December of 2019.

27:31 So Sebastian Ramirez is killing it for sure.

27:33 And then I looked in and I'm like, well, fast API, when that come out, that was the previous December.

27:38 So the end of 2018 released fast API and then Typer a year later.

27:44 He's just crushing it.

27:45 Yeah.

27:45 So yeah, nice.

27:46 Both a huge fan of both of those.

27:49 a big fan of rich also is a rich actually just showed up this in last year in 2020. And rich is a beautiful, beautiful formatting in the terminal. And yes, it's a beautiful. Oh, it's really great.

28:03 He is glorious.

28:04 I'm even using it even in applications where I just need these, the tables.

28:09 So if you need to print out a table in the command line, the tables are kind of hard and there were like weird other, there were other table, specialized table libraries.

28:19 But this one is great that you can, it works, you don't have to specify the width, it like comes up with the width on its own.

28:27 And then you, if you shrink the terminal to really narrow or wide, it'll word wrap correctly and stuff.

28:35 >> Wow.

28:35 >> That's incredible.

28:38 Even just for tables.

28:40 >> Yeah, which is awesome.

28:41 >> The third one is DeerPiGUI.

28:44 I think we covered this, maybe we could.

28:45 >> I don't remember. I mean, we did go on our GUI rant, so it feels like it should be.

28:49 >> Yeah. It's a GUI project.

28:53 Nice pictures though, at least.

28:55 >> Yeah.

28:56 >> I've been drooling over Deer.

28:59 I'm GUI for a while.

29:00 I haven't had an opportunity to use it yet, but I've been looking at it like I went, so.

29:04 - Yeah, so the last few I wanna highlight, pretty errors, looks neat.

29:08 I haven't tried that yet, but it's a way to, yeah.

29:12 - That is glorious as well.

29:13 - Better tracebacks, so.

29:15 - I mean, ideally you don't show errors to people, but if you're going to, let's make them at least readable.

29:19 This is great.

29:20 - And let's train ourselves too.

29:22 You know, it's like, you know, we're gonna have to read the, we're gonna spend at least half our life reading error messages, face it.

29:27 So let's at least make them readable.

29:29 - And another quarter crying about the, what we just couldn't figure out.

29:32 (laughing)

29:33 - And then the last two that I wanna highlight is diagrams and scaling.

29:38 Diagrams is a library, look at that picture.

29:42 It's a way to do, it's intended for like cloud architecture drawings, but it's written in Python.

29:49 You write these diagrams in Python.

29:52 And so because they're text, you can check them in with version control.

29:57 - Oh, that's cool.

29:58 - Which is nice.

29:59 I'd like to see these sorts of diagrams look more, would be great for not just, you know, network diagrams, other diagrams.

30:07 - Flowchart would be great.

30:08 I still flowchart.

30:09 - Yeah, so the last one is Scalene, which is a memory CPU and memory profiler in Python that handles multi-threading well and distinguishes between Python versus on-premise.

30:23 That's pretty cool.

30:24 I definitely need to try this out.

30:26 I also like that you don't have to modify your code >> Yeah, that's really cool.

30:31 Yeah, absolutely. Those are cool.

30:34 There's a bunch of great ideas there and I really need to find a use for rich.

30:38 >> Solution to a problem again.

30:40 >> I write a lot of little terminal apps and stuff and I'm just like, maybe you'll put a little color in here or something and I just need to take the time and go, no, this is a UI that I should pay more attention to, not just some random thing with text.

30:53 >> Yeah. We'll find this cool stuff.

30:55 It's like I want to use, I feel the need to use this somewhere.

30:58 (laughs)

30:59 - Exactly.

31:00 - I had a little, so I had a little application where it's just like I said with the tables and I'm like, I don't think it needs colors.

31:06 I'm just showing a table.

31:08 But the default for rich is to show colors and you don't have to pick them, it just picks them.

31:13 So the like the heading and the lines between were like different colors if you're on a color terminal.

31:19 And if you're not on a color terminal, it works anyway.

31:21 It just figures that out for you and lovely, love it.

31:24 - Yeah, that's awesome.

31:25 That's awesome.

31:26 It's very awesome.

31:27 Awesome. Speaking of awesome.

31:29 So, pep 518 rolled out a while back.

31:33 It was introducing this thing called pyproject.toml I guess it's pronounced "toml" or whatever.

31:39 I'll say that, pyproject.toml.

31:41 So the idea behind this was that it was going to be this configuration file, you know, one configuration file to rule them all. And of course, Python, we like things to be simple.

31:51 Well, ironically, this turned into a really political thing, which I'm still trying to wrap my head around.

31:55 So basically the nice thing about this repository is keeping track of all the projects that have adopted PyProject HOML, either optionally or mandatory, for configurations.

32:07 So instead of having to have a dozen configuration files in your project for all these different tools, you can just use this one.

32:14 And so it's got this big list.

32:15 What I find interesting is this part down here at the bottom.

32:18 If you go down to, yeah, just scroll just slightly here, just slightly, just a little bit up.

32:24 That's going to sound weird on the podcast.

32:25 Anyway, so if you're going to--

32:28 so these are projects that are, quote unquote, discussing the use of PyProjectTOML.

32:32 But if you actually look at these, it's kind of odd.

32:36 The big sticking points-- because these are the projects that are stopping people from really just going all in on PyProjectTOML.

32:42 And there's even some--

32:43 talk about circular dependencies.

32:45 Or some are like, well, I'll do it when they do it.

32:48 And they're like, well, I will do it when they do it, which makes you wonder if it's a serious excuse.

32:52 So mypy is the weirdest.

32:54 Weiner Van Rossum himself said, well, it doesn't solve anything.

32:57 Someone said, can we just add this, please?

32:59 Just add it.

32:59 It's easy.

33:00 Here's the PR.

33:01 Somebody did the PR.

33:02 He's like, nah, it doesn't solve anything.

33:04 And he closed it.

33:05 It's like, it does solve something.

33:07 It's one less file I have to deal with.

33:10 That is a solution.

33:12 Flake8, they have a couple of concrete objections.

33:15 One is the fact we don't have the standard Tomo parser in the Python standard library.

33:20 So that could be a problem.

33:23 - Interesting.

33:24 You're adding another dependency to just support having this format.

33:28 - Exactly, yeah.

33:29 But then again, it's a common dependency with a bunch of other tools that are already in use and it almost doesn't matter.

33:36 Pip, someone said, I don't understand this.

33:39 pip to change its behavior so mere presence of the file doesn't change functionality.

33:42 I can't wrap my head around what he's referring to there.

33:46 But the stupid thing is someone already did Flake 9, which is an exact fork of Flakegate that just adds PyProjectHomel.

33:52 So it's like, it's done.

33:55 They just have to merge it.

33:57 And actually the same thing happened with Bandit.

34:00 Someone actually implemented it in 2019.

34:02 The PR has been sitting there, untouched since 2019.

34:05 So over years gone by, it's there and Bandit is not picking it up.

34:09 They're silent.

34:10 ReadTheDocs is saying it's too much work.

34:12 Like it's a lot of work for us to have the multiple.

34:16 Pie oxidizer shockingly hasn't even said anything. It's 2019. They're they're like the they're like the new trendy like the trend-setting Packaging thing and they haven't been saying anything about this this so I I'm trying to figure out why it is that this is so controversial Because it seems so obvious you have one file to store all of the settings for all the different tools And yet everybody seems to want to do their own thing with this - Well, I know that, you know, PIP, ENF and Poetry and FLIT and some of these other tools that suggest a workflow.

34:52 I feel like I hear this file format being used along with those and, you know, telling people we're gonna have a different way for you to like, work with your projects and manage dependencies and stuff.

35:03 And you know, that, I think that's part of the source of this and I don't know if it's just necessarily all mixed together.

35:09 Brian, what do you think?

35:10 You know more about this than I do.

35:12 I think a lot of projects are on the side of, like for instance, Coverage was, I don't know where they are on the list.

35:21 - That they adopted.

35:22 - That they adopted, okay.

35:23 - Yeah.

35:24 - Well, Coverage had this thing, and other tools were talking about, you know, there's no TOML parser, and they didn't have any other dependencies, so they didn't want to add a third party dependency just for this, and if they're just using it for packaging, however, or settings or something.

35:41 But the, so I do think we will see a lot, I don't think it's a reasonable argument because there's reasons why, you know, the same thing, reason why request is, because there's making changes.

35:54 But I do think that the, like the format of TOML, basic format enough to get a PyProject isn't gonna change much.

36:02 So I think enough of a project TOML parser to handle PyProject, that's, I think we need one of the, something like that in the in built into Python.

36:13 Yeah, just use we have we have PEP 518.

36:15 So like we have some we have some standard already.

36:18 Yeah, so I think we'll see a big.

36:21 I would like to see at least even if it isn't the mainstream one.

36:25 If we have if the if most projects that are OK with the third party use something else for a total parser, but there's some built in stripped down version in the standard library.

36:35 I think that's I think that's.

36:37 Yeah, I see you could solve that problem by just vendoring it.

36:41 Just like here's the two files that make up the parser.

36:43 We're just going to make it part of this package.

36:46 So now we're good to go.

36:47 I don't know.

36:48 Sounds good.

36:48 Well, I think that's it for all of our items.

36:51 Brian, you got anything you actually want to share with folks?

36:53 Yeah, it's my birthday.

36:55 Yay!

36:55 Woo!

36:56 Happy birthday to you!

36:57 Man, you're looking good for 28, brother.

37:01 So I'm 51, and I heard today that that's just one.

37:05 I'm just shy of a full deck I've never been accused of playing with a full deck myself but don't I will say don't let anyone tell you that you're old because I says in the first chapter a chapter of Genesis thou and then God said man's year shall be limited to 120 half of 120 is 60. So what is biblical that 60 is middle-aged. You're not even middle-aged You've got a way to go the Bible I I keep telling everybody that I don't look at day over 73.

37:36 Oh, you're good, man.

37:39 A couple of happy birthdays.

37:40 And also, you're going to ask if you're still a fan of flit.

37:43 Yeah, I love flit, especially since they adopted the source source directory.

37:47 Yeah, that's right. That's awesome.

37:49 Yeah, that's that's saved my life.

37:52 Jason, anything extra that you want to throw out there?

37:54 I mean, maybe people have a place they could get notified about your upcoming book or something like that.

37:58 Yeah. You know, following me on Twitter is probably the best way to do that.

38:02 I'm code mouse 921 Twitter.

38:03 Uhm, and then actually follow no starch press too.

38:07 I mean, no starch press is awesome to begin with.

38:10 That's where you're doing the book.

38:11 Yeah, exactly.

38:12 They're my publisher.

38:13 No starch.

38:14 I don't think they ever put out a bad book.

38:16 I love that publisher.

38:17 So I was I can you can actually you can ask my mother when I got when I got when my book contract got accepted, I actually screamed very high pitched.

38:27 That's awesome.

38:28 So yeah, follow follow those starch press for updates on that and all the other awesome.

38:32 They got some other incredible books coming up too.

38:35 So I'll go ahead and ask her.

38:37 So what's your mom's Twitter handle?

38:39 My mom's Twitter handle?

38:40 Oh, she doesn't have a Twitter handle actually.

38:43 So I'll have to put you in touch directly.

38:44 I think unfortunately.

38:45 Awesome.

38:46 Well, cool.

38:47 Thanks for being here again.

38:50 So I have a couple of items to throw out here.

38:53 Actually, this almost Brian, this almost could have been an extra, extra, extra, extra, extra, extra here all about it.

38:57 but they're real short, so I didn't do that.

38:59 Django 3.1.5 is released.

39:01 Django 3, didn't we just go to Django 2 or something?

39:05 That's, I mean, that's good.

39:06 That's really good to hear.

39:07 So awesome on that.

39:08 Python 3.10, alpha 4 is available for testing.

39:12 - Now the new parser is gonna be in that one, which is gonna be--

39:15 - Oh, that's the PEG parser that Guido's been working on?

39:17 - Yeah, that's gonna revolutionize the language, eventually.

39:22 - Yeah, yeah, it'll definitely make it possible to do more.

39:24 And in releases, SciPy 1.6.0 was released.

39:28 I learned about a cool project.

39:29 So we talked about like avoiding Excel for the Python data science stack, right?

39:35 Like just stop doing Excel.

39:36 There's all these weird errors.

39:37 Like the organization that defines or governs how you can name genes has come up with rules for names you can't use.

39:46 And the reason they can't be used is they'll be parsed incorrectly into other data types by Excel, for example.

39:52 So there's a lot of issues you might run into with Excel and that's all good, but there's this project called PyXLL.

40:00 And this is actually a paid product.

40:03 They're not sponsoring the show.

40:04 I just think it's kind of neat.

40:05 So just spreading the word.

40:06 But anyway, if it's interesting for you, what you can do is it's a plugin for Excel that will embed Jupyter into Excel and allow you to write functions and macros in Excel in Python.

40:17 So basically almost adds the program, Python, the programming language to Excel, which is good.

40:22 - Yeah.

40:23 - It's better than VBA.

40:24 Let's see.

40:25 - No, I started in VBA, tell me about it.

40:27 (laughing)

40:28 Anything's better than VBA.

40:30 - So someone on Twitter asked if PyCharm works okay on my Apple Mac mini M1.

40:37 And PyCharm and JetBrains in general just released a whole bunch of their tooling with different installs for the Apple Silicon native versions.

40:46 And so I've got a cool little video that I'm gonna link to in the show notes.

40:51 It's just like a 5 second video of here.

40:52 I open up PyCharm and you basically from the time you click on open project till the projects open if you've opened a project before, so that that caveat.

41:01 But at that point, if you click on it, you cannot perceive click.

41:05 Like by the time you're letting up the mouse, the whole the project is loaded and ready to work on.

41:09 It's like it's insane.

41:11 Yeah, I will.

41:12 I will consider picking up PyCharm again when they add live sharing.

41:14 They have they're working on it.

41:16 There is something called code with me.

41:18 Yeah, yeah.

41:18 So I have not tried it.

41:19 I have no one to code with.

41:20 I'm sorry, but email me later.

41:23 We'll start filming.

41:23 Yeah, exactly.

41:24 We'll go together.

41:26 So also since I got my M1 like 3, 4 weeks ago, whatever, I've only used used this for all my Python work and apparently it's still going strong.

41:36 I even had to send in my MacBook Pro because it had started shutting the battery was so bad.

41:42 It would shut down at 75% like, you know, when it like is too low, it'll shut down and as the battery gets bad, maybe it shuts down at 10% instead of zero.

41:49 If I'm doing video work, it'll actually shut down at 75% till I plug it back in.

41:53 So it's all in one until that comes back.

41:55 - Well, I'm still on my system 76 Linux.

41:58 I can't speak to Apple.

41:59 I do love my system 76.

42:01 - That's cool.

42:02 I just, I think this whole like new ARM architecture stuff that they're doing, it's gonna be interesting.

42:08 You know, I think Microsoft's following suit or trying in parallel with them.

42:13 It just felt to me like Intel and AMD, that's just the way it was gonna be forever and it's not necessarily the case.

42:18 I don't I don't have a problem with.

42:20 I don't have a problem with competition.

42:21 What I have a problem with the software companies making their own, you know, architecture and it only works on their architecture.

42:27 That's what you move towards and then you wind up with a totally fragmented industry.

42:30 I think that's yeah, that's not going to be great.

42:32 No, don't do it.

42:33 Microsoft is not worth it.

42:35 Awesome, alright, well that's my extra extra extra extra extra extra Brian.

42:41 Nice I want to get an M1.

42:43 I'd like to get a mini.

42:44 Yeah, the mini is fantastic.

42:45 I really, really like it.

42:47 It's not even funny.

42:48 It's not even a it's not even a joke.

42:49 I'm being serious, but we do need a joke.

42:51 Yes, I have a joke.

42:53 Alright, yeah, you got the joke this week.

42:55 I actually do have the joke this week.

42:57 Yeah, and so why?

42:58 Why did the programmer always refuse to check his code into the repository?

43:01 Why he was afraid to commit?

43:03 So yeah, yeah, if you want.

43:07 If you want a regular dose of my of my that is one of my originals.

43:10 If you want to get a dose of my absolutely horrific puns, you can follow me on Twitter at your own peril.

43:14 I posted every every Monday.

43:16 I've got a new one, so awesome.

43:17 Nice.

43:18 Yeah.

43:18 Thanks for being on the show.

43:19 Yeah, it was fun.

43:20 Yeah, thanks.

43:21 See y'all.

43:21 Thanks, everyone out there on the live stream.

43:23 And thanks, everyone who listened.

Back to show page