Transcript #220: What, why, and where of friendly errors in Python
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:04 your earbuds. This is episode 220, recorded February 10th, 2021. I'm Michael Kennedy.
00:10 I'm Brian Okken.
00:11 And we have a special guest, Hannah. Welcome.
00:12 Hello.
00:13 Hannah Stepnick, welcome to the show. It is so great to have you here.
00:16 Thank you. I'm happy to be here.
00:18 Yeah, it's good to have you. It's so cool. The internet is a global place. We can have
00:23 people from all over. So we've decided to make it an all Portland show this time.
00:27 We could do this in person, actually. Well, not really, because we can't go anywhere. But
00:30 theoretically, geographically, anyway. Yeah, so all three of us are from Portland, Oregon. Very nice.
00:35 Before we jump into the main topics, a few quick things. One, this episode is brought to you by
00:41 Datadog. Check them out at Pythonbytes.fm.datadog. And Hannah, do you just want to give people a quick
00:45 background on yourself?
00:47 Yeah, so I'm Hannah. I have written a book, which is weird to say, about pandas. But I also just go
00:56 around, like, give talks at various conferences, like on Python. So yeah, like I gave re-architecting
01:03 legacy code base recently.
01:05 That sounds interesting and challenging.
01:06 Yeah.
01:07 What was the legacy language? Was it Python or something?
01:10 It was Python. It was like a Flask web application. And then also the front end of it was Vue, like
01:18 Vue.js.
01:19 Oh, yeah.
01:20 So yeah, that's been a fun project. That was through work as developers. Like, you're pretty much always
01:26 working with some form of legacy code. Just depends on how legacy it really is.
01:30 Well, what could be cutting edge in one person's viewpoint might be super legacy in another, right?
01:36 Like, it's Python 3.5. You wouldn't believe it.
01:38 Right.
01:39 Yeah. Very cool. Well, it's great to have you here. I think maybe we'll start off with our
01:46 first topic, which is sort of along the lines of the data science world, some tie-ins to your book.
01:51 And of course, whenever you go to JetBrains, you've got to run your CLI to accept the cookies,
01:56 which is fantastic. And so this topic, this first topic I want to cover is from JetBrains. And it's
02:02 entitled, we downloaded 10 million Jupyter notebooks. I almost said 10,000. 10 million Jupyter notebooks
02:08 from GitHub. Here's what we learned. So this is an article or analysis done by Elena, who's a Harina.
02:14 And yeah, pretty neat. So they went through and downloaded a whole bunch of these notebooks and
02:20 just analyzed them. And there's many, many of them are publicly accessible. And a couple of years ago,
02:25 there were 1.2 million Jupyter notebooks that were public. As of last October, it was eight times as
02:33 many. 9.7 million notebooks available on GitHub. That's crazy, right?
02:37 Wow.
02:38 Yeah. So this is a bunch of really nice pictures and interactive graphs and stuff. So I encourage
02:43 people to go check out the webpage. So for example, one of the questions was, well, what language do you
02:49 think is the most popular for data science, just by judging on the main language of the notebook?
02:54 Anna, you want to take a guess?
02:55 Oh yeah. Python, for sure. Without a doubt.
02:58 That's for sure. The second one, I'm pretty sure no one who's not seen this, there's no way they're
03:05 going to guess. It's Nan. We have no idea. Like we look, we can't tell what language this is in there.
03:13 But then the other contenders are R and Julia. And often people say, oh yeah, well, Julia,
03:18 maybe I should go to Julia from Python. Well, maybe, but that's not where the trends are. Like
03:22 there's 60,000 versus 9 million, you know, as the ratio, I don't know what that number is,
03:26 but it's a percent of a percent type of thing. Wow.
03:29 They also talk about the Python two versus three growth or different. So in 2008, it was about 50%
03:37 was Python two. And in 2020, it's a Python two is down to 11%. And I was thinking about this 11%.
03:43 Like, why do you guys think people, there's still 11% there hanging around?
03:47 I mean, I would guess, speaking of legacy applications, probably it's just hasn't been
03:53 touched, but yeah. Yeah. Those are very likely the ones that were like the original 2016, 17 ones that
03:59 were not quite there. They're still public, right? GitHub doesn't get rid of them. The other one is I
04:04 was thinking, you know, a lot of people do work on Mac or maybe even on some Linux machines that just
04:10 came at the time with Python two. So they're just like, well, I'm not going to change anything. It just,
04:13 I just need to view this thing. I don't have Python problem solved, right? They didn't know
04:17 that there's more, more than one Python. There's a good breakdown of the different versions. Another
04:21 thing that's interesting is looking at the different languages, not language, different libraries used
04:27 during this. So like NumPy is by far the most likely used. And then a tie is pandas and matplotlib,
04:32 and then scikit-learn, and then OS actually for traversing stuff. And then there's a huge long tail.
04:37 And they also talk about combinations like pandas and NumPy are common, and then pandas,
04:41 and then like seaborn, scikit-learn, pandas, NumPy, matplotlib, and so on as a combo. And so
04:46 that's really interesting, like what sets of tools data scientists are using. Yeah. And then another
04:51 one is they looked at deep learning libraries and PyTorch seems to be crushing it in terms of growth,
04:56 but not necessarily in terms of popularity. So it grew 1.3 times or 130%, whereas TensorFlow is more
05:03 popular, but only grew 30% and so on. So there's a lot of these types of statistics in there. I think
05:07 people will find interesting if they want to dive more into this ecosystem. You know, it's one thing
05:12 to have survey and go fill out the survey, like ask people, what do you use? You know, what platform do
05:17 you run on? Vue.js or Linux? Like, okay, well, that's not really a reasonable question, but I guess
05:21 Vue.js, you know, like, but if you just go and look at what they're actually doing on places like
05:26 GitHub, I think you can get a lot of insight. Yeah, for sure. Yeah, I know I use, like I'll go to GitHub
05:31 pretty frequently, like at work when I'm, you know, just like browsing, like, I wonder how you do this
05:36 thing or like, what's the most common way to do this? Or yeah, absolutely. There's just look up,
05:40 like, what's the most popular. So it's a pretty good sign if a lot of people are using it.
05:44 It is. One thing I should probably make more better use of is I know they started adding dependencies,
05:49 like, oh, if you go to Flask, it'll show you Flask is used in these other GitHub repos and stuff.
05:54 Like you could find interesting little connections. I think, oh, this other project uses this cool
05:58 library. I know nothing about, but if they're using it, it's probably good. Yeah, for sure.
06:01 Yeah. I love the dependency feature of looking who's using it. It's neat. Yeah, absolutely. So,
06:07 Brian, you going to cover something on testing this time? Yeah. Will we make you?
06:11 I wanted to bring up something we brought up before. So there's a project called pytest Python Path,
06:19 and it's just a little tiny plugin for pytest. And we did cover it briefly in way back in episode 62,
06:27 two. But at the time I brought it up as, so, okay. So the, I brought it up as a way to, to,
06:34 to just shim, like be able to have your test code, see your source code, but as just like a shortcut,
06:42 like a stop gap until you actually put together like proper packaging for your source code. But the
06:47 more I talked to real life people were testing all sorts of software and hardware, even there's,
06:55 there that that's a simplistic view of the world. So thinking of everybody is working on,
07:00 on packages is, is not real. There's applications for instance, that, that they're never going to set
07:07 up, pull their code together as a package. And that's, that's, that's legitimate. So if you have an
07:13 application and your, your source code is in your source directory and your test code is in your test
07:18 directory, it's just, your tests are just not going to be able to see your source code right off
07:23 the bat. So what's more, tricky is depending on how you run it, they will, or they won't.
07:30 Yeah. Right. Right. If you say run it with PyCharm and you open up the whole thing and it can like put
07:34 together the past, you're all good. But if you then just go into the directory and type pytest, well,
07:38 maybe not.
07:38 It doesn't work. And it just confuses a lot of people. And so more and more, I'm recommending people
07:44 use this, this little plugin and really, the, the, the big benefit is it gives you there's,
07:52 there's, there's a, it does a few things, but the big biggie is just, you can add a Python path,
08:00 setting within your pytest, any file, and you stick your, any file at the top of your project.
08:05 And then you just give it a relative path to where your source code is like source or SRC
08:11 or something else. And then pytest from then on, we'll be able to see your source code.
08:17 It's a really simple solution. It's just, I, I, that's way better than what I do.
08:23 I don't think it's a stop gap. I think it's awesome. So yeah, I totally agree. What I do a lot of times
08:27 is certain parts of my code. I'm like, this is going to get imported. For me, the real tricky thing is
08:32 a limbic, the database, database migration tool and the tests and the web app. And usually I can get the
08:39 tests and the web app to work just fine running them directly. But for some reason, a limbic always
08:43 seems to get weird, like working directories that don't line up in the same way. So it can't import
08:47 stuff. So a lot of times I'll put at the top of some file, you know, go to the Python path and add,
08:54 you know, get the directory name from dunder file and go to the parent, add that to the Python path.
09:00 And now it's going to work from then on basically. And, this seems like a nicer one, although it doesn't
09:05 help me with the limbic, but still, but it, it might, you might be able to add the limbic path right to it.
09:11 So yeah, yeah, for sure. Very cool. So it says, yeah, go ahead, Hannah.
09:14 Oh, I was just going to say, yeah, like this is something I like pretty much every time I set up a new project.
09:19 Like I always have to screw with the Python path. I always like run it initially. And then it's like, Oh,
09:25 can't find blah, blah, blah. And I'm like, Oh, here we go again.
09:28 But I usually always run my projects from Docker though. So I just, you know, hard code that stuff,
09:35 like just once you get it set up. That's cool. Nice. I dream of days when I can use Docker again,
09:41 have an M one Mac and it's in super early, early beta stages. Yeah. It's okay. I don't,
09:47 I don't mind too much because I don't use it that much, but still cool. Brian,
09:51 it says something about dot PTH. I'm guessing path files. What do you know anything about this?
09:55 I have no idea what those are. Oh, dot PTH files. So there's yeah, there's there.
10:01 There are a way to, I don't know a lot. I don't know the detail, the real big details, but it's,
10:07 it's a way to have a you can have a list of different paths when it, within that file. And if you import it
10:15 or don't import it, if you include it in your path, then Python, I think includes all of the
10:22 contents into anyway, I'm actually, I'm blowing smoke. I don't know the details. Okay. Sorry.
10:26 Yeah. But apparently you can have a little more control with TH files, whatever those are.
10:30 Yeah. I don't know much about that either. Yeah. Unfortunately. I mean, I've been using
10:34 OS dot path. So what do I know? All right. Speaking of what do I know? I could definitely learn more
10:40 about pandas and that's one of your items here, huh? Hannah? Yeah. So I thought maybe I just give
10:48 like a little snippet of kind of like some of the stuff I talk about in the book. Fantastic. So yeah,
10:55 here we go. So if we're looking at pandas in terms of like the dependency hierarchy, well,
11:03 and I guess I should start at the beginning. So what is pandas if you're not familiar with it?
11:08 It's a data analysis library for Python. So it's used for doing big data operations. And so like,
11:16 if we look at the dependency hierarchy of pandas, it kind of goes like pandas, which is dependent on
11:21 numpy, which deep down is dependent on this thing called BLOS, which is basic linear algebra subprograms.
11:28 Right. And wasn't there something with BLOS and a Windows and a Windows update in a certain version,
11:33 I think recently? I can't remember. I feel like there was some update that like made that thing
11:37 that wasn't working. Yeah. Usually a big challenge around numpy and versioning and stuff to make it
11:42 work in the short term. Yeah. Usually the BLOS library is built into your OS already. And it just
11:49 points at that. But if you're using something like Anaconda, I think by default, like it installs
11:55 Intel MLK and uses that. But yeah, if you're using like Linux or just like out of the box,
12:01 whatever's on Windows, which is what it is, if you like pip install it, then yeah, there could
12:06 certainly be issues with like dependencies mismatches. Yeah. So, and I've like greatly simplified this,
12:15 but in terms of kind of like the languages and walking down that dependency hierarchy,
12:22 you start out in Python with pandas and then numpy is partially Python and partially C and then BLOS is
12:31 pretty much always written in assembly. And if you don't know what assembly is, it's basically like a
12:35 very, very, very, like probably the lowest level language you can program in. And it's essentially
12:40 like CPU instructions for your processor. And so I've taken this just like basic example here and I'm
12:48 going to kind of like roll with it. So if we're doing just like a basic addition in pandas, say like
12:56 we have column A and we want to add that with column B and like store it back into column C.
13:01 Like a traditional linear algebra vector addition type thing.
13:05 Traditional vector math. So pandas, like if you, if you look at these operations, each,
13:13 each of these like additions on a per row basis is independent, meaning like you could conceivably run
13:20 like each of those additions for each row, like in parallel. Like there's no reason why you have to go
13:25 like row by row. and that's essentially like what kind of like big data analysis libraries are
13:32 like at their core is they, they like understand this conceptually and try to parallelize things as
13:38 much as possible. and so that's kind of like the first like fundamental understanding that you have
13:42 to have, like when working with pandas is like, you should be doing things in parallel as much as you
13:47 can. which means understanding the API and understanding like which functions in the API
13:51 will let you do things in parallel. so like if we're just not using pandas at all, say like
13:59 we're just inventing our own sort of like technique for this, like you might think, well, like each of
14:04 these rows could be broken up like into a thread, right? So like we could say like thread one is going to
14:09 run like the first row addition. And then like thread two is going to run the second row, et cetera.
14:14 but you might find that we'll run into issues with this, in terms of the GIL. So like the gill
14:21 is now otherwise known as the global interpreter lock in Python, prevents us from really like
14:27 running a multi-threaded app, operation, like in parallel. basically Python can run the rule
14:35 is it can run one Python op code at a time and that's it. All right. It doesn't matter if you've
14:41 got, you know, 16 cores, it's one at a time. Yeah. Yeah. And this like is really terrible for,
14:50 yeah. For, for like trying to do things in parallel. Right. So like that, that kind of
14:56 use cases out, like pandas and numpy and, and all that stuff is, is not going to be able to use
15:01 multi-threading. and so, and like, I just want to point out like Python, like at its core has
15:10 this like fundamental problem, which is why they went with the GIL. So like Python manages memory for you.
15:17 and it, how it does that is it keeps track of references to know when to, free up memory.
15:26 so like when memory can be like completely destroyed and somebody else can use it essentially.
15:33 and like that's something you've got to do stuff like Brian sometimes probably has to do with C and
15:37 like free and all those things. Right. Yeah, exactly. Yeah. Yeah. So like C, you have to do this with
15:43 yourself with like Malik and free and all that stuff. But, with Python, it does it for you,
15:49 but that comes at a cost, which means like every single object in Python has this little like counter,
15:54 which is like a reference counter. and so basically like way back in the day, like when
16:00 threading first became a thing, like in order to kind of like avoid this threading problem,
16:07 they came up with the gill, which basically says you can only run one third at a time or like
16:13 one opcode at a time as, as you said.
16:15 And attempts have been made to remove it. Like Larry Hastings has been working on something
16:20 called the galectomy, the removal of the GIL for a while. And the main problem is, if you take
16:25 it away, the way it works now is you have to do lock on all memory access, all variable access,
16:30 which actually has a bigger hit than a lot of the benefits you would get, at least in the single
16:35 threaded case. And I know Peter said like, if we really don't want to make changes to this,
16:39 if it's going to mean slower, single threaded Python, they'll probably not for a while.
16:43 Yeah. Yeah. Yeah. And that, that is a big problem. So like, I mean, if generally what people use,
16:49 like instead of threads in Python is they use like multi-process and they spin up multiple Python
16:55 processes. Right. And like that truly kind of like achieves the parallelism. but anyways,
17:01 I digress. so, so we can't use the gill, but what's interesting to note is when you're,
17:10 running NumPy at its very low level in C, like when you enter and look at the C files,
17:16 it actually is not subject to the GIL anymore because you're in C. and so you can potentially
17:21 run, you know, multi-threaded things in C, and call it from Python. so, but beyond that,
17:31 if we look at BLOS, BLOS has, built in like parallelization for like, hardware parallelization.
17:38 and how it does that is through vector registers. so if you're not familiar with like the
17:46 architecture of CPUs and stuff, like at its core, you basically, only have like, only can
17:53 have a certain small set, maybe like three or four values in your CPU at any one time that you're running
18:00 like ads and multiplies on. and like how that works is you load those values like into the CPU from
18:07 memory. And that load can be quite time consuming. It's really just based on like how far away your memory is from
18:14 from your CPU at the end of the day, like physically on your board. Right. Right. Is it in the cache?
18:18 Is it in the RAM? Yes. Yeah. And that's why we have caches. So like caches are like memory that's closer
18:24 to your CPU. consequently it's also smaller. but that's, that's how you can kind of, you might hear
18:31 like people say like, oh, like so-and-so wrote this really performant program and it like utilizes like the
18:37 size of the cache or whatever. So like basically like if you can load all of that data, like into your cache and
18:43 run the operations on it without ever like having to go back out to memory, like you can make a really
18:48 fast program. Yeah. Yeah. It could be like a hundred times faster than regular memory. Yeah. Yeah. And so
18:53 essentially like that's what, BLOS is trying to do like underneath and, and to NumPy is they're trying
19:00 to take this giant set of data and break it into chunks and load those chunks into your cache and
19:08 operate on those chunks. and then dump them back out to memory and load the next chunk.
19:13 yeah, very cool. Yeah. Thanks for pointing that out. Like I didn't realize that BLOS leveraged some of
19:18 the OS native stuff, nor that it had like special CPU instruction type optimizations. That's pretty cool.
19:24 Yeah. Yeah. so like it has, like on top of the registers, it also has these things called
19:31 like vector registers, which actually can hold like multiple values at a time in your CPU. so like,
19:38 we could take this like simple example of, like the addition and we could actually, well, we can't
19:43 run those like row per row calculations, in parallel with threads. We can with vector registers.
19:51 Okay. and the limitation there is that the memory has to be, sequential when you load it in.
19:57 this is definitely at a level lower than I'm used to working at. How about you?
20:03 But yeah, so, anyways, this is just like kind of the stuff that I talk about in my book.
20:08 it's not necessarily about like how to use pandas. but it's, it's about like kind of like
20:14 what's going on underneath pandas. And then like, once you kind of like build that foundation of
20:18 understanding, like you can understand like better how pandas is working and like how to use it correctly
20:24 and what all the various functions are doing. Fantastic. Yeah. So people can check out your book.
20:28 Got a link to it in the show notes. So, very nice. It's offering me the European,
20:33 the Euro price, which is fine. I don't mind. So yeah. So like, I mean, it's on Amazon too.
20:38 It's on a lot of different platforms, but I figured I'd just point directly to the publishers.
20:43 Yeah, no, that's perfect. Perfect. quick comment. Roy Larson says, NumPy and Intel MKL cause issues. Sometimes you could learn on windows. If something else in the system
20:54 uses Intel MKL. Okay. Yeah. Interesting. I have no experience with that, but I can believe it. Intel
21:00 has a lot of interesting stuff. They even have a special iPhone, compiled version,
21:04 I think for Intel if you use potentially, I'm not sure they have some high performance version.
21:08 Yeah. Yeah. Yeah, they do. Yeah.
21:10 Nice. Also in Portland, you can keep it in Portland. There we go.
21:15 Now, before we move on to the next item, let me tell you about our sponsor today.
21:19 Thank you to data dog. So they're sponsoring data dog. And if you're having trouble visualizing latency,
21:25 CPU, memory bottlenecks, things like that in your app, and you don't know why you don't know where it's
21:30 coming from or how to solve it, you can use data dog to correlate logs and traces at the level of
21:35 individual requests, allowing you to quickly troubleshoot your Python app. Plus they have
21:38 a continuous profiler that allows you to find the most resource consuming parts of your production code
21:44 all the time at any scale with minimal overhead. So you just point out your production
21:47 server, run it, which is not normally something you want to do with diagnostic tools, but you can with
21:51 their continuous profiler, which is pretty awesome. You'll be the hero that got that app back on track at
21:56 your company, get started with a free trial at pythonbytes.fm/datadog, or just click the link in
22:01 your podcast player show notes. Now, I'm sure you all have heard that working with pickle has all sorts
22:08 of issues, right? The pickle is a way to say, take my Python thing, make a binary version of bits that
22:13 looks like that Python thing so I can go do stuff with it, right? That's generally got issues, not the
22:19 least of which actually are around the security stuff. So like you unpickle something to deserialize it,
22:25 sit back is actually potentially running arbitrary code. So people could send you a pickle virus.
22:30 I don't know what that is like a bad, a rotten pickle or whatever. That wouldn't be good.
22:34 So there's a library I came across that solves a lot of the pickle problems.
22:39 It's supposed to be faster than pickle and it was cleverly named quickle.
22:43 Either of you heard of this thing?
22:46 No.
22:47 Yeah, it's cool, right? So here's the deal. It's a fast serialization format for a subset of Python types.
22:54 So you can't pickle everything, but you can pickle like way more say than JSON. And the
22:59 reasons they give to use it are it's fast. If you check out the benchmarks, I'll pull those up in a
23:03 second. It's one of the fastest ways to serialize things in Python. It's safe, which is important.
23:09 And unlike pickle deserializing a user provided message does not allow arbitrary code execution.
23:14 That seems like the minimum bar. Like, oh, I got stuff off the internet. Let's try to execute that.
23:19 What's that going to do? Oh, look, it's reading all my files. That's nice.
23:22 All right.
23:23 It also, it's a flexible because it supports more types. And we'll also learn about a bunch of other
23:30 libraries while we're at it here, which is kind of cool. A bunch of things I heard of like
23:34 MSG pack or well, Jason, you may have heard of that. And the other main problem you get with some
23:39 of these binary formats is you can end up where in a situation where you can't read something.
23:44 If you make a change your code, like, so imagine I've, I've got a user object and I've pickled them
23:48 and put them into a Redis cache. We upgrade our web app, which adds a new field to the user object.
23:53 That stuff is still in cache. After we restart, we try to read it. Oh, that stuff isn't there anymore.
23:58 You can't, you know, user cache anymore. Everything's broken, et cetera, et cetera. So it has a concept of
24:03 schema evolution, having different versions of like history. So there's ways that older messages can be
24:09 read without errors, which is pretty cool. Yeah. That's nice. Yeah. Neat, huh? Yeah. I'll pull up the benchmarks.
24:14 There's actually a pretty cool little site here. It shows you some examples on how to use it. I mean,
24:18 it's incredibly simple. It's like, dump this as a string, read this, you know, deserialize this.
24:22 It's real simple. So, but there's quite interesting analysis, live analysis where you can click around
24:29 and you can actually look at like load speed versus reads like serialized versus deserialized speed,
24:35 how much memory is used and things like that. And it compares against pickle tuples,
24:39 protobuf, pickle itself, ORJSON, MSGPAC, quickle, and quicklestrux.
24:44 There's a lot of things. I mean, I knew about two of those, I think. That's cool.
24:48 But these are all different ways. And you can see, like in all these pictures, generally,
24:52 at least the top one where it's time shorter is better. Right? So you can see if you go with
24:57 there, like quicklestrux, it's quick rule of thumb, maybe four or five times faster than pickle,
25:02 which I presume is a way faster than JSON, for example.
25:04 You know, you'll also see the memory size, which actually varies by about 50% across the
25:09 different things. Also speed of loading up a whole bunch of different objects and so on. So yeah,
25:14 you can come check out these analysis here. Let's see all the different libraries that we had. Yeah,
25:19 I guess we read them all off basically there, but yeah, there's a bunch of different ways which are,
25:23 you know, not pickle itself to do this kind of binary serialization, which is pretty interesting.
25:28 I think it does. Protobuf, that's pretty cool. Actually, I want to try this out. It looks neat.
25:33 Yeah. Yeah, it looks really right. And one of the things I was just looking at the source code,
25:37 I love that they use pytest to test this. Of course, you should use pytest. But the, I can't believe
25:45 I'm saying this, but this would be the perfect package to test with a Gherkin syntax. Don't you think?
25:50 Because it's a pickle thing. Oh my gosh. You've got to use the Gherkin syntax.
25:54 So yeah, you definitely should. And Roy threw out another one like UQ foundation,
26:02 Dill package deals with many of the same issues, but because it's binary and has all the same
26:07 sort of versioning challenges you might run into. Well, Dill, the Dill package. That's funny.
26:12 Yeah, pretty good. Pretty good. All right. So anyway, like, you know, I'm,
26:16 I'm kind of a fan of JSON these days. I've had enough XML with custom namespaces in my life that
26:22 I really don't want to go down that path and XSLT and all that. But, you know, I've really shied away
26:27 from these binary formats for a lot of these reasons here. But, you know, this might make me interested.
26:33 If I was going to say throw something into a cache, the whole point is put it in the cache,
26:36 get it back, read it fast. This might be decent. Yeah. Yeah. It definitely seems to address a lot of the
26:42 concerns I have with pickle for sure. Yeah. And I don't, did I talk about the types
26:46 somewhere in here? We have time. Yeah. Here's, there's quite a list of types. You know, one's
26:50 really nice. Date time. I can't do that with JSON. Why is, why in the world doesn't JSON support
26:54 some sort of time information? Oh, well, but you've got most of the fundamental types that you might run
26:59 into. All right. So, quick, give it a quick look. All right, Brian, what you got here?
27:05 Well, I was actually reading a different article. But the, it came up, we, I think we've talked about
27:14 friendly traceback. It's a package that just sort of tries to make your tracebacks nicer. But,
27:20 but when I didn't realize it had a console built in. So I was pretty blown away by this. So there's a,
27:28 it's, you know, it's not trivial to get set up. It's not that terrible, but you,
27:31 you have to start your own console, start the REPL, import friendly traceback, and then do friendly
27:38 traceback start console. But at that point, you have just like the normal console, but you have better
27:45 tracebacks. And then also you have all these different cool functions you can call like,
27:50 what, what, what, where, why, and explain and more. And basically if something goes
27:58 wrong while you're playing with Python, you can interrogate it and ask like for more information.
28:04 And that's just pretty cool. The, the why is really great. So if you have the, one of the examples I saw
28:11 before, and I'm, I think I might start using this when teaching people is, we often have like
28:17 exceptions, like you assigned to none or you assigned to something that can't be assigned,
28:21 or you, you, you didn't match up the bracket and the parentheses or something like that correctly.
28:27 and you'll get like just syntax error and it'll point to the syntax error, but you might
28:32 not know more. So you can just type why a W H Y with parentheses. Cause it's a, or yeah,
28:39 because it's a function and it'll tell you why, why it's like a, the great storytelling,
28:45 right. The five Y's of a bug. Yeah. so then you get W's of a bug. Yep. You can, you can say
28:52 what, like to, to repeat what the error was, why we'll tell you why that was an error. And then
28:58 specifically what you did wrong. And then where it will show you if you've, if you've been asking
29:03 all sorts of questions and you lost where the actual trace back was, you can say where, and it'll point
29:08 to directly to it. And, I think this is going to be cool. I think I'll use this when trying to teach,
29:13 especially kids, but really just people new to Python. Tracebacks can be very helpful for them.
29:19 Yeah. Like even, I know, like I sometimes have to look up like certain error messages that I'm like,
29:24 not familiar with. So yeah, that would be super helpful. I could just do it right in the console.
29:28 Yeah. I totally agree. You're going to have to help me find a W that goes with this,
29:32 but I want the, what would be effectively Google open closed privacy?
29:40 You know, because so often you get this huge trace back and you've got these errors. And if
29:43 you go through and you select it, like for example, the area you see on the screen,
29:46 unbound local error, local variable greetings in quotes, reference before assignments. Well,
29:52 the quotes means oftentimes in search, like it must have the word greeting. And that's the one thing that
29:57 is not relevant to the, the, the Googling of it. Right? So if I'm a beginner and I even try to Google
30:02 that I might get a really wrong message. Right? So if you could say, Google this in a way that is
30:08 most likely going to find the error, but without carrying through like variable details, file
30:14 name details, but just the essence of the error, that would be fantastic. Now, how do we say that with W?
30:21 You could just say, Whoa, or, or maybe www or WTF. I mean, come on, there's a lot of WTF.
30:31 But wouldn't that be great. And so that's also part of this package that you see,
30:36 at their main site where you've got these really cool, like visualized stuff, right? Where it's
30:42 sort of more tries to tell you the problem of the error with the help text and whatnot.
30:45 Yeah. Yeah. This is cool. Also uses rich, which is a cool library we talked about as well.
30:49 I love rich. I include rich in everything now, even just, just to print out simple,
30:54 better tables. It's great. Yeah, for sure. Hannah, do you see yourself using this or is it,
30:59 are you more, more in a notebooks? Oh no. I, I mean, I usually use like the PDB debugger. So yeah,
31:07 I mean, I'm not sure if really this as it is would be, like a problem. It would depend on how
31:14 much information it has about like obscure errors from dependent libraries which is usually what I
31:20 end up looking at these days but yeah I mean conceivably like yeah that could be helpful
31:25 yeah if we get that WTF feature added then yeah oh yeah for sure gosh speaking of errors let's uh cover your last item last item of the show uh yeah so um I uh at work
31:39 uh work in um the security org and I write uh like automation tools for them which means uh
31:47 sometimes the repos that we work on get to be like test subjects um for for new like requirements and
31:55 such um and so recently uh our org was exploring uh like static code analysis looking for like
32:04 security vulnerabilities in the code um and so I ran across bandit and I integrated bandit
32:09 into our we don't have time to uh go through these old legacy code and fix these problems oh wait this
32:15 is what it means oh sorry yes we can do that right now that's the kind of report you gave you got from
32:21 bandit yeah exactly um so yeah we integrated bandit into our legacy code base and we actually it's funny
32:29 you say that because I the bug that I found using bandit was actually like a from the legacy code um
32:35 that does not surprise me yeah uh so it was it was a pretty stupid like error um like it was pretty
32:44 obvious like if you were doing code review but because it was legacy code and it was like already there
32:49 um I just like never noticed um but it was basically like issuing like a request with like no verify
32:55 uh so it was like an unverified like http request um and bandit was like yeah this broken ssl
33:03 certificate keeps breaking it I just told it to ignore it oh yeah yeah well and I honestly like I think that
33:09 might have been why it was there in the first place because I I know like the oh like several years ago
33:14 like had some certificate issues so yeah that might be and it was it was like an internal talking to
33:20 internal so it was like maybe even a self-signed certificate that nothing trusted but they get
33:26 technically there yeah yeah it was like we'll just we'll just do that um but yeah so um bandit is
33:33 basically like like a linter but it looks for security issues um so you could just like pip install it um
33:40 and then just like run it on your code and it will find a bunch of different potential security issues
33:45 like just by like statically analyzing your code um and I've uh pretty much like come to the opinion
33:52 that like why haven't I done this on all of my other projects like I should be doing this on every single
33:58 project um like because you know like as as like a developer I always run like lint and black and stuff
34:05 like that um so I figure you know I should probably be running bandit too yeah cool yeah well very nice uh
34:12 it's a good recommendation for people as well and it's got a lot of cool you can go and actually see
34:16 the list of the things that it tests for and even has test plugins as well which is pretty cool yeah
34:21 yeah so you can like make your make your own if you want um and it has like all the common linter sort of
34:27 like functionality like ignore these files or like ignore these rules or even you know like ignore this
34:32 rule on this particular line stuff like that yeah absolutely which is pretty sweet I love that things like
34:37 bandit are around because um uh thankfully uh developing web stuff is becoming easier and easier
34:45 but it's then now the barrier to to entry is lower you still have to have all those security concerns
34:52 that you had before that normal I mean usually people were just had more experience but they would make
34:58 mistakes anyway but now I think this is one of the reasons why I love this is because people new to it
35:03 might be terrified about the security part but having uh bandit on there looking over their shoulder is
35:08 great yeah yeah like don't publish with the debug setting on and blast or jango or anything like that
35:14 simple obvious stuff and like honestly like having worked in the security org for about a year now like
35:21 I've come to the understanding that a lot of security issues stem from just like basic like duh sort of
35:29 misconfigurations so like something like this is perfect and I really really like that you added um
35:36 you you wrote in the show notes um some pre-commit uh how to how to hook this up with pre-commit because
35:42 I think having it in pre-commit or in a ci pipeline is important because um like you guys were joking
35:48 about often security problems come in because somebody's just trying to fix something that broke yeah but
35:54 they don't really realize how many other things it affects so yeah yeah yeah besides down just we got
36:00 to make it work quick just just turn on the debug thing we'll just look real quick and then you forget
36:03 to turn it off or whatever yeah yeah for sure yeah yeah just stupid human errors nice all right I want to go
36:11 back real quick Brian because uh your uh mentioned a friendly trace back got a lot of stuff so let me just do a
36:17 quick uh audience reaction Robert says it is cool Brian John Sheehan says I was just thinking it's something
36:23 the same would be cool it's a great teaching concept Anthony says super useful um John says I've been doing
36:28 more demo code in the console rather than the idea this looks like it would help w how to fix it w
36:36 wow how w i love it Robert very good Zach says uh what is this magic this looks amazing and so on all
36:44 right well thanks everyone uh I'm glad you all like that uh so that's it for our main items you know um
36:50 Brian you got any extras you want to throw out there you were uh doing something with climate change or what
36:55 are you doing this week um yeah I'm sharing the room with some people just a sec uh the uh I did do
37:02 two meetups uh with uh with uh Noah and uh then with the Aberdeen python meetup wait wait I got
37:09 I got to interrupt you really quick did all the talk that Hannah did about bandit and viruses get you
37:14 it's all right I'm sorry sorry about that carry on well I missed all this talk with Hannah that Hannah
37:25 had about viruses and in hacking and stuff with bandit did it make you nervous and you had to put on your
37:31 your mask no I just I'm in a group meeting in their group room and somebody came in but it's okay I'm
37:37 just teasing carry on um the that's funny I also wanted to look like a bandit yeah exactly but I was
37:44 thrilled that uh Noah uh asked me to to speak to them that was neat and then the python Aberdeen people
37:51 um and also like but they mentioned that Ian from the python Aberdeen group said that he had an arrangement
37:57 with you that when you Michael that when the the pandemic is over you're gonna go over and they're
38:02 gonna you're gonna do like a whiskey tour or something like that so I'm I don't know the
38:06 details but it sounds good to me already anyway if that happens I want to go along yeah it's a python
38:12 bites outing let's do it and then we have uh uh there are pdx west meetup tomorrow you're gonna speak
38:19 that's kind of exciting so yeah it's gonna be fun and people as virtual so people can attend however
38:23 um I'm also I've got feedback from both uh you and um and Matt Harrison gave me some feedback so I'm
38:31 updating my training page on testing code so um because I really like working with teams so I'd and
38:37 anybody else wants to give me feedback on my training page maybe I could I'd love to hear it so yeah or maybe
38:43 they even want to have some high test training for their team yeah I mean testing is something that uh
38:48 I think teaching a team at a time is a great thing because people can uh can really um I don't know that
38:53 we can talk about their their particular problems not general problems it's good so yeah for sure well
38:59 you also need more of a team buy-in on testing right because like if one person writes code and
39:03 won't write the test another person is like really concerned about making the test fast it's super
39:07 frustrating yeah the person who doesn't want to run the test keeps it breaking the build but like you know
39:12 anyway it's a team sort of sport in that regard yep yeah all right awesome so I got a couple quick
39:17 things PEP 634 structural pattern matching in python has been accepted for python 310 that's like
39:24 imagine a switch case that has about a hundred different options that's what it is yeah with
39:29 like like reg x not quite but sort of like style like you can have like these patterns and stuff that
39:34 happen in the cases I don't know how to feel about this like if uh let me put a perspective like if the
39:40 walrus operated was controversial like this is like this is like a way bigger change to the language so
39:45 I don't know it it's both awesome and terrifying yes exactly yeah I was gonna say I'm kind of surprised
39:51 yeah yeah so am I Hannah that like this got accepted it seemed to be sort of counter to the simplicity of
39:56 python like I I did not at all against having a simple switch statement that does certain things but
40:01 this seems like a lot I may come to love it one thing that maybe would help me come to a better
40:05 understanding and acceptance was if the PEP page had at least one example of it in use like the
40:10 whole page that talks about all the details says I don't believe there's a single code sample ever
40:15 well there's a tutorial page as well so oh is there there's the tutorial page okay maybe that's where
40:20 I should be going to check it out yeah but it still sort of feels like a five barrel foot gun yeah
40:25 it does well but the page that I'm looking like the pip thing that I'm listening to the official PEP I
40:29 don't think it has uh does it have the tutorial yeah no you're right it does it does um somewhere down
40:35 yeah PEP 636 yeah it's a different PEP that is the tutorial for the PEP interesting I didn't realize
40:40 that it's kind of meta honestly anyway I to me I'm a little surprised it's accepted fine um I know people
40:46 worked really hard on it and congratulations a lot of people really want it comes from Haskell right so
40:50 Haskell had this like pattern matching like alternate struct thing I don't know I just feel
40:53 like Haskell and Python are far away from each other so that's my first impression I will
40:58 probably come to love it at some point uh PyCon registration is open so if you want to go to PyCon
41:03 you want to attend and be more part of it than just like watching the live stream on YouTube be part of
41:07 that I think I'm going to try to make a conscious effort to attend the virtual conference not just
41:11 catch some videos so you can do that yeah PyCon is awesome like just I my first conference was PyCon
41:18 and then I went to other conferences and I was like what are wrong with these conferences like
41:23 yeah I know I feel the same way I know it's uh it's really really special I'm sure the virtual one
41:31 will be good I can't wait for the in-person stuff to come back because it really for sure yeah it's a
41:36 whole another experience in person I consider it basically my um geek holiday where I get get away
41:41 and like just get a hang out with my geek friends I happen to learn stuff on there totally
41:44 and then Python web comp is coming up and that's uh registration is open for that as well um and I
41:51 suppose probably PyCascades which Brian and I are on a panel at there as well oh nice I put I put a link
41:56 into an hour of code for Minecraft which has to do with programming Minecraft with Python if people are
42:01 looking to teach kids stuff uh that looks pretty neat so um my daughter's super into Minecraft I don't
42:06 do anything with it but if if you are and you want to make it part of your curriculum uh that's pretty
42:10 cool Hannah anything you want to throw out there before uh we break out the joke nope I'm good
42:15 awesome do it do it all right all right so this one we have something a little more interactive for
42:20 everyone we've got a um a song about PEP 8 about writing clean code this is written and and uh produced
42:28 sung by Leon Sandoy uh goes by Lemon and him and his team over at Python Discord he runs Python Discord and
42:34 apparently it was a team effort creating this and the reason I'm covered is a bunch of people sent it
42:38 over so Michael Rogers Valet uh sent it over so you should cover this Dan Bader said check this out
42:43 Alan McElroy said hey check out this thing so all right I actually uh spoke to Lemon and said hey do
42:49 you mind if we play this he said no that'd be awesome give us a shout out of course so we're
42:53 gonna actually play the song as part of this in the live stream you get the video on the audio you get
42:57 well audio so I'm gonna kick this off and we'll come back and I'd love to hear Brian and Hannah's thoughts
43:02 here we go you don't need any curly braces just for spaces just for spaces
43:26 wildcard imports should be avoided in most cases in most cases try to make sure there's no trailing white space it's confusing it's confusing
43:47 trailing commas go behind list items get blamed titans get blamed titans
43:57 and comments are important as long as they're maintained when comments are misleading it will drive people insane
44:09 just try to be empathic just try to be empathic just try to be a friend it's really not that hard just adhere to
44:22 papade. Papade.
44:33 constants should be named, all capital letters, and live forever, live forever.
44:44 And camel case is not for python, never ever, never ever.
44:55 And never use a bare exception, be specific, be specific.
45:06 No one likes the horizontal scroll bar, keep it succinct, keep it succinct.
45:17 And comments are important, as long as they're maintained.
45:23 When comments are misleading, it will drive people insane.
45:29 Just try to be empathic, just try to be a friend.
45:34 It's really not that hard, just adhere to.
45:40 Papade.
45:44 Papade.
45:50 Papade.
45:55 Papade.
46:02 Papade.
46:04 Papade.
46:08 Papade.
46:08 That was amazing.
46:09 I can sympathize with so much of what he's saying.
46:14 I'm just having flashbacks to a discussion I had with my teammate about comments.
46:19 And being like, "No, this comment doesn't actually describe what the comment is doing."
46:27 It's worse than having no comment. It really is.
46:30 It really is, yeah.
46:31 I love it.
46:32 Or if it describes literally what the code is doing and not high-level.
46:36 Exactly.
46:37 Why or background or anything other than...
46:40 The why.
46:41 The why is important.
46:42 Yeah.
46:43 I love it.
46:44 So, two things.
46:45 Lemon and team, well done on the song.
46:47 And man, you've got a great voice.
46:48 That's actually...
46:49 It was beautiful and funny.
46:51 Yeah.
46:52 It was amazing.
46:53 All right.
46:53 Well, Brian, we probably should wrap it up.
46:54 Yeah.
46:55 All right.
46:56 Well, Hannah, thanks so much for being here.
46:58 It's good to have you on the show.
46:59 And Brian, thanks as always.
47:00 Everyone, thanks for listening.
47:01 Thanks for having me.
47:02 Bye.
47:03 Bye, all.
47:04 Thank you for listening to Python Bytes.
47:05 Follow the show on Twitter via @pythonbytes.
47:07 That's Python Bytes as in B-Y-T-E-S.
47:10 And get the full show notes at pythonbytes.fm.
47:13 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
47:17 We're always on the lookout for sharing something cool.
47:20 On behalf of myself and Brian Okken, this is Michael Kennedy.
47:23 Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.