Transcript #220: What, why, and where of friendly errors in Python
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:04 This is episode 220, recorded February 10th, 2021.
00:09 I'm Michael Kennedy.
00:10 I'm Brian Okken.
00:11 And we have a special guest, Hannah. Welcome.
00:12 Hello.
00:13 Hannah Stepnik, welcome to the show. It is so great to have you here.
00:17 Thank you. I'm happy to be here.
00:18 Yeah, it's good to have you.
00:20 It's so cool, the internet is a global place.
00:22 We can have people from all over.
00:23 So we've decided to make it an all Portland show this time.
00:26 We could do this in person, actually.
00:28 - Well, not really because we can't go anywhere, but theoretically, geographically anyway.
00:32 Yeah, so all three of us are from Portland, Oregon.
00:34 Very nice.
00:35 Before we jump into the main topics, a few quick things.
00:38 One, this episode is brought to you by Datadog.
00:41 Check them out at pythonbytes.fm/datadog.
00:44 And Hannah, do you just want to give people a quick background on yourself?
00:47 - Yeah, so I'm Hannah.
00:50 I have written a book, which is weird to say, about pandas, but I also just go around, I give talks at various conferences, like on Python.
01:01 So yeah, like I gave re-architecting a legacy code base recently.
01:05 - That sounds interesting and challenging.
01:06 - Yeah.
01:07 (laughs)
01:08 - What was the legacy language?
01:09 Was it Python or something?
01:10 - It was Python.
01:12 It was like a Flask web application.
01:15 And then also the front end of it was Vue, like Vue.js.
01:19 - Oh yeah, uh-huh.
01:20 - So yeah, that's been a fun project.
01:22 That was through work as developers.
01:25 Like you're pretty much always working with some form of legacy code, just depends on how legacy it really is.
01:30 - Well, what could be cutting edge in one person's viewpoint might be super legacy in another, right?
01:36 Like it's Python 3.5, you wouldn't believe it.
01:38 - Right.
01:40 - Yeah, very cool.
01:42 Well, it's great to have you here.
01:44 I think maybe we'll start off with our first topic, which is sort of along the lines of the data science world, some tie-ins to your book.
01:51 And of course, whenever you go to JetBrains, you've got to run your CLI to accept the cookies, which is fantastic.
01:57 And so this topic, this first topic I want to cover is from JetBrains and it's entitled, We Downloaded 10 Million Jupyter Notebooks.
02:06 I almost said 10,000, 10 million Jupyter Notebooks from GitHub.
02:09 Here's what we learned.
02:10 So this is an article or analysis done by Elena Guzacharina and yeah, pretty neat.
02:16 So they went through and downloaded a whole bunch of these notebooks and just analyzed them.
02:21 There's many, many of them are publicly accessible.
02:24 And a couple of years ago, there were 1.2 million Jupyter notebooks that were public.
02:29 As of last October, it was eight times as many, 9.7 million notebooks available on GitHub.
02:36 That's crazy, right?
02:37 - Wow.
02:38 - Yeah, so this is a bunch of really nice pictures and interactive graphs and stuff.
02:42 So I encourage people to go check out the webpage.
02:45 So for example, one of the questions was, well, what language do you think is the most popular for data science just by judging on the main language of the notebook.
02:54 Hannah, you wanna take a guess?
02:55 - Oh yeah, Python for sure, without a doubt.
02:58 (laughing)
02:59 - That's for sure.
03:01 The second one, I'm pretty sure no one who's not seen this, there's no way they're gonna guess.
03:06 It's NaN.
03:07 We have no idea.
03:11 We looked, we can't tell what language this is in there.
03:14 But then the other contenders are R and Julia.
03:16 And often people say, oh yeah, well Julia, maybe I should go to Julia from Python.
03:20 Well, maybe, but that's not where the trends are.
03:22 Like there's 60,000 versus 9 million, you know, as the ratio, I don't know what that number is, but it's a percent of a percent type of thing.
03:29 - Wow.
03:30 - They also talk about the Python 2 versus 3 growth or difference.
03:34 So in 2008, it was about 50% was Python 2.
03:38 And in 2020, Python 2 is down to 11%.
03:42 And I was thinking about this 11%, like, why do you guys think people, there's still 11% there hanging around?
03:47 I mean I would guess, speaking of legacy applications, probably it just hasn't been touched, but also--
03:55 - Yeah, those are very likely the ones that were like the original 2016, 17 ones that were not quite there.
04:01 They're still public, right?
04:02 GitHub doesn't get rid of them.
04:03 The other one is, I was thinking, a lot of people do work on Mac, or maybe even on some Linux machines that just came at the time with Python 2, that are just like, well, I'm not gonna change anything.
04:13 I just need to view this thing.
04:15 I have Python, problem solved, right?
04:17 they didn't know that there's more than one Python.
04:19 There's a good breakdown of the different versions.
04:21 Another thing that's interesting is looking at the different languages, not language, different libraries used during this.
04:27 So like NumPy is by far the most likely used, and then a tie is Pandas and Matplotlib, and then scikit-learn, and then OS actually for traversing stuff, and then there's a huge long tail.
04:37 And they also talk about combinations, like Pandas and NumPy are common, and then Pandas, and then like Seaborn, scikit-learn, Pandas, NumPy, Matplotlib, and so on as a combo.
04:46 So that's really interesting, like what sets of tools data scientists are using.
04:50 And then another one is they looked at deep learning libraries and PyTorch seems to be crushing it in terms of growth, but not necessarily in terms of popularity.
04:58 So it grew 1.3 times or 130%, whereas TensorFlow is more popular, but only grew 30% and so on.
05:05 So there's a lot of these types of statistics in there.
05:07 I think people will find interesting if they wanna dive more into this ecosystem.
05:12 You know, it's one thing to have survey, you can go fill out the survey, like ask people, what do you use?
05:16 You know, what platform do you run on?
05:17 Vue.js or Linux?
05:19 Like, okay, well that's not really a reasonable question, but I guess Vue.js, you know?
05:23 But if you just go and look at what they're actually doing on places like GitHub, I think you can get a lot of insight.
05:27 - Yeah, for sure.
05:28 Yeah, I know I use, like I'll go to GitHub pretty frequently like at work when I'm, you know, just like browsing, like I wonder how you do this thing, or like what's the most common way to do this?
05:38 - Yeah, absolutely.
05:39 - I just look up like what's the most popular.
05:41 So it's a pretty good sign a lot of people are using it.
05:44 - It is, one thing I should probably make better use of is I know they started adding dependencies, like, oh, if you go to Flask, it'll show you Flask is used in these other GitHub repos and stuff.
05:54 Like, you could find interesting little connections, I think, oh, this other project uses this cool library I know nothing about, but if they're using it, it's probably good.
06:01 - Yeah, for sure.
06:02 - Yeah, I love the dependency feature of looking who's using it.
06:05 - Yeah, absolutely.
06:06 So, Brian, you gonna cover something on testing this time?
06:09 - Yeah, I wanna--
06:10 - If we make you?
06:11 (laughing)
06:13 I wanted to bring up something we brought up before.
06:15 So there's a project called pytest Python Path, and it's just a little tiny plugin for pytest.
06:23 And we did cover it briefly way back in episode 62, but at the time I brought it up as, so, okay, so I brought it up as a way to just shim, like be able to have your test code see your source code, but as just like a shortcut, like a stop gap until you actually put together like proper packaging for your source code.
06:47 But the more I talk to real life people who are testing all sorts of software and hardware even, that's a simplistic view of the world.
06:57 So thinking of everybody is working on packages is not real.
07:02 There's applications, for instance, that they're never going to set up, hold their code together as a package.
07:09 And that's legitimate.
07:12 So if you have an application and your source code is in your source directory and your test code is in your test directory, it's just, your tests are just not gonna be able to see your source code right off the bat.
07:24 - Right, right.
07:25 What's more tricky is depending on how you run it, they will or they won't, right?
07:31 If you say run it with PyCharm and you open up the whole thing and it can put together the paths, you're all good, but if you then just go into the directory and type pytest, well, maybe not.
07:39 - It doesn't work and it just confuses a lot of people.
07:41 And so more and more, I'm recommending people to use this little plugin.
07:47 And really, the big benefit is it gives you--
07:54 it does a few things.
07:55 But the biggie is just you can add a Python path setting within your pytest.ini file.
08:03 And you stick your.ini file at the top of your project.
08:06 And then you just give it a relative path to where your source code is like source or SRC or something else.
08:12 And then pytest from then on will be able to see your source code.
08:17 It's a really simple solution.
08:19 It's just, I--
08:21 - That's way better than what I do.
08:23 - I don't think it's a stopgap.
08:24 I think it's awesome, so.
08:25 - Yeah, I totally agree.
08:26 What I do a lot of times is certain parts of my code, I'm like, this is gonna get imported.
08:31 But for me, the real tricky thing is Alembic, the database migration tool and the tests and the web app.
08:38 And usually I can get the tests in the web app to work just fine running them directly.
08:41 But for some reason, Alembic always seems to get weird, like working directories that don't line up in the same way.
08:46 So it can't import stuff.
08:48 So a lot of times I'll put at the top of some file, go to the Python path and add, get the directory name from dunder file and go to the parent, add that to the Python path.
09:00 And now it's gonna work from then on basically.
09:03 And this seems like a nicer one, although it doesn't help me with Alembic, but still.
09:06 - But it might, you might be able to add the limbic path right to it, so.
09:12 - Yeah, yeah, for sure.
09:13 - Pretty cool, so it's, yeah, go ahead, Hannah.
09:15 - Oh, I was just gonna say, yeah, like, this is something I like pretty much every time I set up a new project, like, I always have to screw with the Python path.
09:22 I always, like, run it initially, and then it's like, oh, can't find blah, blah, blah, and I'm like, oh, here we go again.
09:28 But I usually always run my projects from Docker, though, so I just, you know, hard code that stuff, like just directly in the environment variables.
09:37 - Once you get it set up, yeah, that's cool.
09:39 Nice.
09:40 I dream of days when I can use Docker again, have an M1 Mac and it's in super early, early beta stages.
09:46 - Oh no.
09:47 - Yeah, it's all good.
09:48 I don't mind too much because I don't use it that much, but it's still cool.
09:50 Brian, it says something about .PTH, I'm guessing path files.
09:55 Do you know anything about this?
09:55 I have no idea what those are.
09:57 - Oh, .PTH files.
09:58 So there's, yeah, there are a way to, I don't know a lot.
10:04 I don't know the detail, the real big details, but it's a way to have a, you can have a list of different paths within that file.
10:13 And if you import it or don't import it, if you include it in your path, then Python, I think, includes all of the contents into, anyway, I'm actually I'm blown smoke.
10:25 I don't know the details.
10:26 - Okay. - Sorry.
10:27 - Yeah, but apparently you can have a little more control with ETH files, whatever those are.
10:31 - Yeah, I don't know much about that either.
10:32 - Yeah. - Unfortunately.
10:34 I mean, I've been using os.path, so what do I know?
10:36 All right.
10:37 (both laughing)
10:38 Speaking of what do I know, I could definitely learn more about pandas, and that's one of your items here, right, Hannah?
10:43 - Yeah, so-- - Tell us about it.
10:45 - I thought maybe I just give a little snippet of some of the stuff I talk about in the book.
10:53 - Mm-hmm, yeah, fantastic.
10:55 - So yeah, here we go.
10:58 So if we're looking at pandas in terms of the dependency hierarchy, Well, and I guess I should start at the beginning.
11:05 So what is pandas?
11:07 If you're not familiar with it, it's a data analysis library for Python.
11:11 So it's used for doing big data operations.
11:15 And so like, if we look at the dependency hierarchy of pandas, it kind of goes like pandas, which is dependent on NumPy, which deep down is dependent on this thing called BLOS, which is basic linear algebra subprograms.
11:29 - Right, and wasn't there something with BLOS and a Windows update in a certain version, I think recently, I can't remember.
11:35 I feel like there was some update that made that thing that wasn't working.
11:38 - Yeah, usually--
11:39 - So there was a big challenge around NumPy and versioning and stuff to make it work in the short term privacy, okay.
11:43 - Yeah, usually the BLAS library is built into your OS already, and it just points at that.
11:49 But if you're using something like Anaconda, I think by default it installs Intel MLK and uses that.
11:58 But yeah, if you're using Linux or just out of the box, whatever's on Windows, which is what it is if you pip install it, then yeah, there could certainly be issues with dependencies mismatches.
12:10 Yeah, so, and I've greatly simplified this, but in terms of kind of the languages and walking down that dependency hierarchy, you start out in Python with pandas, And then NumPy is partially Python and partially C.
12:30 And then BLAS is pretty much always written in assembly.
12:33 And if you don't know what assembly is, it's basically like a very, very, very, like probably the lowest level language you can program in.
12:39 And it's essentially like CPU instructions for your processor.
12:44 And so I've taken this just like basic example here and I'm gonna kind of like roll with it.
12:51 So if we're doing just like a basic addition in pandas, say like we have column A and we want to add that with column B and like store it back into column C.
13:01 Like a traditional linear algebra vector addition type thing.
13:05 Yes.
13:06 Traditional vector math.
13:08 So pandas, like if you look at these operations, each of these like additions on a per row basis is independent, meaning like you could conceivably run like each of those additions for each row like in parallel.
13:23 There's no reason why you have to go row by row.
13:26 And that's essentially what big data analysis libraries are at their core, is they understand this conceptually and try to parallelize things as much as possible.
13:38 And so that's kind of the first fundamental understanding that you have to have when working with pandas is you should be doing things in parallel as much as you can, which means understanding the API and understanding which functions in the API will let you do things in parallel.
13:54 So if we're just not using pandas at all, say we're just inventing our own technique for this, you might think, well, each of these rows could be broken up into a thread.
14:07 So we could say thread one is going to run the first row addition, and then thread two is going to run the second row, et cetera.
14:15 But you might find that we'll run into issues with this in terms of the gil.
14:20 So like the GIL is otherwise known as the global interpreter lock in Python prevents us from really like running a multi-threaded app operation like in parallel.
14:32 Basically Python can run, the rule is it can run one Python opcode at a time.
14:39 Yeah.
14:39 And that's it, right?
14:40 It doesn't matter if you've got, you know, 16 cores.
14:43 It's one at a time.
14:44 Yeah, yeah.
14:46 And this is really terrible for trying to do things in parallel, right?
14:53 So that kind of use case is out.
14:57 Like, pandas and NumPy and all that stuff is not going to be able to use multithreading.
15:01 And so, I just want to point out, like, Python, at its core, has this fundamental problem, which is why they went with the GIL.
15:15 So like Python manages memory for you.
15:18 And how it does that is it keeps track of references to know when to free up memory.
15:27 So like when memory can be like completely destroyed and somebody else can use it essentially.
15:33 And like that's something--
15:34 - Otherwise you gotta do stuff like, Brian sometimes probably has to do with C and like malloc and free and all those things, right?
15:40 - Yeah, exactly, yeah.
15:41 Yeah, so like C you have to do this with yourself with like malic and free and all that stuff.
15:46 But with Python, it does it for you, but that comes at a cost, which means like every single object in Python has this little like counter, which is like a reference counter.
15:56 And so basically like way back in the day, like when threading first became a thing, like in order to kind of like avoid this threading problem, they came up with the gil, which basically says you can only run one thread at a time, or like one opcode at a time, as you said.
16:15 - And attempts have been made to remove it.
16:17 Like Larry Hastings has been working on something called the Galectomy, the removal of the GIL for a while.
16:23 And the main problem is if you take it away, the way it works now is you have to do lock on all memory access, all variable access, which actually has a bigger hit than a lot of the benefits you would get, at least in the single threaded case.
16:35 And I know Guido said, like, if we really don't want to make changes to this, if it's going to mean slower single threaded Python, they're probably not for a while.
16:43 - Yeah, yeah, yeah, and that is a big problem.
16:46 So like, I mean, if generally what people use, like instead of threads in Python is they use like multi-process and they spin up multiple Python processes, right, and like that truly kind of like achieves the parallelism.
17:00 But anyways, I digress.
17:03 So we can't use the GIL, but what's interesting to note is when you're running NumPy at its very low level in C, like when you enter and look at the C files, it actually is not subject to the GIL anymore 'cause you're in C.
17:20 And so you can potentially run, you know, multi-threaded things in C and call it from Python.
17:27 But beyond that, if we look at Bloss, Bloss has built-in like parallelization for like hardware parallelization.
17:38 And how it does that is through vector registers.
17:42 So if you're not familiar with like the architecture of CPUs and stuff, like at its core, you basically only have like, only can have a certain small set, maybe like three or four values in your CPU at any one time that you're running like adds and multiplies on.
18:02 And like how that works is you load those values like into the CPU from memory.
18:08 and that load can be quite time consuming.
18:10 It's really just based on how far away your memory is from your CPU at the end of the day, like physically on your board.
18:17 >> Right. Is it in cache?
18:18 Is it in regular RAM?
18:20 >> Yes. That's why we have caches.
18:22 Caches are memory that's closer to your CPU.
18:26 Consequently, it's also smaller.
18:28 But that's how you might hear people say, "So-and-so wrote this really performant program and it utilizes the size of the cache or whatever.
18:38 So basically, if you can load all of that data into your cache and run the operations on it without ever having to go back out to memory, you can make a really fast program.
18:49 - Yeah, yeah, it could be like 100 times faster than regular memory.
18:52 - Yeah, yeah, and so essentially, that's what Bloss is trying to do underneath and NumPy is they're trying to take this giant set of data and break it into chunks and load those chunks into your cache and operate on those chunks and then dump them back out to memory and load the next chunk.
19:14 - Very cool, thanks for pointing that out.
19:16 I didn't realize that BLAS leveraged some of the OS native stuff, nor that it had special CPU instruction type optimization.
19:24 That's pretty cool.
19:25 - Yeah, yeah.
19:26 So it has, on top of the registers, it also has these things called vector registers, which actually can hold multiple values at a time in your CPU.
19:37 So we could take this simple example of the addition and we could actually, well we can't run those per row calculations in parallel with threads, we can with vector registers.
19:51 - Okay, yeah.
19:52 - And the limitation there is that the memory has to be sequential when you load it in.
19:58 - This is definitely at a level lower than I'm used to working at.
20:00 How about you, Brian?
20:01 (laughing)
20:03 But yeah, so anyways, this is just like kind of the stuff that I talk about in my book.
20:09 It's not necessarily about like how to use pandas, but it's about like kind of like what's going on underneath pandas.
20:16 And then like once you kind of like build that foundation of understanding, like you can understand like better how pandas is working and like how to use it correctly and what all the various functions are doing.
20:27 - Fantastic, yeah, so people can check out your book.
20:29 Got a link to it in the show notes, so very nice.
20:31 It's offering me the Euro price, which is fine.
20:36 I don't mind.
20:39 - Yeah, so it's on Amazon too.
20:41 It's on a lot of different platforms, but I figured I'd just point directly to the publishers.
20:45 - Yeah, no, that's perfect.
20:49 Quick comment, Roy Larson says, NumPy and Intel MKL cause issues sometimes, particularly on Windows, if something else in the system uses Intel MKL.
20:57 - Yeah, interesting.
20:58 I have no experience with that, but I can believe it.
21:00 Intel has a lot of interesting stuff.
21:01 They even have a special Python compiled version, I think, for Intel to use potentially.
21:07 I'm not sure, they have some high performance version.
21:08 - Yeah, yeah, yeah, they do, yeah.
21:11 - Also in Portland, keep it in Portland, there we go.
21:14 Now, before we move on to the next item, let me tell you about our sponsor today.
21:19 Thank you to Datadog.
21:21 So they're sponsoring Datadog.
21:23 And if you're having trouble visualizing latency, CPU, memory bottlenecks, things like that in your app, and you don't know why, you don't know where it's coming from or how to solve it, you can use Datadog to correlate logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python app.
21:38 Plus, they have a continuous profiler that allows you to find the most resource consuming parts of your production code all the time at any scale with minimal overhead.
21:46 So you just point out your production server, run it, which is not normally something you want to do with diagnostic tools, but you can with their continuous profiler, which is pretty awesome.
21:54 So be the hero that got that app back on track at your company, get started with a free trial at pythonbytes.fm/datadog, or just click the link in your podcast or your show notes.
22:03 Now, I'm sure you all have heard that working with pickle has all sorts of issues, right?
22:09 The pickle is a way to say, take my Python thing, make a binary version of bits that looks like that Python thing so I can go do stuff with it, right?
22:16 That's generally got issues, not the least of which actually are around the security stuff.
22:23 So like you unpickle, something to deserialize it back is actually potentially running arbitrary code.
22:28 So people could send you a pickle virus.
22:31 I don't know what that is, like a bad, a rotten pickle or whatever.
22:33 That wouldn't be good.
22:34 So there's a library I came across that solves a lot of the pickle problems.
22:39 It's supposed to be faster than pickle and it was cleverly named Quickle.
22:43 (laughing)
22:45 Have either of you heard of this thing?
22:46 - No.
22:47 - Yeah, it's cool, right?
22:48 So here's the deal.
22:50 It's a fast serialization format for a subset of Python types.
22:54 You can't pickle everything, but you can pickle way more, say, than JSON.
22:58 And the reasons they give to use it are it's fast.
23:02 If you check out the benchmarks, I'll pull those up in a second, it's one of the fastest ways to serialize things in Python.
23:07 It's safe, which is important.
23:09 Unlike pickle, deserializing a user-provided message does not allow arbitrary code execution, hooray.
23:15 That seems like the minimum bar.
23:17 Like, oh, I got stuff off the internet.
23:18 Let's try to execute that.
23:19 What's that gonna do?
23:21 Oh, look, it's reading all my files, that's nice.
23:23 All right.
23:25 It also, it's a flexible 'cause it supports more types.
23:28 And we'll also learn about a bunch of other libraries while we're at it here, which is kind of cool.
23:33 A bunch of things I heard of like MSGPack, or well, JSON, you may have heard of that.
23:37 And the other main problem you get with some of these binary formats is you can end up where, in a situation where you can't read something if you make a change to your code.
23:45 Like, so imagine I've got a user object and I've pickled them and put them into a Redis cache.
23:50 We upgrade our web app, which adds a new field to the user object.
23:54 That stuff is still in cache.
23:55 After we restart, we try to read it.
23:57 Oh, that stuff isn't there anymore.
23:58 You can't use your cache anymore.
24:00 Everything's broken, et cetera, et cetera.
24:02 So it has a concept of schema evolution, having different versions of like history.
24:07 So there's ways that older messages can be read without errors, which is pretty cool.
24:11 - Yeah, that's nice.
24:12 - Yeah, neat, huh?
24:13 I'll pull up the benchmarks.
24:14 There's actually a pretty cool little site here.
24:16 shows you some examples on how to use it.
24:17 I mean, it's incredibly simple.
24:19 It's like, dump this as a string, read this, deserialize this, it's real simple.
24:23 So, but there's quite interesting analysis, live analysis where you can click around and you can actually look at like load speed versus read, like serialize versus deserialize speed, how much memory is used and things like that.
24:36 And it compares against pickle tuples, protobuf, pickle itself, ORJSON, MSGPACK, QUICKL and QUICKLstructs.
24:45 There's a lot of things.
24:46 I mean, I knew about two of those, I think.
24:48 That's cool.
24:49 But these are all different ways.
24:50 And you can see, like in all these pictures, generally, at least the top one where it's time shorter is better, right?
24:55 So you can see, if you go with their like, quick old struts, it's quick roll of thumb, maybe four or five times faster than pickle, which I presume is way faster than JSON, for example.
25:05 And you'll also see the memory size, which actually varies by about 50% across the different things.
25:10 Also speed of load in a whole bunch of different objects and so on.
25:14 So yeah, you can come check out these analysis here.
25:17 Let's see all the different libraries that we had.
25:19 Yeah, I guess we read them all off basically there.
25:21 But yeah, there's a bunch of different ways which are not pickle itself to do this kind of binary serialization, which is pretty interesting, I think.
25:29 - It does protobuf, that's pretty cool.
25:31 Actually, I wanna try this out.
25:33 It looks neat. - Yeah, yeah.
25:34 It looks really neat, right?
25:35 - And one of the things, I was just looking at the source code.
25:37 I love that they use pytest to test this.
25:40 Of course, you should use pytest.
25:43 But the, I can't believe I'm saying this, but this would be the perfect package to test with a Gherkin syntax, don't you think?
25:50 'Cause it's a pickle thing.
25:52 - Oh my gosh, you've got to use the Gherkin syntax.
25:55 (laughing)
25:57 Yeah, you definitely should.
25:59 And Roy threw out another one like UQ Foundation.
26:03 DIL package deals with many of the same issues, but because it's binary, it has all the same sort of versioning challenges you might run into as well.
26:10 - DIL, the DIL package, that's funny.
26:12 (laughing)
26:14 - Yeah, pretty good, pretty good.
26:15 All right, so anyway, like, you know, I'm kind of a fan of JSON these days.
26:19 I've had enough XML with custom namespaces in my life that I really don't want to go down that path and XSLT and all that, but, you know, I've really shied away from these binary formats for a lot of these reasons here, but you know, this might make me interested.
26:33 If I was gonna say throw something into a cache, the whole point is put it in the cache, get it back, read it fast, this might be decent.
26:39 - Yeah, yeah, it definitely seems to address a lot of the concerns I have with PQL, for sure.
26:44 - Yeah, and I don't, did I talk about the types?
26:46 Somewhere in here we have, yeah, here's, there's quite a list of types.
26:50 You know, one's really nice, date/time.
26:51 You can't do that with JSON.
26:52 Why in the world doesn't JSON support some sort of time information?
26:56 Oh, well, but you've got most of the fundamental types that you might run into.
27:00 All right, so PQL, give it a quick look.
27:03 (both laughing)
27:04 All right, Brian, what you got here?
27:06 - Well, I was actually reading a different article, But it came up--
27:13 I think we've talked about Friendly Traceback.
27:16 It's a package that just sort of tries to make your tracebacks nicer.
27:20 But I didn't realize it had a console built in.
27:24 So I was pretty blown away by this.
27:27 So it's not trivial to get set up.
27:30 It's not that terrible.
27:31 But you have to start your own console, start the REPL, import Friendly Traceback, and then do Friendly Traceback start console.
27:39 But at that point, you have just like the normal console, but you have better tracebacks.
27:46 And then also you have all these different cool functions you can call, like what, where, why, and explain and more.
27:56 And basically if something goes wrong while you're playing with Python, you can interrogate it and ask for more information.
28:04 And that's just pretty cool.
28:06 The why is really great.
28:08 So if you have one of the examples I saw before, and I think I might start using this when teaching people, is we often have exceptions like you assigned to none, or you assigned to something that can't be assigned, or you didn't match up the bracket in the parenthesis or something like that correctly.
28:27 You'll get just syntax error, and it'll point to the syntax error, but you might not know more.
28:34 So you can just type Y, a whywith parentheses because it's a function.
28:40 It'll tell you why.
28:42 >> Why?
28:43 >> Where?
28:43 >> It's like the great storytelling, the five whys of a bug.
28:47 >> Yeah.
28:49 >> The five whys of a bug.
28:51 >> Yeah. You can say what to repeat what the error was, why it will tell you why that was an error, and then specifically what you did wrong, and then where it will show you.
29:02 If you've been asking all sorts of questions, and you lost where the actual trace back was, you can say where and it'll point directly to it.
29:10 I think this is going to be cool.
29:12 I think I'll use this when trying to teach, especially kids, but really just people new to Python.
29:16 Tracebacks can be very difficult.
29:18 >> It's going to be really helpful for them.
29:20 I know I sometimes have to look up certain error messages that I'm not familiar with.
29:25 Yeah, that would be super helpful.
29:27 I could just do it right in the console.
29:28 >> Yeah, I totally agree. You're going to have to help me find a W that goes with this.
29:32 But I want the what would be effectively Google open, close, privacy.
29:37 You know, because so often you get this huge trace back and you've got these errors.
29:43 And if you go through and you select it, like, for example, the error you see on the screen, an unbound local error, local variable greetings in quotes, reference before assignments.
29:52 Well, the quotes means oftentimes in search, like it must have the word greeting.
29:56 And that's the one thing that is not relevant to the the the Googling of it.
30:00 So if I'm a beginner and I even try to Google that, I might get a really wrong message.
30:05 If you could say, Google this in a way that is most likely going to find the error, but without carrying through variable details, file name details, but just the essence of the error, that would be fantastic.
30:17 Now, how do we say that with W?
30:20 >> You just say, whoa.
30:23 >> Or maybe www.
30:27 >> There you go.
30:28 >> Or WTF. I mean, come on, there's some options here.
30:30 >> WTF. WTF is good.
30:32 >> Wouldn't that be great? That's also part of this package that you see at their main site where you've got these really cool visualized stuff, where it more tries to tell you the problem of the error with the help text and whatnot.
30:45 >> Yeah.
30:45 >> Yeah, this is cool. Also uses Rich, which is a cool library we talked about previously as well.
30:50 >> I love Rich. I include Rich in everything now, even just to print out simple better tables. It's great.
30:56 >> Yeah, for sure.
30:57 - Hannah, do you see yourself using this, or are you more in notebooks?
31:02 - Oh, no, I mean, I usually use the PDB debugger, so yeah, I mean, I'm not sure if this as it is would be a problem, it would depend on how much information it has about obscure errors from dependent libraries, which is usually what I end up looking at these days.
31:22 But yeah, I mean, conceivably, yeah, that could be helpful.
31:25 - Yeah, if we get that WTF feature added, then it's gonna go-- - Yeah, oh yeah, for sure, gosh.
31:29 (laughing)
31:31 - Speaking of errors, let's cover your last item, last item of the show.
31:34 - Woohoo, yeah, so I at work, work in the security org, and I write automation tools for them, which means sometimes the repos that we work on get to be test subjects for new requirements and such, and such. And so recently, our work was exploring like static code analysis, looking for like security vulnerabilities in the code. And so I ran across Bandit and I integrated Bandit into our...
32:11 We don't have time to go through these old legacy code and fix these problems. Oh, wait, this is what it means? Oh, sorry. Yes, we can do that right now. That's the kind of report you got from Bandit?
32:22 Yeah, exactly.
32:24 So yeah, we integrated Bandit into our legacy code base.
32:28 And we actually, it's funny you say that because the bug that I found using Bandit was actually from the legacy code.
32:35 That does not surprise me.
32:38 Yeah.
32:39 So it was a pretty stupid error.
32:43 It was pretty obvious if you were doing a code review, but because it was legacy code and it was already there, I just like never noticed, but it was basically like issuing like a request with like no verify.
32:56 So it was like an unverified like HTTP request.
33:00 And then it was like, no.
33:02 This broken SSL certificate keeps breaking.
33:04 I just told you to ignore it.
33:06 Oh, yeah.
33:07 Yeah.
33:08 Well, and I honestly like I think that might have been why it was there in the first place.
33:11 Because I know like the like several years ago, like had some certificate issues.
33:17 So yeah, that might be, and it was like an internal talking to internal, so it was like, eh.
33:25 - Maybe even a self-signed certificate that nothing trusted, but like, it technically was there.
33:30 - Yeah, it was like, eh, we'll just do that.
33:32 But yeah, so Bandit is basically like a linter, but it looks for security issues.
33:41 So you could just pip install it, and then just run it on your code and it will find a bunch of different potential security issues just by statically analyzing your code.
33:49 And I've pretty much come to the opinion that why haven't I done this on all of my other projects?
33:56 I should be doing this on every single project.
34:00 Because as a developer, I always run Lint and Black and stuff like that.
34:06 So I figured I should probably be running Bandit, too.
34:10 - Yeah, cool.
34:11 Yeah, well, very nice.
34:12 It's a good recommendation for people as well.
34:14 And it's got a lot of cool, you can go and actually see the list of the things that it tests for and even has test plugins as well, which is pretty cool.
34:21 - Yeah, yeah.
34:22 So you can like make your own if you want.
34:24 And it has like all the common Linter sort of like functionality, like ignore these files or like ignore these rules or even like ignore this rule on this particular line, stuff like that.
34:35 - Yeah, absolutely.
34:36 - Which is pretty sweet.
34:36 - I love that things like Bandit are around because thankfully, developing web stuff is becoming easier and easier, but it's then now the barrier to entry is lower.
34:50 You still have to have all the security concerns that you had before that normal, I mean, usually people just had more experience, but they would make mistakes anyway.
34:59 But now I think this is one of the reasons why I love this is because people new to it might be terrified about the security part, but having bandit on there looking over their shoulders.
35:08 Great.
35:09 - Yeah. - Yeah.
35:10 - Like don't publish with the debug setting on and blast or Django or anything like that.
35:15 - Simple, obvious stuff.
35:16 And like, honestly, like having worked in the security org for about a year now, like I've come to the understanding that a lot of security issues stem from just like basic, like duh, sort of misconfigurations.
35:31 So like something like this is perfect.
35:33 - And I really like that you added, you wrote in the show notes, a pre-commit, how to hook this up with pre-commit, because I think having it in pre-commit or in a CI pipeline is important because like you guys were joking about, often security problems come in because somebody's just trying to fix something that broke, but they don't really realize how many other things it affects.
35:58 - Yeah. - Yeah.
35:59 - Besides that, we gotta make it work quick.
36:01 Just turn on the debug thing.
36:02 just look real quick and then you forget to turn it off or whatever, yeah.
36:05 - Yeah, for sure.
36:06 Yeah, yeah, just stupid human errors.
36:09 - Nice, all right, I wanna go back real quick, Brian, 'cause your mention of friendly traceback got a lot of stuff so let me just do a quick audience reaction.
36:19 Robert says, "It is cool, Brian." John Sheehan says, "I was just thinking "of something the same would be cool.
36:24 "It's a great teaching concept." Anthony says, "Super useful." John says, "I've been doing more demo code in the console rather than ID and this looks like it would help.
36:33 W how to fix it?
36:35 W wow how W.
36:37 I love it Robert.
36:38 Very good.
36:39 Zach says, what is this magic?
36:42 This looks amazing.
36:43 And so on.
36:44 All right.
36:44 Well, thanks everyone.
36:45 I'm glad you all like that.
36:47 So that's it for our main items.
36:49 You know, Brian, you got any extras you want to throw out there?
36:53 You were doing some of the climate change or what are you doing this week?
36:56 Yeah, I'm sharing a room with some people.
36:59 Just like the I did do two meetups with with Noah and then with the Aberdeen Python meetup.
37:09 Wait, I gotta interrupt you really quick.
37:10 Did all that talk that Hannah did about bandit viruses get you?
37:14 I'm sorry, sorry about that.
37:21 Carry on.
37:22 I missed it.
37:23 Did all this talk with Hannah that Hannah had about viruses and in hacking and stuff with bandit.
37:29 Did it make you nervous and you had to put on your mask?
37:32 - No, just I'm in a group meeting in a group room and somebody came in.
37:37 - It's okay, I'm just teasing, carry on.
37:39 - That's funny, I also wanted to look like a bandit.
37:42 - Yeah, exactly.
37:44 - But I was thrilled that Noah asked me to speak to them, that was neat, and then the Python Aberdeen people.
37:51 And also like, but they mentioned that Ian from the Python Aberdeen group said that he had an arrangement with you that when you, Michael, that when the pandemic is over, you're gonna go over and you're gonna do like a whiskey tour or something like that.
38:06 - I don't know the details, but it sounds good to me already.
38:08 Let's get this happening.
38:09 - If that happens, I wanna go along.
38:11 - It's a Python Pyte's outing, let's do it.
38:13 - And then we have, there are PDX West meetup tomorrow.
38:18 You're gonna speak, that's kind of exciting.
38:20 - Yeah, it's gonna be fun.
38:21 And people, it's virtual, so people can attend however.
38:24 - I'm also, I've got feedback from both you and Matt Harrison gave me some feedback.
38:30 So I'm updating my training page on testing code because I really like working with teams.
38:36 So, and anybody else wants to give me feedback on my training page, maybe I'd love to hear it.
38:42 So that's good.
38:43 - Yeah, or maybe they even want to have some high test training for their team.
38:46 - Yeah, I mean, testing is something that I think teaching a team at a time is a great thing because people can really, I don't know, we can talk about their particular problems, not general problems, it's good.
38:58 - Yeah, for sure.
38:59 Well, you also need more of a team buy-in on testing, right?
39:01 'Cause like if one person writes code and won't write the test, and another person is like really concerned about making the test fast, it's super frustrating when the person who doesn't wanna run the test keeps breaking the build.
39:11 But anyway, it's a team sort of sport in that regard.
39:15 - Yep. - Yeah.
39:16 - All right, awesome.
39:16 So I got a couple of quick things.
39:18 PEP 634, structural pattern matching in Python has been accepted for Python 3.10.
39:23 That's like, imagine a switch case that has about 100 different options.
39:28 That's what it is.
39:29 - Yeah.
39:30 - With like, like Reg X, not quite, but sort of like style, like you can have like these patterns and stuff that happen in the cases.
39:36 I don't know how to feel about this.
39:37 Like if, let me put it in perspective, like if the Walrus operator was controversial, like this is like, this is like a way bigger change to the language.
39:46 So I don't know.
39:47 - It's both awesome and terrifying.
39:48 - Yes, exactly.
39:49 - Yeah, I was gonna say I'm kind of surprised.
39:51 - Yeah, yeah, so on my end, like this got accepted.
39:54 It seemed to be sort of counter to the simplicity of Python.
39:57 Like I did not at all against having a simple switch statement that does certain things, but this seems like a lot.
40:02 I may come to love it.
40:03 One thing that maybe would help me come to a better understanding and acceptance was if the pet page had at least one example of it in use.
40:10 Like the whole page that talks about all the details says, I don't believe there's a single code sample ever.
40:15 - Well, there's a tutorial page as well.
40:17 - Oh, is there?
40:18 There's the tutorial page.
40:19 Okay, maybe that's where I should be going to check it out.
40:21 Yeah.
40:22 - This sort of feels like a five barrel foot gun.
40:25 - Yeah, it does.
40:26 Well, but the page that I'm looking, like the pip thing that I'm listening to, the official PIP, I don't think it has, does it have a tutorial?
40:32 Yeah, no, you're right, it does.
40:33 It does somewhere down.
40:35 - Yeah, pip 636.
40:37 - Yeah, it's a different pip that is the tutorial for the PIP.
40:39 Interesting, I didn't realize that.
40:40 It's kind of meta, honestly.
40:42 Anyway, to me, I'm a little surprised this was accepted.
40:44 Fine.
40:45 I know people worked really hard on it, and congratulations, a lot of people really want it.
40:48 It comes from Haskell, right?
40:50 So Haskell had this like pattern matching, and alternate struct thing.
40:53 I don't know, I just feel like Haskell and Python are far away from each other.
40:56 So that's my first impression.
40:58 I will probably come to love it at some point.
41:00 PyCon registration is open.
41:02 So if you want to go to PyCon, you want to attend and be more part of it than just watching the live stream on YouTube, be part of that.
41:07 I think I'm going to try to make a conscious effort to attend the virtual conference, not just catch some videos.
41:12 So you can do that.
41:13 - PyCon is awesome.
41:15 My first conference was PyCon, and then I went to other conferences, And I was like, what are wrong with these conferences?
41:23 Like, why do they suck so much?
41:25 - I know, I feel the same way.
41:27 I know.
41:28 It's really, really special.
41:30 I'm sure the virtual one will be good.
41:31 I can't wait for the in-person stuff to come back 'cause it really is a new experience.
41:34 - For sure, yeah.
41:35 It's a whole nother experience in person.
41:37 - I consider it basically my geek holiday where I get away and just get to hang out with my geek friends.
41:42 I happen to learn stuff on there.
41:44 - Totally.
41:45 - And then Python WebConf is coming up and that's a registration is open for that as well.
41:50 And I suppose probably PyCascades, which Brian and I are on a panel out there as well.
41:55 - Oh, nice.
41:55 - I put a link into an hour of code for Minecraft, which has to do with programming Minecraft with Python.
42:01 If people are looking to teach kids stuff, that looks pretty neat.
42:04 So my daughter's super into Minecraft.
42:06 I don't do anything with it.
42:07 But if you are and you wanna make it part of your curriculum, that's pretty cool.
42:10 Hannah, anything you wanna throw out there before we break out the joke?
42:13 - Nope, I'm good.
42:15 - Awesome.
42:16 - Do it, do it.
42:17 All right, so this one, we have something a little more interactive for everyone.
42:21 We've got a song about PEP 8, about writing clean code.
42:25 This is written and produced, sung by Leon Sandoy, goes by Lemon, and him and his team over at Python Discord.
42:33 He runs Python Discord, and apparently it was a team effort creating this, and the reason I'm covering it is a bunch of people sent it over.
42:39 So Michael Rogers of LA sent it over, so you should cover this, Dan Bader said, check this out.
42:43 Alan McElroy said, hey, check out this thing.
42:45 So, all right, I actually spoke to Lemon and said, "Hey, do you mind if we play this?" He said, "No, that'd be awesome.
42:51 "Give us a shout out," I said, "Of course." So we're gonna actually play the song as part of this.
42:54 In the live stream, you get the video.
42:56 On the audio, you get, well, audio.
42:58 So I'm gonna kick this off and we'll come back, and I'd love to hear Brian and Hannah's thoughts.
43:02 Here we go.
43:03 (gentle piano music)
43:15 You don't need any curly braces Just for spaces, just for spaces Wildcard imports should be avoided In most cases, in most cases Try to make sure there's no trailing white space It's confusing, it's confusing Trailing commas go behind list items Git blamed items, git blamed items And comments are important, as long as they're maintained When comments are misleading, it will drive people insane Just try to be empathic, just try to be a friend It's really not that hard, just adhere to Pepede Pepede Constants should be named all capital letters And live forever, live forever And camel case is not for python Never ever, never ever And never use a bear exception Be specific, be specific No one likes the horizontal scrollbar Keep it succinct, keep it succinct And comments are important, as long as they're maintained When comments are misleading, it will drive people insane Just try to be empathic, just try to be a friend It's really not that hard, just adhere to Pepede Pepede Pepede Pepede That was amazing. I can sympathize with so much of what he's saying. I'm just having flashbacks to a discussion I had with my teammate about comets. And being like, "No, this comet doesn't actually describe what the comet is doing." It's worse than having no comet. It really is.
46:30 - It really is, yeah.
46:31 Or like if it describes like literally what the code is doing and not like, you know, kind of like high level sort of--
46:38 - Why or background or anything other than--
46:40 - The why, the why is important.
46:42 - Yeah, I love it.
46:44 So two things, Lemon and team well done on the song and man, you got a great voice.
46:48 That's actually, it was beautiful and funny.
46:51 - Yeah. - Yeah, it was amazing.
46:53 - All right, well, Brian, we probably should wrap it up.
46:54 - Yeah, yeah, we're here.
46:56 - All right, well, Hannah, thanks so much for being here.
46:58 It's good to have you on the show.
46:59 And Brian, thanks as always.
47:00 Everyone, thanks for listening.
47:01 >> Thanks for having me.
47:02 >> Bye.
47:03 >> Bye.
47:04 >> Bye all.
47:05 >> Thank you for listening to Python Bytes.
47:06 Follow the show on Twitter via @pythonbytes.
47:07 That's Python Bytes as in B-Y-T-E-S.
47:10 And get the full show notes at pythonbytes.fm.
47:13 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
47:18 We're always on the lookout for sharing something cool.
47:20 On behalf of myself and Brian Okken, this is Michael Kennedy.
47:24 Thank you for listening and sharing this podcast with your friends and colleagues.