Transcript #178: Build a PyPI package from a Jupyter notebook
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:04 your earbuds. This is episode 178, recorded April 15th, 2020. I am Brian Okken. I'm Michael
00:12 Kennedy. And this episode is brought to you by DigitalOcean. Who's first? I think I got my notes
00:17 wrong. Yeah. Well, I want to talk about something really quick before we actually get to the first
00:21 one. So we'll see. Okay. I just want to tell people about the YouTube channel. And obviously,
00:26 if people are watching on YouTube, they might know about the YouTube channel.
00:29 But most people subscribe to our podcast and we are multicasting and repurposing what we're doing
00:35 here on YouTube. We talked a little bit about it last time. So basically, each individual item
00:39 is now a separate YouTube video. And you can watch Brian and me talk about it if you want to consume
00:46 in that format and have a little bit of video and admire Brian's awesome shirts because he's got a
00:51 bunch he's going to be wearing throughout these different shows and it's going to be awesome.
00:54 Oh, you didn't have to set it up like that, man. I only have like one good shirt.
00:58 People loved the shirt for the first video we shared. That was like several comments about,
01:03 dude, your shirt is awesome.
01:04 Yeah. Go figure. Okay. So we're trying to teach you about Python also, but...
01:10 That's right. And fashion.
01:11 Shirts.
01:11 And fashion.
01:12 Yeah.
01:12 Yeah. It's pythonbytes.fm/YouTube. People could check that out.
01:15 Well, tell me about strings, Python. I mean, Michael.
01:19 I'll tell you about Python strings. You know what? Strings are confusing, man.
01:22 Especially when they're about numbers and dates, especially dates. So this seems to be like a
01:30 problem that vexes me permanently. And it's, you know, we talked about is programming Googling,
01:36 right? Like in our consensus was, you know, maybe in the early stages of your career, there's a lot of
01:42 Googling, but no, not really. You mostly just sit down and you think about the problems and you write
01:47 the code and you evolve the code. Like there's a lot of reading code before you actually do much
01:52 writing anyway.
01:53 But this topic in particular, I'm all about Googling this all the time. So Python has a
01:59 datetime.strptime for string pars time. You give it some piece of text like Wednesday, April 15th,
02:09 comma 10 colon 30 a.m. without a space. I want to take that and turn it into a datetime so I can
02:16 maybe compare it to something else, right? Like another time. How many days is that from now? Is
02:20 that in the future is in the past. I just need to store it in the database as not a string,
02:24 but a datetime because I want to order by it. I don't want it to be alphabetical, right? There's
02:27 all sorts of reasons you need to get a datetime from strings or go in the reverse. And yet the format,
02:34 you know, that's strptime has a, it has a format string that tells it how to look at the string and
02:42 then pull the pieces out. So would you know about that example I told you about like the Wednesday,
02:48 April 15th, 10 30 a.m.? That's definitely something I Google every single time.
02:51 Every time. And it's never quite right. So just for those of you listening, you really want to know
02:56 it's percent a space percent capital B space percent capital H comma space percent capital M colon percent
03:03 S a.m. Woo. Who would ever come up with that?
03:06 Well, I mean, these are intentionally short.
03:09 Yes, I know.
03:09 So that like they don't take up too much room, but they're and they sort of make sense. It's just
03:14 it's arbitrary, whether it's a capital Y or a lowercase y or capital D or lowercase D.
03:19 Right. And there's documentation you can go find. Like if you want the three letter date day of the
03:24 week, that's a percent a lowercase and whatnot. But putting that all together can be tricky.
03:28 So what I want to tell you about is this website by Lachlan Eaglin. And it's let me see what the URL is.
03:36 It's high stir F time dot com, which, of course, is linked in the show notes.
03:42 And the idea is you put the text you want to parse like Wednesday, April 15, April 2020 at the time or
03:51 whatever. You put that in there and you hit go and it will tell you that complex string that I told you
03:56 was the right answer to my parsing problem.
03:58 Yeah, it's awesome.
04:00 Yeah, you just put in like the the output that you want it to have happen and it tells you the magical
04:06 incantation.
04:07 Right, right. Or the the format of the thing you want to parse. And depending whether you're trying to go
04:11 to that string or from that string. But yeah, super, super handy. This bad boy is bookmarked for
04:17 me for sure, because this is way better than Googling. I can put it in there. It gives me a
04:22 quick, quick answer. I can throw it into a Python REPL and see, hey, did it work or did it not work?
04:27 It's really easy.
04:28 Yeah, nice.
04:29 Yeah. So not super complicated, but very handy. So people can bookmark that and try it out.
04:34 Well, I want to tell you about something easier. Also, I got to kind of thank Jack.
04:40 Jack McHugh has sent us a few suggestions and they're usually pretty darn nice. And here's this
04:45 one's from Jack. Pandas dash bokeh. Bokeh? Oh, I forget how to say that.
04:51 I love the logo. Pandas bokeh. I say bokeh. I don't know. You know, but it's like that F stop
04:57 difference where like the person in a portrait is like crisp, but the background is faded.
05:01 Yeah. And the logo is pandas clear bokeh, like in the background faded. It's beautiful.
05:06 It's a pretty cool logo. So I'm going to quote some from their website or the readme. It says,
05:12 pandas bokeh provides a bokeh plotting backend for pandas, geopandas, and PySpark dataframes,
05:19 similar to the already existing visualization features of pandas. Importing the library adds
05:24 a complimentary plotting method plot underscore bokeh on dataframes and series. Okay. So that's,
05:32 I mean, it's already built in and all it provides is plot bokeh, another function on it. What's the
05:38 big deal? Well, it's so cool. It's so easy. And I was, I tried out some of these examples this
05:44 morning and it's just a little tiny bit of code and you call, like you've got a data frame and you
05:50 call plot bokeh on it and it pops open like an interactive graph that you can look at everything.
05:56 It's actually pretty incredible. You have to do something a little different. You can plot bokeh,
06:01 but if you want the normal plotting to do the same awesome stuff that it's built in, you can set an
06:07 option, one of the pandas options to switch out the plotting backend. So that's neat. So apparently
06:14 what it's really doing is switching out the backend. And to me, I mean, it's plotting is not terribly
06:20 difficult, but this interface, at least for me, it makes it a lot easier instead of having to work
06:26 with frames and plots to just call this thing. And then all the different options you can have,
06:31 you can, you know, different point, you know, want it to look like an asterisk instead of a point
06:36 or something, all other different color, different scale or different titles. All that stuff is options
06:42 you can pass into the plot function. And the other thing that I, that I like a few more things.
06:47 One of them is you just, when you're pip install pandas dash bokeh, it pulls everything in because
06:54 it's all the rest of the stuff is dependent on it. So you get all of it just for a simple install.
07:00 And it also generates a general, it's able to do this interactively, but you can also generate
07:05 notebooks. Yeah. Yeah. You can generate notebooks and you can also generate standalone HTML files
07:11 with this in it.
07:12 This is really cool. And yeah, the fact you can generate standalone HTML, there's probably ways
07:17 to plug it into Flask sites, you know, Python websites and whatnot, pretty straightforward.
07:22 And the interactive bit is super nice. I mean, this is not about pandas interaction per se. This is just
07:29 bokeh, right? Being very cool and interactive, but you can zoom, you can pan as you move around,
07:35 you know, like it'll show you the marks on the graph and you can hide, you know, and sort of hide and show
07:40 elements. And there's even a cool example where they're showing the stock price of Apple versus Google.
07:46 And as you put the cursor along, it has the Apple logo next to Apple and the information,
07:52 like a little like card that talks about it over time, man, this is nice stuff. And all you got to
07:57 do is point it at a data frame. Not bad.
08:00 Yeah. And they've got a whole bunch of examples on there that GitHub repo with a bunch of working
08:04 examples too. Obviously for the examples, the data is just sort of random data that they're
08:09 throwing in there. But you know, once you know how to get your data, this does the rest of the work
08:14 for you. So it's cool.
08:15 Very cool. Yeah, it's just a great one. And thank you, Jack, for recommending it. And yeah,
08:19 it's a good one, Brian, for pulling it out.
08:20 We've had DigitalOcean as a sponsor for a while, and we really want to thank them. They've really
08:24 helped us out a lot. And they're plus, they're pretty darn cool. So thank you, DigitalOcean for
08:29 sponsoring this episode and many others. And in the past, we've told you about a lot of awesome things
08:35 with DigitalOcean, like their one-click install Kubernetes cluster support, their amazing new support
08:42 center and help documentation that's been around for a while. And our podcast runs on DigitalOcean,
08:47 and we're thrilled with it. And so if your business or your side project deserves great
08:52 hosting and that will grow with you and let you scale affordably, I really definitely want you to swing
08:59 by pythonbytes.fm/DigitalOcean to grab the $100 credit for new users. But there's something else
09:06 I want to tell you about DigitalOcean that's really cool. They've got something they've started recently
09:11 that's called Hub for Good. And it's designed to support COVID-19 relief efforts where DigitalOcean
09:19 through this is supplying $100,000 in infrastructure credits for new not-for-profit projects.
09:26 They're also giving $50,000 to COVID-19 relief fund, their own relief fund, but still it's really cool.
09:34 And they're also trying to raise awareness for COVID-19 related projects and provide learning for
09:41 developers and also provide visibility for these projects. And so I headed over there this morning
09:47 and checked it out. And there's a bunch of cool projects starting out that are related to COVID-19.
09:53 It's not just this sort of stuff, but it's things like there's even a platform to help teachers
09:58 interact with students during quarantine. A lot of cool projects through this. So thank you, DigitalOcean.
10:04 Yeah, this is a great project. And obviously the infrastructure is great and we love it,
10:08 but this is very cool too. I didn't know about this.
10:10 Yeah, it's pretty neat.
10:11 Yeah. So speaking of not knowing, I feel like I've been kind of exploring the cave of Python,
10:18 which is large and vast. And I just come on like a whole nother area. I'm like, it opens up like,
10:24 what is this? How have I not known about this? And this is NBDev. Have you heard of NBDev?
10:30 No.
10:30 Yeah. Okay. So let me tell you about it and I'll get your impressions later. So NBDev takes
10:37 notebooks and basically makes them on par with writing proper Python packages and solves all
10:46 these different problems. It lets you generate what's got to be some of the best documentation
10:50 period for that library that is sort of backed by a notebook. So it lets you develop like full Python
10:59 packages and libraries and notebooks where you can have all your code, your unit tests and your
11:04 documentation all in one place, but then you can take it and pip and you can upload it to pip and
11:09 make it a pip and solve a library that people have no idea that it came from a notebook.
11:13 Wow.
11:13 Is that crazy or what?
11:15 That's awesome. I got to check that out.
11:16 Yeah.
11:17 Yeah. And you know, you think about this idea of notebooks and to me, notebooks like burst on the
11:21 scene in the 2010-ish era, maybe 2012, 2011, like that timeframe. But this project references
11:30 this concept envisioned by Donald Knuth way back in 1983. And it says notebooks finally made literate
11:37 programming, this concept by Donald Knuth, a thing. So, you know, the old is new again, but in a really
11:43 cool way. And to me, this seems like just such a massive upgrade to notebooks. So notebooks have a
11:49 bunch of challenges in my view. Like I can't use a proper editor with it. Like if I don't use PyCharm or
11:54 VS Code and all of its navigation and it's cool, get blame and like history and just like all this
12:00 stuff is just not present, right? Documentation. I think that actually it really works well there,
12:06 right? But it's, it doesn't tie the documentation of the notebook to like parts of functionality that
12:12 might be created by the notebook, which is cool. One of the biggest problems with notebooks,
12:16 it's a benefit, but it's a big problem is if you run a notebook, it stores the output in the notebook.
12:22 So if you had like a bokeh plot or you had like a print of a data frame, that is in there and now it's
12:29 part of it. So if I'm working on a project and you're working on the same project and we both run the notebook
12:35 at different times or the same time, but separately, and it for some reason generates different results,
12:41 that's a merge conflict in Git, right? So basically you cannot use notebooks in like a sane way with Git
12:49 because anytime you work with it, if you're not careful and like don't remove all the output before
12:54 you save it, it's going to be a merge conflict. So this project has a Git pre-commit hook that will
13:01 remove that problem. So right before it gets committed, it'll automatically do the cleaning of that
13:06 metadata output. So it'll never have that as a conflict. It also has an ability to like a CLI go
13:12 just accept it. I just accept all the metadata changes. Mine are just right or whatever, right?
13:19 So it also has a CLI to automatically fix that. But if you do have those problems, but it also has this
13:24 pre-commit hook to avoid them entirely.
13:25 Nice. That's a nice use for pre-commit too.
13:27 Yeah. It's super, super clever. So if I write a function in the notebook, I can put hash export
13:32 in that cell and that becomes a public function in the package.
13:37 Oh, cool.
13:37 Right. So I write like documentation and pictures and I would say hash export. Now that's part of my
13:43 library that I'm building. It also lets you create the structure for Python packages. So you have like
13:48 the setup py and you can do the build wheels and whatnot automatically out of that. And it uses this
13:55 exported stuff. You can have your unit test in your notebook, which is pretty cool for the things that are in
14:01 there. And then finally you can edit it. You can take the edited library or the library that exported,
14:06 sorry, and then edit it with PyCharm or VS Code and then reverse export it. So what you can do is like say
14:14 push the changes that I've done with my editor back into the segments of the notebook where that code came from.
14:19 Oh man. Okay. I'm a little confused, but I got to try it out.
14:23 Yeah. You got to kind of read through it to get the sense, but there's just a bunch of stuff going on. Like all these things seem like,
14:28 yes, you should have been able to do that with notebooks, but obviously, right. That's not their origins, right?
14:32 They can't do everything at once, but all of these things seem awesome to me.
14:36 Yeah. Yeah. So in order to get started, it's going to basically create a Git repo for you is my understanding,
14:41 either on GitHub or GitLab. So you got to follow the getting started instructions and then you click a button and it'll like generate the
14:48 repo in the right structure, or you can use the CLI tooling to generate like the right repo with things like the Git commit prehooks and whatnot.
14:56 And if you're going to read the docs, check out nbdev.fast.ai.
15:01 Cause this comes from a fast AI people, the same one as the build, the FastAPI framework.
15:07 So some of the docs render better. There's certain things on GitHub that like it says, and here's a cool picture.
15:13 And it's just like source code. It's not quite right. So, so maybe check out the final link at the bottom in this section to get to,
15:21 if you're going to like browse through it, but it's basically a, you get the same thing out of GitHub.
15:25 Anyway, this to me seems like a massive improvement for notebooks and sort of brings them more into,
15:33 I can do things like, for example, you can now have your notebook and its tests running as part of continuous integration.
15:41 Like, so these networks are now like full participants in CI, CD, you can upload, you can like create packages and put them on pipe.
15:49 Yeah. There's all sorts of neat stuff. The documentation, like if you have a cool graph as part of your notebook,
15:55 that can become the documentation on pipe.
15:58 I or read the docs for those functions. I mean, it's crazy cool.
16:02 How, how this is like taking some of the awesome parts of notebooks, like the doc side and turn that into the help docs.
16:09 And then also letting you export the functionality still as a proper CS type thing.
16:13 Yeah. I definitely got to check this out.
16:15 How did I not even know this existed? Like, this is awesome.
16:18 Well, I don't know how long, I mean, it looks like, it looks like five months to me is my guess.
16:21 Okay. So we're not that behind the ball.
16:23 No, we're not that behind. Yeah.
16:25 But this looks neat.
16:26 Yeah. It's very neat.
16:27 Plus Fast AI is pretty cool. So I think this is probably pretty solid.
16:30 Yeah, I agree. It's definitely got some solid people behind it. So very cool. Very cool.
16:34 Anyway, NB Dev, quite neat.
16:37 I want to talk about something a little not neat, a little lighthearted. So this is a sort of a serious topic, but this is a article from Sebastian entitled Stop Naming Your Python Modules Utils.
16:52 And I don't think we've, I don't know if we've covered it before, but it's good advice. And it's something that happens. Basically, a lot of projects, public or private, will at some point end up having a utils.py or a utils package or something.
17:06 And this article is just saying, resist the urge. Utils is arguably one of the worst names for modules because it's very blurry and imprecise. Some of the names did not say what such a name does not say what the purpose of the code inside is.
17:22 And on the contrary, utils module can well contain almost anything. By naming a module utils, software developer lays down perfect conditions for an incohesive or uncohesive whatever code blob.
17:37 And I have definitely seen this in action. I have been one of the culprits before of having a pulling out a little helper function that I had in one file.
17:49 And I wanted to use it in a different module. So I didn't know where to put it. So I stuck it in a utils.py, added a couple more. So there's just a few methods.
17:58 And I come back six months later and there's like a couple dozen just junk drawer functions from all over the place in there.
18:07 So if you start, people will add junk to it. So Sebastian lists a few excuses. It's just one function, but it grows.
18:15 There's no other place in the code to put it. Well, try harder. And I need a place for company comments.
18:21 I don't even really know what that means, but name it company or something. And also Django does it.
18:27 Well, I don't know if you're a, well, maybe they shouldn't have, but they have it now, so they're not going to change it.
18:32 So the advice is to try name, try grouping your utility functions and naming them based on the role of how you're going to use it, or possibly group them in themes.
18:44 And also, if you see a utils.py crop up in a code review, just request that the person rename it to something else, if possible.
18:54 Just set up a CI rule to break the build if you see that file name.
18:59 Yeah. So what are your thoughts on this, on the utils?
19:01 See, I agree with Sebastian. Absolutely.
19:03 I understand the challenge because naming things in software is hard, but naming things in software is super important.
19:12 Because when you think about even just function names or class names or whatever,
19:19 usually what will happen is they'll get like a crummy, vague name and then a comment describing what they are doing.
19:25 And you're like, well, why don't you just make the name a little bit longer that says what it does?
19:30 And utils is kind of like the generic catch-all of saying like, well, I couldn't come up with a name.
19:35 So here it is.
19:36 We're just going to drop it here.
19:39 And in my code, I have like tons of different areas of which I organize it, you know, sort of like sub modules, I guess, if it's a, or sub packages, if it's a package, but not, sometimes it's not technically a package.
19:52 And I try to come up with names that are meaningful, right?
19:55 Like I have something called number converter that will like try to parse an integer or return a default value instead of throwing an exception or it'll try to parse some other thing.
20:04 Or maybe it's called conversions.py or whatever, but it's not like utils, right?
20:08 Like there's, there's usually some kind of a better structure you can find that will help you do this.
20:15 But, you know, there's that joke that, you know, naming things in computer science, that's one of the hardest problems, right?
20:20 And I do agree with that, but yeah, it's, it's worth the effort when you get it figured out.
20:25 If you don't believe me, you can just try it sometime.
20:27 If you're working on group project, just put one function in utils and you will see it grow.
20:32 And you'll have to find it.
20:33 Is this like the broken window theory of software?
20:37 Yep.
20:38 And MISC doesn't count either.
20:40 If you'd name it MISC, it's just as bad.
20:42 That's right.
20:43 Yeah.
20:44 There's probably some synonyms here in the code world that don't count.
20:48 So yeah.
20:48 Awesome.
20:49 I want to tell you about this one next that helps with performance or understanding that performance more specifically of your code.
20:56 So I don't know if you, how much profiling you guys do your work.
21:00 How much does performance matter to you guys?
21:02 It matters a lot.
21:03 Yeah.
21:04 Yeah.
21:04 Yeah.
21:05 I'm building things that go into testing in a production line.
21:09 So every millisecond that it takes, takes a millisecond longer to get something shipped.
21:15 So yes.
21:16 It matters.
21:17 Yeah.
21:17 It matters.
21:18 I'm supposed to mostly spend my time on the web and obviously it matters there, right?
21:22 Like every hundred milliseconds.
21:23 I think Amazon measured is like 1% loss of orders or something ridiculous like that, right?
21:29 Like, so understanding your performance is good.
21:32 We've had good, good in quotes, profilers for Python.
21:37 And they typically tell you about this function spent this much time.
21:42 But another challenge is my program is using too much memory or worse.
21:47 It's something long running like a web app or some background process.
21:50 And it's like growing.
21:52 It's like sort of leaking memory.
21:55 Why is that?
21:57 So I came across this project called Scaling, which is a high performance and high precision CPU and memory profiler for Python.
22:06 Cool.
22:07 Yeah.
22:07 So it lets you either analyze CPU time or it actually lets you on a line by line basis say, here's some memory.
22:15 What line made this and where is it coming from?
22:18 Yeah.
22:19 And so that's cool.
22:20 But one of the challenges for profiling is when you're profiling your code, you can make it, you don't get the same behavior.
22:28 It's sort of like the Heisenberg uncertainty principle, right?
22:31 It does one thing, but when you measure the profiler, you've changed it.
22:35 So now you kind of got to say, well, that part where it was the network, that was 50%, but now you made the computational bits way slower.
22:43 So that network part looks just like 20, right?
22:45 Like you're affecting it.
22:46 So for example, if you use profile, the built-in profile, it can make your code 30 times faster or a simple scenario than running it normally.
22:56 But you can use C profile, which is the C based one that's built in.
22:59 It only slows it down by 1.65 times.
23:03 So that's not too bad.
23:04 There's a line profiler that's 11 times slower.
23:07 And there's a whole bunch of other ones.
23:08 There's a memory profiler that's like over a thousand times slower.
23:12 So the scaling thing has a nice comparison to all these things.
23:17 It says, well, how does scaling do?
23:19 And it claims that it's got this built-in library that's much faster.
23:23 So for CPU stuff, it's 1.04 times the speed.
23:26 So like 4% slower.
23:28 And it does that through sampling, right?
23:31 It doesn't do instrumentation.
23:32 It doesn't rewrite the stuff.
23:33 It actually just asks frequently like, hey, where are you in the code?
23:36 But it still gets per line analysis of that, which is pretty cool.
23:40 And then the memory one is like another 10% slower because analyzing memory is hard.
23:45 But yeah, there's all sorts of cool stuff you can do with it.
23:49 The overhead is not too bad.
23:51 The precision is pretty good.
23:52 So like I said, it gives you like line by line level of how much time you're spending in various places.
23:59 It also is interesting in that it separates out the time spent running Python code from native code,
24:04 including like the base libraries and stuff.
24:06 So you can say like, I can only affect the Python stuff.
24:11 The other stuff is not a thing I can deal with.
24:14 So yeah, don't tell me about it or punish me for it.
24:18 Or maybe I do want to look at it, right?
24:19 Tell me about that.
24:20 So that's pretty cool.
24:21 And then also the memory stuff I think is pretty cool.
24:25 So it says it points to specific lines of code responsible for memory.
24:28 Memory growth.
24:29 And it's important.
24:31 It does this through a special memory allocator thing that comes with it.
24:35 And so while you can pip install Scalene, you can't inspect the memory allocation that way.
24:41 You have to go and install it directly and do some more setup.
24:44 On macOS, you can do brew install.
24:46 There's instructions in there on how to do that.
24:48 On other OSes, I have no idea what you do.
24:50 But you can't run the memory allocation directly.
24:54 You can't just say pip install it and then do the memory allocator.
24:57 There's some other lower subsystem that has to get installed for that to work.
25:00 Yeah.
25:00 And memory is an interesting one because it's a difficult one to chase down with Python.
25:06 Yeah.
25:06 It's very hard in Python because everything is a pointer.
25:09 Everything is an indirection.
25:11 It's not like, well, here's the block where we allocated this object or whatever, right?
25:16 Like it's pretty indirect.
25:18 And you don't typically have a hold of pointers in the memory address sense of it like you do in C or something, right?
25:25 So yeah, it's challenging.
25:26 I would love to see this integrated into PyCharm and VS Code.
25:30 Oh, yeah.
25:31 Right now, it just gives you a cool tabular text output or file output.
25:37 But if you could just right click in PyCharm and say, analyze with scalene, that'd be sweet.
25:41 Yeah, I wonder.
25:42 And also, that would solve some of the install thing.
25:45 So if you have to install it separately, some integration with PyCharm VS Code would be cool.
25:50 Right.
25:50 Like right now, you can do profiling.
25:52 And it's really awesome in PyCharm.
25:54 But I'm pretty sure it uses C profile.
25:55 So yeah, who knows?
25:57 Someday, baby.
25:58 Hey, while we're talking about editors, I don't know about VS Code.
26:01 But I do know, backing up a little bit, I do know that PyCharm does open notebooks okay.
26:06 Awesome.
26:07 Yeah.
26:07 Just back there.
26:08 Anyway.
26:09 Yeah, yeah, nice.
26:09 I want to tell you a little bit about testing.
26:12 Awesome.
26:12 I'm really surprised that you're covering this.
26:15 But okay, yeah, go ahead.
26:15 Yeah, it's interesting.
26:19 Lately, you've been covering the testing articles.
26:21 I know.
26:21 Isn't that my role now?
26:23 No, go ahead.
26:24 This is great.
26:24 Tell us about it.
26:25 Yeah.
26:25 This is a person named Carolyn that wrote an article called From 1 to 10,000 Test Cases
26:31 in Under an Hour, A Beginner's Guide to Proper...
26:35 That's productive.
26:35 And imagine if Carolyn was getting paid by the test, right?
26:39 Like, we're evaluating your bonus for the year.
26:42 Like, I wrote five times as many tests as anyone else, and I just started this month.
26:45 Heck yeah.
26:47 I would totally use...
26:48 If I was paid by the test case, I would definitely use Hypothesis on every project.
26:52 All right.
26:54 So how did she do this?
26:55 What is this property-based testing?
26:57 Okay.
26:57 So hopefully people have heard of property-based testing, but it is...
27:00 So the...
27:01 It's as opposed to, like, what do we call it?
27:05 Example-based testing.
27:07 So...
27:07 And this is kind of how she goes through this discussion.
27:10 It's...
27:11 The article is really just a really excellent introduction to property-based testing and
27:16 using Hypothesis.
27:17 And it's...
27:18 I mean, she's using Hypothesis in the example, but the intent is just property-based testing
27:23 because you can...
27:24 It's the same sort of strategy with every other type of property-based testing library.
27:30 She just happens to be using Hypothesis and Python.
27:32 So that's nice.
27:34 But the...
27:34 She starts off with a unit test example of just doing...
27:39 She has, like, a string sort or a...
27:41 Not a string sort, but a...
27:42 List sort.
27:42 A list sorting thing.
27:44 And if you were doing example-based testing, you just pick a few example tests.
27:49 Example test cases where you would take the input and you know what the sorted output should
27:55 look like and you, you know, run it through the function and make sure the output sort that
27:59 it's equal or equal to the expected one.
28:02 How would you do this with property-based testing?
28:04 And before she goes in...
28:06 And she does give an example of how to write some sort of test like that in property-based
28:11 testing.
28:11 But she stops and pauses and talks about kind of the different mindset.
28:15 You can't test against an exact example because you don't know what example is coming in.
28:20 So you have to think about property.
28:22 So like on a list sort thing, you don't have the exact answer, but you could check to make
28:27 sure that the link should be the same and that you can use sets on both the input and output
28:33 to make sure that the contents of the both are identical.
28:36 And then you can go through the answer and make sure that element-wise, every element i is
28:42 less than or equal to i plus one.
28:44 You know, there's ways to test sort without, you know, without just knowing the answer.
28:49 But it takes a mind shift a little bit.
28:51 And I think actually that's one of the benefits of property-based testing is thinking in terms
28:56 of that also.
28:57 I also think it's nice that she talks about how this isn't a replacement for example-based
29:03 testing.
29:03 It is a complement to it.
29:06 And so you can mix them together.
29:08 Then she goes on to introduce some of the aspects of hypothesis.
29:12 Like there's some cool strategies, like some lists and some integers and being able to set
29:19 the max examples to, so you can set how many.
29:22 And that's where you can just set it to 10,000 and wham, you have 10,000 test cases right away.
29:27 But, and just let hypothesis come up with the examples.
29:31 The real meat of the article, which I really appreciate is just the, how do you, the hard
29:36 part of property-based testing isn't the, some of it's the syntax and she does cover the syntax
29:42 and how to get this done.
29:42 But it's also just how to think about the properties, how to, the coming up with what properties to
29:48 test for is the hard part.
29:49 And so taking a little time to talk about that, I think this is a, is a great thing.
29:54 I'm also glad she threw in that one of the things you could, you should check for with
29:59 tests, property-based testing is making sure exceptions that get raised are expected exceptions.
30:05 So if you throw garbage in or different cases that don't make sense, you should know what
30:11 kind of exceptions are going to come out and that this can be caught with your tests with
30:15 hypothesis.
30:16 And then also a great use for all of this is to implement whatever functionality you wanted
30:22 in a very simplistic, but possibly slow or memory hoggy way or something.
30:27 And then you can compare the elegant version and the slow version within the tests to make
30:33 sure that they come up with the same answer.
30:35 This is also great.
30:36 If you're doing a refactoring, you can refactor part of your system and make sure that the
30:41 old and new way act the same.
30:43 So it's just a good introduction to all of this.
30:46 Yeah.
30:46 And a property-based testing is it's you're right.
30:49 It's such a mind shift and it's, I don't know, I haven't fully embraced it yet, but I feel
30:55 like there's probably some places where it would really be interesting and useful.
30:59 And I probably should just look into it.
31:01 You know, I, I don't know, I get stuck in my ways and then I just, I keep going that way.
31:04 At the end, she talks about if you're not using Python, what options you have as well, which
31:09 is kind of cool.
31:10 Right.
31:11 So it's like, hey, hypothesis is cool in Python.
31:13 But if you're on TypeScript, we got fast check.
31:15 We're on .NET.
31:16 They don't have dashes or A's or T's.
31:19 So there's FS check.
31:20 And in Java, there's this and C++ and Rust and so on.
31:24 So yeah, if it looks like you could use the same thinking and ideas across different parts
31:31 of your stack, if you're having different technologies in there.
31:33 This is another example of if it shows up in every language, it's probably something you
31:39 should be paying attention to.
31:40 So that's a really, that's a good rule of thumb.
31:43 It's like, yeah, if I see it all over the place, right, this is a general CS sort of thing
31:48 that's important.
31:49 Yeah.
31:50 Yeah.
31:50 You know what else I like about going through stuff like this is you come across things
31:54 that you didn't know about, right?
31:56 For example, you'd think that I would know about JSON.
31:58 It seems pretty simple, like the JavaScript object notation.
32:01 But apparently there's like a JSON 5 as well, which allows things like comments and whatnot
32:08 and multi-line strings and single quotes and elements that are not quoted for the keys
32:14 and so on.
32:15 And there's a whole cool library for JSON 5 support as if you want to have like a
32:19 a little bit more human-friendly JSON.
32:22 I had no idea that was a thing.
32:24 Yeah.
32:24 Neither did I.
32:25 And I was just like, why can't I put a comment in JSON?
32:27 This is driving me crazy.
32:28 So what I do is I have like a field that says comment or like double slash in quotes.
32:33 And then I have the string that is the comment because you can't actually have comments, but
32:37 you can have ignored keys and values.
32:40 So that's how I have comments in my JSON.
32:41 But anyway, she talks about using the JSON 5 library that's part of Python to support that.
32:46 Or not.
32:46 It's not built in, but it's a Python library.
32:48 You can use it to do that.
32:49 Pretty cool.
32:49 Yeah.
32:50 Nice.
32:50 Cool.
32:50 Yeah.
32:51 Well, I guess that's it for all of our items, huh, Brian?
32:52 Yeah, it is.
32:53 Got anything extra for us?
32:55 Yeah, I totally did.
32:56 But you nabbed it and put it in your section.
32:58 So go for it.
32:59 Tell us that you found a bunch of cool things there.
33:02 Yeah.
33:02 I want to get this one out of the way first.
33:05 Some sad news.
33:06 Have you, you've heard of Game of Life, right?
33:08 Yes.
33:09 Yeah.
33:09 Conway's Game of Life.
33:10 Yeah.
33:11 Conway's Game of Life.
33:12 Well, Conway, John Conway is, I'm going to link to an article that's a nice article talking about the Game of Life and John Conway.
33:20 But just an announcement that he is one of the victims of COVID-19, died from it recently.
33:27 So that's sad.
33:28 Yeah, it's definitely sad news.
33:30 Game of Life is kind of an excellent thing to have in the computer science realm.
33:34 Pretty neat.
33:35 So that's sad.
33:36 Something that's happy is GitHub is now free for all teams and individuals.
33:42 So that's a pretty cool announcement.
33:43 That's really awesome.
33:44 Yeah.
33:45 So previously you had to pay to have collaborators on a private repo.
33:50 I think maybe you could have some, but not a ton for private.
33:53 I can't remember.
33:54 Three, I think like that.
33:55 Yeah.
33:55 It's like evolving.
33:56 First you had to pay for private repos, then you didn't, but then you had to for collaborators.
34:00 And yeah, but that's awesome.
34:02 So it's much more free.
34:03 And then also for people who still pay GitHub, like me, it's half price.
34:08 It's 40.
34:09 It's, I don't know, whatever four divided by nine is.
34:12 It's now 44% of what you're paying before.
34:14 And people wonder like, why would you pay for GitHub organizations?
34:18 If you have an organization, so like Talk Python and the related training authors and content,
34:24 there's like a GitHub organization for Talk Python.
34:27 Have people collaborate on that.
34:30 You still have to pay, but it was $9 a month per user.
34:32 Now it's $4 a month per user.
34:34 So that's also bonus.
34:35 Yeah.
34:35 Yeah.
34:36 Pretty cool.
34:37 Yeah.
34:37 That's happy.
34:37 Yeah.
34:38 So last thing I wanted to bring up is that the PyCon US 2020 online is now live.
34:45 So there's a welcome video and more.
34:49 There's some talks linked and there's more on the way.
34:51 Yeah.
34:51 There's a nice welcome video from Emily Morehouse that she basically kicks off the virtual conference.
34:57 And this conference, I don't know if that's the right word for it.
35:00 This thing, this event is not like a lot of online virtual conferences.
35:06 Like on Saturday, we're all going to meet.
35:08 And then the talks are going to be these three hours and whatnot.
35:10 It's like, it's like a rolling release of information and videos that then you get to consume over the next couple of weeks.
35:17 So yeah, you're linking to the, basically the landing page for like stuff as it happens.
35:22 Right.
35:22 Yeah.
35:23 And I recommend, so also recommend checking out the, so if you go to any of the, like the welcome video,
35:29 and then go up and find the, the PyCon US 2020 top page and look at the videos there, then you can see them all listed as well.
35:39 but they're, they're rolling out.
35:41 There's, and I know that they're not all recorded.
35:44 So some will come later.
35:45 For instance, I am still, I don't know if I will, but I'm still planning on recording my talk and posting it, just trying to figure out when to do that.
35:54 So.
35:54 Yeah.
35:55 Yeah.
35:55 Cool.
35:55 Anyway, I'm definitely looking forward to checking it out and see what comes along.
35:58 There's also, it's worth mentioning that they're at that link.
36:02 There's a place that has like the virtual expo and the expo hall is actually my favorite part of the conference is because you get to walk around and meet people and just, you know, see what's going on and you see all the companies and what they're doing.
36:14 But one of the things that happens there on Sunday in normal times is there's the like hiring job fair thing and all the job fair stuff is already up there.
36:25 So if people are looking for a Python job, there's like many, many links of this company's hiring for these four positions.
36:32 Click here.
36:33 This company's hiring for this position.
36:34 So if you're looking for a job, you want to get in there quick and, grab the good ones and apply to them.
36:40 Yeah.
36:40 One of the things that that's missing is how am I going to last an entire year with no new t-shirts?
36:46 I know.
36:47 Well, you're going to have to be up in your game there in this video version here.
36:52 I know.
36:53 I love all the conference swag.
36:55 Yeah, exactly.
36:56 Like, how do you even do that?
36:58 How do you even find a good tech shirt?
36:59 Like that you buy?
37:00 I know that there, but it'll be different.
37:04 Well, you want to know something that wasn't funny is I almost forgot to put a joke in our show notes here.
37:09 Oh no.
37:09 So I pulled up the terminal and I typed pie joke because I've pipX installed pie joke.
37:15 So it's, it's right there in the command line.
37:18 Anytime you need a laugh.
37:19 And this one is about QA software quality folks.
37:22 And, it's a take on a traditional one.
37:25 So here, I'll hit you with this.
37:26 See what you think.
37:27 How many QAs does it take to change a light bulb?
37:29 I don't know.
37:30 They noticed the room was dark.
37:31 They don't fix problems.
37:32 They find them.
37:33 Oh dear.
37:37 That's bad, right?
37:38 Yeah.
37:39 That's definitely why QA and development should be one team.
37:41 Absolutely.
37:42 Yeah.
37:43 All right.
37:43 Well, a good joke.
37:44 Nonetheless, a good pie joke.
37:46 Thanks.
37:47 Well, this was lovely today.
37:49 So thanks for talking with me.
37:50 Yeah, absolutely.
37:51 Thanks.
37:51 As always.
37:52 Great to chat with you.
37:53 See you later.
37:53 Bye.
37:53 Thank you for listening to Python Bytes.
37:55 Follow the show on Twitter at Python Bytes.
37:58 That's Python Bytes as in B-Y-T-E-S.
38:01 And get the full show notes at Pythonbytes.fm.
38:04 If you have a news item you want featured, just visit Pythonbytes.fm.
38:08 and send it our way.
38:09 We're always on the lookout for sharing something cool.
38:11 This is Brian Okken.
38:12 And on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast
38:16 with your friends and colleagues.