Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #178: Build a PyPI package from a Jupyter notebook

Return to episode page view on github
Recorded on Wednesday, Apr 15, 2020.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 178 recorded April 15th, 2020. I am Brian Okken.

00:11 I'm Michael Kennedy.

00:12 And this episode is brought to you by DigitalOcean. Who's first? I think I got my notes wrong.

00:18 Yeah. Well, I want to talk about something really quick before we actually get to the first one.

00:22 So we'll see.

00:23 Okay.

00:23 I just want to tell people about the YouTube channel. And obviously, if people are watching on YouTube, they might know about the YouTube channel.

00:29 But most people subscribe to our podcast and we are multicasting and repurposing what we're doing here on YouTube.

00:36 We talked a little bit about it last time.

00:37 So basically each individual item is now a separate YouTube video.

00:42 And you can watch Brian and me talk about it if you want to consume in that format and have a little bit of video and admire Brian's awesome shirts because he's got a bunch he's going to be wearing throughout these different shows and it's going to be awesome.

00:54 - Oh, you didn't have to set it up like that, man.

00:56 I only have like one good shirt.

00:58 - People loved the shirt for the first video we shared.

01:01 That was like several comments about, dude, your shirt is awesome.

01:05 - Yeah, go figure.

01:06 Okay, so we're trying to teach you about Python also.

01:09 - That's right, and fashion.

01:11 - Shirts.

01:12 - And fashion.

01:12 - Yeah.

01:13 - Yeah, it's at pythonbytes.fm/youtube.

01:15 People could check that out.

01:15 - Well, tell me about strings, Python.

01:17 I mean, Michael.

01:18 - I'll tell you about Python strings.

01:21 You know what, strings are confusing, man.

01:23 Especially when they're about numbers and dates.

01:27 Especially dates.

01:28 So this seems to be a problem that vexes me permanently.

01:32 And we talked about is programming Googling, right?

01:37 And our consensus was maybe in the early stages of your career, there's a lot of Googling, but no, not really.

01:44 You mostly just sit down and you think about the problems and you write the code and you evolve the code.

01:49 There's a lot of reading code before you actually do much writing anyway.

01:53 But this topic in particular, I'm all about Googling this all the time.

01:58 So Python has a datetime.strptime for string parse time.

02:03 You give it some piece of text like Wednesday, April 15th, comma, 10 colon 30 a.m. without a space.

02:13 I want to take that and turn it into a datetime so I can maybe compare it to something else, right?

02:18 Like another time.

02:19 How many days is that from now?

02:20 Is that in the future?

02:21 Is it in the past?

02:21 I just need to store it in the database as not a string, but a datetime because I want to order by it.

02:26 I want it to be alphabetical, right?

02:27 There's all sorts of reasons you need to get a date/time from strings or go in the reverse.

02:32 And yet, the format, that strptime has a, it has a format string that tells it how to look at the string and then pull the pieces out.

02:44 So, would you know about that example I told you about?

02:47 Like the Wednesday, April 15th, 10.30 a.m.?

02:49 - That's definitely something I Google every single time.

02:52 - Every time, and it's never quite right.

02:54 So just for those of you listening, you really want to know, it's %A space %B space %H comma space %M colon %S AM.

03:01 Woo, who would ever come up with that?

03:06 - Well, I mean, these are intentionally short.

03:09 - Yes, I know.

03:10 - So that they don't take up too much room, but they're, and they sort of make sense.

03:14 It's just, it's arbitrary, whether it's a capital Y or a lowercase y or capital D or lowercase d.

03:19 - Right, and there's documentation you can go find, like if you want the three-letter date, day of the week, that's a percent a lowercase and whatnot.

03:26 But putting that all together can be tricky.

03:28 So what I want to tell you about is this website by Lachlan Eaglin, and it's, let me see what the URL is.

03:36 It's pie-stir-f-time.com, which of course is linked in the show notes.

03:42 And the idea is you put the text you want to parse, like Wednesday, 15 April 2020 at the time or whatever, you put that in there and you hit go, And it will tell you that complex string that I told you was the right answer to my parsing problem.

03:59 - Yeah, it's awesome.

04:01 - Yeah.

04:01 - You just put in like the output that you want it to happen, and it tells you the magical incantation.

04:07 - Right, right, or the format of the thing you wanna parse, and depending on whether you're trying to go to that string or from that string.

04:13 But yeah, super, super handy.

04:16 This bad boy is bookmarked for me for sure, because this is way better than Googling.

04:20 I can put it in there, it gives me a quick, quick answer, I can throw it into a Python REPL and see, hey, did it work or did it not work?

04:27 It's really easy.

04:28 - Yeah, nice.

04:29 - Yeah, so not super complicated, but very handy.

04:32 So people can bookmark that and try it out.

04:34 - Well, I want to tell you about something easier also.

04:38 I got to thank Jack.

04:39 Jack McHugh has sent us a few suggestions and they're usually pretty darn nice.

04:44 And here's this one's from Jack, pandas-bokeh.

04:48 Bokeh?

04:49 Oh, I forget how to say that.

04:51 - I love the logo, Panda's Bokeh.

04:54 I say Bokeh, I don't know.

04:56 You know, but it's like that F-stop difference where like the person in a portrait is like crisp but the background is faded.

05:01 And the logo is Panda's clear Bokeh, like in the background faded.

05:05 It's beautiful.

05:06 - That's a pretty cool logo.

05:07 So I'm gonna quote some from their website or the readme.

05:12 It says, "Panda's Bokeh provides a Bokeh plotting backend for pandas, geopandas, and PySpark data frames similar to the already existing visualization features of pandas.

05:23 Importing the library adds a complimentary plotting method, plot_bokeh on data frames and series.

05:30 Okay, so it's already built-in and all it provides is plot_bokeh, another function on it. What's the big deal?

05:39 Well, it's so cool, it's so easy.

05:42 I tried out some of these examples this morning, and it's just a little tiny bit of code.

05:47 And you call, like you've got a data frame, and you call plot bouquet on it, and it pops open like an interactive graph that you can look at everything.

05:57 It's actually pretty incredible.

05:58 You have to do something a little different.

06:00 You plot bouquet.

06:01 But if you want the normal plotting to do the same awesome stuff that it's built in, you can set an option, one of the pandas options, to switch out the plotting back end.

06:13 So that's neat.

06:13 So apparently what it's really doing is switching out the back end.

06:17 And to me, I mean, plotting is not terribly difficult, but this interface, at least for me, it makes it a lot easier.

06:25 Instead of having to work with frames and plots to just call this thing.

06:29 And then all the different options you can have, you can, you know, different point, you don't want it to look like an asterisk instead of a point or something, all are different color, different scale, or different titles.

06:41 All that stuff is options you can pass into the plot function.

06:45 And the other thing that I like, a few more things, one of them is you just, when you pip install pandas-boca, it pulls everything in, 'cause all the rest of the stuff is dependent on it.

06:56 So you get all of it just for a simple install.

07:00 And it also generates, it's able to do this interactively, but you can also generate notebooks.

07:06 Yeah, you can generate notebooks, and you can also generate standalone HTML files with this in it.

07:12 - This is really cool.

07:13 And yeah, the fact you can generate standalone HTML, there's probably ways to plug it into Flask sites, you know, Python websites and whatnot, pretty straightforward.

07:23 And the interactive bit is super nice.

07:25 I mean, this is not about Handa's interaction per se, this is just Bokeh, right?

07:31 Being very cool and interactive.

07:32 But you can zoom, you can pan as you move around, you know, like it'll show you the marks on the graph and you can sort of hide and show elements.

07:41 And there's even a cool example where they're showing the stock price of Apple versus Google.

07:47 And as you put the cursor along, it has the Apple logo next to Apple and the information, like a little card that talks about it over time.

07:55 Man, this is nice stuff.

07:57 And all you got to do is point it at a data frame, not bad.

08:00 - Yeah, and they've got a whole bunch of examples on there.

08:02 They have repo with a bunch of working examples too.

08:06 Obviously for the examples, the data is just sort of random data that they're throwing in there.

08:11 But once you know how to get your data, this does the rest of the work for you.

08:15 So it's cool.

08:15 - Very cool.

08:16 Yeah, it's just a great one.

08:17 And thank you, Jack, for recommending it.

08:18 And yeah, it's a good one, Brian, for pulling it out.

08:20 - We've had DigitalOcean as a sponsor for a while, and we really want to thank them.

08:24 They've really helped us out a lot, and plus, they're pretty darn cool.

08:28 So thank you, DigitalOcean, for sponsoring this episode and many others.

08:32 And in the past, we've told you about a lot of awesome things with DigitalOcean, like their one-click install, Kubernetes cluster support, their amazing new support center and help documentation that's been around for a while.

08:45 And our podcast runs on DigitalOcean and we're thrilled with it.

08:49 And so if your business or your side project deserves great hosting and growth that will grow with you and let you scale affordably, I really definitely want you to swing by pythonbytes.fm/digitalocean to grab the $100 credit for new users.

09:06 But there's something else I want to tell you about DigitalOcean that's really cool.

09:09 They've got something they've started recently that's called Hub4Good, and it's designed to support COVID-19 relief efforts where DigitalOcean through this is supplying $100,000 in infrastructure credits for new not-for-profit projects.

09:26 They're also giving 50K to COVID-19 Relief Fund, their own relief fund, but still it's really cool.

09:34 And they're also trying to raise awareness for COVID-19 related projects and provide learning for developers and also provide visibility for these projects.

09:45 And so I headed over there this morning and checked it out and there's a bunch of cool projects starting out that are related to COVID-19.

09:53 It's not just this sort of stuff, but it's things like there's even a platform to help teachers interact with students during quarantine.

10:01 A lot of cool projects through this.

10:03 So thank you, DigitalOcean.

10:04 - Yeah, this is a great project.

10:06 and obviously their infrastructure is great and we love it, but this is very cool too.

10:10 I didn't know about this.

10:10 - Yeah, it's pretty neat.

10:11 - Yeah, so speaking of not knowing, I feel like I've been kind of exploring the cave of Python, which is large and vast, and I just come on like a whole 'nother area.

10:23 I'm like, it opens up, like what is this?

10:25 How have I not known about this?

10:27 And this is NBDev.

10:29 Have you heard of NBDev?

10:30 - No.

10:31 - Yeah, okay, so let me tell you about it, and I'll get your impressions later.

10:35 So NBDev takes notebooks and basically makes them on par with writing proper Python packages and solves all these different problems.

10:47 It lets you generate what's gotta be some of the best documentation, period, for that library that is sort of backed by a notebook.

10:56 So it lets you develop full Python packages and libraries and notebooks where you can have your code, your unit tests, and your documentation all in one place. But then you can take it and pip install, you can upload it to pip and make it a pip install of a library that people have no idea that it came from a notebook.

11:13 Wow.

11:14 Is that crazy or what?

11:15 That's awesome. I gotta check that out.

11:17 Yeah. And you know, you think about this idea of notebooks and to me, notebooks like burst on the scene in the 2010-ish era, maybe 2012, 2011, like that timeframe. But this project references this concept envisioned by Donald Knuth way back in 1983.

11:35 And it says, "Notebooks finally made literate programming," this concept by Donald Knuth, "a thing." So, you know, the old is new again, but in a really cool way.

11:44 And to me, this seems like just such a massive upgrade to notebooks.

11:47 So, notebooks have a bunch of challenges, in my view.

11:51 Like, I can't use a proper editor with it.

11:53 Like, if I were to use PyCharm or VS Code, and all of its navigation, and its cool git blame, and history, and all this stuff is just not present.

12:02 Documentation, I think that actually it really works well there, but it doesn't tie the documentation of the notebook to parts of functionality that might be created by the notebook, which is cool.

12:14 One of the biggest problems with notebooks, it's a benefit, but it's a big problem, is if you run a notebook, it stores the output in the notebook.

12:22 So if you had a bokeh plot, or you had a print of a data frame, that is in there and now it's part of it.

12:29 So if I'm working on a project and you're working on the same project and we both run the notebook at different times or the same time but separately and it for some reason generates different results, that's a merge conflict in Git.

12:43 Right, so basically you cannot use notebooks in like a sane way with Git because anytime you work with it, if you're not careful and like don't remove all the output before you save it, it's gonna be a merge conflict.

12:57 So this project has a git pre-commit hook that will remove that problem.

13:02 So right before it gets committed, it'll automatically do the cleaning of that metadata output so it'll never have that as a conflict.

13:10 It also has an ability to like a CLI go, just accept it.

13:14 I just accept all the metadata changes.

13:17 Mine are just right or whatever, right?

13:19 So it also has a CLI to automatically fix that but if you do have those problems, but it also has this pre-commit help to avoid them entirely.

13:26 - Nice, that's a nice use for pre-commit too.

13:28 - Yeah, it's super clever.

13:29 So if I write a function in the notebook, I can put hash export in that cell, and that becomes a public function in the package.

13:37 - Oh, cool.

13:38 - Right, so I write documentation and pictures, and I say hash export.

13:41 Now that's part of my library that I'm building.

13:44 It also lets you create the structure for Python packages.

13:48 So you have the setup py, and you can do the build wheels and whatnot automatically out of that.

13:55 And it uses this exported stuff.

13:57 You can have your unit tests in your notebook, which is pretty cool, for the things that are in there.

14:02 And then finally, you can edit it.

14:04 You can take the edited library, or the library that exported, sorry, and then edit it with PyCharm or VS Code, and then reverse export it.

14:13 So what you can do is like, say, push the changes that I've done with my editor back into the segments of the notebook where that code came from.

14:20 - Oh man, okay.

14:21 I'm a little confused, but I gotta try it out.

14:23 - Yeah, you gotta kinda read through it to get the sense, but there's just a bunch of stuff, like all these things seem like, yes, you should have been able to do that with notebooks, but obviously, right, that's not their origins, right?

14:32 They can't do everything at once, but all of these things seem awesome to me.

14:36 Yeah, so in order to get started, it's going to basically create a Git repo for you, is my understanding, either on GitHub or GitLab.

14:43 So you gotta follow the getting started instructions, and then you click a button, and it'll generate the repo in the right structure, or you can use the CLI tooling to generate the right repo with things like the Git commit pre-hooks and whatnot.

14:56 And if you're gonna read the docs, check out nbdev.fast.ai, 'cause this comes from FastAI, people, the same one as the build the FastAPI framework.

15:08 So some of the docs render better.

15:10 There's certain things on GitHub that it says, and here's a cool picture, and it's just like source code.

15:15 It's not quite right.

15:15 So maybe check out the final link at the bottom in this section to get to, if you're gonna like browse through it, but it's basically, you get the same thing out of GitHub.

15:25 Anyway, this to me seems like a massive improvement for notebooks and sort of brings them more into, I can do things, like for example, you can now have your notebook and its tests running as part of continuous integration.

15:42 Like so these notebooks are now like full participants and CICD, you can upload, you can create packages and put them on PyPI.

15:49 There's all sorts of neat stuff.

15:51 The document, like if you have a cool graph as part of your notebook, that can become the documentation on PyPI or read the docs for those functions. - That's crazy.

16:01 - I mean, it's crazy cool how this is like taking some of the awesome parts of notebooks, like the doc side and turn that into the help docs and then also letting you export the functionality still as a proper CS type thing.

16:13 - Yeah, I definitely gotta check this out.

16:15 - How did I not even know this existed?

16:17 Like, this is awesome.

16:18 - Well, I don't know how long, I mean, it looks like--

16:20 - It looks like five months to me, is my guess.

16:21 - Okay, so we're not that behind the bombs.

16:23 - No, we're not that behind, yeah.

16:25 - But this looks neat.

16:26 - Yeah, it's very neat.

16:27 - Plus Fast.ai is pretty cool, so I think this is probably pretty solid.

16:31 - Yeah, I agree, it's definitely got some solid people behind it, so very cool, very cool.

16:35 Anyway, in BDEV, quite neat.

16:37 - I wanna talk about something a little not neat, a little lighthearted.

16:42 So this is sort of a serious topic, but this is a article from Sebastian entitled "Stop Naming Your Python Modules 'Utils'".

16:52 And I don't think we've, I don't know if we've covered it before, but it's good advice and it's something that happens.

16:58 Basically a lot of projects, public or private, will at some point end up having a utils.py or a utils package or something.

17:06 And this article is just saying, resist the urge.

17:11 Utils is arguably one of the worst names for modules because it's very blurry and imprecise.

17:17 Some of the names did not say what, such a name does not say what the purpose of the code inside is.

17:23 And on the contrary, Utils module can, well, contain almost anything.

17:28 By naming a module Utils, software developer lays down perfect conditions for an incohesive, uncohesive, whatever, code blob.

17:38 And I have definitely seen this in action.

17:40 I have been one of the culprits before of having a pulling out a little helper function that I had in one file, and I wanted to use it in a different module.

17:52 So I didn't know where to put it.

17:53 So I stuck it in a utils.py, added a couple more.

17:57 So there's just a few methods.

17:58 And I come back six months later, and there's like a couple dozen just junk drawer functions from all over the place in there.

18:07 So if you start, people will add junk to it.

18:10 So Sebastian lists a few excuses.

18:13 It's just one function, but it grows.

18:16 There's no other place in the code to put it.

18:18 Well, try harder.

18:19 And I need a place for company comments.

18:21 I don't really know what that means, but name it company or something.

18:25 And also Django does it.

18:27 Well, I don't know if you're a wealth, maybe they shouldn't have, but they have it now, so they're not going to change it.

18:32 So the advice is to try name, try grouping your utility functions and naming them based on the role of how you're going to use it, or possibly group them in themes.

18:45 And also, if you see a utils.py crop up in a code review, just request that the person rename it to something else, if possible.

18:55 - Just set up a CI rule to break the build if you see that file name.

18:58 (laughing)

18:59 - Yeah, so what are your thoughts on this, on utils?

19:02 - I agree with Sebastian, absolutely.

19:04 I understand the challenge, because naming things in software is hard, But naming things in software is super important because when you think about, like even just like function names or class names or whatever, right?

19:19 Usually what will happen is we'll get like a crummy, vague name and then a comment describing what they are doing.

19:26 And you're like, well, why don't you just make the name a little bit longer that says what it does.

19:30 And utils is kind of like the generic catch all of saying like, well, I couldn't come up with a name.

19:36 So here it is.

19:37 We're just going to drop it here.

19:38 And in my code, I have tons of different areas in which I organize it, sort of like submodules, I guess, or subpackages if it's a package, but sometimes it's not technically a package.

19:52 And I try to come up with names that are meaningful.

19:55 I have something called number converter that will try to parse an integer or return a default value instead of throwing an exception, or it'll try to parse some other thing, or maybe it's called conversions.py or whatever, but it's not like utils.

20:08 There's usually some kind of better structure you can find that will help you do this.

20:15 You know, there's that joke that, you know, naming things in computer science, that's one of the hardest problems, right?

20:21 I do agree with that, but yeah, it's worth the effort when you get it figured out.

20:25 - If you don't believe me, you can just try it sometime.

20:27 If you're working on a group project, just put one function in utils and you will see it grow.

20:32 And you'll have to find it.

20:33 - Is this like the broken window theory of software?

20:36 - Yeah, definitely.

20:38 - Yep, and misc doesn't count either.

20:40 If you'd name it misc, it's just as bad.

20:42 - That's right.

20:43 Yeah, there's probably some synonyms here in the code world that don't count.

20:48 So yeah, awesome.

20:49 I want to tell you about this one next that helps with performance or understanding the performance more specifically of your code.

20:56 So I don't know how much profiling you guys do your work.

21:00 How much does performance matter to you guys?

21:03 - It matters a lot.

21:03 - Yeah? - Yeah.

21:05 I'm building things that go into testing in a production line.

21:10 So every millisecond that it takes, takes a millisecond longer to get something shipped.

21:16 So yes, it matters.

21:17 - Yeah, it matters.

21:18 I'm supposed to mostly spend my time on the web and obviously it matters there, right?

21:22 Like every hundred milliseconds, I think Amazon measured is like 1% loss of orders or something ridiculous like that, right?

21:29 Like, so understanding your performance is good.

21:32 We've had good, good in quotes, profilers for Python, and they typically tell you about this function spent this much time.

21:42 But another challenge is, my program is using too much memory or worse, it's something long running like a web app or some background process, and it's like growing.

21:52 It's like sort of leaking memory.

21:56 Why is that?

21:57 So I came across this project called Scaling, which is a high performance and high precision CPU and memory profiler for Python.

22:07 - Cool.

22:07 - Yeah, so it lets you either analyze CPU time or it actually lets you on a line by line basis say here's some memory, what line made this and where's it coming from.

22:19 Yeah, and so that's cool.

22:21 But one of the challenges for profiling is when you're profiling your code, you can make it, you don't get the same behavior.

22:28 It's sort of like the Heisenberg uncertainty principle.

22:31 It does one thing, but when you measure the profiler, you've changed it, so now you kind of got to say, well, that part where it was the network, that was 50%, but now you made the computational bits way slower, so that network part looks just like 20.

22:45 You're affecting it.

22:47 So for example, if you use profile, the built-in profile, it can make your code 30 times faster, or a simple scenario, than running it normally.

22:56 But you can use C profile, which is the C-based one that's built in.

22:59 It only slows it down by 1.65 times.

23:03 So that's not too bad.

23:04 There's a line profiler that's 11 times slower.

23:07 And there's a whole bunch of other ones.

23:08 There's a memory profiler that's like over a thousand times slower.

23:13 So the scaling thing has a nice comparison to all these things.

23:17 It says, well, how does scaling do?

23:20 And it claims that it's got this built-in library that's much faster.

23:23 for CPU stuff, it's 1.04 times the speed.

23:26 So, like 4% slower.

23:29 And it does that through sampling, right?

23:31 It doesn't do instrumentation, it doesn't rewrite the stuff, it actually just asks frequently, like, "Hey, where are you in the code?" But it still gets per line analysis to that, which is pretty cool.

23:41 And then the memory one is like another 10% slower because analyzing memory is hard.

23:46 But yeah, there's all sorts of cool stuff you can do with it.

23:49 The overhead is not too bad, The precision is pretty good.

23:53 So like I said, it gives you like a line by line level of how much time you're spending in various places.

23:59 It also is interesting in that it separates out the time spent running Python code from native code, including like the base libraries and stuff.

24:07 So you can say, like, I can only affect the Python stuff.

24:11 The other stuff is not a thing I can deal with.

24:14 So yeah, don't tell me about it or punish me for it.

24:18 Or maybe I do want to look at it, right?

24:19 Tell me about that.

24:21 - So that's pretty cool.

24:22 And then also, the memory stuff, I think is pretty cool.

24:25 So it says it points to specific lines of code responsible for memory growth.

24:29 And it's important it does this through a special memory allocator thing that comes with it.

24:36 And so while you can pip install scalene, you can't inspect the memory allocation that way.

24:41 You have to go and install it directly and do some more setup.

24:45 On macOS you can do brew install.

24:46 There's instructions in there on how to do that.

24:48 On other OS's I have no idea what you do.

24:51 but you can't run the memory allocation directly.

24:54 You can't just say pip install it and then do the memory allocator.

24:57 There's something other like lower subsystem that has to get installed for that to work.

25:00 - Yeah, and memory is an interesting one because it's a difficult one to chase down with Python.

25:06 - Yeah, it's very hard in Python 'cause everything is a pointer, everything is an indirection.

25:11 It's not like, well, here's the block where we allocated this object or whatever, right?

25:16 Like it's pretty indirect.

25:18 And you don't typically have a hold of pointers in the like main address sense of it, like you do in C or something, right?

25:25 So it's, yeah, it's challenging.

25:27 I would love to see this integrated into PyCharm and VS Code.

25:30 - Oh, yeah.

25:31 - Right, right now it just gives you, it gives you like a cool tabular text output or file output, but if you could just right click in PyCharm and say analyze with scalene, that'd be sweet.

25:42 - Yeah, I wonder, and also that would solve some of the install thing.

25:45 So if you have to install it separately, some integration with PyCharm VS Code would be cool.

25:50 - Right, like right now you can do profiling and it's really awesome in PyCharm but I'm pretty sure it uses C Profile so, yeah, who knows, someday maybe.

25:58 - Hey, while we're talking about editors, I don't know about VS Code but I do know, backing up a little bit, I do know that PyCharm does open notebooks okay, so.

26:07 - Awesome, yeah.

26:08 - Just backed that, anyway.

26:09 - Yeah, yeah, nice.

26:10 - I want to tell you a little bit about testing.

26:12 - Awesome, I'm really surprised that you're covering this but okay, yeah, go ahead.

26:16 (laughing)

26:18 - Yeah, it's interesting.

26:19 Lately, you've been covering the testing articles.

26:21 - I know, isn't that my role now?

26:23 No, go ahead.

26:24 This is great, tell us about it.

26:25 - Yeah, this is a person named Carolyn that wrote an article called "From 1 to 10,000 Test Cases in Under an Hour, Beginner's Guide to Proper--

26:35 - That's productive.

26:36 And imagine if Carolyn was getting paid by the test, right?

26:39 Like, we're evaluating your bonus for the year.

26:42 Like, I wrote five times as many tests as anyone else and I just started this month.

26:46 Yeah, I would totally use if I was paid by the test case, I would definitely use hypothesis on every project.

26:52 So how did she do this? What is this property based testing?

26:57 Okay, so hopefully people have heard of property based testing, but it is. So the it's as opposed to like, what do we call it example based testing. So, and this is kind of how she goes through this discussion. It's it's as the article is really just a really excellent introduction to property based testing and using hypothesis. And it's, I mean, she's using hypothesis in the example, but the intent is just property-based testing because you can, it's the same sort of strategy with every other type of property-based testing library. She just happens to be using hypothesis in Python. So that's nice. But she starts off with a unit test example of just doing, she has like a string sort, or not a a string sort, but a list sorting thing.

27:44 And if you're doing example based testing, you just pick a few example tests, example test cases where you would take the input and you know what the sorted output should look like.

27:56 And you run it through the function and make sure the output assert that it's equal or equal to the expected one.

28:02 How would you do this with property based testing?

28:05 And before she goes in, and she does give an example of how to write some sort of test like that and property-based testing, but she stops and pauses and talks about kind of the different mindset.

28:16 You can't test against an exact example because you don't know what example is coming in.

28:20 So you have to think about properties.

28:22 So like on a list sort thing, you don't have the exact answer, but you could check to make sure that the link should be the same and that you can use sets on both the input and output to make sure that the contents of the both are identical.

28:37 And then you can go through the answer and make sure that element-wise, every element i is less than or equal to i plus one.

28:44 You know, there's ways to test sort without just knowing the answer.

28:49 But it takes a mind shift a little bit.

28:51 And I think actually that's one of the benefits of property-based testing is thinking in terms of that also.

28:58 I also think it's nice that she talks about how this isn't a replacement for example-based testing.

29:04 It is a complement to it.

29:06 And so you can mix them together.

29:08 Then she goes on to introduce some of the aspects of hypothesis, like there's some cool strategies like some lists and some integers and being able to set the max examples so you can set how many, and that's where you can just set it to 10,000 and wham, you have 10,000 test cases right away.

29:28 But, and just let hypothesis come up with the examples.

29:31 The real meat of the article, which I really appreciate is just the hard part of property-based testing isn't the, some of it's the syntax, and she does cover the syntax on how to get this done, but it's also just how to think about the properties, how to, coming up with what properties to test for is the hard part.

29:50 And so taking a little time to talk about that, I think this is a great thing.

29:54 I'm also glad she threw in that one of the things you should check for with tests, property-based testing, is making sure exceptions that get raised are expected exceptions.

30:06 So if you throw garbage in or different cases that don't make sense, you should know what kind of exceptions are gonna come out and that this can be caught with your tests with hypothesis.

30:16 And then also a great use for all of this is to implement whatever functionality you wanted in a very simplistic, but possibly slow or memory hoggy way or something.

30:28 And then you can compare the elegant version and the slow version within the tests to make sure that they come up with the same answer.

30:35 This is also great if you're doing a refactoring, you can refactor part of your system and make sure that the old and new way act the same.

30:43 So it's just a good introduction to all of this.

30:46 - Yeah, and a property-based testing is, you're right, it's such a mind shift and it's, I don't know, I haven't fully embraced it yet, but I feel like there's probably some places where it would really be interesting and useful I probably should just look into it.

31:01 I don't know, I get stuck in my ways and then I just keep going that way.

31:05 At the end, she talks about if you're not using Python, what options you have as well, which is kind of cool.

31:11 So it's like, hey, hypothesis is cool in Python, but if you're in TypeScript, we got FastCheck.

31:16 We're on .NET, they don't have dashes or A's or T's, so there's FSCheck.

31:21 And in Java, there's this and C++ and Rust and so on.

31:25 So yeah, it looks like you could use the same thinking and ideas across different parts of your stack if you're having different technologies in there.

31:34 - This is another example of, if it shows up in every language, it's probably something you should be paying attention to.

31:41 - That's a really, that's a good rule of thumb.

31:43 It's like, yeah, if I see it all over the place, right, this is a general CS sort of thing that's important.

31:49 Yeah. - Yeah.

31:50 - You know what else I like about going through stuff like this is you come across things that you didn't know about, right?

31:56 For example, you'd think that I would know about JSON.

31:59 It seems pretty simple, like the JavaScript object notation.

32:02 But apparently there's like a JSON5 as well, which allows things like comments and whatnot, and multi-line strings, and single quotes, and elements that are not quoted for the keys, and so on.

32:16 And there's a whole cool library for JSON5 support, as if you wanna have like a little bit more human-friendly JSON.

32:22 - I had no idea that was a thing.

32:24 - Yeah, neither did I.

32:25 And I was just like, why can't I put a comment in JSON?

32:28 This is driving me crazy.

32:28 So what I do is I have like a field that says comment or like double slash in quotes.

32:33 And then I have the string that is the comment 'cause you can't actually have comments, but you can have ignored keys and values.

32:40 So that's how I have comments in my JSON.

32:42 But anyway, she talks about using the JSON5 library that's part of Python to support that.

32:46 It's not built in, but it's a Python library.

32:48 You can use to do that.

32:49 Pretty cool.

32:50 - Yeah, nice.

32:51 - Yeah, well, I guess that's it for all of our items, huh, Brian?

32:53 - Yeah, it is.

32:54 Got anything extra for us?

32:55 - Yeah, I totally did, but you nabbed it and put it in your section, so go for it.

32:59 (laughing)

33:00 That tells me you found a bunch of cool things there.

33:02 - Yeah, I want to get this one out of the way first.

33:05 Some sad news.

33:07 You've heard of Game of Life, right?

33:08 - Yes, yeah, Conway's Game of Life.

33:10 - Yeah, Conway's Game of Life.

33:12 Well, Conway, John Conway, is, I'm going to link to an article that's a nice article talking about the Game of Life and John Conway, but just an announcement that he is one the victims of COVID-19 died from it recently.

33:27 So that's sad.

33:29 - Yeah, it's definitely sad news.

33:30 - Game of Life is a kind of a excellent thing to have in the computer science realm.

33:35 Pretty neat.

33:36 So that's sad.

33:37 Something that's happy is GitHub is now free for all teams and individuals.

33:42 So that's a pretty cool announcement.

33:44 - That's really awesome, yeah.

33:45 So previously you had to pay to have collaborators on a private repo.

33:50 I think maybe you could have some, but not a ton per private.

33:53 I can't remember.

33:54 - Three, I think like that.

33:55 - Yeah, it's like evolving.

33:56 First you had to pay for private repos, then you didn't, but then you had to for collaborators.

34:01 Yeah, but that's awesome.

34:02 So it's much more free.

34:03 And then also, for people who still pay GitHub, like me, it's half price.

34:08 It's 40, it's, I don't know, whatever four divided by nine is.

34:12 It's now 44% of what you were paying before.

34:15 I have people wondering, like, why would you pay for GitHub?

34:17 Organizations.

34:19 If you have an organization, So like Talk Python and the related training authors and content, there's like a GitHub organization for Talk Python.

34:27 Have people collaborate on that, you still have to pay, but it was $9 a month per user. Now it's $4 a month per user.

34:34 So that's also bonus. Yeah. Yeah. Pretty cool. Yeah, that's happy.

34:37 Yeah. So the last thing I wanted to bring up is that the PyCon US 2020 online is now live.

34:45 So there's a welcome video and more talk.

34:49 - There's some talks linked and there's more on the way.

34:51 - Yeah, there's a nice welcome video from Emily Morehouse that she basically kicks off the virtual conference.

34:57 In this conference, I don't know if that's the right word for it, this thing, this event, is not like a lot of online virtual conferences, like on Saturday, we're all going to meet and then the talks are going to be these three hours and whatnot.

35:10 It's like a rolling release of information and videos that then you get to consume over the next couple weeks.

35:17 So yeah, you're linking to basically the landing page for stuff as it happens, right?

35:22 - Yeah, and I recommend, so also recommend checking out the, so if you go to any of the, like the welcome video, and then go up and find the PyCon US 2020 top page and look at the videos there, then you can see them all listed as well.

35:39 But they're rolling out, and I know that they're not all recorded, so some will come later.

35:46 For instance, I am still, I don't know if I will, but I'm still planning on recording my talk and posting it, just trying to figure out when to do that.

35:55 - Yeah, yeah, cool.

35:56 - Anyway. - I'm definitely looking forward to checking it out and see what comes along.

35:59 There's also, it's worth mentioning that there, at that link, there's a place that has the virtual expo.

36:06 And the expo hall is actually my favorite part of the conference.

36:09 It's 'cause you get to walk around and meet people and just see what's going on, and you see all the companies and what they're doing.

36:15 But one of the things that happens there on Sunday in normal times is there's the hiring job fair thing.

36:23 And all the job fair stuff is already up there.

36:26 So if people are looking for a Python job, there's many, many links of this company's hiring for these four positions, click here.

36:33 This company's hiring for this position.

36:34 So if you're looking for a job, you want to get in there quick and grab the good ones and apply to them.

36:40 - Yeah, one of the things that's missing is how am I gonna last an entire year with no new t-shirts?

36:46 - I know.

36:47 Well, you're gonna have to be up in your game there in this video version here.

36:53 I know, I love all the conference swag.

36:56 Yeah, exactly, like how do you even do that?

36:58 How do you even find a good tech shirt that you buy?

37:01 I know that out there, but it'll be different.

37:04 Well, you wanna know something that wasn't funny is I almost forgot to put a joke in our show notes here.

37:09 - Oh no, you found one.

37:11 I pulled up the terminal and I typed Pyjoke, because I have PIPX installed Pyjoke.

37:18 So it's right there in the command line, anytime you need a laugh.

37:23 This one is about QA, software quality folks.

37:25 And it's a take on a traditional one.

37:28 So here, I'll hit you with this, see what you think.

37:30 How many QAs does it take to change a light bulb?

37:32 - I don't know.

37:34 - They notice the room was dark.

37:35 They don't fix problems, they find them.

37:36 (laughing)

37:39 - Oh dear.

37:38 - That's bad, right?

37:38 - Yeah, that's definitely why QA and development should be one team.

37:42 - Absolutely.

37:43 All right, well, a good joke nonetheless.

37:46 A good pie joke.

37:46 - Thanks.

37:47 Well, this was lovely today, so thanks for talking with me.

37:50 - Yeah, absolutely.

37:51 Thanks, as always, great to chat with you.

37:53 See you later.

37:54 - Bye.

37:55 - Thank you for listening to Python Bytes.

37:56 Follow the show on Twitter @pythonbytes.

37:58 That's Python Bytes as in B-Y-T-E-S.

38:01 And get the full show notes at pythonbytes.fm.

38:04 If you have a news item you want featured, just visit bythonbytes.fm and send it our way.

38:09 We're always on the lookout for sharing something cool.

38:11 This is Brian Okken, and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page