#178: Build a PyPI package from a Jupyter notebook

Published Wed, Apr 22, 2020, recorded Wed, Apr 15, 2020

This episode is brought to you by Digital Ocean: pythonbytes.fm/digitalocean

YouTube is going strong over at pythonbytes.fm/youtube

Michael #1: Python String Format Website

by Lachlan Eagling
Have you ever forgotten the arguments to datetime.str``f``time()?
Quick: What’s the format for Wed April 15, 10:30am?
I don’t know but the site says '%a %B %H, %M:%Sam' and it’s right!

Brian #2: Pandas-Bokeh

Suggested by Jack McKew
“Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.”
“With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling: df.plot_bokeh()"
You can also switch the default plotting of pandas to Bokeh with pd.set_option('plotting.backend', 'pandas_bokeh')
This interface looks a lot easier to me, instead of frames and plots and shows and such.
Lots of options, and all collected in parameters to the plot call.
Can also export a notebook or a standalone html file.
Plus, the combined install of pip install pandas-bokeh pulls in everything you need.

Michael #3: NBDev

nbdev is a library that allows you to fully develop a library in Jupyter Notebooks, putting all your code, tests and documentation in one place.
That is: you now have a true literate programming environment, as envisioned by Donald Knuth back in 1983!
This seems to be a massive upgrade for notebooks and related tooling
Creates Python packages out of a notebook
Creates documentation from the notebook
Solves the git perma-conflict issues with git pre-commit hooks
Use #export to declare a cell should become a function in the package
Manages the boilerplate issues for creating Python packages (setup.py, etc)
Makes testing possible inside notebooks
Navigate and edit your code in a standard text editor or IDE, and sync any changes automatically back into your notebooks (reverse basically)
Follow getting started instructions.
Docs render slightly better at nbdev.fast.ai

Brian #4: Stop naming your python modules “utils”

Sebastian Buczyński, @EnforcerPL
Lots of projects, public and private, end up having a utils.py.
“utils is arguably one of the worst names for a module because it is very blurry and imprecise. Such a name does not say what is the purpose of code inside. On the contrary, a utils module can as well contain almost anything. By naming a module utils, a software developer lays down perfect conditions for an incohesive code blob. Since the module name does not hint team members if something fits there or not, it is likely that unrelated code will eventually appear there, as more utils.”
one occurrence of misbehavior invites more of them
- I have seen this in action. I’ve put 2-3 hard to classify methods, but used in lots of modules, into a utils.py, only to come back in a few months and see a couple dozen completely unrelated methods, now that the team has a junk drawer to throw things in.
Excuses:
- It’s just one function
- There is no other place to put this code
- I need a place for company commons
- But Django does it
Instead:
- Try naming based on role of the code or group functions by theme.
- If you see a utils.py crop up in a code review, request that it be renamed or split and renamed.

Michael #5: Scalene

A high-performance, high-precision CPU and memory profiler for Python
It runs orders of magnitude faster than other profilers while delivering far more detailed information.
Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less).
Scalene is precise. Unlike most other Python profilers, Scalene performs CPU profiling at the line level, pointing to the specific lines of code that are responsible for the execution time in your program.
Scalene separates out time spent running in Python from time spent in native code (including libraries).
Scalene profiles memory usage. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator.
- Requires special install, not just pip (see brew install instructions for the docs)
Scalene profiles copying volume, making it easy to spot inadvertent copying, especially due to crossing Python/library boundaries (e.g., accidentally converting numpy arrays into Python arrays, and vice versa).
See the performance comparison chart.
Would be nice to have integrated in the editors (PyCharm and VS Code)

Brian #6: From 1 to 10,000 test cases in under an hour: A beginner's guide to property-based testing

Carolyn Stransky, @carolynstran
Excellent intro to property based testing and hypothesis
Starts with a unit test that uses example based testing.
Before showing similar test using hypothesis, she talks about the different mindset of testing for properties instead of exact examples.
- Like not the exact sorted list you should
- but instead,
  - the length should be the same
  - the contents should contain the same things, for instance, using set for that assertion
  - you could element-wise walk the list and make sure i <= i+1
She walks through the hypothesis decorators to come up with input and shows how to use some.lists and some.integers and max_examples
Goes on to discuss coming up with properties to test for, which really is the hard part of property based testing.
Checking for expected exceptions
Using a naive method technique, useful in property based testing, to compare two versions of a method. This is super useful for refactoring and testing new vs old versions on tons of input data.
json5 lib

Extras

John Conway, inventor of the Game of Life, has died of COVID-19
GitHub is now free for all teams (and individuals)
- including 2,000 Actions minutes/month
- unlimited collaborators, even on private repos
GitLab has a similar free tier
PyCon US 2020 Online
- Lots of talks already up, more on the way.

Joke

PyJoke delivers:

How many QAs does it take to change a lightbulb? They noticed that the room was dark. They don't fix problems, they find them.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 178, recorded April 15th, 2020. I am Brian Okken. I'm Michael

00:12 Kennedy. And this episode is brought to you by DigitalOcean. Who's first? I think I got my notes

00:17 wrong. Yeah. Well, I want to talk about something really quick before we actually get to the first

00:21 one. So we'll see. Okay. I just want to tell people about the YouTube channel. And obviously,

00:26 if people are watching on YouTube, they might know about the YouTube channel.

00:29 But most people subscribe to our podcast and we are multicasting and repurposing what we're doing

00:35 here on YouTube. We talked a little bit about it last time. So basically, each individual item

00:39 is now a separate YouTube video. And you can watch Brian and me talk about it if you want to consume

00:46 in that format and have a little bit of video and admire Brian's awesome shirts because he's got a

00:51 bunch he's going to be wearing throughout these different shows and it's going to be awesome.

00:54 Oh, you didn't have to set it up like that, man. I only have like one good shirt.

00:58 People loved the shirt for the first video we shared. That was like several comments about,

01:03 dude, your shirt is awesome.

01:04 Yeah. Go figure. Okay. So we're trying to teach you about Python also, but...

01:10 That's right. And fashion.

01:11 Shirts.

01:11 And fashion.

01:12 Yeah.

01:12 Yeah. It's pythonbytes.fm/YouTube. People could check that out.

01:15 Well, tell me about strings, Python. I mean, Michael.

01:19 I'll tell you about Python strings. You know what? Strings are confusing, man.

01:22 Especially when they're about numbers and dates, especially dates. So this seems to be like a

01:30 problem that vexes me permanently. And it's, you know, we talked about is programming Googling,

01:36 right? Like in our consensus was, you know, maybe in the early stages of your career, there's a lot of

01:42 Googling, but no, not really. You mostly just sit down and you think about the problems and you write

01:47 the code and you evolve the code. Like there's a lot of reading code before you actually do much

01:52 writing anyway.

01:53 But this topic in particular, I'm all about Googling this all the time. So Python has a

01:59 datetime.strptime for string pars time. You give it some piece of text like Wednesday, April 15th,

02:09 comma 10 colon 30 a.m. without a space. I want to take that and turn it into a datetime so I can

02:16 maybe compare it to something else, right? Like another time. How many days is that from now? Is

02:20 that in the future is in the past. I just need to store it in the database as not a string,

02:24 but a datetime because I want to order by it. I don't want it to be alphabetical, right? There's

02:27 all sorts of reasons you need to get a datetime from strings or go in the reverse. And yet the format,

02:34 you know, that's strptime has a, it has a format string that tells it how to look at the string and

02:42 then pull the pieces out. So would you know about that example I told you about like the Wednesday,

02:48 April 15th, 10 30 a.m.? That's definitely something I Google every single time.

02:51 Every time. And it's never quite right. So just for those of you listening, you really want to know

02:56 it's percent a space percent capital B space percent capital H comma space percent capital M colon percent

03:03 S a.m. Woo. Who would ever come up with that?

03:06 Well, I mean, these are intentionally short.

03:09 Yes, I know.

03:09 So that like they don't take up too much room, but they're and they sort of make sense. It's just

03:14 it's arbitrary, whether it's a capital Y or a lowercase y or capital D or lowercase D.

03:19 Right. And there's documentation you can go find. Like if you want the three letter date day of the

03:24 week, that's a percent a lowercase and whatnot. But putting that all together can be tricky.

03:28 So what I want to tell you about is this website by Lachlan Eaglin. And it's let me see what the URL is.

03:36 It's high stir F time dot com, which, of course, is linked in the show notes.

03:42 And the idea is you put the text you want to parse like Wednesday, April 15, April 2020 at the time or

03:51 whatever. You put that in there and you hit go and it will tell you that complex string that I told you

03:56 was the right answer to my parsing problem.

03:58 Yeah, it's awesome.

04:00 Yeah, you just put in like the output that you want it to have happen and it tells you the magical

04:06 incantation.

04:07 Right, right. Or the format of the thing you want to parse. And depending whether you're trying to go

04:11 to that string or from that string. But yeah, super, super handy. This bad boy is bookmarked for

04:17 me for sure, because this is way better than Googling. I can put it in there. It gives me a

04:22 quick, quick answer. I can throw it into a Python REPL and see, hey, did it work or did it not work?

04:27 It's really easy.

04:28 Yeah, nice.

04:29 Yeah. So not super complicated, but very handy. So people can bookmark that and try it out.

04:34 Well, I want to tell you about something easier. Also, I got to kind of thank Jack.

04:40 Jack McHugh has sent us a few suggestions and they're usually pretty darn nice. And here's this

04:45 one's from Jack. Pandas dash bokeh. Bokeh? Oh, I forget how to say that.

04:51 I love the logo. Pandas bokeh. I say bokeh. I don't know. You know, but it's like that F stop

04:57 difference where like the person in a portrait is like crisp, but the background is faded.

05:01 Yeah. And the logo is pandas clear bokeh, like in the background faded. It's beautiful.

05:06 It's a pretty cool logo. So I'm going to quote some from their website or the readme. It says,

05:12 pandas bokeh provides a bokeh plotting backend for pandas, geopandas, and PySpark dataframes,

05:19 similar to the already existing visualization features of pandas. Importing the library adds

05:24 a complimentary plotting method plot underscore bokeh on dataframes and series. Okay. So that's,

05:32 I mean, it's already built in and all it provides is plot bokeh, another function on it. What's the

05:38 big deal? Well, it's so cool. It's so easy. And I was, I tried out some of these examples this

05:44 morning and it's just a little tiny bit of code and you call, like you've got a data frame and you

05:50 call plot bokeh on it and it pops open like an interactive graph that you can look at everything.

05:56 It's actually pretty incredible. You have to do something a little different. You can plot bokeh,

06:01 but if you want the normal plotting to do the same awesome stuff that it's built in, you can set an

06:07 option, one of the pandas options to switch out the plotting backend. So that's neat. So apparently

06:14 what it's really doing is switching out the backend. And to me, I mean, it's plotting is not terribly

06:20 difficult, but this interface, at least for me, it makes it a lot easier instead of having to work

06:26 with frames and plots to just call this thing. And then all the different options you can have,

06:31 you can, you know, different point, you know, want it to look like an asterisk instead of a point

06:36 or something, all other different color, different scale or different titles. All that stuff is options

06:42 you can pass into the plot function. And the other thing that I, that I like a few more things.

06:47 One of them is you just, when you're pip install pandas dash bokeh, it pulls everything in because

06:54 it's all the rest of the stuff is dependent on it. So you get all of it just for a simple install.

07:00 And it also generates a general, it's able to do this interactively, but you can also generate

07:05 notebooks. Yeah. Yeah. You can generate notebooks and you can also generate standalone HTML files

07:11 with this in it.

07:12 This is really cool. And yeah, the fact you can generate standalone HTML, there's probably ways

07:17 to plug it into Flask sites, you know, Python websites and whatnot, pretty straightforward.

07:22 And the interactive bit is super nice. I mean, this is not about pandas interaction per se. This is just

07:29 bokeh, right? Being very cool and interactive, but you can zoom, you can pan as you move around,

07:35 you know, like it'll show you the marks on the graph and you can hide, you know, and sort of hide and show

07:40 elements. And there's even a cool example where they're showing the stock price of Apple versus Google.

07:46 And as you put the cursor along, it has the Apple logo next to Apple and the information,

07:52 like a little like card that talks about it over time, man, this is nice stuff. And all you got to

07:57 do is point it at a data frame. Not bad.

08:00 Yeah. And they've got a whole bunch of examples on there that GitHub repo with a bunch of working

08:04 examples too. Obviously for the examples, the data is just sort of random data that they're

08:09 throwing in there. But you know, once you know how to get your data, this does the rest of the work

08:14 for you. So it's cool.

08:15 Very cool. Yeah, it's just a great one. And thank you, Jack, for recommending it. And yeah,

08:19 it's a good one, Brian, for pulling it out.

08:20 We've had DigitalOcean as a sponsor for a while, and we really want to thank them. They've really

08:24 helped us out a lot. And they're plus, they're pretty darn cool. So thank you, DigitalOcean for

08:29 sponsoring this episode and many others. And in the past, we've told you about a lot of awesome things

08:35 with DigitalOcean, like their one-click install Kubernetes cluster support, their amazing new support

08:42 center and help documentation that's been around for a while. And our podcast runs on DigitalOcean,

08:47 and we're thrilled with it. And so if your business or your side project deserves great

08:52 hosting and that will grow with you and let you scale affordably, I really definitely want you to swing

08:59 by pythonbytes.fm/DigitalOcean to grab the $100 credit for new users. But there's something else

09:06 I want to tell you about DigitalOcean that's really cool. They've got something they've started recently

09:11 that's called Hub for Good. And it's designed to support COVID-19 relief efforts where DigitalOcean

09:19 through this is supplying $100,000 in infrastructure credits for new not-for-profit projects.

09:26 They're also giving $50,000 to COVID-19 relief fund, their own relief fund, but still it's really cool.

09:34 And they're also trying to raise awareness for COVID-19 related projects and provide learning for

09:41 developers and also provide visibility for these projects. And so I headed over there this morning

09:47 and checked it out. And there's a bunch of cool projects starting out that are related to COVID-19.

09:53 It's not just this sort of stuff, but it's things like there's even a platform to help teachers

09:58 interact with students during quarantine. A lot of cool projects through this. So thank you, DigitalOcean.

10:04 Yeah, this is a great project. And obviously the infrastructure is great and we love it,

10:08 but this is very cool too. I didn't know about this.

10:10 Yeah, it's pretty neat.

10:11 Yeah. So speaking of not knowing, I feel like I've been kind of exploring the cave of Python,

10:18 which is large and vast. And I just come on like a whole nother area. I'm like, it opens up like,

10:24 what is this? How have I not known about this? And this is NBDev. Have you heard of NBDev?

10:30 No.

10:30 Yeah. Okay. So let me tell you about it and I'll get your impressions later. So NBDev takes

10:37 notebooks and basically makes them on par with writing proper Python packages and solves all

10:46 these different problems. It lets you generate what's got to be some of the best documentation

10:50 period for that library that is sort of backed by a notebook. So it lets you develop like full Python

10:59 packages and libraries and notebooks where you can have all your code, your unit tests and your

11:04 documentation all in one place, but then you can take it and pip and you can upload it to pip and

11:09 make it a pip and solve a library that people have no idea that it came from a notebook.

11:13 Wow.

11:13 Is that crazy or what?

11:15 That's awesome. I got to check that out.

11:16 Yeah.

11:17 Yeah. And you know, you think about this idea of notebooks and to me, notebooks like burst on the

11:21 scene in the 2010-ish era, maybe 2012, 2011, like that timeframe. But this project references

11:30 this concept envisioned by Donald Knuth way back in 1983. And it says notebooks finally made literate

11:37 programming, this concept by Donald Knuth, a thing. So, you know, the old is new again, but in a really

11:43 cool way. And to me, this seems like just such a massive upgrade to notebooks. So notebooks have a

11:49 bunch of challenges in my view. Like I can't use a proper editor with it. Like if I don't use PyCharm or

11:54 VS Code and all of its navigation and it's cool, get blame and like history and just like all this

12:00 stuff is just not present, right? Documentation. I think that actually it really works well there,

12:06 right? But it's, it doesn't tie the documentation of the notebook to like parts of functionality that

12:12 might be created by the notebook, which is cool. One of the biggest problems with notebooks,

12:16 it's a benefit, but it's a big problem is if you run a notebook, it stores the output in the notebook.

12:22 So if you had like a bokeh plot or you had like a print of a data frame, that is in there and now it's

12:29 part of it. So if I'm working on a project and you're working on the same project and we both run the notebook

12:35 at different times or the same time, but separately, and it for some reason generates different results,

12:41 that's a merge conflict in Git, right? So basically you cannot use notebooks in like a sane way with Git

12:49 because anytime you work with it, if you're not careful and like don't remove all the output before

12:54 you save it, it's going to be a merge conflict. So this project has a Git pre-commit hook that will

13:01 remove that problem. So right before it gets committed, it'll automatically do the cleaning of that

13:06 metadata output. So it'll never have that as a conflict. It also has an ability to like a CLI go

13:12 just accept it. I just accept all the metadata changes. Mine are just right or whatever, right?

13:19 So it also has a CLI to automatically fix that. But if you do have those problems, but it also has this

13:24 pre-commit hook to avoid them entirely.

13:25 Nice. That's a nice use for pre-commit too.

13:27 Yeah. It's super, super clever. So if I write a function in the notebook, I can put hash export

13:32 in that cell and that becomes a public function in the package.

13:37 Oh, cool.

13:37 Right. So I write like documentation and pictures and I would say hash export. Now that's part of my

13:43 library that I'm building. It also lets you create the structure for Python packages. So you have like

13:48 the setup py and you can do the build wheels and whatnot automatically out of that. And it uses this

13:55 exported stuff. You can have your unit test in your notebook, which is pretty cool for the things that are in

14:01 there. And then finally you can edit it. You can take the edited library or the library that exported,

14:06 sorry, and then edit it with PyCharm or VS Code and then reverse export it. So what you can do is like say

14:14 push the changes that I've done with my editor back into the segments of the notebook where that code came from.

14:19 Oh man. Okay. I'm a little confused, but I got to try it out.

14:23 Yeah. You got to kind of read through it to get the sense, but there's just a bunch of stuff going on. Like all these things seem like,

14:28 yes, you should have been able to do that with notebooks, but obviously, right. That's not their origins, right?

14:32 They can't do everything at once, but all of these things seem awesome to me.

14:36 Yeah. Yeah. So in order to get started, it's going to basically create a Git repo for you is my understanding,

14:41 either on GitHub or GitLab. So you got to follow the getting started instructions and then you click a button and it'll like generate the

14:48 repo in the right structure, or you can use the CLI tooling to generate like the right repo with things like the Git commit prehooks and whatnot.

14:56 And if you're going to read the docs, check out nbdev.fast.ai.

15:01 Cause this comes from a fast AI people, the same one as the build, the FastAPI framework.

15:07 So some of the docs render better. There's certain things on GitHub that like it says, and here's a cool picture.

15:13 And it's just like source code. It's not quite right. So, so maybe check out the final link at the bottom in this section to get to,

15:21 if you're going to like browse through it, but it's basically a, you get the same thing out of GitHub.

15:25 Anyway, this to me seems like a massive improvement for notebooks and sort of brings them more into,

15:33 I can do things like, for example, you can now have your notebook and its tests running as part of continuous integration.

15:41 Like, so these networks are now like full participants in CI, CD, you can upload, you can like create packages and put them on pipe.

15:49 Yeah. There's all sorts of neat stuff. The documentation, like if you have a cool graph as part of your notebook,

15:55 that can become the documentation on pipe.

15:58 I or read the docs for those functions. I mean, it's crazy cool.

16:02 How, how this is like taking some of the awesome parts of notebooks, like the doc side and turn that into the help docs.

16:09 And then also letting you export the functionality still as a proper CS type thing.

16:13 Yeah. I definitely got to check this out.

16:15 How did I not even know this existed? Like, this is awesome.

16:18 Well, I don't know how long, I mean, it looks like, it looks like five months to me is my guess.

16:21 Okay. So we're not that behind the ball.

16:23 No, we're not that behind. Yeah.

16:25 But this looks neat.

16:26 Yeah. It's very neat.

16:27 Plus Fast AI is pretty cool. So I think this is probably pretty solid.

16:30 Yeah, I agree. It's definitely got some solid people behind it. So very cool. Very cool.

16:34 Anyway, NB Dev, quite neat.

16:37 I want to talk about something a little not neat, a little lighthearted. So this is a sort of a serious topic, but this is a article from Sebastian entitled Stop Naming Your Python Modules Utils.

16:52 And I don't think we've, I don't know if we've covered it before, but it's good advice. And it's something that happens. Basically, a lot of projects, public or private, will at some point end up having a utils.py or a utils package or something.

17:06 And this article is just saying, resist the urge. Utils is arguably one of the worst names for modules because it's very blurry and imprecise. Some of the names did not say what such a name does not say what the purpose of the code inside is.

17:22 And on the contrary, utils module can well contain almost anything. By naming a module utils, software developer lays down perfect conditions for an incohesive or uncohesive whatever code blob.

17:37 And I have definitely seen this in action. I have been one of the culprits before of having a pulling out a little helper function that I had in one file.

17:49 And I wanted to use it in a different module. So I didn't know where to put it. So I stuck it in a utils.py, added a couple more. So there's just a few methods.

17:58 And I come back six months later and there's like a couple dozen just junk drawer functions from all over the place in there.

18:07 So if you start, people will add junk to it. So Sebastian lists a few excuses. It's just one function, but it grows.

18:15 There's no other place in the code to put it. Well, try harder. And I need a place for company comments.

18:21 I don't even really know what that means, but name it company or something. And also Django does it.

18:27 Well, I don't know if you're a, well, maybe they shouldn't have, but they have it now, so they're not going to change it.

18:32 So the advice is to try name, try grouping your utility functions and naming them based on the role of how you're going to use it, or possibly group them in themes.

18:44 And also, if you see a utils.py crop up in a code review, just request that the person rename it to something else, if possible.

18:54 Just set up a CI rule to break the build if you see that file name.

18:59 Yeah. So what are your thoughts on this, on the utils?

19:01 See, I agree with Sebastian. Absolutely.

19:03 I understand the challenge because naming things in software is hard, but naming things in software is super important.

19:12 Because when you think about even just function names or class names or whatever,

19:19 usually what will happen is they'll get like a crummy, vague name and then a comment describing what they are doing.

19:25 And you're like, well, why don't you just make the name a little bit longer that says what it does?

19:30 And utils is kind of like the generic catch-all of saying like, well, I couldn't come up with a name.

19:35 So here it is.

19:36 We're just going to drop it here.

19:39 And in my code, I have like tons of different areas of which I organize it, you know, sort of like sub modules, I guess, if it's a, or sub packages, if it's a package, but not, sometimes it's not technically a package.

19:52 And I try to come up with names that are meaningful, right?

19:55 Like I have something called number converter that will like try to parse an integer or return a default value instead of throwing an exception or it'll try to parse some other thing.

20:04 Or maybe it's called conversions.py or whatever, but it's not like utils, right?

20:08 Like there's, there's usually some kind of a better structure you can find that will help you do this.

20:15 But, you know, there's that joke that, you know, naming things in computer science, that's one of the hardest problems, right?

20:20 And I do agree with that, but yeah, it's, it's worth the effort when you get it figured out.

20:25 If you don't believe me, you can just try it sometime.

20:27 If you're working on group project, just put one function in utils and you will see it grow.

20:32 And you'll have to find it.

20:33 Is this like the broken window theory of software?

20:37 Yep.

20:38 And MISC doesn't count either.

20:40 If you'd name it MISC, it's just as bad.

20:42 That's right.

20:43 Yeah.

20:44 There's probably some synonyms here in the code world that don't count.

20:48 So yeah.

20:48 Awesome.

20:49 I want to tell you about this one next that helps with performance or understanding that performance more specifically of your code.

20:56 So I don't know if you, how much profiling you guys do your work.

21:00 How much does performance matter to you guys?

21:02 It matters a lot.

21:03 Yeah.

21:04 Yeah.

21:05 I'm building things that go into testing in a production line.

21:09 So every millisecond that it takes, takes a millisecond longer to get something shipped.

21:15 So yes.

21:16 It matters.

21:17 Yeah.

21:17 It matters.

21:18 I'm supposed to mostly spend my time on the web and obviously it matters there, right?

21:22 Like every hundred milliseconds.

21:23 I think Amazon measured is like 1% loss of orders or something ridiculous like that, right?

21:29 Like, so understanding your performance is good.

21:32 We've had good, good in quotes, profilers for Python.

21:37 And they typically tell you about this function spent this much time.

21:42 But another challenge is my program is using too much memory or worse.

21:47 It's something long running like a web app or some background process.

21:50 And it's like growing.

21:52 It's like sort of leaking memory.

21:55 Why is that?

21:57 So I came across this project called Scaling, which is a high performance and high precision CPU and memory profiler for Python.

22:06 Cool.

22:07 Yeah.

22:07 So it lets you either analyze CPU time or it actually lets you on a line by line basis say, here's some memory.

22:15 What line made this and where is it coming from?

22:18 Yeah.

22:19 And so that's cool.

22:20 But one of the challenges for profiling is when you're profiling your code, you can make it, you don't get the same behavior.

22:28 It's sort of like the Heisenberg uncertainty principle, right?

22:31 It does one thing, but when you measure the profiler, you've changed it.

22:35 So now you kind of got to say, well, that part where it was the network, that was 50%, but now you made the computational bits way slower.

22:43 So that network part looks just like 20, right?

22:45 Like you're affecting it.

22:46 So for example, if you use profile, the built-in profile, it can make your code 30 times faster or a simple scenario than running it normally.

22:56 But you can use C profile, which is the C based one that's built in.

22:59 It only slows it down by 1.65 times.

23:03 So that's not too bad.

23:04 There's a line profiler that's 11 times slower.

23:07 And there's a whole bunch of other ones.

23:08 There's a memory profiler that's like over a thousand times slower.

23:12 So the scaling thing has a nice comparison to all these things.

23:17 It says, well, how does scaling do?

23:19 And it claims that it's got this built-in library that's much faster.

23:23 So for CPU stuff, it's 1.04 times the speed.

23:26 So like 4% slower.

23:28 And it does that through sampling, right?

23:31 It doesn't do instrumentation.

23:32 It doesn't rewrite the stuff.

23:33 It actually just asks frequently like, hey, where are you in the code?

23:36 But it still gets per line analysis of that, which is pretty cool.

23:40 And then the memory one is like another 10% slower because analyzing memory is hard.

23:45 But yeah, there's all sorts of cool stuff you can do with it.

23:49 The overhead is not too bad.

23:51 The precision is pretty good.

23:52 So like I said, it gives you like line by line level of how much time you're spending in various places.

23:59 It also is interesting in that it separates out the time spent running Python code from native code,

24:04 including like the base libraries and stuff.

24:06 So you can say like, I can only affect the Python stuff.

24:11 The other stuff is not a thing I can deal with.

24:14 So yeah, don't tell me about it or punish me for it.

24:18 Or maybe I do want to look at it, right?

24:19 Tell me about that.

24:20 So that's pretty cool.

24:21 And then also the memory stuff I think is pretty cool.

24:25 So it says it points to specific lines of code responsible for memory.

24:28 Memory growth.

24:29 And it's important.

24:31 It does this through a special memory allocator thing that comes with it.

24:35 And so while you can pip install Scalene, you can't inspect the memory allocation that way.

24:41 You have to go and install it directly and do some more setup.

24:44 On macOS, you can do brew install.

24:46 There's instructions in there on how to do that.

24:48 On other OSes, I have no idea what you do.

24:50 But you can't run the memory allocation directly.

24:54 You can't just say pip install it and then do the memory allocator.

24:57 There's some other lower subsystem that has to get installed for that to work.

25:00 Yeah.

25:00 And memory is an interesting one because it's a difficult one to chase down with Python.

25:06 Yeah.

25:06 It's very hard in Python because everything is a pointer.

25:09 Everything is an indirection.

25:11 It's not like, well, here's the block where we allocated this object or whatever, right?

25:16 Like it's pretty indirect.

25:18 And you don't typically have a hold of pointers in the memory address sense of it like you do in C or something, right?

25:25 So yeah, it's challenging.

25:26 I would love to see this integrated into PyCharm and VS Code.

25:30 Oh, yeah.

25:31 Right now, it just gives you a cool tabular text output or file output.

25:37 But if you could just right click in PyCharm and say, analyze with scalene, that'd be sweet.

25:41 Yeah, I wonder.

25:42 And also, that would solve some of the install thing.

25:45 So if you have to install it separately, some integration with PyCharm VS Code would be cool.

25:50 Right.

25:50 Like right now, you can do profiling.

25:52 And it's really awesome in PyCharm.

25:54 But I'm pretty sure it uses C profile.

25:55 So yeah, who knows?

25:57 Someday, baby.

25:58 Hey, while we're talking about editors, I don't know about VS Code.

26:01 But I do know, backing up a little bit, I do know that PyCharm does open notebooks okay.

26:06 Awesome.

26:07 Yeah.

26:07 Just back there.

26:08 Anyway.

26:09 Yeah, yeah, nice.

26:09 I want to tell you a little bit about testing.

26:12 Awesome.

26:12 I'm really surprised that you're covering this.

26:15 But okay, yeah, go ahead.

26:15 Yeah, it's interesting.

26:19 Lately, you've been covering the testing articles.

26:21 I know.

26:21 Isn't that my role now?

26:23 No, go ahead.

26:24 This is great.

26:24 Tell us about it.

26:25 Yeah.

26:25 This is a person named Carolyn that wrote an article called From 1 to 10,000 Test Cases

26:31 in Under an Hour, A Beginner's Guide to Proper...

26:35 That's productive.

26:35 And imagine if Carolyn was getting paid by the test, right?

26:39 Like, we're evaluating your bonus for the year.

26:42 Like, I wrote five times as many tests as anyone else, and I just started this month.

26:45 Heck yeah.

26:47 I would totally use...

26:48 If I was paid by the test case, I would definitely use Hypothesis on every project.

26:52 All right.

26:54 So how did she do this?

26:55 What is this property-based testing?

26:57 Okay.

26:57 So hopefully people have heard of property-based testing, but it is...

27:00 So the...

27:01 It's as opposed to, like, what do we call it?

27:05 Example-based testing.

27:07 So...

27:07 And this is kind of how she goes through this discussion.

27:10 It's...

27:11 The article is really just a really excellent introduction to property-based testing and

27:16 using Hypothesis.

27:17 And it's...

27:18 I mean, she's using Hypothesis in the example, but the intent is just property-based testing

27:23 because you can...

27:24 It's the same sort of strategy with every other type of property-based testing library.

27:30 She just happens to be using Hypothesis and Python.

27:32 So that's nice.

27:34 But the...

27:34 She starts off with a unit test example of just doing...

27:39 She has, like, a string sort or a...

27:41 Not a string sort, but a...

27:42 List sort.

27:42 A list sorting thing.

27:44 And if you were doing example-based testing, you just pick a few example tests.

27:49 Example test cases where you would take the input and you know what the sorted output should

27:55 look like and you, you know, run it through the function and make sure the output sort that

27:59 it's equal or equal to the expected one.

28:02 How would you do this with property-based testing?

28:04 And before she goes in...

28:06 And she does give an example of how to write some sort of test like that in property-based

28:11 testing.

28:11 But she stops and pauses and talks about kind of the different mindset.

28:15 You can't test against an exact example because you don't know what example is coming in.

28:20 So you have to think about property.

28:22 So like on a list sort thing, you don't have the exact answer, but you could check to make

28:27 sure that the link should be the same and that you can use sets on both the input and output

28:33 to make sure that the contents of the both are identical.

28:36 And then you can go through the answer and make sure that element-wise, every element i is

28:42 less than or equal to i plus one.

28:44 You know, there's ways to test sort without, you know, without just knowing the answer.

28:49 But it takes a mind shift a little bit.

28:51 And I think actually that's one of the benefits of property-based testing is thinking in terms

28:56 of that also.

28:57 I also think it's nice that she talks about how this isn't a replacement for example-based

29:03 testing.

29:03 It is a complement to it.

29:06 And so you can mix them together.

29:08 Then she goes on to introduce some of the aspects of hypothesis.

29:12 Like there's some cool strategies, like some lists and some integers and being able to set

29:19 the max examples to, so you can set how many.

29:22 And that's where you can just set it to 10,000 and wham, you have 10,000 test cases right away.

29:27 But, and just let hypothesis come up with the examples.

29:31 The real meat of the article, which I really appreciate is just the, how do you, the hard

29:36 part of property-based testing isn't the, some of it's the syntax and she does cover the syntax

29:42 and how to get this done.

29:42 But it's also just how to think about the properties, how to, the coming up with what properties to

29:48 test for is the hard part.

29:49 And so taking a little time to talk about that, I think this is a, is a great thing.

29:54 I'm also glad she threw in that one of the things you could, you should check for with

29:59 tests, property-based testing is making sure exceptions that get raised are expected exceptions.

30:05 So if you throw garbage in or different cases that don't make sense, you should know what

30:11 kind of exceptions are going to come out and that this can be caught with your tests with

30:15 hypothesis.

30:16 And then also a great use for all of this is to implement whatever functionality you wanted

30:22 in a very simplistic, but possibly slow or memory hoggy way or something.

30:27 And then you can compare the elegant version and the slow version within the tests to make

30:33 sure that they come up with the same answer.

30:35 This is also great.

30:36 If you're doing a refactoring, you can refactor part of your system and make sure that the

30:41 old and new way act the same.

30:43 So it's just a good introduction to all of this.

30:46 Yeah.

30:46 And a property-based testing is it's you're right.

30:49 It's such a mind shift and it's, I don't know, I haven't fully embraced it yet, but I feel

30:55 like there's probably some places where it would really be interesting and useful.

30:59 And I probably should just look into it.

31:01 You know, I don't know, I get stuck in my ways and then I just, I keep going that way.

31:04 At the end, she talks about if you're not using Python, what options you have as well, which

31:09 is kind of cool.

31:10 Right.

31:11 So it's like, hey, hypothesis is cool in Python.

31:13 But if you're on TypeScript, we got fast check.

31:15 We're on .NET.

31:16 They don't have dashes or A's or T's.

31:19 So there's FS check.

31:20 And in Java, there's this and C++ and Rust and so on.

31:24 So yeah, if it looks like you could use the same thinking and ideas across different parts

31:31 of your stack, if you're having different technologies in there.

31:33 This is another example of if it shows up in every language, it's probably something you

31:39 should be paying attention to.

31:40 So that's a really, that's a good rule of thumb.

31:43 It's like, yeah, if I see it all over the place, right, this is a general CS sort of thing

31:48 that's important.

31:49 Yeah.

31:50 Yeah.

31:50 You know what else I like about going through stuff like this is you come across things

31:54 that you didn't know about, right?

31:56 For example, you'd think that I would know about JSON.

31:58 It seems pretty simple, like the JavaScript object notation.

32:01 But apparently there's like a JSON 5 as well, which allows things like comments and whatnot

32:08 and multi-line strings and single quotes and elements that are not quoted for the keys

32:14 and so on.

32:15 And there's a whole cool library for JSON 5 support as if you want to have like a

32:19 a little bit more human-friendly JSON.

32:22 I had no idea that was a thing.

32:24 Yeah.

32:24 Neither did I.

32:25 And I was just like, why can't I put a comment in JSON?

32:27 This is driving me crazy.

32:28 So what I do is I have like a field that says comment or like double slash in quotes.

32:33 And then I have the string that is the comment because you can't actually have comments, but

32:37 you can have ignored keys and values.

32:40 So that's how I have comments in my JSON.

32:41 But anyway, she talks about using the JSON 5 library that's part of Python to support that.

32:46 Or not.

32:46 It's not built in, but it's a Python library.

32:48 You can use it to do that.

32:49 Pretty cool.

32:49 Yeah.

32:50 Nice.

32:50 Cool.

32:50 Yeah.

32:51 Well, I guess that's it for all of our items, huh, Brian?

32:52 Yeah, it is.

32:53 Got anything extra for us?

32:55 Yeah, I totally did.

32:56 But you nabbed it and put it in your section.

32:58 So go for it.

32:59 Tell us that you found a bunch of cool things there.

33:02 Yeah.

33:02 I want to get this one out of the way first.

33:05 Some sad news.

33:06 Have you, you've heard of Game of Life, right?

33:08 Yes.

33:09 Yeah.

33:09 Conway's Game of Life.

33:10 Yeah.

33:11 Conway's Game of Life.

33:12 Well, Conway, John Conway is, I'm going to link to an article that's a nice article talking about the Game of Life and John Conway.

33:20 But just an announcement that he is one of the victims of COVID-19, died from it recently.

33:27 So that's sad.

33:28 Yeah, it's definitely sad news.

33:30 Game of Life is kind of an excellent thing to have in the computer science realm.

33:34 Pretty neat.

33:35 So that's sad.

33:36 Something that's happy is GitHub is now free for all teams and individuals.

33:42 So that's a pretty cool announcement.

33:43 That's really awesome.

33:44 Yeah.

33:45 So previously you had to pay to have collaborators on a private repo.

33:50 I think maybe you could have some, but not a ton for private.

33:53 I can't remember.

33:54 Three, I think like that.

33:55 Yeah.

33:55 It's like evolving.

33:56 First you had to pay for private repos, then you didn't, but then you had to for collaborators.

34:00 And yeah, but that's awesome.

34:02 So it's much more free.

34:03 And then also for people who still pay GitHub, like me, it's half price.

34:08 It's 40.

34:09 It's, I don't know, whatever four divided by nine is.

34:12 It's now 44% of what you're paying before.

34:14 And people wonder like, why would you pay for GitHub organizations?

34:18 If you have an organization, so like Talk Python and the related training authors and content,

34:24 there's like a GitHub organization for Talk Python.

34:27 Have people collaborate on that.

34:30 You still have to pay, but it was $9 a month per user.

34:32 Now it's $4 a month per user.

34:34 So that's also bonus.

34:35 Yeah.

34:36 Pretty cool.

34:37 Yeah.

34:37 That's happy.

34:37 Yeah.

34:38 So last thing I wanted to bring up is that the PyCon US 2020 online is now live.

34:45 So there's a welcome video and more.

34:49 There's some talks linked and there's more on the way.

34:51 Yeah.

34:51 There's a nice welcome video from Emily Morehouse that she basically kicks off the virtual conference.

34:57 And this conference, I don't know if that's the right word for it.

35:00 This thing, this event is not like a lot of online virtual conferences.

35:06 Like on Saturday, we're all going to meet.

35:08 And then the talks are going to be these three hours and whatnot.

35:10 It's like, it's like a rolling release of information and videos that then you get to consume over the next couple of weeks.

35:17 So yeah, you're linking to the, basically the landing page for like stuff as it happens.

35:22 Right.

35:22 Yeah.

35:23 And I recommend, so also recommend checking out the, so if you go to any of the, like the welcome video,

35:29 and then go up and find the PyCon US 2020 top page and look at the videos there, then you can see them all listed as well.

35:39 but they're, they're rolling out.

35:41 There's, and I know that they're not all recorded.

35:44 So some will come later.

35:45 For instance, I am still, I don't know if I will, but I'm still planning on recording my talk and posting it, just trying to figure out when to do that.

35:54 So.

35:54 Yeah.

35:55 Yeah.

35:55 Cool.

35:55 Anyway, I'm definitely looking forward to checking it out and see what comes along.

35:58 There's also, it's worth mentioning that they're at that link.

36:02 There's a place that has like the virtual expo and the expo hall is actually my favorite part of the conference is because you get to walk around and meet people and just, you know, see what's going on and you see all the companies and what they're doing.

36:14 But one of the things that happens there on Sunday in normal times is there's the like hiring job fair thing and all the job fair stuff is already up there.

36:25 So if people are looking for a Python job, there's like many, many links of this company's hiring for these four positions.

36:32 Click here.

36:33 This company's hiring for this position.

36:34 So if you're looking for a job, you want to get in there quick and, grab the good ones and apply to them.

36:40 Yeah.

36:40 One of the things that that's missing is how am I going to last an entire year with no new t-shirts?

36:46 I know.

36:47 Well, you're going to have to be up in your game there in this video version here.

36:52 I know.

36:53 I love all the conference swag.

36:55 Yeah, exactly.

36:56 Like, how do you even do that?