Transcript #54: PyAnnotate your way to the future
Return to episode page view on github00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.
00:06 This is episode 54, recorded November 28th, 2017.
00:11 I'm Michael Kennedy.
00:12 And I'm Brian Okken.
00:12 And Brian, I feel like we got some pretty good stuff lined up for this week. What do you think?
00:16 Yeah, we do.
00:17 Totally.
00:18 Before we get to that though, let's just say thank you to DigitalOcean.
00:21 They want you to know about Spaces at do.co/python.
00:24 Spaces is awesome. It's like AWS S3 but 10 times better.
00:29 maybe even more so. I'll tell you more about that later.
00:31 But Brian, you have some fantastic news for the stability of Python open source infrastructure, right?
00:39 Yes, this just came out yesterday, an announcement that the Python Software Foundation has awarded a $170,000 grant that the money came from the Mozilla open source program and it's to improve the sustainability of PyPI.
00:56 that is our packaging index that everybody uses.
00:59 Yeah, and we've talked about the challenges that PyPI had previously.
01:02 I've actually done an entire like panel episode on Talk Python.
01:06 It's a ways back. It's in like the 60s, 70s range in the episode number.
01:10 But this has been a really big problem and it's really been like on the shoulders of Donald Stuft to just keep pip and PyPI running, right?
01:20 There are other people involved with trying to keep it up and running.
01:24 But really that's all that they have time for right now. There was effort for the new warehouse code base, but there's, Donald has switched jobs recently and cannot spend as much time as he was before working on it.
01:39 So there's a big gap there and we need some work. So there's a lot of people that have asked, this warehouse thing, I thought it was going to become the new PyPI, what's up?
01:49 Still not the default. I know, you know, the site basically works, it uses the same database, so it doesn't get out of sync. And you know, if you go to pypi.org or pypi.io, you end up there. And it's a much better experience than the funky double pypi URL that's at python.org.
02:07 >> There are some administrative capabilities that, for instance, if you're pushing up a new package, you will notice.
02:14 You still have to go use the old API to create an account.
02:20 There are some backwards compatible administrative capabilities that are needed in order to get this going and farther.
02:27 Also, it's used by so many people that we have to migrate slowly, a little bit slowly and carefully, and hopefully this grant will be enough to at least get us started and get that done.
02:41 So I'm excited about it.
02:42 Yeah, that'd be super awesome.
02:43 Maybe they can take a page out of how the Instagram folks migrated from Python 2 and the older version of Django to Python 3 and the newer version of Django, where at first it just rolled out to the internal people and then a small group and so on.
02:58 It's either that, them or their Facebook, same company, but I can't remember exactly the product, but I think it was Instagram.
03:03 I think it'll be pretty good a plan put together.
03:05 They've got in the article that we link up and they do talk about one of the first steps is redirecting some of the production traffic to the warehouse and then gradually going over migrating that over.
03:19 And then again, the main thing is to try to get all the administrative capabilities up to snuff.
03:24 Yeah.
03:24 Nice.
03:24 I don't know what the timeline is like, but I'm looking forward to seeing some of those changes.
03:28 You know, I'm looking forward to that red pre-production website banner thing being gone.
03:33 Yeah, yeah, definitely.
03:34 Because the site, at least from a consumer perspective, is really, really great.
03:37 I think they could actually take that down now and just say, "Admin people, if you want to maintain your package, go over here." It's still kind of a messy thing to have to try to teach people how to put up new packages.
03:47 It's still a convoluted instruction set.
03:50 Yep, for sure.
03:51 So how often do you use type annotations?
03:54 Python's a dynamic language.
03:57 You might say, "Here's a function called register, and it has a thing called user." Maybe that's the user's email, maybe that's a user object, maybe it's something else.
04:05 Like, you could annotate that, but do you do that?
04:07 I try to do it for at least the API for a package.
04:11 That's what I've been using it for.
04:12 Yeah, that's a really great point. I do that as well.
04:14 I don't, like, go over the top and, like, annotate everything in my code.
04:19 But I find as you cross, like, major architectural boundaries, which hopefully you've put into your application, you know, you've got, like, a data access layer, and you've got some other layer that's using it.
04:28 Like, if you annotate just that data access layer, Like that really flows a lot of good checking through.
04:35 So one of the tools that has been around for a while and is actually, as I understand it, one of the main projects that Guido van Rossum has been working on is mypy, which is an experimental optional type checker for Python.
04:51 Yeah, it's cool, right?
04:52 So basically what it does is it's like Flake 8 or something, you run it against your code.
04:56 And if you've used these type annotations, which are just editor notes, basically.
05:01 They have no runtime behavior for most frameworks.
05:05 I've seen some people try to make use of it, and it's been pretty cool what I've seen.
05:08 But generally, it's just like a, here's a note for the editors to give you some hints.
05:13 mypy will check that through, as it follows the flow of your code, right?
05:19 - Yeah. - So that's pretty good.
05:20 It even works on Python 2, which doesn't support type annotations, but there's like a docstring style of doing it.
05:26 So the big announcement is that Dropbox has just released something called PyAnnotate.
05:33 So PyAnnotate builds on mypy, and instead of just going, okay, great, so you wrote this code, and then you went and you added type annotations, I can tell you if it's correct, PyAnnotate will say, you wrote a bunch of code, or you inherited a bunch of code, I will annotate it for you.
05:49 That is awesome.
05:50 - It's pretty cool.
05:51 - Yeah, so basically, if you've got some amount of code you want to annotate, what you do is you can go and like import some profiler hooks, and you can do it just on a function by function or call graph by call graph section and say, start annotating, collecting annotation information here, stop there.
06:10 And it generates a JSON file with all the info, and then if you want, you can run a separate command line argument, a utility, pass it that JSON file plus your source files, and it will then go put the type annotations in it.
06:23 So I think this is huge, and I really like it.
06:26 I think it has a potential of being huge.
06:28 There's a few things I'm on the fence about.
06:30 Like what?
06:31 Like it only does the Python 2 style comment annotations so far.
06:37 Yeah, that's not so amazing.
06:38 Well, hold on, let me look.
06:40 Let me pull this up.
06:41 So one of the things I think this is actually coming from is the fact that Dropbox is trying to move away from Python 2.
06:52 I'm pretty sure that's why this whole thing exists.
06:54 You're right, it does do the Python 2 style, which is kind of annoying, but I guess it wouldn't be that much work to like migrate it up.
07:01 Maybe some enterprising person will add that feature, the Python 3 style, which I think is much, much nicer.
07:07 - Version of PyAnnotate.
07:08 - Yeah, a PyAnnotate 3.
07:09 - Yeah, one of the comments is, it's pull requests accepted.
07:13 - Beautiful.
07:15 Yeah, that's really cool.
07:16 So I think the plan is those guys have one of the largest code bases in Python, period.
07:23 And it's all in Python 2.
07:25 Well, I should say all, I don't know all.
07:26 I think much of it is in Python 2.
07:28 And so here's a great way to like prepare this for some kind of automated migration or much stronger migration story.
07:37 - Yeah, it's definitely a step in the right direction.
07:40 I think it's really cool.
07:40 - Yeah, very cool.
07:41 Maybe somebody will take this and do something fun with it.
07:43 - One of the other parts of it is the little boilerplate that you've got to do to try to import your code and run it to generate that stuff, there's somebody already, the Kencho Engineering has released a project called pytest Annotate that makes this a little bit cleaner.
08:02 So with pytest Annotate, you can run, just run your tests against your code without doing any hooks into your code for the PyAnnotate.
08:12 And it will generate all, it does all of the start and stop the resume and stop whatever it is. Yeah, the resume. And it generates that stuff for you with that. Again, there's these are all in the early phases. And there's a few caveats with it. But I played with it a little bit. And it's a lot. It's pretty easy. There's just a couple lines of code to generate some to get annotations out of your code. It's pretty cool. Yeah, I think that's really great. And so basically, you can run individual tests or all the sets of tests and everything under test will then have type annotation information available for it.
08:49 Then one more line, command line thing, and you'll put it back in code.
08:53 Yeah, I tried it out.
08:54 One of the things I do like about the PyAnnotate is there's, by default, it doesn't modify your code, but it tells you what you should change.
09:02 And then if you want to have it actually write the code, you add a -w flag and it'll write it.
09:07 So that's a good behavior. I like it.
09:09 Yeah, it gives you the option to see what's going to happen before you actually commit.
09:14 I mean, we have source control, I hope.
09:16 People are using source control.
09:18 - Yeah.
09:18 - But still.
09:20 Awesome, so before we get to the next item, I want to tell you guys about DigitalOcean Spaces.
09:24 So DigitalOcean Spaces is online object storage, file storage for your applications, and all the other things you might use something like Amazon S3 for.
09:34 But it's much, much more affordable instead of being, say, $93 for the first terabyte of traffic, it's five.
09:42 And you get free inbound traffic, all sorts of really good stuff.
09:46 And after that, it's still 10 times, nine times cheaper than AWS.
09:50 So really great, same APIs, you can just switch over there, super easy, more or less just point your client at a different URL, and you're still doing the same type of thing.
10:00 So check it out at do.co/python.
10:03 Speaking of server code that wants to store stuff in places and link other people to it, Have you ever created a systemd service for Linux?
10:12 I have not.
10:12 I haven't either. It always seemed like kind of a complicated thing that you'd have to set up.
10:17 So systemd is the more modern sort of daemon service for at least Ubuntu.
10:23 I think on other ones as well that I've only played with Ubuntu.
10:26 So that's a really early one that I encountered on.
10:29 And there's this guy who created it just showing how to use a Python script as a system daemon in the systemd service, then you can control it with like service control and all those sorts of things just like you would say nginx or microwizky or some other major built-in server component.
10:47 And it is super, super easy.
10:49 You basically create a Python file and you create this little .service file.
10:55 Those are both in the gist.
10:56 Copy into a certain location, run a few command line arguments to enable them and start them.
11:01 And off it goes. You just have a little while true go do your stuff work in your Python script and it'll just run indefinitely and even auto start when Linux boots.
11:11 Oh, that's cool.
11:11 And it's super easy, right? Are you looking at the code?
11:13 Yeah, I mean it's just a handful of lines of code. That's it.
11:16 Yeah, and it's basically a configuration.
11:18 It's probably like eight lines of configuration, half of which is like headers.
11:22 So it's really, really super easy.
11:24 So if you need to have stuff running in the background and just run with your system on Linux, check this out if you want to write that in Python.
11:31 Nice. Cool.
11:32 Yeah, for sure.
11:34 You were talking about pytest before.
11:36 pytest is shiny and new again, right?
11:38 Yes, there's a new version came out, pytest 3.3.
11:42 And there's quite a few changes, one of which is they're not supporting a couple versions of Python anymore. I think 2.6 and 3.3 are out now.
11:56 So you have to do either 2.7 and above or I guess 2.7 or 3.4 and above. Yeah, that's right. The Python 3.3 just went out of support in its own right.
12:08 So those are probably tied. I'm not sure about 2.6. There's a bunch of new features which are exciting, but the most exciting thing is just a visual thing for me is that there's pytest now displays a progress percentage while running tests. So you get along the right hand side of your terminal window, you'll get like percentage of tests done. And I imagine And it's based on just the number of, it does collections first, it's probably just the number of tests.
12:34 Yeah, it probably doesn't go, okay, this one last time took 10 seconds, and this one took one.
12:41 So you have, you know, you've whatever, right?
12:42 I don't know that for sure.
12:43 But I'm guessing that.
12:44 So yeah, yeah, it'd be awesome if it if it had kind of both, but I can totally see why that wouldn't make any sense.
12:49 Yeah.
12:50 And then one of the other things that pytest has always been great about is capturing standard out and standard error and display those.
12:57 If there's for test failures by default, you can display them all the time if you feel like it.
13:03 And also, you can write tests around cap around the captured output and test against that.
13:09 And they've added built in support for capturing the output from the standard logging module, which is quite helpful for people using the logging module.
13:20 Yeah, how nice.
13:21 That's pretty cool.
13:22 I gotta go out and test my entire book to make sure that it still compiles, it still runs against my test 3.3.
13:29 - Ah, the joys of being an author.
13:30 You're never done.
13:31 - Yeah, I'm pretty sure everything looks pretty compatible, so it shouldn't be an issue.
13:35 - That's cool.
13:36 Think of it this way, like someday it'll break bad enough, you'll have to write a version to a second edition.
13:42 - Yeah, that's the plan.
13:44 - Think of those, yeah, for sure.
13:46 Cool, all right, so I wanna wrap this episode up with something pretty straightforward, but also it kind of gives you a really unique technique.
13:55 So it turns out that if you're going to create a dictionary, as we all know, there's multiple ways to do this in Python.
14:01 Same for list, same for strings, same for tuples and so on.
14:05 Like I could say D equals open curly, close curly.
14:08 That's the sort of language way, or there's the more type driven way where I say D equals dict, open close parentheses, right?
14:16 So you either use the curly braces or use the dict.
14:19 Similarly, list or square brackets or set, I guess set you can do it, but tuples, things like that.
14:24 So there's the type way, and then there's the built-in way.
14:27 It turns out that the built-in way is faster.
14:32 Okay, that's kind of an interesting piece of trivia, but what's really interesting is this guy wrote an article called why D equals curly braces is faster than D equals dict.
14:45 And he goes through the analysis, and he uses the dis module.
14:48 and he goes through and he actually disassembles the line that uses curly braces and the line that uses dict and analyzes why the one is like 20% slower or whatever the numbers turn out to be.
15:00 - It's fun and nerdy.
15:02 It looks like just one extra byte code or something like that.
15:05 - Yeah, the main thing that makes it slow is when you use the type way, you're effectively calling a function.
15:12 And when you're calling a function, it needs to load the global variables and check to see if that function is overridden in the local scope rather than in the major scope.
15:24 So it can't be convinced that str or dict or whatever is what the built-in one means.
15:30 So it has to kind of load up the state and check it out and then carry on.
15:34 And it turns out that that makes that slower.
15:36 And so this is all interesting, right?
15:38 But it's kind of just like a little trivia trick.
15:42 But the reason I brought up this article is if you look farther down at the end, he analyzes something that has nothing to do with this whole dict versus curly thing.
15:53 So let's suppose we're going to do some mathematical calculations with like math.floor and logarithms and so on.
16:01 There's a way to structure it.
16:03 We're using the functions directly out of, say, out of the globals that you've imported.
16:09 So you say import math and then math.floor, math.log10, and so on.
16:13 And then there's another way to like pass those into the function, the passing it in means you get to skip that load global for really hot loops or really short functions that are called super frequently.
16:23 And that's like 22% faster by just passing them in from the outside than calling them directly. So if you're really trying to optimize something, this is a super simple non obvious trick to get like a significant speed up.
16:38 That actually to get around loading globals.
16:40 Isn't that weird?
16:41 Yeah, I didn't know you could get around that. So that's cool.
16:44 I didn't either.
16:45 Apparently you can.
16:46 And I just think it's an interesting way of going like, all right, here's this incongruity.
16:50 Like why would these have any different speed?
16:52 They're effectively doing the same thing in the end.
16:55 And then using the dissimilar to analyze it and then seeing, okay, well, here's the problem.
16:58 How do we get around that?
16:59 Let's make this other unrelated thing faster.
17:01 I think that's just fascinating.
17:02 Yeah, that's pretty cool.
17:03 For sure.
17:04 All right.
17:05 Well, that's pretty much it for our news this week, everyone.
17:08 Hopefully enjoy it.
17:09 I thought all of them were very, very cool.
17:12 I do have one follow up item for you, Brian.
17:14 - Okay, great.
17:15 - Remember I told you guys a couple of weeks ago about All Work, All Play, that weird eSports championship thing that apparently has taken over?
17:21 - Yeah.
17:22 - So there's this article that came out in Ars Technica that caught my attention that's really closely related to that.
17:28 And I love Ars Technica.
17:29 It says, "F1 eSports is now more exciting "than the real F1." As in Formula One, like the many, many million dollar racing teams.
17:39 And it says, "Look, watching the eSports version "is actually more interesting." and they go through and they talk about why that is.
17:45 It was just like the first world championship of F1 and they have the 20 minute race video with real announcers and the super excited Italian guy as one of the announcers.
17:55 And if you look through the comments, I think they might be right.
17:58 I think eSports F1 might actually be more interesting than real F1 racing.
18:02 And I love racing things, like real racing, not game racing.
18:05 - That's kind of cool though.
18:07 I'll have to go check this out.
18:08 - Yeah, yeah, so if this sounds interesting to you guys, check it out.
18:10 Watch that video for like five minutes wait for the Italian announcer. He's awesome. All right. Great. Well, hopefully you guys can enjoy that and find some cool stuff in the news. Brian, thank you for sharing this with everyone.
18:23 Yeah, thank you. You bet. Bye.
18:24 Bye.