Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


« Return to show page

Transcript for Episode #54:
PyAnnotate your way to the future

Recorded on Tuesday, Nov 28, 2017.

00:00 KENNEDY: Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode #54, recorded November 28th, 2017. I’m Michael Kennedy.

00:00 OKKEN: And I’m Brian Okken.

00:00 Brian, I feel like we’ve got some pretty good stuff lined up for this week. What do you think?

00:00 Yeah, we do.

00:00 Totally. Before we get to that though, let’s just say, ‘Thank you’ to DigitalOcean. They want you to know about Spaces at do.co/python. Spaces is awesome. It’s like AWS 3, but ten times better, maybe even more so. We’ll tell you more about that later.

00:00 you have some fantastic news about the stability of Python Open Source infrastructure, right?

00:00 Yes, this just came out yesterday, an announcement that the Python Software Foundation has awarded a $170,000 grant. The money came from the Mozilla Open Source program and it’s to improve the sustainability of PyPI. That is our packaging index that everybody uses.

00:00 Yeah and we’ve talked about the challenges that PyPI had previously. I’ve actually done an entire panel episode on Talk Python. It’s a ways back; it’s in the 60s/70s-range in the episode number. This has been a really big problem and it’s really been on the shoulders of Donald Stuft to just keep pip and PyPI running, right?

00:00 There are other people involved with trying to keep it up and running, but really that’s all they have time for right now. There was effort for the new warehouse code base, but Donald has switched jobs recently and cannot spend as much time as he was before working on it, so there’s a big gap there and we need some work. There’s a lot of people that have asked, ‘This warehouse thing, I thought it was going to become the new PyPI? What’s up?’ It’s still not the default.

00:00 I know. The site basically works. It uses the same database so it doesn't get out of sync. And if you go to PyPI.org or PyPI.io, you end up there and it’s a much better experience than the funky double PyPI URL at Python.org.

00:00 There’s some administrative capabilities that, for instance, if you’re pushing up a new package you will notice. You still have to go use the old API to create an account and there’s some backwards compatible administrative capabilities that are needed in order to get this going farther. And also, it’s used by so many people that you have to migrate slowly and carefully. Hopefully this grant will be enough to at least get us started and get that done. So, I’m excited about it.

00:00 Yeah, that’d be super awesome. Maybe they can take a page out of how the Instagram folks migrated from Python 2 and the older version of Django to Python 3 and the newer version of Django, where at first it just rolled out to the internal people and then a small group and so on. It’s either them or Facebook. Same company but I can't remember exactly what product but I think it was Instagram.

00:00 In the article we link up, they do talk about first steps as redirecting some of the production traffic to the warehouse and then gradually migrating that over. Then, again, the main thing is to try to get all the administrative capabilities up to snuff. I don’t know that the timeline is like, but I’m looking forward to seeing some of those changes.

00:00 You know, I’m looking forward to that red pre-production website banner being on.

00:00 Yeah, definitely.

00:00 The site, at least from a consumer perspective is really great. I think they can actually take that down now and say, ‘Admin people, if you want to maintain your package, go over here.’

00:00 It’s still kind of a messy thing, to teach people how to put up new packages. It’s still a convoluted instruction set.

00:00 For sure. So, how often do you use type annotations? Python is a dynamic language. You might say, ‘Here’s a function called register. It has a thing called user. Maybe that’s the user’s email, maybe it’s the user object, maybe it’s something else.’ You could annotate that. Do you do that?

00:00 I try to do it for at least the API for a package. That’s what I’ve been using it for.

00:00 Yeah, that’s a really great point. I do that as well. I don’t go over-the-top and annotate everything in my code, but I find that you cross major architectural boundaries that hopefully you put into your application. You’ve got a data access layer, and you’ve got some other layer that’s using it. If you annotate just that data access layer, that really flows a lot of good checking through.

00:00 one of the tools that has been around for awhile is actually, as I understand it, one of the main projects that Guido Van Rossum has been working on is Mypy, which is an experimental, optional type checker for Python.

00:00 Yeah.

00:00 Yeah, it’s cool, right? So, basically what it does is it’s like Flake8 or something. You run it against your code and if you’ve used these type annotations, which are just editor notes basically, they have no runtime behavior for most frameworks. I've seen some people try to make use of it and it’s been pretty cool what I’ve seen. Generally, it’s just a, ‘Here’s a note for the editors to give you some hints.’ Mypy will check that through as it follows the flow of your code, right?

00:00 Yeah.

00:00 So, that’s pretty good. It even works on Python 2, which doesn’t support type annotations, but there’s a docstring-style of doing it.

00:00 the big announcement is that Dropbox has just released a thing called PyAnnotate. So, PyAnnotate builds on Mypy. Instead of just going, ‘Okay, great. So, you wrote this code and then you went and added type annotations. I can tell you if it’s correct’ PyAnnotate will say, ‘You wrote a bunch of code or you inherited a bunch of code, I will annotate it for you.’ That is awesome.

00:00 It’s pretty cool.

00:00 Yeah, so basically if you’ve got some amount of code that you want to annotate, what you do it, you can go and import some profiler hooks. And you can do it just on a function-by-function or call graph-by-call graph section and say, ‘Start collecting annotation information here, stop there.’ It generates a JSON file with all the info and then if you want, you can run a seperate command line utility, pass it through the JSON file plus your source files, and it will go put the type annotations in it.

00:00 I think this is huge and I really like it.

00:00 I think it has the potential of being huge. There’s a few things I’me on the fence about.

00:00 Like what?

00:00 Like, it only does the Python 2-style comment annotations, so far.

00:00 Yeah, that’s not so amazing. Well, hold on, let me look. Let me pull this up. So, one of the things I think this is actually coming from is the fact that Dropbox is trying to move away from Python 2. I’m pretty sure that’s why this whole thing exists. You’re right, it does do the Python 2-style, which is kind of annoying. It wouldn’t be that much work to migrate it up. Maybe some enterprising person will add that feature, the Python 3-style, which I think is much nicer.

00:00 Version PyAnnotate.

00:00 Yeah, a PyAnnotate 3.

00:00 One of the comments is, ‘Is pull requests accepted?’

00:00 (Laughs) Beautiful. Yeah, that’s really cool. So, I think the plan is, those guys have one of the largest code bases in Python, period, and it’s all in Python 2. Well, I shouldn’t say all. I don’t if it’s all. I think much of it is in Python 2, so here’s a great way to prepare this for some kind of automated migration or much stronger migration story.

00:00 Yeah, it’s definitely a step in the right direction. I think it’s really cool.

00:00 Yeah, very cool. Maybe somebody will take this and do something fun with it.

00:00 One of the other parts of it is the little boilerplate that you’ve got to do to try to import your code and run it to generate that stuff. There’s somebody already. Kencho Engineering has released a project called Pytest Annotate that makes this a little bit cleaner. With Pytest Annotate, you can run your tests against you code without doing any hooks into your code. And it does all of the resume and stop. It generates that stuff for you with that.

00:00 these are all in early phases and there’s a few caveats with it. I played with it a little bit and it’s pretty easy. It’s just a couple lines of code and it generates some annotations to add to your code. It’s pretty cool.

00:00 Yeah, I think that’s really great. Basically, you can run individual tests and all the sets of tests and everything under test will then have type annotation information available for it. One more command line thing and you’ll put it back in the code.

00:00 Yeah, I tried it out. One of the things I do like about the PyAnnotate is by default it doesn’t modify your code, but it tells you what you should change. And then if you want it to actually have it write the code, you add a (dash) - w flag and it will write it. It’s a good behavior; I like it.

00:00 Yeah, I like the option to see what’s going to happen before you actually commit. We have source control. I hope people are using source control. (Laughs)

00:00 Before we get to the next item, I want to tell you guys about DigitalOcean Spaces.

00:00 DigitalOcean Spaces is online object storage, file storage for your applications and all the other things you might use something like Amazon S3 for. But it’s much, much more affordable. Instead of being $93 for the first terabyte of traffic, it’s $5. And you get free inbound traffic, all sorts of really good stuff. After that, it’s still nine times cheaper than AWS. Really great. Same APIs. You can just switch over there super easy. More or less, just point your client at a different URL and you’re still doing the same type of thing. So, check it out at do.co/Python.

00:00 of server code that wants to store stuff in places and link other people to it, have you ever created a systemd service on Linux?

00:00 I have not.

00:00 I haven’t either. It always seemed like a complicated thing that you’d have to set up. Systemd is the more modern sort of daemon service for at least Ubuntu. I think on other ones as well but I’ve only played with Ubuntu, so that’s an earlier one I encountered on. And there’s this guy who created it just showing how to use a Python script as a system daemon in the systemd service. And you can control it with service control and all those sorts of things, just like you would say, EngineX or MicroWSGI or some other major built in server component. It’s super, super easy. You basically create a Python file and you create this little .service file, those are both in the gist. Copy into a certain location, run a few command line arguments to enable them and start them and off it goes. You have a little while true, go do your stuff work in your Python script and it will run indefinitely, even auto start Linux.

00:00 Oh, that’s cool.

00:00 And it’s super easy. Are you looking at the code?

00:00 Yeah, it’s just a handful of lines of code, that’s it.

00:00 It’s basically a configuration. It’s probably like eight lines of configuration, half of which is headers. So, it’s really super easy. If you need to have stuff run in the background and run with your system on Linux, check this out if you want to write that in Python.

00:00 Cool.

00:00 For sure. So, you were talking about pytest before. Pytest is shiny and new again, right?

00:00 Yes, a new version came out, pytest 3.3 and there’s quite a few changes, one of which is they’re not supporting a couple versions of Python anymore. I think 2.6 and 3.3 are out now, so you have to do 2.7 or 3.4 and above.

00:00 That’s right. Python 3.3 just went out of support in its own right, so those are probably tied. I’m not sure about 2.6.

00:00 There’s a bunch of new features, which are exciting, but the most exciting thing is just the visual thing for me. Pytest now displays a progress percentage while running tests. Along the right-hand side of your terminal window, you’ll get percentage of tests done. I imagine it’s based on – it does collections first – the number of tests.

00:00 Yeah, it probably doesn’t go, ‘This one last time took ten seconds and this one took one…’

00:00 I don’t know that for sure but I’m guessing that.

00:00 It’d be awesome if it had kind of both. I can totally see why; that wouldn't make any sense.

00:00 One of the other things that pytest has always been great about is capturing standard error and play those for test failures. By default, you can display them all the time if you feel like it. Also, you can write tests around the capture output and testing against that. They’ve added built-in sport for capturing the output from the standard logging module, which is quite helpful for people using the logging module.

00:00 Yeah, how nice.

00:00 It’s pretty cool. Now, I've got to go out and test my entire book to make sure it still runs against pytest 3.3.

00:00 Ah, the joys of being an author. You’re never done.

00:00 Yeah, I’m pretty sure everything looks pretty compatible so it shouldn’t be an issue.

00:00 That’s cool. Think of it this way, someday it will break bad enough, you’ll have to write a version two, a second edition. (Laughs)

00:00 Yeah, that’s the plan.

00:00 Yeah, for sure. So, I want to wrap this episode up with something pretty straightforward, but also, it kind of gives you a really unique technique. So, it turns out, if you’re going to create a dictionary, as we all know, there’s multiple ways to do this in Python. Same for lists, same for strings, same for tuples and so on. I could say d={}, that’s the sort of language way. Or there’s the more type-driven way where I say d=dict (). You can either use the curly braces or use the dict, similarly list or [] (square brackets) but tuples, things like that. So, there’s the type way, and then there’s the built-in way.

00:00 turns out, that the built-in way is faster. That’s kind of an interesting piece of trivia. But what’s really interesting is this guy wrote an article called, “Why d={} is faster than d=dict.” He goes through the analysis and he used the dis module. He goes through and he actually disassembles the line that uses {} and the line that uses dict, and analyzes why one is 20% slower, or whatever the numbers turn out to be.

00:00 It’s fun and nerdy. It looks like just one extra byte code or something like that.

00:00 Yeah, the main thing that makes it slow is when you use the type way, you’re effectively calling a function and when you’re calling a function, it needs to load the global variables and check to see if that function is overridden in the local scope, rather than in the major scope. So, it can’t be convinced that stir or dict or whatever, is what the built-in one means. So, it has to load up the state and check it out and then carry on. And it turns out that that makes that slower. This is all interesting, right? It’s kind of just like a little trivia trick. But the reason I brought up this article is if you look farther down at the end, he analyzes something that has nothing to do with this whole dict versus {} thing. He says, ‘Let’s suppose we’re going to do some mathematical calculations with math.floor and logarithms and so on. There’s a way to structure it using the functions directly out of the globals you’ve imported. Say you import math, then math.floor and so on. There’s another way to pass those in to the function. The passing it in means you get to skip that load global for really hot loops or really short functions that are called super frequently. And that’s like 22% faster by just passing them in from the outside than calling them directly. So, if you’re really trying to optimize something, this is a super simple, non-obvious trick to get a significant speed up.

00:00 To get around loading globals? Huh.

00:00 Isn’t that weird?

00:00 Yeah, I didn’t know you could get around that, so that’s cool.

00:00 I didn’t either. Apparently, you can. I just think it’s an interesting way of going, ‘Here’s this incongruity.’ Why would have a different speed? They’re effectively doing the same thing in the end, and then using the disassembly to analyze it, and seeing, ‘Here’s the problem. How do we get around that? Let’s make this other unrelated thing faster.’ I think that’s just fascinating.

00:00 Yeah. That’s pretty cool.

00:00 Alright, well, that’s pretty much it for our news this week, everyone. Hopefully you enjoyed it. I thought all of them were very cool.

00:00 do have one follow-up item for you, Brian.

00:00 Okay, great.

00:00 Remember, I told you guys a couple weeks ago about, “All Work, All Play,” that eSports champion thing that has apparently taken over?

00:00 Yeah.

00:00 So, there’s this article that came out in Ars Technica that caught my attention, that’s really closely related to that. And I love Ars Technica. It says, “F1 eSports Now More Exciting Than Real F1.” As in Formula 1, the many million dollar racing teams. It says that watching the eSports version is actually more interesting and they go through and talk about why that is. There was just the first World Championship of F1 and they have real announcers and a super excited Italian guy as one of the announcers. If you look through the comments, I think they might be right. I think eSports might actually be more interesting than real F1 racing, and I love real racing things.

00:00 That’s kind of cool. I’ll have to check this out.

00:00 If that sounds interesting to you, watch that video for five minutes and wait for the Italian announcer. He’s awesome. (Laughs)

00:00 hopefully you guys can enjoy that and find some cool stuff in the news. Brian, thank you for sharing this for everyone.

00:00 Yeah, thank you.

00:00 You bet. Bye.

00:00 you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes. Get the full show notes at pythonbytes.fm. If you have a news item that you want featured, just visit pythonbytes.fm and send it our. We’re always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page