Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #54: PyAnnotate your way to the future

Return to episode page view on github
Recorded on Tuesday, Nov 28, 2017.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 54, recorded November 28th, 2017.

00:10 I'm Michael Kennedy.

00:11 And I'm Brian Okken.

00:12 And Brian, I feel like we've got some pretty good stuff lined up for this week.

00:15 What do you think?

00:16 Yeah, we do.

00:17 Totally.

00:17 Before we get to that, though, let's just say thank you to DigitalOcean.

00:20 They want you to know about Spaces at do.co slash Python.

00:24 Spaces is awesome.

00:25 It's like AWS S3, but 10 times better.

00:29 Maybe even more so.

00:30 I'll tell you more about that later.

00:31 But Brian, you have some fantastic news for the stability of Python open source infrastructure, right?

00:38 Yes.

00:39 This just came out yesterday, an announcement that the Python Software Foundation has awarded a $170,000 grant that the money came from the Mozilla open source program.

00:52 And it's to improve the sustainability of PyPI.

00:56 That is our packaging index that everybody uses.

00:59 Yeah.

00:59 And we've talked about the challenges that PyPI had previously.

01:02 I've actually done an entire panel episode on Talk Python.

01:05 It's a ways back.

01:07 It's in the 60s, 70s range in the episode number.

01:10 But this has been a really big problem.

01:12 And it's really been on the shoulders of Donald Stuffed to just keep pip and PyPI running, right?

01:20 There are other people involved with trying to keep it up and running.

01:23 But really, that's all that they have time for right now.

01:26 There was effort for the new warehouse code base.

01:30 But Donald has switched jobs recently and cannot spend as much time as he was before working on it.

01:40 So there's a big gap there.

01:42 And we need some work.

01:43 So there's a lot of people that have asked, this warehouse thing, I thought it was going to become the new PyPI.

01:48 What's up?

01:49 Still not the default.

01:50 I know.

01:50 You know, the site basically works.

01:53 It uses the same database, so it doesn't get out of sync.

01:56 And, you know, if you go to pypi.org or pypi.io, you end up there.

02:01 And it's a much better experience than the funky double PyPI URL that's at python.org.

02:07 There are some administrative capabilities that, for instance, if you're pushing up a new package, you will notice.

02:14 You still have to go use the old API to create an account.

02:19 And there are some backwards compatible administrative capabilities that are needed in order to get this going and farther.

02:27 And also, it's used by so many people that we kind of have to migrate slowly, a little bit slowly and carefully.

02:34 And hopefully this grant will be enough to at least get us started and get that done.

02:40 So I'm excited about it.

02:42 Yeah, that'd be super awesome.

02:43 Maybe they can take a page out of how the Instagram folks migrated from Python 2 and the older version of Django to Python 3 and the newer version of Django, where at first it just rolled out to the internal people and then a small group and so on.

02:58 It's either them or their Facebook.

03:00 Same company, but I can't remember exactly the product.

03:02 But I think it was Instagram.

03:03 I think it'll be pretty good a plan put together.

03:05 They've got in the article that we link up, they do talk about one of the first steps is redirecting some of the production traffic to the warehouse and then gradually migrating that over.

03:19 And then again, the main thing is to try to get all the administrative capabilities up to snuff.

03:24 Yeah, nice.

03:24 I don't know what the timeline is like, but I'm looking forward to seeing some of those changes.

03:28 You know, I'm looking forward to that red pre-production website banner thing being gone.

03:33 Yeah, yeah, definitely.

03:34 Because the site, at least from a consumer perspective, is really, really great.

03:37 I think they could actually take that down now and just say, admin people, if you want to maintain your package, go over here.

03:42 It's still kind of a messy thing to have to try to teach people how to put up new packages.

03:47 It's still a convoluted instruction set.

03:50 Yep, for sure.

03:51 So how often do you use type annotations?

03:54 Python's a dynamic language.

03:56 You might say, here's a function called register, and it has a thing called user.

04:01 Maybe that's the user's email.

04:03 Maybe that's a user object.

04:04 Maybe it's something else.

04:05 Like, you could annotate that.

04:06 But do you do that?

04:07 I try to do it for at least the API for a package.

04:10 That's what I've been using it for.

04:12 Yeah, that's a really great point.

04:13 I do that as well.

04:14 I don't, like, go over the top and, like, annotate everything in my code.

04:18 But I find as you cross, like, major architectural boundaries, which hopefully you've put into your application, you know, you've got, like, a data access layer, and you've got some other layer that's using it.

04:28 Like, if you annotate just that data access layer, like, that really flows a lot of good checking through.

04:35 So one of the tools that has been around for a while, and it's actually, as I understand it, one of the main projects that Guido van Rossum has been working on is mypy, which is an experimental optional type checker for Python.

04:51 Yeah.

04:51 Yeah, it's cool, right?

04:52 So basically what it does is it's like Flake 8 or something.

04:55 You run it against your code.

04:56 And if you've used these type annotations, which are just editor notes, basically, they have no runtime behavior for most frameworks.

05:04 I've seen some people try to make use of it, and it's been pretty cool what I've seen.

05:08 But generally it's just, like, here's a note for the editors to give you some hints.

05:13 mypy will check that through as it follows the, you know, the flow of your code, right?

05:19 Yeah.

05:19 So that's pretty good.

05:20 It even works on Python 2, which doesn't support type annotations, but there's, like, a doc string style of doing it.

05:26 So the big announcement is that Dropbox has just released something called PyAnnotate.

05:33 So PyAnnotate builds on mypy, and instead of just going, okay, great, so you wrote this code, and then you went and you added type annotations, I can tell you if it's correct.

05:42 PyAnnotate will say, you wrote a bunch of code or you inherited a bunch of code.

05:46 I will annotate it for you.

05:49 That is awesome.

05:50 It's pretty cool.

05:51 Yeah.

05:51 So basically if you've got some amount of code you want to annotate, what you do is you can go and, like, import some profiler hooks.

06:00 And you can do it just on a function by function or, you know, call graph by call graph section and say, start collecting annotation information here.

06:09 Stop there.

06:10 And it generates a JSON file with all the info.

06:12 And then if you want, you can run a separate command line argument, a utility, pass it that JSON file plus your source files, and it will then go put the type annotations in it.

06:23 So I think this is huge, and I really like it.

06:26 I think it has a potential of being huge.

06:28 There's a few things I'm on the fence about.

06:30 Oh, like what?

06:30 Like it only does the Python 2 style comment annotations so far.

06:36 Yeah, that's not so amazing.

06:38 Well, hold on.

06:39 Let me look.

06:40 Let me pull this up.

06:41 So one of the things I think this is actually coming from is the fact that Dropbox is trying to move away from Python 2.

06:51 I'm pretty sure that's why this whole thing exists.

06:53 You're right.

06:54 It does do the Python 2 style, which is kind of annoying.

06:57 But I guess, you know, it wouldn't be that much work to, like, migrate it up.

07:01 Maybe some enterprising person will add that feature, the Python 3 style, which I think is much, much nicer.

07:07 Version of PyAnnotate.

07:08 Yeah, a PyAnnotate 3.

07:09 Yeah.

07:10 One of the comments is, it's pull requests accepted.

07:13 Beautiful.

07:15 Yeah, that's really cool.

07:16 So I think the plan is those guys have one of the largest code bases in Python, period.

07:23 And it's all in Python 2.

07:24 Well, I should say all.

07:26 I don't know.

07:26 I think much of it is in Python 2.

07:28 And so here's a great way to, like, prepare this for some kind of automated migration or much stronger migration story.

07:37 Yeah, it's definitely a step in the right direction.

07:39 I think it's really cool.

07:40 Yeah, very cool.

07:41 Maybe somebody will take this and do something fun with it.

07:43 One of the other parts of it is the little boilerplate that you've got to do to try to import your code and run it to generate that stuff.

07:52 There's somebody already.

07:53 The Kensho Engineering has released a project called pytest Annotate that makes this a little bit cleaner.

08:01 So with pytest Annotate, you can run, just run your tests against your code without doing any hooks into your code for the PyAnnotate.

08:11 And it will generate all.

08:14 It does all of the start and stop or the...

08:17 The resume and stop, whatever it is, yeah.

08:18 Yeah, the resume.

08:19 And it, yeah, it generates that stuff for you with that.

08:23 Again, these are all in early phases.

08:25 And there's a few caveats with it.

08:28 But I played with it a little bit.

08:30 And it's a lot.

08:32 It's pretty easy.

08:32 There's just a couple lines of code to generate some ticket annotations out of your code.

08:37 It's pretty cool.

08:38 Yeah, I think that's really great.

08:39 And so basically, you can run individual tests or all the sets of tests.

08:43 And everything under test will then have type annotation information available for it.

08:49 And then one more line, command line thing, and you'll put it back in code.

08:53 Yeah, I tried it out.

08:54 One of the things I do like about the PyAnnotate is there's, by default, it doesn't modify your code.

08:59 But it tells you what you should change.

09:01 And then if you want to have it actually write the code, you add a dash W flag and it'll write it.

09:06 So that's a good behavior.

09:08 I like it.

09:09 Yeah, it gives you the option to see what's going to happen before you actually commit.

09:13 I mean, we have source control.

09:15 I hope people are using source control.

09:17 Yeah.

09:18 But still.

09:19 Awesome.

09:20 So before we get to the next item, I want to tell you guys about DigitalOcean Spaces.

09:24 So DigitalOcean Spaces is online object storage, file storage for your applications, and all the other things you might use something like Amazon S3 for.

09:34 But it's much, much more affordable.

09:36 Instead of being, say, $93 for the first terabyte of traffic, it's $5.

09:43 And you get free inbound traffic, all sorts of really good stuff.

09:46 And after that, it's still 10 times, nine times cheaper than AWS.

09:50 So really great.

09:51 Same APIs.

09:52 You can just switch over there super easy.

09:55 More or less just point your client at a different URL, and you're still doing the same type of thing.

10:00 So check it out at do.co slash Python.

10:03 Speaking of server code that wants to store stuff in places and link other people to it, have you ever created a SystemD service for Linux?

10:11 I have not.

10:12 I haven't either.

10:13 It always seemed like kind of a complicated thing that you'd have to set up.

10:17 So SystemD is the more modern sort of daemon service for at least Ubuntu.

10:23 I think on other ones as well, but I only play with Ubuntu.

10:26 So that's a really early one that I encountered on.

10:29 And there's this guy who created a gist showing how to use a Python script as a SystemD daemon in the SystemD service.

10:38 And then you can control it with like service control and all those sorts of things, just like you would say Nginx or Microwiski or some other major built-in server component.

10:47 And it is super, super easy.

10:49 You basically create a Python file, and you create this little .service file.

10:55 Those are both in the gist.

10:56 Copy and restart and location, run a few command line arguments to enable them and start them, and off it goes.

11:02 You can just have a little while true, go do your stuff work in your Python script, and it'll just run indefinitely and even auto start when Linux boots.

11:11 Oh, that's cool.

11:11 And it's super easy, right?

11:12 Are you looking at the code?

11:13 I mean, it's like...

11:14 Yeah.

11:14 I mean, it's just a handful of lines of code.

11:16 That's it.

11:16 Yeah, and it's basically a configuration.

11:18 It's probably like eight lines of configuration, half of which is like headers.

11:22 So it's really, really super easy.

11:24 So if you need to have stuff run in the background and just run with your system on Linux, check this out if you want to write that in Python.

11:31 Nice.

11:31 Cool.

11:32 Yeah, for sure.

11:33 So you were talking about pytest before.

11:36 pytest is shiny and new again, right?

11:38 Yes.

11:39 There's a new version came out, pytest 3.3.

11:43 And there's quite a few changes, one of which is they're not supporting a couple versions of Python anymore.

11:51 I think it's 2.6 and 3.3 are out now.

11:56 So you have to do either 2.7 and above or 3.7 or 3.4 and above.

12:03 Yeah, that's right.

12:04 The Python 3.3 just went out of support in its own right.

12:08 So those are probably tied.

12:09 I'm not sure about 2.6.

12:09 There's a bunch of new features which are exciting.

12:12 But the most exciting thing is just a visual thing for me is that pytest now displays a progress percentage while running tests.

12:21 So you get along the right-hand side of your terminal window, you'll get like percentage of tests done.

12:27 And I imagine it's based on just the number of – it does collections first and it's probably just the number of tests.

12:34 Yeah, it probably doesn't go, okay, this one last time took 10 seconds.

12:38 And this one took one, so you have – whatever, right?

12:42 I don't know that for sure, but I'm guessing that.

12:44 Yeah, yeah.

12:45 It'd be awesome if it had kind of both, but I can totally see why that wouldn't make any sense.

12:49 Yeah, and then one of the other things that pytest has always been great about is capturing standard out and standard error and display those.

12:56 If there's – for test failures by default, you can display them all the time if you feel like it.

13:02 And also, you can write tests around the captured output and test against that.

13:09 And they've added built-in support for capturing the output from the standard logging module,

13:17 which is quite helpful for people using the logging module.

13:20 Oh, yeah.

13:21 How nice.

13:21 That's pretty cool.

13:22 Now I've got to go out and test my entire book to make sure that it still runs against pytest 3.3.

13:29 Ah, the joys of being an author.

13:30 You're never done.

13:31 Yeah, I'm pretty sure everything looks pretty compatible, so it shouldn't be an issue.

13:35 That's cool.

13:36 Think of it this way.

13:37 Like someday it'll break bad enough you have to write a version to a second edition.

13:43 Yeah, that's the plan.

13:44 Think of those.

13:44 Yeah, for sure.

13:46 Cool.

13:47 All right.

13:47 So I want to wrap this episode up with something pretty straightforward, but also it kind of gives you a really unique technique.

13:55 So it turns out that if you're going to create a dictionary, as we all know, there's multiple ways to do this in Python.

14:01 Same for list, same for string, same for tuples, and so on.

14:04 Like I could say D equals open curly, close curly.

14:08 That's the sort of language way.

14:11 Or there's the more type-driven way where I say D equals dict, open, close parenthesis, right?

14:16 So you either use the curly braces or use the dict.

14:18 Similarly, list or square brackets or set.

14:22 I guess set you can't do it, but tuples, things like that.

14:24 So there's the type way, and then there's the built-in way.

14:27 It turns out that the built-in way is faster.

14:31 Okay.

14:32 That's kind of an interesting piece of trivia.

14:37 But what's really interesting is this guy wrote an article called why D equals curly braces is faster than D equals dict.

14:44 And he goes through the analysis, and he uses the dis module.

14:48 And he goes through and he actually disassembles the line that uses curly braces and the line that uses dict

14:54 and analyzes why the one is like 20% slower or whatever the numbers turn out to be.

15:00 It's fun and nerdy.

15:02 It looks like just one extra bytecode or something like that.

15:05 Yeah.

15:05 The main thing that makes it slow is when you use the type way, you're effectively calling a function.

15:12 And when you're calling a function, it needs to load the global variables

15:17 and check to see if that function is overridden in the local scope rather than in the major scope.

15:24 So it can't be convinced that stir or dict or whatever is what the built-in one means.

15:30 So it has to kind of load up the state and check it out and then carry on.

15:34 And it turns out that that makes that slower.

15:36 And so this is all interesting, right?

15:38 But it's kind of just like a little trivia trick.

15:42 But the reason I brought up this article is if you look farther down at the end,

15:45 he analyzes something that has nothing to do with this whole dict versus curly thing.

15:52 It says, let's suppose we're going to do some mathematical calculations with like math.floor and logarithms and so on.

16:01 There's a way to structure it.

16:03 We're using the functions directly out of, say, out of the globals that you've imported.

16:09 So you say import math and then math.floor, math.log10, so on.

16:12 And then there's another way to like pass those into the function.

16:15 The passing it in means you get to skip that load global for really hot loops or really short functions that are called super frequently.

16:23 And that's like 22% faster by just passing them in from the outside than calling them directly.

16:30 So if you're really trying to optimize something, this is a super simple, non-obvious trick to get like a significant speedup.

16:38 That actually to get around loading globals.

16:40 Isn't that weird?

16:41 Yeah.

16:42 I didn't know you could get around that.

16:43 So that's cool.

16:44 I didn't either.

16:45 Apparently you can.

16:46 And I just think it's an interesting way of going like, all right, here's this incongruity.

16:50 Like, why would these have any different speed?

16:52 They're effectively doing the same thing in the end.

16:54 And then using the dissimilar to analyze it and then seeing, okay, well, here's the problem.

16:58 How do we get around that?

16:59 Let's make this other unrelated thing faster.

17:01 I think that's just fascinating.

17:02 Yeah, that's pretty cool.

17:03 For sure.

17:03 All right.

17:05 Well, that's pretty much it for our news this week, everyone.

17:08 Hopefully.

17:08 Enjoy it.

17:09 I thought all of them were very, very cool.

17:11 I do have one follow-up item for you, Brian.

17:14 Okay, great.

17:14 Remember I told you guys a couple of weeks ago about All Work, All Play, that weird esports

17:19 championship thing that apparently has taken over?

17:21 Yeah.

17:22 So there's this article that came out in Ars Technica that caught my attention that's

17:26 really closely related to that.

17:27 And I love Ars Technica.

17:29 It says, F1 esports is now more exciting than the real F1.

17:34 It has in Formula One, like the many, many million dollar racing teams.

17:38 And it says, look, watching the esports version is actually more interesting.

17:42 And they go through and they talk about why that is.

17:45 It was just like the first world championship of F1.

17:48 And they have the 20-minute race video with real announcers and this super excited Italian

17:53 guy as one of the announcers.

17:54 And if you look through the comments, I think they might be right.

17:58 I think esports F1 might actually be more interesting than real F1 racing.

18:02 And I love racing things, like real racing, not game racing.

18:05 That's kind of cool, though.

18:06 I'll have to go check this out.

18:08 Yeah, yeah.

18:08 So if this sounds interesting to you guys, check it out.

18:10 Watch that video for like five minutes and wait for the Italian announcer.

18:13 He's awesome.

18:14 All right.

18:16 Great.

18:16 Well, hopefully you guys can enjoy that and find some cool stuff in the news.

18:21 Brian, thank you for sharing this with everyone.

18:23 Yeah, thank you.

18:24 You bet.

18:24 Bye.

18:24 Thank you for listening to Python Bytes.

18:27 Follow the show on Twitter via at Python Bytes.

18:30 That's Python Bytes as in B-Y-T-E-S.

18:33 And get the full show notes at pythonbytes.fm.

18:36 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

18:41 We're always on the lookout for sharing something cool.

18:44 On behalf of myself and Brian Okken, this is Michael Kennedy.

18:47 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page