Transcript #51: How to make your code 80 times faster
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 51, recorded November 7th, 2017.
00:10 I'm Michael Kennedy.
00:11 And I'm Brian Okken.
00:12 And we got a bunch of awesome Python news lined up for you, as always.
00:16 Before we get to that, though, let's just say thank you to Datadog.
00:19 Yeah, thanks, Datadog.
00:20 Yeah, Datadog is sponsoring this episode.
00:22 They've got some really cool sort of whole platform monitoring tools.
00:27 And you get a t-shirt if you do their little tutorials.
00:29 So we'll talk more about that in a minute.
00:31 But now I'd like to explore the United States with some data science.
00:35 Yeah, so I ran across this article called Exploring United States Policing Data with Python.
00:41 And since that's actually kind of probably a hot topic in a lot of parsing some of that data,
00:49 I thought this would be a fun thing to walk through.
00:52 So I walked through about half of the paper.
00:55 Anyway, but it goes through using Jupyter and IPython and all of those fun tools like Pandas and NumPy to grab some publicly available data.
01:08 It's in a CSV file that's zipped.
01:09 And you can just import that directly or read it directly with the appropriate tools with a Jupyter notebook and ask some questions like the race of people that get pulled over more often and things like that.
01:25 I know it's a very political topic.
01:28 I don't really want to get into that part of it.
01:29 It is interesting.
01:30 But mostly I think it's a very riveting example for walking through why it's important for more and more people to be able to examine public data and be able to figure out what's going on.
01:44 Yeah, I think it's really interesting that you bring this up, this example of working with public data.
01:48 Like there's a ton of data that we could be asking and answering questions about, right?
01:54 This policing data is an example.
01:55 Was it fun for you to like play data scientists and play with Jupyter and Matplotlib and stuff like that?
02:02 Yeah, it really was.
02:03 And one of the things I really actually enjoyed was that the example goes through pretty quickly because it doesn't stop to tell you exactly what all the code does.
02:12 It just has the code snippet that you just plop into a Jupyter cell and hit shift enter and it just runs it and plots things.
02:23 And I know I can look that stuff up later.
02:26 There were a few gotchas that I ran into that I'll put in the show notes.
02:30 Since this was my first time playing with Jupyter, I didn't really know that you had to hit shift enter, but I remember hearing that from somebody else to get it to run.
02:39 Yeah, it's kind of its own world, but it's really nice.
02:41 Yeah, but the example really does start with just from the beginning.
02:45 If you've never run it before, you can walk through some of this.
02:48 It's pretty cool.
02:48 There's some really interesting stuff you can do with police data and data science.
02:53 I don't remember exactly the details, so just take this as kind of a general idea.
02:58 But I think partially derivative, they had someone on talking about analyzing this kind of stuff.
03:04 And it was something like when there was some kind of complaint or episode of violence involving police, it was pretty frequent.
03:12 And it was like that the policeman involved in that violence had previously somehow just come off of some other horrible thing.
03:19 Like policemen who went to like check on a tech to deal with a suicide and then pulled somebody over who was, you know, noncompliant was much more likely to have a violent interaction with that person at the traffic stop.
03:33 So you can do things like say, well, let's change our policy so that people who just had some traumatic event get the rest of the day off so they can process that.
03:42 Right.
03:42 I mean, these are really powerful and important things.
03:45 One of the things I want to caution people for is, is I like I'm a I'm clearly an amateur data scientist because I just did this one thing and just followed a tutorial.
03:56 But now you have that title.
03:57 Yeah, no, but but there is.
04:01 Be careful when you're drawing when you draw conclusions and plot things and show charts.
04:06 Suddenly it looks more legitimate.
04:08 And yes, there's there's good information you can find, but there's also you got to be careful as well.
04:14 So at the beginning in this article, for instance, not all the data is filled in for all police stops.
04:21 So you have to deal with like fields that are empty.
04:25 What do you do with that?
04:26 And this article deals with it by just throwing them away, like throwing away a bunch of it picks one field and says, well, I can fill that one in.
04:35 But the rest of them, rest of the rows, if there if there if there's any empties, just throw them away.
04:41 I don't think that that's valid.
04:43 And I think there's a probably a better solution.
04:45 But but I think that if you're going to publish something, you should discuss what you what you did to clean up the data also.
04:51 One of the things we've talked a lot about here on the show is performance.
04:55 Sometimes that performance comes in terms of asynchronous programming when you're waiting on the network and things like that.
05:02 Or other times, maybe looking at things like PyPy.
05:06 So most people know there's all these different implementations and runtimes for Python.
05:11 But if you're a new listener, maybe new to Python, we have CPython is what most people mean when they talk about Python.
05:18 We also have a JIT just in time compiler version of Python called PyPy.
05:23 We have IronPython.
05:24 We have Jitthon.
05:25 We have Cython.
05:27 There's all these different variations.
05:28 So one of the ones that promises to take basic working Python code and make it a lot faster is PyPy.
05:35 So you'll hear people talk about, hey, you could make your code five times faster with PyPy under, you know, a whole bunch of constraints.
05:42 So the show, the thing I want to talk about this time on this show is how somebody went and took their code for IoT stuff and made it 80 times faster with PyPy.
05:51 Yeah, that's incredible.
05:52 80 times.
05:53 Yeah.
05:53 And I don't think they even really changed the code much.
05:56 They did change one little bit of an inner loop, but that was about it.
05:59 All right.
05:59 So here's the deal.
06:01 This person was working on evolutionary algorithms, which they were trying to create basically a self-learning adjusting algorithm that could evolve a logic to control a quadcopter.
06:12 Oh, nice.
06:13 Right?
06:13 Like one of these drone things.
06:15 It was a simulated one, but, you know, it could be hooked to a real one.
06:18 It didn't really matter.
06:19 And in order to drive the quadcopter, this object had to basically run a certain operation every, you know, so often.
06:28 And the so often is like 100 times a second.
06:30 So quite often.
06:31 And they would input this numpy array.
06:34 They would do some processing on it and output another one, like how much thrust goes to each motor and things like that.
06:39 So this is happening 100 times a second.
06:41 And they ran it with CPython.
06:43 And their test that they ran, not just once, but a whole bunch of operations, it took about six seconds.
06:49 And they said, okay, well, let's try PyPy.
06:51 I heard making, you know, using PyPy makes it faster.
06:54 So they just sort of typing Python space my program, they type PyPy space my program.
06:59 And it turns out it was, wait for it, five and a half times slower.
07:03 Oh, no.
07:03 That's not faster.
07:05 Exactly.
07:06 It's like, oh, who sold me this bill of goods, man?
07:09 I thought this was supposed to be faster.
07:10 So it turns out that the integration with numpy, which is what they're using, actually some of the C interops,
07:19 stuff is quite a bit slower.
07:21 So they actually use this PyPy implementation of numpy.
07:25 It's called numpypy instead of numpy.
07:28 And that worked.
07:29 That made it faster.
07:30 So now it was two times faster just by switching out the imports in the library.
07:34 So that's pretty good.
07:35 I said, but you know, we actually think we could do better.
07:38 And so what they did is they started profiling it and they looked at where it was slow and said, you know, this thing, actually, we could just rewrite this algorithm just a tiny bit.
07:47 So it's more friendly to the JIT compiler.
07:49 And they got it to go 80 times faster than the original CPython plus numpy version.
07:54 No, that's nice.
07:55 That's awesome, right?
07:56 And that was even 35 times faster than the native PyPy plus numpypy version.
08:00 So I think that's a really interesting lesson.
08:03 Yeah.
08:03 Nice use of profiler, too.
08:05 Yeah, exactly.
08:05 That's the point I was going to make is like, yeah, it's one thing to throw PyPy or some other technology at it.
08:10 But it's not always just going to be like this silver bullet that solves your problem, right?
08:14 But PyPy plus a little intelligent problem solving, like that made a huge difference for these guys.
08:20 Yeah.
08:21 So, I mean, you said this was a simulation?
08:23 Yeah, they're doing it as a simulation.
08:25 Okay.
08:26 I wonder if you can run, I don't know whether or not you can run PyPy on a little tiny quadcopter controller.
08:33 I guess it depends on your definition of tiny.
08:36 Like if it's raspberry pie, I think you probably could.
08:39 If it's eight of fruit size, maybe that might be too small.
08:44 I actually don't know.
08:44 Okay, cool.
08:45 Yeah.
08:45 So the next one is an interesting one I saw go by and I saw that you picked here as well.
08:51 It has to do with the longevity of open source projects.
08:54 Yeah.
08:55 This is actually a Wired article called Giving Open Source Projects Life After a Developer's Death.
09:02 And I hadn't really thought about that too much before.
09:06 But there is, I mean, we've got more and more, as the article goes on, we've got more and more critical projects using open source projects.
09:17 And there's a lot of them that don't have that many maintainers or sometimes just a handful or one.
09:23 So really, how do you deal with that?
09:26 So part of the article is just talking about this as a problem.
09:31 But then also there's possibly some solutions.
09:35 And I was just wondering if we had any solutions.
09:39 I also had some terrible puns I was going to try to throw in.
09:44 What to do after you hit your corporeal seg fault and raise an end of life exception.
09:50 Yeah, it's definitely something that people, I guess, really want to think of.
09:55 I mean, if you're in a business, they do talk about the, like, what if so-and-so gets hit by a bus?
10:01 Right?
10:02 Will that kill the business?
10:03 Well, if this open source project is used by many businesses, it could be that that actually kills a lot of projects, a lot of companies.
10:11 Yeah, so I didn't know this was out there.
10:13 Apparently, there's a place called libraries.io that has a bus factor evaluator for different libraries.
10:21 So you can plug in a library that you depend on and look up its bus factor.
10:26 Oh, really?
10:27 Yeah, how many of the core developers would have to be on a bus that got hit before the project went away.
10:35 But one of the things that it did bring up, which I looked up some of the Python stuff, and some of them are core things that even though there's a handful of maintainers, I think it would get picked up anyway because it's part of the standard library.
10:48 But there's definitely others that are of concern.
10:52 And one person points out that perhaps we could build some more things into, like, GitHub or PyPI or other places to have maybe errors put in place.
11:04 So you could say, you know, if I don't check in for like six months, then transfer ownership to these people or something.
11:12 Oh, that's a pretty cool idea.
11:13 Almost like an escrow for open source.
11:16 Yeah, like, I mean, if you get a big investment account, you got to list who gets it if something happens to you.
11:21 So maybe something ought to be like that for open source projects.
11:24 That sounds like projects somebody could create and integrate with GitHub.
11:27 Yeah, maybe.
11:29 One of the other solutions that I have seen is, like, the pytest community for a lot of the plugins.
11:37 Try to encourage people to, once you get quite a few users of your plugin, to push it over to a development group on GitHub.
11:47 And then anybody on the group can maintain it if necessary.
11:50 Yeah, that makes a lot of sense.
11:51 I mean, you could always, like, fork the repo and just say, the real one is here.
11:55 But I can just see a lot of skirmishes around, like, no, your fork is not real.
11:58 My fork is real.
11:59 Like, you know, people could fight over ownership, right?
12:01 Yeah, and that's doable on GitHub, but you can't really do that on PyPI that I know of.
12:08 I don't know how to get transfer ownership on PyPI.
12:11 Yeah, I think you got to contact them directly and just lobby your case, right?
12:15 Which doesn't sound like a great long-term, widespread solution.
12:18 I don't have that problem currently of having any super popular packages, but, you know.
12:23 Yeah, that's cool.
12:24 Definitely worth thinking about.
12:26 Yeah, so before we get to the next topic, let me tell you about Datadog, right?
12:30 So performance and bottlenecks, these don't just exist in just your application code, right?
12:35 Like, just because your code is slow, well, you could be waiting on the database.
12:38 The database could be waiting on some Linux internal behavior.
12:42 Who knows, right?
12:43 So these are layers upon layers upon layers across systems that we really build our apps with.
12:47 And Datadog lets you view all of those as, like, one whole thing.
12:51 So let's say you have a Python web app running on Flask.
12:55 It's built on Mongo, hosted on a scaled-out set of Ubuntu servers, running Nginx and Michael
12:59 Whiskey.
12:59 Datadog will let you, like, view and monitor all those things.
13:03 It's like one system.
13:04 So that is really, really super awesome.
13:06 And the more you scale out and the more diverse your system gets, the better Datadog can help
13:11 you.
13:11 They got a getting started tutorial.
13:13 Just take a few moments.
13:14 And if you finish it, they'll send you a sweet Datadog t-shirt.
13:17 So check that out at pythonbytes.fm/Datadog and see what you've been missing.
13:21 I still have to do that.
13:22 I need that t-shirt.
13:22 Yeah, we got to get our t-shirt.
13:24 So maybe this is the wrong season here in the Northern Hemisphere.
13:28 Maybe this is going to resonate a little more with our Australian friends.
13:33 But this next project I want to talk about is a solar-powered, internet-connected lawn sprinkler.
13:38 Oh, nice.
13:39 Yeah.
13:40 So Lennon, one of our listeners, friends of the show, sent this in and said, hey, I created
13:44 this really cool project.
13:45 And I'd like to share it with everyone.
13:46 I thought, yeah, this is actually a really neat example.
13:48 So I thought we'd throw it in here.
13:50 And the idea is he went and got this little tiny Adafruit feather huzzah board.
13:56 And this is like a little tiny microchip type thing.
14:00 But it has Wi-Fi.
14:01 So that's important.
14:02 You can plug it in somewhere and talk to it.
14:05 And it works with MicroPython.
14:07 So MicroPython is the Python that works in the smallest devices.
14:11 Like you said, the PyPy before.
14:13 I'm not sure if you can get PyPy to run on this one.
14:15 But MicroPython is so super level.
14:17 It basically is the operating system.
14:19 Your app is basically the operating system.
14:21 Yeah.
14:22 So you can take like a Lambda function and connect it right to a hardware interrupt directly.
14:26 Like that's how low level.
14:27 That's insane, right?
14:28 Oh, that's nice.
14:29 Yeah.
14:30 Right?
14:30 And he combines a couple of other interesting things.
14:33 He combines Home Assistant, which is the biggest home automation project in Python.
14:38 Like a really cool app that integrates tons and tons of different IoT and smart home things.
14:44 And he gives a really nice list of like, here's every single piece of hardware I used.
14:50 Here's the solar board that I used.
14:51 Here's the container for the feather huzzah board.
14:54 And it's just a really nice example of a small, compact IoT project.
15:00 Yeah.
15:00 And useful and not creepy.
15:01 I like it.
15:02 Yeah, exactly.
15:03 The more we seem to go back and forth on these little IoT things.
15:06 I really want to create one of these that goes on my front door that uses machine learning
15:10 to determine what type of person is on my front door when they ring the doorbell.
15:13 Yeah.
15:13 Or a dinosaur.
15:14 Yeah.
15:14 Or a dinosaur.
15:15 That's right.
15:15 Yeah.
15:16 And also I had to put a link in here to Talk Python episode 108 where I actually had the guys from Adafruit come on and talk about a whole bunch of these
15:24 different projects.
15:25 But yeah, nice job, London.
15:27 This is a cool one.
15:28 Yeah.
15:28 And shout out to Adafruit too for doing all sorts of cool stuff with hardware and software.
15:32 I like that.
15:33 Yeah.
15:34 They have a big educational aspect to them.
15:36 Not just education, education, but teaching people who want to learn about IoT.
15:40 I'm definitely planning on playing with some of these little devices in MicroPython.
15:44 It just seems really fun.
15:45 Okay.
15:46 So I am going to be perfectly honest with this last one.
15:49 I had a...
15:50 My last thing I was going to talk about was going to be another packaging story, but I
15:56 kind of went down a rabbit hole.
15:57 So instead of getting into that, that's my homework for next week.
16:02 So I've already set up what I'm going to talk about next week.
16:04 But what I want to highlight is some books that came out.
16:09 So we had some new Python books that came out recently.
16:12 It's a big week for Python books.
16:14 Yeah, we've got Python Tricks from Dan Bader.
16:18 That came out.
16:19 And Matt Harrison's Illustrated Guide to Python 3.
16:22 And I have just...
16:25 Actually, I'm going to read, at least peruse both of these.
16:28 I've just started Python Tricks, and I like the format.
16:30 It's cool.
16:31 And then I'm going to take a read for Matt Harrison's book.
16:36 And the cover is awesome.
16:39 And I want to try to get...
16:40 Actually, I want to have that around my office so that other people can...
16:43 Can look up Python 3 stuff pretty easy.
16:45 That seems really nice.
16:47 I looked through it as well.
16:47 And the illustrated aspect is cool on Matt's.
16:51 So yeah, congrats both to Dan and Matt on this.
16:53 This is cool.
16:54 Yeah.
16:54 And then the last thing is that I was on Twitter.
16:57 We were...
16:57 I was talking with a handful of people, authors.
17:00 There are some Python books out there that really could use some Amazon review love.
17:07 So I'm going to drop a link to my book and Harry Percival's Test Driven Development, which has been out for a while, but it's still...
17:15 It's only got six reviews.
17:17 And I know a lot of people have read this stuff.
17:19 And then also...
17:19 Yeah.
17:20 Harry's book is really great.
17:21 Obey the Testing Goat and all that stuff.
17:23 And then also the Greenfields, Two Scoops of Django.
17:27 I know a lot of people have started Django or gotten a lot better at it using this.
17:32 So go out and show some Amazon love to these people.
17:35 Yeah, definitely.
17:36 I think it really helps if you've read a book to write a review.
17:40 If you've taken a class, give a review, right?
17:43 Like these things actually make a big difference.
17:44 Yeah, it really does.
17:45 And we're all trying to do things the right way and trying to support the community.
17:50 So...
17:51 Yep.
17:51 It's cool.
17:52 Awesome.
17:52 Well, the last one I want to talk about is...
17:55 Sort of harkens back to the first one that you did, the data science space and the Anaconda
18:00 distribution.
18:01 So you probably know Anaconda distribution is an alternate distribution for Python.
18:07 It's basically CPython.
18:09 But instead of just being a standalone CPython where you pip install stuff, it comes packaged
18:14 with most of the machine learning, data science, and popular libraries you already need pre-compiled
18:21 for your machine.
18:22 So if you want to use some weird package that requires like a Fortran compiler or something,
18:27 you can just, you know, install it.
18:29 Either it's come with Anaconda or you can't install it and it actually downloads the binary
18:35 version.
18:36 So there's no worries about it, you know, not installing correctly.
18:40 Yeah, there's also the free distribution.
18:42 There's both paid and free distributions.
18:45 But even the free one is one of the few multiple package distributions that is completely legitimate
18:53 to do within a company as long as you're not reselling that itself.
18:58 Oh yeah, that's awesome.
18:59 So the news about Anaconda distribution is version 5 is released.
19:04 So they have 100 packages have been updated or added.
19:07 They have JupyterLab, Alpha Preview included, updated MKL.
19:11 Nice.
19:12 That's the Intel high performance compiled stuff.
19:15 So it used the Intel sort of low level speed ups for the machine learning and computational
19:20 stuff.
19:21 New compilers for macOS and Linux.
19:24 So what is that?
19:25 That's Clang and GCC respectively.
19:27 But what's important here is they flipped the flags on the compile steps for all of these
19:32 things to use the most newest and compatible flags for security.
19:39 So the stuff that gets compiled out of here is less likely to suffer from things like buffer
19:44 overflow sort of attacks and stuff.
19:47 So that's really cool.
19:48 The compilers are now a little safer for you.
19:50 They've got updated Condoforge and some other things.
19:54 They're creating another channel that has to do with this new compiler thing I talked about
19:57 and so on.
19:57 So pretty cool.
19:58 I still have on my to-do list to go check out JupyterLab.
20:01 Have you done any of that?
20:02 I have not checked out JupyterLab.
20:04 Okay.
20:04 It looks fun.
20:05 Yeah, it definitely does.
20:06 There's a lot of cool stuff happening with Jupyter and like social coding and stuff these
20:10 days.
20:10 It's exciting.
20:11 Well, that's it for my news this week, Brian.
20:14 Anything else you want to add?
20:15 I did find out last week that the Python testing with pytest is available on, what is that
20:22 thing?
20:22 Safari Books Online?
20:24 Yeah, Safari Online.
20:25 It's now there.
20:26 That's awesome.
20:27 Tons of people have that as just available to them as part of their company, right?
20:31 So people can now get your book that way.
20:33 I don't think I do.
20:35 I'm going to check it out.
20:37 Yeah, people can check it out.
20:38 I have no idea how if any of that money comes back to me with that, but I'm glad that a lot
20:42 of people can read it.
20:43 I think one of the things that's great about creating something like a book or a course
20:47 or whatever, even a podcast per se, is that people just use it and enjoy it, right?
20:51 You put a lot of energy to creating it.
20:53 And if it just sat there like digitally silent, that would be sad.
20:57 Yeah, it would be.
20:58 So cool.
20:59 Yeah, I'm glad it's out there and yet another channel for people.
21:02 All right.
21:03 Well, Brian, thanks again.
21:04 Thank you.
21:04 for everything this week and talk to you all later.
21:07 All right.
21:07 Bye.
21:07 Thank you for listening to Python Bytes.
21:10 Follow the show on Twitter via at Python Bytes.
21:12 That's Python Bytes as in B-Y-T-E-S.
21:15 And get the full show notes at pythonbytes.fm.
21:19 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
21:23 We're always on the lookout for sharing something cool.
21:26 On behalf of myself and Brian Okken, this is Michael Kennedy.
21:29 Thank you for listening and sharing this podcast with your friends and colleagues.