Transcript #47: PyPy now works with way more C-extensions and parking your package safely
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 47, recorded October 11th, 2017.
00:09 I'm Michael Kennedy.
00:11 And I'm Brian Okken.
00:12 And we've got a bunch of cool stuff lined up for you.
00:13 So, hey, Brian, how's it going?
00:15 It's going really good.
00:15 Yeah, yeah, great.
00:16 Hey, before we get to your first item, I want to say thanks to DigitalOcean.
00:19 They've sponsored a bunch of episodes coming up.
00:21 They're really supporting the show.
00:22 And the thing they want me to tell you about is Spaces, which is like Amazon S3,
00:27 but like literally three times better and you get a two-month trial.
00:30 So check it out at do.co slash Python.
00:34 And we'll talk more about that later.
00:35 How about Fast?
00:37 Fast Python, Brian.
00:38 What do you think?
00:38 I'm excited.
00:39 So PyPy is fast implementation.
00:42 And it's good to see that there's still work coming out.
00:46 And one of the exciting bits of news just recently is version 5.9, at least on the PyPy 2.7 version of this release,
00:56 has Pandas and NumPy in it as well, which is super exciting.
01:00 That's actually a really big deal because they had not been supported.
01:04 That's one of the things that was a challenge with PyPy.
01:06 Like it was great.
01:08 It was much faster.
01:08 In many ways, it was like five times faster than regular CPython.
01:13 However, it didn't support any of the C extensions.
01:15 You couldn't integrate things like NumPy and stuff.
01:18 And so it was like you get a subset of Python that's super fast, but there might be things you don't want to do.
01:23 And oh, by the way, a lot of those are computational and where people care about when it's fast.
01:26 Yeah.
01:27 So it's awesome to see that coming on.
01:28 So getting NumPy and Pandas come on, and I'm sure that eventually it'll come on on the 3.5 branch as well.
01:35 Yeah, for sure.
01:36 And you also have notes about Cython as well, right?
01:39 Yeah.
01:39 So it includes the part of the help with this, and what it includes is Cython 0.27.1,
01:47 which supports a lot more Cython projects on PyPy.
01:52 I'm not sure what the Cython story was before this release, but that's pretty exciting.
01:57 Yeah, that's cool.
01:58 Yeah, I think the biggest news here is that CFFI has been updated and the C API extensions for many, many projects now work with PyPy,
02:08 whereas previously they did not.
02:10 And so it's not just Pandas and NumPy.
02:13 Those are the headline ones.
02:14 But there's a bunch of things that previously couldn't work with PyPy because of the C extensions.
02:18 Well, guess what?
02:18 Now they can.
02:19 That's pretty awesome.
02:20 Yeah.
02:20 And then another bit of news with this release is the optimized JSON parser
02:26 for both memory and speed, which should help for people trying to pull in JSON.
02:31 So that's good.
02:32 Yeah, that's awesome.
02:33 I think people use JSON every now and then.
02:35 Not really sure.
02:35 All the microservices, it's just like the network lights are above those JSON messages.
02:40 So that's really cool, and that's all pretty straightforward.
02:43 I want to show you some stuff that is not straightforward.
02:47 So there's this project on GitHub that has really taken off.
02:51 There's a ton of people contributing to it.
02:53 So let me pull up the main page and see.
02:56 There's 17 contributors who are doing a lot of work on this project, and it has about 3,600 stars called WTF Python.
03:06 So if you've heard of, have you seen the Watt video about JavaScript and Ruby,
03:11 which is hilarious?
03:12 You know, Python is lucky in that there's not that many weird edge cases,
03:15 but this repository will show you, actually, there's some weird cases.
03:20 So have you seen this, Brian?
03:22 No, I haven't.
03:23 This is pretty funny.
03:24 Yeah, I pulled out four items, but there's a bunch, and this is super active on GitHub.
03:28 I'm getting all these notifications from it.
03:29 That's cool.
03:30 Like, one is about skipping lines.
03:33 You say, like, value equals 11.
03:35 Value equals 32.
03:36 What is value?
03:37 It's 11.
03:38 Huh?
03:38 What is going on here?
03:40 There's another one that's similar in the same section.
03:43 It says, quote E, equal, equal, quote E, false.
03:47 Okay.
03:48 And things like that.
03:50 And it's about encoding and some interesting stuff.
03:53 So each one of these has, like, a really simple, you know, like, three or four lines of code and then the explanation.
03:58 And the explanation, I think, is where this gets interesting.
04:01 So another one is modifying dictionaries.
04:04 Like, these are super good ways to trick people.
04:07 Like, create a dictionary with one item.
04:08 Go through for each item in it.
04:11 Delete that item and add a new one.
04:13 And then print that out.
04:14 How many times did that loop run, do you think?
04:15 I have no idea.
04:16 It's either one or error or something is what I would guess, right?
04:20 But the answer is eight.
04:21 Exactly eight.
04:22 You're like, what?
04:23 Why does it run eight?
04:25 Why doesn't it run one, infinite, or zero, or error?
04:31 Like, those are the three.
04:32 Zero, one, or infinity.
04:33 Eight doesn't make any sense.
04:34 But if you look at the implementation, the dictionaries are pre-allocated
04:37 because you're typically adding stuff.
04:39 They want to grow in, like, a doubling sort of way.
04:41 Not a every time you add something, it's got to reallocate and copy around things.
04:46 And so what they do is they pre-allocate a certain number of items.
04:49 And this trick, like, leverages assigning into those new slots until it runs out.
04:55 So this is crazy.
04:57 I'll give you one more example.
04:58 Is, let's go with the is.
05:00 Is is not what it is.
05:02 So if you say A equals 256, B equals 256, A is B is true.
05:07 However, if you say A is 257 and B is 257, A is B is false.
05:14 Do you know why?
05:14 It's another crazy one.
05:16 This is insane.
05:17 And the reason is, I believe the first 126 numbers, maybe negative as well, I'm not sure,
05:23 are pre-allocated for performance reasons.
05:25 And every time you, like, literally say the number seven, like, that points to this pre-allocated
05:31 flywheel pattern type thing.
05:33 But beyond that, these get allocated on demand.
05:36 So you're basically asking, is the pointer to 257 equal to the other pointer 257?
05:40 And there's no longer this tracking between them and they get dropped.
05:43 So there's just, there's tons of this craziness going on here.
05:47 That's pretty fun.
05:47 Yeah, that's nice.
05:49 So I think this is a fun project.
05:50 I really commend the people working on it.
05:52 It's great.
05:53 And I definitely, I want to do something with this later.
05:55 I just haven't figured out quite what the details are yet, but there's got to be something
05:58 fun here.
05:58 So this makes me feel like I should go practice my Python.
06:01 Like, maybe I'm not as good as I thought I was because that dictionary thing going eight
06:04 times kind of like took me for a loop for a bit.
06:07 Anything in the WTF Python would be evil to try to bring up at a job interview.
06:12 But it'd be very evil.
06:14 Yeah.
06:14 But if they answered it, think of that.
06:16 Yeah, that'd be good.
06:17 I ran across this, it's a recent article called Python Exercises.
06:22 And I've done this before.
06:24 So as a trying to either brush up on Python skills or trying to do, find some questions to ask
06:31 at an interview or something, trying to come up with some decent questions.
06:35 And a lot of the questions out there are, they seem to be sort of generic questions around
06:41 like any language.
06:42 And they just happen to be do it in Python.
06:44 This is a collection of questions that are, some of them are pretty easy to start off
06:50 with, like basic syntax stuff.
06:51 But they're some things that check actually just Python and some use of the standard library.
06:57 And I think it's a nice collection.
07:00 It goes through syntax, of course, and then some text processing and OS integration and decorators,
07:08 generators.
07:09 And you can get into quite a few things.
07:12 But I think it's a nice set.
07:14 It's not too huge.
07:15 It's a good one to look at.
07:16 Yeah, yeah.
07:16 And they don't seem too trivial.
07:18 They're like, given this set of data, parse it into a CSV file, start the subprocess, things
07:24 like that.
07:24 It's really, it's pretty nice, actually.
07:25 Yeah.
07:26 And then at the end, the last thing they talk about is testing, which I very much appreciate.
07:30 I think it's important to make sure.
07:33 I've started with trying to do, send out code examples to, before I bring somebody in for
07:39 an interview, ask them to solve some coding problem, but also to write a test to prove
07:44 it works.
07:44 And I think that's a good thing to add.
07:45 Absolutely.
07:46 Yeah, that's really cool.
07:47 Great that they include that at the end as well.
07:49 So I've got another thing you should test for.
07:51 Before I tell you about it, though, I want to tell you about Spaces.
07:54 So Spaces is DigitalOcean's new service, which lets you basically store files on the internet
08:01 and either privately or publicly pass them around, right?
08:04 So kind of like Amazon S3, but much, much more affordable.
08:08 So instead of charging you nine cents per gigabyte, they charge you one cent.
08:12 And you can use exactly the same tools.
08:14 So, you know, like I use Transmit for my Mac.
08:17 I love that to manage all my stuff in the cloud.
08:20 And when I switched to DigitalOcean Spaces, which I did just because I saw the offer, I'm
08:24 like, this is so much better before we even talked about this.
08:27 I just pointed my Transmit at that and it just kept on working.
08:31 Just said, hey, there's an S3 thing over here and here's the key.
08:33 So if you are using S3 or some other sort of shared cloud storage for files and things
08:40 like that, you definitely should check out DigitalOcean Spaces at do.co slash Python
08:46 and check it out.
08:47 There's a two month free trial and then it's really, really affordable and straightforward.
08:51 I love it.
08:51 Nice.
08:52 The audio you're listening to right now came straight out of there.
08:54 So beautiful.
08:55 Have you heard of Pickle?
08:57 Oh, yeah.
08:57 Not the gherkins, but the built in a way to serialize stuff.
09:03 I don't remember why, but I try to avoid it because I've heard there's problems.
09:06 Yeah.
09:07 There's two major problems with Pickle.
09:08 One of them is it stores a binary representation of your objects.
09:13 And so if you do things like rename a field or maybe even reorder stuff, right?
09:18 If you add a field, remove a field, there's all sorts of stuff where like just the versioning
09:22 of your classes or your data, if that changes, you can no longer properly serialize these things.
09:28 It's not great.
09:29 So that can be a problem.
09:31 And that's probably reason enough to use JSON or some other format.
09:34 However, right in the documentation, it says, warning, the Pickle module is not intended to be secure against erroneous or maliciously constructed data.
09:43 Never unpickle data received from an untrusted or unauthenticated source.
09:49 All right.
09:49 So I think people see this like, okay, that looks bad.
09:51 Let's get out of here.
09:52 And they just bail as they should.
09:54 Like, I think even the versioning stuff alone is already an issue.
09:58 So like, I think there was an issue with somebody caching stuff.
10:01 And when they were switching from Python 2 to Python 3, the in-memory representation of like
10:06 date time or some part of the memory was a different representation and the Pickle and stuff
10:11 started to conflict with each other.
10:13 Anyway, this article I want to talk about is called Exploiting Misuse of Python's Pickle.
10:19 So if you've ever read that warning and gone, huh, that sounds bad.
10:23 I can kind of imagine what that might look like.
10:25 I'm going to stay away from it.
10:26 This one shows you exactly how to do bad things.
10:30 And bad things begin with, let's create a remote shell and start executing code.
10:37 And maybe even let us log in remotely over SSH to this machine by sending a little bit of binary data,
10:43 like 50 bytes, 100 bytes, something super small, over to this machine.
10:48 And then we'll just log in and go from there.
10:49 That sounds bad, right?
10:50 Yeah.
10:51 Jeez.
10:51 So the idea is when you unpickle something, there's a way, there's a few hooks where you
10:55 can run arbitrary Python code.
10:57 And so they say, well, let's just use subprocess.popen and create a shell for us.
11:04 So you just put that command in like your dunder reduce, I think it's called.
11:07 And then you've got shells and that's bad.
11:10 So for those of you out there wondering, what is this warning about?
11:14 Exactly.
11:14 Why should I be super scared?
11:16 Here's why.
11:17 Great little example.
11:18 Super approachable.
11:19 Yeah.
11:19 Wacky.
11:19 Yeah.
11:20 Wacky.
11:20 So if I was running like a Django website, I probably wouldn't want to like use that
11:24 as my exchange format on my services, right?
11:26 No.
11:26 And there's so many other better formats anyway.
11:28 So.
11:29 JSON, JSON.
11:29 JSON.
11:30 Yeah.
11:31 For sure.
11:31 All right.
11:32 So what do you got next for us?
11:33 I've got a complete beginner's guide to Django.
11:35 Awesome.
11:36 This is a seven part series and it looks like six parts are done already.
11:41 And the seventh part is coming up soon.
11:43 And it's, it kind of goes through quite a bit of Django.
11:47 I know there's already a lot of Django tutorials out there, but the interesting thing I think that
11:52 makes this one stand out is it's kind of, it has an academic feel to it, I think.
11:58 And if that's kind of your thing, you might like this.
12:01 Well, it has a chalkboard.
12:02 It has a beaker and it has a Superman flying.
12:05 So these are all good signs.
12:06 Yeah.
12:07 Well, it has some like comic like drawings in it too and stuff.
12:10 Yeah.
12:10 Yeah.
12:10 Yeah.
12:10 Actually, I think this is really nice.
12:12 The graphics are wonderful.
12:14 They've got little, wireframes to help you design the web pieces, some nice graphics
12:19 for file structure.
12:20 It seems super approachable to me.
12:22 I kind of got lost with some of the UML diagrams and whatnot, but, it's well written.
12:27 People should check it out if you're want to learn Django.
12:30 So maybe.
12:31 Yep.
12:31 Absolutely.
12:32 And it's based on Python, not legacy Python.
12:34 So this is all good as well.
12:35 Yeah.
12:36 So if you're looking to, pick up Django, that's a good place to do it.
12:40 All right.
12:41 So do you remember when we talked about the malicious packages being uploaded?
12:46 Yes.
12:47 PyPI?
12:47 Yeah.
12:48 Do you remember what they were targeting?
12:50 Like how were they making those, getting people to install them?
12:52 Well, there were a couple of ways.
12:54 There were naming standard library things in PyPI and then also misspellings.
12:59 Exactly.
13:00 So we have a new GitHub project called PyPI dash Parker.
13:05 So this is a cool project by a guy named Matt.
13:08 And he sent this over and said, Hey, you should check this out.
13:10 I don't think a lot of people know about it yet, but it's, it's really cool.
13:13 So the idea is, you know, we had this debate about how do people check and how people verify
13:18 what gets uploaded to PyPI.
13:20 Should there be like a committee that reviews it?
13:22 And all that sounded really bad.
13:24 And so he's created this library that says, look, the self-serve ability of people to just
13:31 upload things to PyPI.
13:33 This is a good thing.
13:34 Let's not get rid of it.
13:36 Let's just try to solve this typo squatting problem.
13:39 So what he's done is he's created this thing called the PyPI Parker and it's an extension to
13:45 dist utils.
13:46 So it's a separate command that you can run on it.
13:50 So if I was like Kenneth writes and I create a request, you do this and I could run the
13:56 setup PY and give it, I think it's park.
13:59 And it will actually generate additional packages that I can upload to PyPI.
14:04 And there'll be the various reasonable misspellings of requests.
14:08 And when you import them, it'll raise an error, an import error and says, no, no, no.
14:14 This thing that you pip installed, you misspelled that.
14:16 Go get the real one over here.
14:18 So it gives them like a help message and all that kind of stuff.
14:20 So it one blocks the ownership or provide, it gives the ownership of these misspellings to the
14:26 original package owner.
14:27 And then for the people trying to accidentally use those, it will give them the warning to say,
14:34 you've misspelled this, but here's what you actually should be looking for.
14:37 I think that's great.
14:38 Yeah.
14:38 That's cool.
14:39 Yeah.
14:40 So well done, Matt.
14:40 If you're a package owner, check this out.
14:42 It might be helpful.
14:43 Since I'm not writing so much anymore, I'm thinking about writing a couple new open source projects.
14:48 So I'll probably be in that boat soon.
14:50 Yeah.
14:51 Nice.
14:51 So you should use PyPI Parker and then give us a report.
14:53 Okay.
14:54 Awesome.
14:54 That's our six items for the week.
14:55 So hopefully everyone enjoyed them.
14:57 Brian, what else is going on?
14:58 Well, I'm just getting ready for Halloween actually.
15:00 So.
15:01 I know.
15:01 Houses around here getting scary.
15:02 A lot of creatures and various cobwebs.
15:05 But I have not been as busy as you have lately.
15:07 What have you been up to?
15:08 I have just released a brand new course and you can find it at freemongodbcourse.com and
15:15 that should give you pretty much all you need to know about it.
15:18 So I have this paid course, which is like a seven hour, super in-depth thing.
15:21 And I wanted to come up with a way for people to get started with Python, get started with
15:26 MongoDB.
15:26 And then if you want to learn more, you can like take the paid course or things like that.
15:31 So just drop over at freemongodbcourse.com and sign up.
15:35 There's really no strings attached.
15:36 You just have to create an account and then you can go take the class.
15:39 Oh, another thing I wanted to point out, this is maybe not worth a whole item.
15:42 And this is not my thing.
15:44 This is just something I saw is Donald Stuffed, who runs PyPI and the website and all that kind
15:51 of stuff.
15:51 He sent out a tweet that said, Python 3 usage has doubled in the past year according to
15:57 download stats on PyPI.
15:59 Oh, that's cool.
16:00 Yeah.
16:00 So legacy Python is definitely on the downward trend, even though it's still the majority
16:05 of things that get downloaded.
16:06 Yeah.
16:07 So way to go, Donald, for putting that out there and nice to see that trend continuing.
16:11 All right.
16:13 Well, thank you everyone for listening.
16:14 Brian, thanks for finding these things and sharing with everyone.
16:17 Yeah.
16:17 Thank you.
16:17 Thank you for listening to Python Bytes.
16:21 Follow the show on Twitter via at Python Bytes.
16:23 That's Python Bytes as in B-Y-T-E-S.
16:26 And get the full show notes at Pythonbytes.fm.
16:29 If you have a news item you want featured, just visit Pythonbytes.fm and send it our way.
16:34 We're always on the lookout for sharing something cool.
16:37 On behalf of myself and Brian Okken, this is Michael Kennedy.
16:40 Thank you for listening and sharing this podcast with your friends and colleagues.