Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #47: PyPy now works with way more C-extensions and parking your package safely

Return to episode page view on github
Recorded on Wednesday, Oct 11, 2017.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 47, recorded October 11th, 2017.

00:09 I'm Michael Kennedy.

00:11 And I'm Brian Okken.

00:12 And we've got a bunch of cool stuff lined up for you.

00:13 So, hey, Brian, how's it going?

00:15 It's going really good.

00:15 Yeah, yeah, great.

00:16 Hey, before we get to your first item, I want to say thanks to DigitalOcean.

00:19 They've sponsored a bunch of episodes coming up.

00:21 They're really supporting the show.

00:22 And the thing they want me to tell you about is Spaces, which is like Amazon S3,

00:27 but like literally three times better and you get a two-month trial.

00:30 So check it out at do.co slash Python.

00:34 And we'll talk more about that later.

00:35 How about Fast?

00:37 Fast Python, Brian.

00:38 What do you think?

00:38 I'm excited.

00:39 So PyPy is fast implementation.

00:42 And it's good to see that there's still work coming out.

00:46 And one of the exciting bits of news just recently is version 5.9, at least on the PyPy 2.7 version of this release,

00:56 has Pandas and NumPy in it as well, which is super exciting.

01:00 That's actually a really big deal because they had not been supported.

01:04 That's one of the things that was a challenge with PyPy.

01:06 Like it was great.

01:08 It was much faster.

01:08 In many ways, it was like five times faster than regular CPython.

01:13 However, it didn't support any of the C extensions.

01:15 You couldn't integrate things like NumPy and stuff.

01:18 And so it was like you get a subset of Python that's super fast, but there might be things you don't want to do.

01:23 And oh, by the way, a lot of those are computational and where people care about when it's fast.

01:26 Yeah.

01:27 So it's awesome to see that coming on.

01:28 So getting NumPy and Pandas come on, and I'm sure that eventually it'll come on on the 3.5 branch as well.

01:35 Yeah, for sure.

01:36 And you also have notes about Cython as well, right?

01:39 Yeah.

01:39 So it includes the part of the help with this, and what it includes is Cython 0.27.1,

01:47 which supports a lot more Cython projects on PyPy.

01:52 I'm not sure what the Cython story was before this release, but that's pretty exciting.

01:57 Yeah, that's cool.

01:58 Yeah, I think the biggest news here is that CFFI has been updated and the C API extensions for many, many projects now work with PyPy,

02:08 whereas previously they did not.

02:10 And so it's not just Pandas and NumPy.

02:13 Those are the headline ones.

02:14 But there's a bunch of things that previously couldn't work with PyPy because of the C extensions.

02:18 Well, guess what?

02:18 Now they can.

02:19 That's pretty awesome.

02:20 Yeah.

02:20 And then another bit of news with this release is the optimized JSON parser

02:26 for both memory and speed, which should help for people trying to pull in JSON.

02:31 So that's good.

02:32 Yeah, that's awesome.

02:33 I think people use JSON every now and then.

02:35 Not really sure.

02:35 All the microservices, it's just like the network lights are above those JSON messages.

02:40 So that's really cool, and that's all pretty straightforward.

02:43 I want to show you some stuff that is not straightforward.

02:47 So there's this project on GitHub that has really taken off.

02:51 There's a ton of people contributing to it.

02:53 So let me pull up the main page and see.

02:56 There's 17 contributors who are doing a lot of work on this project, and it has about 3,600 stars called WTF Python.

03:06 So if you've heard of, have you seen the Watt video about JavaScript and Ruby,

03:11 which is hilarious?

03:12 You know, Python is lucky in that there's not that many weird edge cases,

03:15 but this repository will show you, actually, there's some weird cases.

03:20 So have you seen this, Brian?

03:22 No, I haven't.

03:23 This is pretty funny.

03:24 Yeah, I pulled out four items, but there's a bunch, and this is super active on GitHub.

03:28 I'm getting all these notifications from it.

03:29 That's cool.

03:30 Like, one is about skipping lines.

03:33 You say, like, value equals 11.

03:35 Value equals 32.

03:36 What is value?

03:37 It's 11.

03:38 Huh?

03:38 What is going on here?

03:40 There's another one that's similar in the same section.

03:43 It says, quote E, equal, equal, quote E, false.

03:47 Okay.

03:48 And things like that.

03:50 And it's about encoding and some interesting stuff.

03:53 So each one of these has, like, a really simple, you know, like, three or four lines of code and then the explanation.

03:58 And the explanation, I think, is where this gets interesting.

04:01 So another one is modifying dictionaries.

04:04 Like, these are super good ways to trick people.

04:07 Like, create a dictionary with one item.

04:08 Go through for each item in it.

04:11 Delete that item and add a new one.

04:13 And then print that out.

04:14 How many times did that loop run, do you think?

04:15 I have no idea.

04:16 It's either one or error or something is what I would guess, right?

04:20 But the answer is eight.

04:21 Exactly eight.

04:22 You're like, what?

04:23 Why does it run eight?

04:25 Why doesn't it run one, infinite, or zero, or error?

04:31 Like, those are the three.

04:32 Zero, one, or infinity.

04:33 Eight doesn't make any sense.

04:34 But if you look at the implementation, the dictionaries are pre-allocated

04:37 because you're typically adding stuff.

04:39 They want to grow in, like, a doubling sort of way.

04:41 Not a every time you add something, it's got to reallocate and copy around things.

04:46 And so what they do is they pre-allocate a certain number of items.

04:49 And this trick, like, leverages assigning into those new slots until it runs out.

04:55 So this is crazy.

04:57 I'll give you one more example.

04:58 Is, let's go with the is.

05:00 Is is not what it is.

05:02 So if you say A equals 256, B equals 256, A is B is true.

05:07 However, if you say A is 257 and B is 257, A is B is false.

05:14 Do you know why?

05:14 It's another crazy one.

05:16 This is insane.

05:17 And the reason is, I believe the first 126 numbers, maybe negative as well, I'm not sure,

05:23 are pre-allocated for performance reasons.

05:25 And every time you, like, literally say the number seven, like, that points to this pre-allocated

05:31 flywheel pattern type thing.

05:33 But beyond that, these get allocated on demand.

05:36 So you're basically asking, is the pointer to 257 equal to the other pointer 257?

05:40 And there's no longer this tracking between them and they get dropped.

05:43 So there's just, there's tons of this craziness going on here.

05:47 That's pretty fun.

05:47 Yeah, that's nice.

05:49 So I think this is a fun project.

05:50 I really commend the people working on it.

05:52 It's great.

05:53 And I definitely, I want to do something with this later.

05:55 I just haven't figured out quite what the details are yet, but there's got to be something

05:58 fun here.

05:58 So this makes me feel like I should go practice my Python.

06:01 Like, maybe I'm not as good as I thought I was because that dictionary thing going eight

06:04 times kind of like took me for a loop for a bit.

06:07 Anything in the WTF Python would be evil to try to bring up at a job interview.

06:12 But it'd be very evil.

06:14 Yeah.

06:14 But if they answered it, think of that.

06:16 Yeah, that'd be good.

06:17 I ran across this, it's a recent article called Python Exercises.

06:22 And I've done this before.

06:24 So as a trying to either brush up on Python skills or trying to do, find some questions to ask

06:31 at an interview or something, trying to come up with some decent questions.

06:35 And a lot of the questions out there are, they seem to be sort of generic questions around

06:41 like any language.

06:42 And they just happen to be do it in Python.

06:44 This is a collection of questions that are, some of them are pretty easy to start off

06:50 with, like basic syntax stuff.

06:51 But they're some things that check actually just Python and some use of the standard library.

06:57 And I think it's a nice collection.

07:00 It goes through syntax, of course, and then some text processing and OS integration and decorators,

07:08 generators.

07:09 And you can get into quite a few things.

07:12 But I think it's a nice set.

07:14 It's not too huge.

07:15 It's a good one to look at.

07:16 Yeah, yeah.

07:16 And they don't seem too trivial.

07:18 They're like, given this set of data, parse it into a CSV file, start the subprocess, things

07:24 like that.

07:24 It's really, it's pretty nice, actually.

07:25 Yeah.

07:26 And then at the end, the last thing they talk about is testing, which I very much appreciate.

07:30 I think it's important to make sure.

07:33 I've started with trying to do, send out code examples to, before I bring somebody in for

07:39 an interview, ask them to solve some coding problem, but also to write a test to prove

07:44 it works.

07:44 And I think that's a good thing to add.

07:45 Absolutely.

07:46 Yeah, that's really cool.

07:47 Great that they include that at the end as well.

07:49 So I've got another thing you should test for.

07:51 Before I tell you about it, though, I want to tell you about Spaces.

07:54 So Spaces is DigitalOcean's new service, which lets you basically store files on the internet

08:01 and either privately or publicly pass them around, right?

08:04 So kind of like Amazon S3, but much, much more affordable.

08:08 So instead of charging you nine cents per gigabyte, they charge you one cent.

08:12 And you can use exactly the same tools.

08:14 So, you know, like I use Transmit for my Mac.

08:17 I love that to manage all my stuff in the cloud.

08:20 And when I switched to DigitalOcean Spaces, which I did just because I saw the offer, I'm

08:24 like, this is so much better before we even talked about this.

08:27 I just pointed my Transmit at that and it just kept on working.

08:31 Just said, hey, there's an S3 thing over here and here's the key.

08:33 So if you are using S3 or some other sort of shared cloud storage for files and things

08:40 like that, you definitely should check out DigitalOcean Spaces at do.co slash Python

08:46 and check it out.

08:47 There's a two month free trial and then it's really, really affordable and straightforward.

08:51 I love it.

08:51 Nice.

08:52 The audio you're listening to right now came straight out of there.

08:54 So beautiful.

08:55 Have you heard of Pickle?

08:57 Oh, yeah.

08:57 Not the gherkins, but the built in a way to serialize stuff.

09:03 I don't remember why, but I try to avoid it because I've heard there's problems.

09:06 Yeah.

09:07 There's two major problems with Pickle.

09:08 One of them is it stores a binary representation of your objects.

09:13 And so if you do things like rename a field or maybe even reorder stuff, right?

09:18 If you add a field, remove a field, there's all sorts of stuff where like just the versioning

09:22 of your classes or your data, if that changes, you can no longer properly serialize these things.

09:28 It's not great.

09:29 So that can be a problem.

09:31 And that's probably reason enough to use JSON or some other format.

09:34 However, right in the documentation, it says, warning, the Pickle module is not intended to be secure against erroneous or maliciously constructed data.

09:43 Never unpickle data received from an untrusted or unauthenticated source.

09:49 All right.

09:49 So I think people see this like, okay, that looks bad.

09:51 Let's get out of here.

09:52 And they just bail as they should.

09:54 Like, I think even the versioning stuff alone is already an issue.

09:58 So like, I think there was an issue with somebody caching stuff.

10:01 And when they were switching from Python 2 to Python 3, the in-memory representation of like

10:06 date time or some part of the memory was a different representation and the Pickle and stuff

10:11 started to conflict with each other.

10:13 Anyway, this article I want to talk about is called Exploiting Misuse of Python's Pickle.

10:19 So if you've ever read that warning and gone, huh, that sounds bad.

10:23 I can kind of imagine what that might look like.

10:25 I'm going to stay away from it.

10:26 This one shows you exactly how to do bad things.

10:30 And bad things begin with, let's create a remote shell and start executing code.

10:37 And maybe even let us log in remotely over SSH to this machine by sending a little bit of binary data,

10:43 like 50 bytes, 100 bytes, something super small, over to this machine.

10:48 And then we'll just log in and go from there.

10:49 That sounds bad, right?

10:50 Yeah.

10:51 Jeez.

10:51 So the idea is when you unpickle something, there's a way, there's a few hooks where you

10:55 can run arbitrary Python code.

10:57 And so they say, well, let's just use subprocess.popen and create a shell for us.

11:04 So you just put that command in like your dunder reduce, I think it's called.

11:07 And then you've got shells and that's bad.

11:10 So for those of you out there wondering, what is this warning about?

11:14 Exactly.

11:14 Why should I be super scared?

11:16 Here's why.

11:17 Great little example.

11:18 Super approachable.

11:19 Yeah.

11:19 Wacky.

11:19 Yeah.

11:20 Wacky.

11:20 So if I was running like a Django website, I probably wouldn't want to like use that

11:24 as my exchange format on my services, right?

11:26 No.

11:26 And there's so many other better formats anyway.

11:28 So.

11:29 JSON, JSON.

11:29 JSON.

11:30 Yeah.

11:31 For sure.

11:31 All right.

11:32 So what do you got next for us?

11:33 I've got a complete beginner's guide to Django.

11:35 Awesome.

11:36 This is a seven part series and it looks like six parts are done already.

11:41 And the seventh part is coming up soon.

11:43 And it's, it kind of goes through quite a bit of Django.

11:47 I know there's already a lot of Django tutorials out there, but the interesting thing I think that

11:52 makes this one stand out is it's kind of, it has an academic feel to it, I think.

11:58 And if that's kind of your thing, you might like this.

12:01 Well, it has a chalkboard.

12:02 It has a beaker and it has a Superman flying.

12:05 So these are all good signs.

12:06 Yeah.

12:07 Well, it has some like comic like drawings in it too and stuff.

12:10 Yeah.

12:10 Yeah.

12:10 Yeah.

12:10 Actually, I think this is really nice.

12:12 The graphics are wonderful.

12:14 They've got little, wireframes to help you design the web pieces, some nice graphics

12:19 for file structure.

12:20 It seems super approachable to me.

12:22 I kind of got lost with some of the UML diagrams and whatnot, but, it's well written.

12:27 People should check it out if you're want to learn Django.

12:30 So maybe.

12:31 Yep.

12:31 Absolutely.

12:32 And it's based on Python, not legacy Python.

12:34 So this is all good as well.

12:35 Yeah.

12:36 So if you're looking to, pick up Django, that's a good place to do it.

12:40 All right.

12:41 So do you remember when we talked about the malicious packages being uploaded?

12:46 Yes.

12:47 PyPI?

12:47 Yeah.

12:48 Do you remember what they were targeting?

12:50 Like how were they making those, getting people to install them?

12:52 Well, there were a couple of ways.

12:54 There were naming standard library things in PyPI and then also misspellings.

12:59 Exactly.

13:00 So we have a new GitHub project called PyPI dash Parker.

13:05 So this is a cool project by a guy named Matt.

13:08 And he sent this over and said, Hey, you should check this out.

13:10 I don't think a lot of people know about it yet, but it's, it's really cool.

13:13 So the idea is, you know, we had this debate about how do people check and how people verify

13:18 what gets uploaded to PyPI.

13:20 Should there be like a committee that reviews it?

13:22 And all that sounded really bad.

13:24 And so he's created this library that says, look, the self-serve ability of people to just

13:31 upload things to PyPI.

13:33 This is a good thing.

13:34 Let's not get rid of it.

13:36 Let's just try to solve this typo squatting problem.

13:39 So what he's done is he's created this thing called the PyPI Parker and it's an extension to

13:45 dist utils.

13:46 So it's a separate command that you can run on it.

13:50 So if I was like Kenneth writes and I create a request, you do this and I could run the

13:56 setup PY and give it, I think it's park.

13:59 And it will actually generate additional packages that I can upload to PyPI.

14:04 And there'll be the various reasonable misspellings of requests.

14:08 And when you import them, it'll raise an error, an import error and says, no, no, no.

14:14 This thing that you pip installed, you misspelled that.

14:16 Go get the real one over here.

14:18 So it gives them like a help message and all that kind of stuff.

14:20 So it one blocks the ownership or provide, it gives the ownership of these misspellings to the

14:26 original package owner.

14:27 And then for the people trying to accidentally use those, it will give them the warning to say,

14:34 you've misspelled this, but here's what you actually should be looking for.

14:37 I think that's great.

14:38 Yeah.

14:38 That's cool.

14:39 Yeah.

14:40 So well done, Matt.

14:40 If you're a package owner, check this out.

14:42 It might be helpful.

14:43 Since I'm not writing so much anymore, I'm thinking about writing a couple new open source projects.

14:48 So I'll probably be in that boat soon.

14:50 Yeah.

14:51 Nice.

14:51 So you should use PyPI Parker and then give us a report.

14:53 Okay.

14:54 Awesome.

14:54 That's our six items for the week.

14:55 So hopefully everyone enjoyed them.

14:57 Brian, what else is going on?

14:58 Well, I'm just getting ready for Halloween actually.

15:00 So.

15:01 I know.

15:01 Houses around here getting scary.

15:02 A lot of creatures and various cobwebs.

15:05 But I have not been as busy as you have lately.

15:07 What have you been up to?

15:08 I have just released a brand new course and you can find it at freemongodbcourse.com and

15:15 that should give you pretty much all you need to know about it.

15:18 So I have this paid course, which is like a seven hour, super in-depth thing.

15:21 And I wanted to come up with a way for people to get started with Python, get started with

15:26 MongoDB.

15:26 And then if you want to learn more, you can like take the paid course or things like that.

15:31 So just drop over at freemongodbcourse.com and sign up.

15:35 There's really no strings attached.

15:36 You just have to create an account and then you can go take the class.

15:39 Oh, another thing I wanted to point out, this is maybe not worth a whole item.

15:42 And this is not my thing.

15:44 This is just something I saw is Donald Stuffed, who runs PyPI and the website and all that kind

15:51 of stuff.

15:51 He sent out a tweet that said, Python 3 usage has doubled in the past year according to

15:57 download stats on PyPI.

15:59 Oh, that's cool.

16:00 Yeah.

16:00 So legacy Python is definitely on the downward trend, even though it's still the majority

16:05 of things that get downloaded.

16:06 Yeah.

16:07 So way to go, Donald, for putting that out there and nice to see that trend continuing.

16:11 All right.

16:13 Well, thank you everyone for listening.

16:14 Brian, thanks for finding these things and sharing with everyone.

16:17 Yeah.

16:17 Thank you.

16:17 Thank you for listening to Python Bytes.

16:21 Follow the show on Twitter via at Python Bytes.

16:23 That's Python Bytes as in B-Y-T-E-S.

16:26 And get the full show notes at Pythonbytes.fm.

16:29 If you have a news item you want featured, just visit Pythonbytes.fm and send it our way.

16:34 We're always on the lookout for sharing something cool.

16:37 On behalf of myself and Brian Okken, this is Michael Kennedy.

16:40 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page