Transcript #47: PyPy now works with way more C-extensions and parking your package safely
Return to episode page view on github00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is episode 47 recorded October 11th 2017 I'm Michael Kennedy and I'm Brian Okken and we got a bunch of cool stuff lined up for you. So hey Brian, how's it going?
00:15 Yeah, it's going really good. Yeah. Yeah, great. Hey before we get to your first item I want to say thanks to DigitalOcean They've sponsored a bunch of episodes coming up and they're really supporting the show and the thing they want me to tell you about is Spaces which is like Amazon s3 but like literally three times better and you get a two-month trial So check it out at do dot co slash Python and we'll talk more about that later How about fast fast Python Brian? What do you think? I'm excited. So pi pi is fast implementation and It's good to see that there's still work coming out and one of the exciting bits of news just recently is version 5.9, at least on the the PiPi 2.7 version of this release has Pandas and NumPy in it as well, which is super exciting. That's actually a really big deal because They had not been supported. That's one of the things that was a challenge with PiPi like it was great It was much faster in many many ways. It was like five times faster than regular CPython However, it didn't support any of the C extensions you couldn't integrate things like NumPy and stuff.
01:18 And so it was like, you get a subset of Python that's super fast, but there might be things you don't want to do.
01:23 And oh, by the way, a lot of those are computational and where people care about when it's fast.
01:26 - Yeah.
01:27 - So it's awesome to see that coming on.
01:28 - So getting NumPy and pandas come on, and I'm sure that eventually it'll come on on the 3.5 branch as well.
01:36 - Yeah, for sure.
01:36 And you also have notes about Cython as well, right?
01:39 - Yeah, so it includes the part of the help with this And what it includes is Cython 0.27.1, which supports a lot more Cython projects on PyPy.
01:52 I'm not sure what the Cython story was before this release, but that's pretty exciting.
01:57 Yeah, that's cool.
01:58 Yeah, I think the biggest news here is that CFFI has been updated.
02:03 And the C API extensions for many, many projects now work with PyPy, whereas previously, they did not.
02:11 And so it's not just pandas and NumPy.
02:13 are the headline ones. But there's a bunch of things that previously couldn't work with PyPy because of the C extensions. Well, guess what? Now they can. That's pretty awesome.
02:20 Yeah. And then another another bit of news with this release is the optimized JSON parser for both memory and speed, which should help for people trying to pull in JSON. So that's good.
02:32 Yeah, that's awesome. I think people use JSON every now and then not really sure. All the microservices, it's just like the network lights are above those JSON messages. So that's That's really cool and that's all pretty straightforward.
02:43 I want to show you some stuff that is not straightforward.
02:47 There's this project on GitHub that has really taken off.
02:51 There's a ton of people contributing to it.
02:54 Let me pull up the main page and see.
02:57 There's 17 contributors who are doing a lot of work on this project.
03:01 It has about 3,600 stars called WTF Python.
03:07 If you've seen the Watt video about JavaScript and Ruby, which is hilarious.
03:13 Python is lucky in that there's not that many weird edge cases, but this repository will show you actually there's some weird cases.
03:21 Have you seen this, Brian?
03:22 No, I haven't.
03:23 This is pretty funny.
03:24 Yeah, I pulled out four items, but there's a bunch.
03:27 This is super active on GitHub.
03:28 I'm getting all these notifications from it.
03:30 That's cool.
03:31 One is about skipping lines.
03:33 You say value equals 11, value equals 32.
03:36 is value? It's 11. Huh? What is going on here? There's another one that similar in the same section says, quote, e, equal, equal, quote, e, false. Okay. And things like that. And it's, it's about encoding and some interesting stuff. So each one of these has like a really simple, you know, like three or four lines of code, and then the explanation and the explanation, I think is where this gets interesting. So another one is modifying dictionaries, like, These are super good ways to trick people like create a dictionary with one item, go through for each item in it, delete that item and add a new one, and then print that out. How many times did that loop run? Do you think? I have no idea. It's either one or error or something is what I would guess, right? But the answer is eight. Exactly eight. Like what? Why? Why is it right? Why doesn't it run one infinite, or zero or error like those are the three zero one or infinity eight doesn't make any sense. But if you look at the implementation, the dictionaries are pre allocated, because you're typically adding stuff, they want to grow and like a doubling sort of way, not a, every time you add something, it's got to reallocate and copy around things. And so what they do is they pre allocate a certain number of items in this trick, like leverages, assigning into those new slots until it runs out. So this is crazy. I'll give you one more example is let's go with it is not what it is. So if you say a equals 256, b equals 256, a is b is true. However, if you say a is 257, and b is 257, a is b is false. Do you know why? Another crazy one. This is insane. And the reason is, I believe the first 126 numbers, maybe negative as well, I'm not sure, are pre allocated for performance reasons. And every time you like literally say the number seven, like that points to this pre-allocated flywheel pattern type thing.
05:33 But beyond that, these get allocated on demand.
05:36 So you're basically asking, is the pointer to 257 equal to the other pointer 257, and there's no longer this tracking between them and they get dropped.
05:43 So there's just, there's tons of this craziness going on here.
05:47 That's pretty fun.
05:48 Yeah, that's nice.
05:49 So I think this is a fun project.
05:51 I really commend the people working on it.
05:53 It's great.
05:54 And I definitely, I want to do something with this later.
05:55 I just haven't figured out quite what the details are yet.
05:57 But there's got to be something fun here.
05:59 So this makes me feel like I should go practice my Python.
06:01 Like maybe I'm not as good as I thought I was because that dictionary thing going eight times kind of like took me for a loop for a bit.
06:07 Anything in the WTF Python would be evil to try to bring up at a job interview.
06:12 It'd be very evil.
06:14 But if they answered it, think of that.
06:16 Yeah, that would be good.
06:18 I ran across this.
06:19 It's a recent article called Python Exercises.
06:23 And I've done this before.
06:24 So trying to either brush up on Python skills or trying to find some questions to ask at an interview or something, trying to come up with some decent questions.
06:35 And a lot of the questions out there are, they seem to be sort of generic questions around like any language and they just happen to be "do it in Python".
06:45 This is a collection of questions that are, some of them are pretty easy to start off with like basic syntax stuff but there are some things that check actually just Python and some use of the standard library and I think it's a nice collection. It goes through syntax of course and then some text processing and OS integration and decorators, generators and you can get in into quite a few things but I think it's a nice set. It's not too huge. It's a good one to look Yeah, yeah, and they don't seem too trivial.
07:19 They're like, given this set of data, parse it to a CSV file, start the sub-process, things like that.
07:25 It's really, it's pretty nice, actually.
07:26 Yeah, and then at the end, the last thing they talk about is testing, which I very much appreciate.
07:31 I think it's important to make sure, I've started with trying to do, send out code examples to before I bring somebody in for an interview, ask them to solve some coding problem, but also to write a test to prove it works.
07:44 I think that's a good thing to add.
07:46 - Absolutely.
07:46 Yeah, that's really cool.
07:47 Great that they include that at the end as well.
07:50 So I've got another thing you should test for.
07:51 Before I tell you about it though, I wanna tell you about Spaces.
07:54 So Spaces is DigitalOcean's new service, which lets you basically store files on the internet and either privately or publicly pass them around, right?
08:04 So kind of like Amazon S3, but much, much more affordable.
08:08 So instead of charging you nine cents per gigabyte, they charge you one cent.
08:12 And you can use exactly the same tools.
08:14 So, you know, like I use transmit for my Mac.
08:17 I love that to manage all my stuff in the cloud.
08:20 And when I switched to DigitalOcean Spaces, which I did just 'cause I saw the offer, I'm like, this is so much better before we even talked about this.
08:28 I just pointed my transmit at that and it just kept on working.
08:31 Just said, hey, there's an S3 thing over here and here's the key.
08:33 So if you are using S3 or some other sort of shared cloud storage for files and things like that, You definitely should check out DigitalOcean Spaces at do.co/python and check it out.
08:47 There's a two month free trial and then it's really, really affordable and straightforward, I love it.
08:51 - Nice.
08:52 - The audio you're listening to right now came straight out of there, so beautiful.
08:56 Have you heard of Pickle?
08:57 - Oh yeah.
08:58 - Not the gherkins, but the built-in way to serialize stuff.
09:03 - I don't remember why, but I try to avoid it because I've heard there's problems.
09:07 - Yeah, there's two major problems with Pickle.
09:08 One of them is it stores a binary representation of your objects.
09:13 And so if you do things like rename a field, or maybe even reorder stuff, right, if you add a field, remove a field, there's all sorts of stuff where like, just the versioning of your classes or your data, if that changes, you can no longer properly serialize these things. It's not great.
09:30 So that can be a problem. And that's probably reason enough to use JSON or some other format.
09:35 However, right in the documentation it says, "Warning, the pickle module is not intended to be secure against erroneous or maliciously constructed data.
09:44 Never unpickle data received from an untrusted or unauthenticated source." Alright, so I think people see this like, "Okay, that looks bad.
09:51 Let's get out of here." And they just bail, as they should.
09:54 I think even the versioning stuff alone is already an issue.
09:59 I think there was an issue with somebody caching stuff, and when they were switching from Python to Python 3, the in-memory representation of like, date time or some, some part of the memory was a different representation and the pickling stuff started to conflict with each other.
10:14 Anyway, this article I want to talk about is called exploiting misuse of Python's pickle.
10:19 So if you've ever read that warning, and gone, huh, that sounds bad, I can kind of imagine what that might look like.
10:25 I'm going to stay away from it.
10:27 This one shows you exactly how to do bad things.
10:31 And bad things begin with let's create a remote shell and start executing code and maybe even let us log in remotely over SSH to this machine by sending a little bit of binary data like 50 bytes, 100 bytes, something super small over to this machine and then we'll just log in and go from there.
10:50 That sounds bad, right?
10:51 Yeah, geez.
10:52 So the idea is when you unpickle something, there's a way, there's a few hooks where you you can run arbitrary Python code.
10:58 They say, "Well, let's just use subprocess.popen and create a shell for us." You just put that command in your dunder reduce, I think it's called, and then you've got shells, and that's bad.
11:12 For those of you out there wondering, "What is this warning about?
11:14 Exactly, why should I be super scared?" Here's why.
11:17 Great little example, super approachable.
11:19 >> Yeah, wacky.
11:20 >> Yeah, wacky.
11:21 If I was running a Django website, I probably wouldn't want to use that as my exchange format on my services, right?
11:26 No, and there's so many other better formats anyway.
11:29 JSON, JSON.
11:30 JSON, yeah.
11:31 For sure.
11:32 All right, so what do you got next for us?
11:33 I've got a complete beginner's guide to Django.
11:36 Awesome.
11:37 This is a seven-part series, and it looks like six parts are done already, and the seventh part is coming up soon.
11:44 And it kind of goes through quite a bit of Django.
11:48 I know there's already a lot of Django tutorials out there, but the interesting thing I think that makes this one stand out is it's kind of I has an academic feel to it I think and if that's kind of your thing you might like this. Well it has a chalkboard, it has a beaker, and it has a Superman flying so these are all good signs. Yeah well it has some like comic like drawings in it too and stuff. Yeah yeah yeah actually I think this is really nice the graphics are wonderful they've got little wireframes to help you design the web pieces some nice graphics for file structure. It seems super approachable to me. I kind of got lost with some of the UML diagrams and whatnot but it's well written. People should check it out if you want to learn Django. So maybe. Yep, absolutely and it's based on Python not legacy Python so this is all good as well. Yeah so if you're looking to pick up Django that's a good place to do it. All right so do you remember when we talked about the malicious packages being uploaded to - PyPi? - Yeah.
12:48 - Do you remember what they were targeting?
12:50 Like how were they making those, getting people to install them?
12:52 - Well, there were a couple ways.
12:54 There were naming standard library things in PyPI, and then also misspellings.
12:59 - Exactly.
13:00 So we have a new GitHub project called PyPI-Parker.
13:05 So this is a cool project by a guy named Matt.
13:08 He sent this over and said, "Hey, you should check this out." I don't think a lot of people know about it yet, but it's really cool.
13:13 So the idea is, you know, We had this debate about how do people check and how people verify what gets uploaded to PyPI, should there be a committee that reviews it, and all that sounded really bad.
13:24 So he's created this library that says, look, the self-serve ability of people to just upload things to PyPI, this is a good thing.
13:34 Let's not get rid of it.
13:36 Let's just try to solve this typo squatting problem.
13:40 So what he's done is he's created this thing called the PyPI Parker, and it's an extension to distutils.
13:46 It's a separate command that you can run on it.
13:51 If I was like Kenneth Wrights and I create a request, you do this and I could run the setup py and give it, I think it's park.
14:00 It will actually generate additional packages that I can upload to PyPI.
14:05 There'll be the various reasonable misspellings of requests.
14:09 When you import them, it'll raise an error, an import error, and says, "No, no, no.
14:14 thing that you pip installed, you misspelled that. Go get the real one over here. So it gives them like a help message and all that kind of stuff. So it one blocks the ownership or provide it gives the ownership of these misspellings to the original package owner.
14:28 And then for the people trying to accidentally use those, it will give them the warning to say you've misspelled this, but here's what you actually should be looking for. I think that's great.
14:39 Yeah, that's cool.
14:40 Yeah. So well done, Matt. If you're a package owner, check this out. It might be helpful.
14:43 Since I'm not writing so much anymore, I'm thinking about writing a couple new open source projects so I'll probably be in that boat soon.
14:50 Yeah, nice.
14:51 So you should use PyPI Parker and then give us a report.
14:54 Okay.
14:55 Awesome.
14:56 That's our six items for the week so hopefully everyone enjoyed them.
14:57 Brian, what else is going on?
14:58 Well, I'm just getting ready for Halloween actually.
15:01 I know.
15:02 Houses around here are getting scary.
15:03 A lot of creatures and various cobwebs.
15:05 But I have not been as busy as you have lately.
15:08 What have you been up to?
15:09 I have just released a brand new course, and you can find it at freemongodbcourse.com, and that should give you pretty much all you need to know about it.
15:18 So I have this paid course, which is like a seven hour super in-depth thing, and I wanted to come up with a way for people to get started with Python, get started with MongoDB, and then if you want to learn more, you can take the paid course or things like that.
15:31 So just drop over at freemongodbcourse.com and sign up.
15:35 There's really no strings attached.
15:36 You just have to create an account, and then you can go take the class.
15:39 Oh, another thing I wanted to point out, this is maybe not worth a whole item, and this is not my thing, this is just something I saw, is Donald Stuft, who runs PyPI, and the website and all that kind of stuff, he sent out a tweet that said, "Python 3 usage has doubled in the past year "according to download stats on PyPI." - Oh, that's cool.
16:00 - Yeah, so, legacy Python is definitely on the downward trend, even though it's still the majority of things that get downloaded.
16:07 Yeah, so way to go, Donald, for putting that out there and nice to see that trend continuing.
16:12 All right.
16:13 Well, thank you everyone for listening.
16:14 Brian, thanks for finding these things and sharing with everyone.
16:17 Yeah, thank you.
16:19 Thank you for listening to Python Bytes.
16:21 Follow the show on Twitter via @pythonbytes.
16:23 That's Python Bytes as in B-Y-T-E-S.
16:27 And get the full show notes at pythonbytes.fm.
16:30 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
16:35 the lookout for sharing something cool.
16:37 On behalf of myself and Brian Okken, this is Michael Kennedy.
16:40 Thank you for listening and sharing this podcast with your friends and colleagues.