Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


« Return to show page

Transcript for Episode #99:
parse - the regex antidote in Python

Recorded on Wednesday, Oct 10, 2018.

00:00 KENNEDY: Hey everyone, Michael here. I want to take a quick moment for an editorial comment before we get to this week's show. On the last episode, we covered a story from Bloomberg about China implanting hardware hacking devices into motherboards for servers. Since that article came out, there's been a lot of pushback from the organizations named: Amazon, Apple, and others. And there's been a few articles that raised some doubts about the veracity of the original Bloomberg article. I linked to an article in Forbes called Doubts Swirl About Bloomberg's China Chip Hack Report. This doesn't mean the original article was false or implausible, but it may be. And because of that, I felt like I should add this disclaimer and warning about the coverage we had on Episode 98. Sorry about that. Now let's get on to some Python-focused topics on this episode with Brian. Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode 99, recorded October 10th, 2018. I'm Michael Kennedy.

00:59 OKKEN: I'm Brian Okken.

01:00 KENNEDY: Brian, we're coming up really close on Episode 100.

01:02 OKKEN: Yeah, one more, this is 99, wow.

01:04 KENNEDY: Yeah, we're going to have to do something cool for that one. But for now, I think it is super cool that DigitalOcean is sponsoring the show, not just for today but for the rest of the year.

01:12 OKKEN: That is very cool.

01:13 KENNEDY: Yeah, thank you, DigitalOcean. Check them out at pythonbytes.fm/digitalocean. Get $100 credit for new users. I think I had said this before as a joke to you, and you didn't necessarily agree with it, but your first item here may belial some agreement that if you have a problem and you solve it with a regular expression, you now have two problems.

01:34 OKKEN: Yes, yeah, well, definitely. Within code, at least you have two things to support.

01:38 KENNEDY: Yeah, that's right. But you've run across some library that's actually really awesome for simple, what you might think of as regex problems.

01:44 OKKEN: I got this from a tweet from Kenneth Reitz. He said, oh, yeah, everybody, by the way, parse is a thing, or something like that. Parse is a library that, the tagline is it's the opposite of format. So in the general sense of it, there's a bunch of things you can do. You can parse strings. You can search inside strings. You can find all the element patterns or whatever from a string. But you give it both the string that you're searching and then also a, instead of a regular expression for what to search for, basically the same string but with parts of it replaced with the curly braces or something like that to say, if I were to have printed it with format, using this string, I would've got this output. And reverse that out, and then you can get the results out to see all the stuff.

02:37 KENNEDY: That's awesome. So you could say, this is episode {} of Python Bytes. And then you could actually parse it, and that little curly, curly would say, give you the number. Well, I guess the first example would give you a string, and you could put a :d, and it would actually give you an integer, 99.

02:52 OKKEN: Yeah, definitely.

02:54 KENNEDY: That is cool.

02:55 OKKEN: It has some cool things, too, 'cause if you were going to do that and pass in elements of a dictionary, you can have this thing return basically things that look like dictionaries, with named elements and both positional elements and named elements. And it's pretty neat. And I was playing with it, like the for each or the findall. So you can put that in a loop to say, for all the elements in, and I gave it a big file, finding a whole bunch of colon, or a CSV file or something and pulling out elements. And it works really good. And the thing I like about it is it's more readable than a regular expression.

03:32 KENNEDY: Yeah, for sure.

03:33 OKKEN: So if you've got something simple like that that you've got multiple people that have to be able to support it, I think this is a good choice.

03:40 KENNEDY: Yeah, I love it. It's a really cool example. And you can tell that's it's probably written by somebody who understands regex well under the covers, but you don't have to think about it, 'cause it has a compiled mode and things like that, which regex often do.

03:51 OKKEN: Yeah, and you can pass in a pattern, apparently. But if you're going to figure out patterns, then why not just use regex? So, yeah.

04:00 KENNEDY: Yeah, quite cool.

04:01 OKKEN: Anyway.

04:02 KENNEDY: I like it. I'm going to see if I can use it next time I need something like this. So this next one I want to talk about has to do with GUIs. Can you believe it?

04:10 OKKEN: Yeah, we've covered that a few times.

04:11 KENNEDY: I think we have. So this one is called the fman Build System, and it comes from the project which is like a dual-pane file explorer for Windows and Mac and so on from Michael Herrmann. So it's a pretty cool project. But I'm not interested so much in covering the desktop app that he built, per se, but the tool to build it, so the fman Build System, right? So what it lets you do is it lets you create GUI apps for Windows, Mac, and Linux, as in here's my .app file or my .exe. Go click it. In fact, it gives you an installer, right, and like a proper installer for Windows, one of those drag here. It has the applications folder for a .app file disk image in Mac OS. This is nice.

04:57 OKKEN: Wow, that's kind of like one of the missing pieces that we've had for packaging and sending out things.

05:03 KENNEDY: It really is, right? It's quite cool. So like I said, Windows, Mac, and Linux. Works really well. It's what he uses for his projects. It's open source, so you can use it for free on opensource projects. It's licensed under the GPL for commercial stuff, so you can basically buy a license for it. Now, if you're using Qt, you also have to buy a license for Qt. And that's kind of a complicated story. Looking to figure out a little more about that. Honestly I don't really know the full story there. It's sort of, I got this commercial side and this opensource side. But at least if you're doing opensource stuff, I think it's a really cool option.

05:40 OKKEN: Yeah, I like that. Even the idea of being able to, matching the model, similar model is what Qt's model is, is a decent idea.

05:49 KENNEDY: Yeah, and if you're, the point is to package Qt apps, it's almost probably unavoidable.

05:54 OKKEN: Yeah, yeah, definitely. And I also like, I got to give a shout out to Michael Herrmann, that it's not trivial to say, take a piece of your project and pull it off, as he did with the build system so that it can be usable for other people. That's pretty cool.

06:12 KENNEDY: Yeah, absolutely. So JSON, not a whole lot of validation there, is there?

06:16 OKKEN: Well, I think there's a lot of ways to validate JSON. But I don't know if everybody does. I don't think I've, in all the times where I've used JSON to talk with different parts of an application, I usually just kind of assume it's all working right.

06:32 KENNEDY: Yeah, for sure.

06:33 OKKEN: But there are validators out there. And this one, one I want to cover has been recommended by a few people. The documentation's a little light, so I think, it's called fastjsonschema.

06:45 KENNEDY: But the name is descriptive.

06:47 OKKEN: Yeah, definitely. And I'm not sure what the, so one of the articles that I'm going to link to, it's got four different libraries, including fastjsonschema. And I'm not sure what they were validating. It's way faster than everything else. So jsonschema takes 5 seconds. jsonspec takes 7. And then his was the fastjsonschema, 250 milliseconds. And I'm not sure how big of a data set this is to have anything take 5 seconds or 7 seconds, but.

07:17 KENNEDY: Yeah, it must be the same size, one would hope, you know?

07:20 OKKEN: Yeah, it's a compiling scheme. So the scheme is, it's a pretty simple interface. I think. Like I said, the documentation's a little slim. But you describe a schema in terms of what the types of each element is supposed to be like. And I think there's some optional keywords and stuff like that you can throw in there. And then you compile it. You compile that into your own validator. So this is, like we were talking about with regex, you compile it so that it runs faster. And that's what this does, is just comes up, you compile your own validator. And then you can use that to validate any strings that you want.

08:02 KENNEDY: Yeah, it's cool, yeah. So JSON Schema is a separate specification, and you can even learn about it at json-schema.org, that allows you to create a secondary JSON file that is the type system for the original data exchange, right. So if I have an address, I could say, okay, my schema is this is an object. It has properties like post office box and extended address. Those are strings, and so on. You can even have dependencies and stuff, like so the post office box must be a valid street address, which is defined elsewhere. So this is pretty cool. You take those files, you feed it to this validator, and it'll take anything you get back from, say, a web service or something, and say, yeah, this is valid or not.

08:47 OKKEN: And this project's been around for a while, but the big news lately is that there's multiple drafts of this JSON Schema. And the tool we're talking about covers drafts 4, 6, and 7.

08:58 KENNEDY: Right, which is pretty nice. Cool, yeah, very cool. This is separate to other ones, and they're apparently kind of stale, didn't cover the latest drafts and things. So nice find, and it's way, way faster. Now, before we get to the next one, I want to tell you about a cool feature at DigitalOcean. So at DigitalOcean, you can of course login and say, create me, what they call a Droplet in your virtual machine or various other things, load balancers, firewalls, and so on. And it'll spin up your machine, and off you go. And you get some choices, like various versions of Ubuntu and other stuff. But what you can do if you like is you can create your own local virtual machine, whatever you want, some kind of Linux, as long as you can install a few dependencies that it needs to interact with the DigitalOcean infrastructure, and upload that. And from then on, you can just click a button and say, create me my super special private server, as many as you want. With their API or whatever.

09:48 OKKEN: Very cool, definitely.

09:50 KENNEDY: Yeah, pretty cool. So if you want more control over how your virtual machines get created and what they even look like, check them out at pythonbytes.fm/digitalocean. New users get $100 credit. And they've got a bunch of cool stuff that you can do with all their infrastructure. Speaking of infrastructure, a lot of people use IPython these days in the whole data science base, right?

10:10 OKKEN: Yeah, very big.

10:12 KENNEDY: And people might be tired of me going on and on about async. I know people are not a fan, but it's just so powerful. And when it's used at the right time, very, very nice. But until recently, IPython was a thing that you put Python code into, and async was a thing that you did in files, you know, applications that executed Python code. But they didn't really go together.

10:35 OKKEN: Yeah, I'm still trying to get my head around them going together. But yeah.

10:40 KENNEDY: So here's the thing. If I have an async library that I want to use, basically the only way to use it in IPython previously, I believe, was to spin up all the infrastructure to sort of host the async loop yourself, which is like five or six lines to just call the function. So now in IPython, you can just say, await, give it a function, and it just runs it automatically.

11:02 OKKEN: Oh, okay. That's cool.

11:04 KENNEDY: So that's, yeah, so IPython 7 is out. And one of the big features that it has is the interactive shell now supports async and await, which is really cool.

11:14 OKKEN: Yeah, that's very neat.

11:15 KENNEDY: Yeah, so this one came to us from Nick Spirit. Thank you, Nick, for sending it in. And this is written by Matthias Bussonnier. And he is the guy who originally cloned the term legacy Python for the world, as far as I can tell.

11:29 OKKEN: Yeah, I think you're right.

11:31 KENNEDY: Yeah, he wrote a cool article called Planning an Early Death for Python 2 or something friendly like that and talked about referring to it as legacy Python, which, I think it's great. So he also wrote this, and he works on IPython and whatnot quite a bit. Talks about how, when IPython dropped support for Python 2, how were they able to sort of make these features possible, right? If you want to support these types of things, it was much harder to do so if you want to use a Python 3.5 feature but you also want to support Python 2. So it's cool how they talk a lot about that.

12:05 OKKEN: Yeah, and I also think the opensource community is a little, is sort of changing. We have this idea kind of from, I think, other commercial applications that you should support as many platforms as possible, or your library should support as many versions of Python as possible.

12:21 KENNEDY: Right, if it can support 2.1, that'd be awesome.

12:24 OKKEN: Yeah. But at the same time, there's the reality that if you only support the more recent versions, you can clean up your code and have it be an easier code base for other developers to work in and increase your opensource contributions. But I think that's a very real thing. And I think that's one of, like you said, it's one of the things that they probably addressed with IPython, so.

12:48 KENNEDY: Yeah, definitely. It sounds like it here, where they talked about doing the same thing for Django, was we were able to delete a bunch of code. The easiest way to maintain code is to not have it. So yeah, it's a great point. And here's another example.

13:01 OKKEN: A lot of people are using IPython to teach Python. And there's a debate as to whether or not that's a good thing. But at least now they'll be able to teach async, as well.

13:13 KENNEDY: Yeah, it's a good point, yeah. There's a lot of presentations and stuff done there. And now it's nice and easy to call it. Super cool. All right, what's the next one you got for us?

13:19 OKKEN: I have a library called Molten. And Molten is...

13:22 KENNEDY: It's a first study in volcanoes, or what is this?

13:24 OKKEN: It's an API framework. So the link we're going to include has a little video demo on it. But it's like a REST API framework, used similar to API Star. And in fact, kind of the motivation, there's a motivation page that talks about how API Star is kind of awesome, but there's some of the implementation, like a hook model for middleware that this author didn't quite like. So they took inspiration from API Star and Rocket, which is a REST framework, and tried to make this one. And it's Python 3 only, because they're leveraging type hints and type annotation.

14:08 KENNEDY: Yeah, in a really nice way.

14:10 OKKEN: It's really clean-looking. You can implement an API fairly quickly. But there's also built-in, speaking of schema validation, there is schema validation built into this system so that you know the code that you're writing to deal with requests coming in, they're already going to be valid before it even hits you. So you won't be hit with invalid data, which is pretty cool.

14:37 KENNEDY: Yeah, there's a lot of cool validation. So for example, the hello world type method for a web view method is just like, def hello(name: str, age: int) -> str. And it actually grabs the value, say out of the URL or somewhere, puts it in there, converts it to an integer, and passes it. And you don't have to figure out, how do I go get that from the route match data or other weird data sources like that. So that's really cool. And then they take it farther. They so, okay, well, you could have a string and an integer in the function, or if you've got something more interesting, you could define an actual class that has a little decorator. It has a schema, it has an ID that's an optional int, that it has a description, it has a status that can be certain values and has default values, all sorts of cool stuff. And then you can just say, this web method takes this, like they have a todo example, as a class. It takes a todo item, and it automatically pulls it out, does the validation and checking, and yeah, I'm loving this. This is great.

15:39 OKKEN: Yeah, and also you can define the schema on the output, as well, to make sure that you're complying, I think. It's kind of neat. And the other, there's a couple other neat features of it, or at least features of it, whether or not you like it. The middleware is a functional programming-based middleware. And a lot of the different pieces, like if you want to have a database management, they're all set up to allow them to be isolated easily, so using dependency injection. It's a thing, and it allows you to sort of test your different components by themselves and swap out new ones, so it's fun.

16:14 KENNEDY: Yeah, it looks really cool. Well done on that project, you guys. But I think, it looks like something if I was building an API, I might be pretty excited about using it. So I want to round out this episode with something a little fluffy but nice to just remind everybody why we're here and why we use the tools and the technology that we do. So this last one is called A Python Love Letter.

16:38 OKKEN: Well, I love Python.

16:39 KENNEDY: Yeah, I love it too. So this was actually a thing posted to a Reddit thread by a guy who's pretty new to Python. And he posted it, said, Dear Python, where have you been all my life, right? And the thing, the thing he posted was pretty interesting. But also the comments, right. There are many, many comments, and just all the people either agreeing or disagreeing or whatever. So the guy says, look, I'm not a developer. But I've been tinkering with programming for BASIC and Perl and whatnot. And for some reason, he decided he's done with shell scripts. So we've heard that before, right? He's going to go write some Python. And he said, look, I went and I learned Python. And no, I didn't go from zero to production in a day. But if my coworkers'll leave me alone, I might be at production tomorrow. Which is, you know, I think that just kind of sums up a lot of what happens in the Python space.

17:29 OKKEN: Yeah, that's neat. Just kind of a fun story.

17:31 KENNEDY: Yeah, it's definitely a fun story. A couple of the comments that came up that I thought were interesting were, one person said, welcome to the club. I came up on C++. My job highly trained me in C and assembly. And every project I touch, can't we do 90% of this, 95% of this in Python? And we do. Right, we don't need inline assembly most of the time. Another person said, I've a chip on my shoulder. I want to do things the hard way and understand them. So I went C++, 'cause that's real programming, dang it. But later, after suffering a lot, I kind of learned that doing things smarter was way better than doing the hard way and whatnot. So he sort of found his way to Python. I guess one other person said, I felt exactly the same way, decided to learn it. What a breath of fresh air. Sadly, there are few things in my life that make me feel like this. Python and Bitcoin give me the same level of enjoyment. I've used Java, Ruby, Scala, Objective C, C++, et cetera. And nothing feels as good as Python does. So definitely, definitely cool. And then this person, this is what was notable to me, Brian, closed out his comment, is, hell, my next two planned tattoos are Bitcoin and Python logos on my wrists. Way to go. That's some commitment.

18:45 OKKEN: The Python, fine. But you're probably going to regret the Bitcoin one.

18:49 KENNEDY: Is there an abstract cryptocurrency that's going to encapsulate whatever comes next? I agree, I agree. Anyway, I thought that was fun, and it just reminds us what a great community and ecosystem.

19:00 OKKEN: Yeah, definitely. I also just wanted to say, assembly code? Real men program in bits.

19:07 KENNEDY: That's right, 0, 1, 1, 0, baby.

19:12 OKKEN: Anyway.

19:13 KENNEDY: All right, so that's it for our official items. But I see you have one little extra one here that will also bring fun, excitement, and joy to any presentation, if you're just sitting down with a coworker, or even at a meetup, right?

19:25 OKKEN: Yeah, and Oliver Beswalter got me excited about this, and it's Power Mode. And I'm linking to something called Power Mode 2, which is a plugin for PyCharm. But there's Power Modes in a couple different, it started in Atom, I think, and people have probably done it other places. It just makes programming more fun.

19:47 KENNEDY: This is funny. You introduced this to me. So let me just sort of give people a little description. So imagine as you start typing, it's kind of like, a bit like a comic book. The faster you type, the more excited your editor gets. If you copy and duplicate a method, like a big, bam, pow thing'll pop out. Sparks shoot off of your cursor. The faster you type, the more intense it gets. Yeah, it's super productive and awesome.

20:16 OKKEN: I've left it, I've turned off the shaking screen, which is a little unsettling to me, and the flames. But the rest of it, the sparks flying and everything like that, I've been using it for like a week or a week and a half.

20:29 KENNEDY: You leave it on all the time? That's awesome.

20:34 OKKEN: Because I really like it when I copy and paste and it goes, bam!

20:37 KENNEDY: I know, bam, pow.

20:38 OKKEN: Yeah, so it's nice.

20:39 KENNEDY: That's cool. Yeah, Power Mode, if you're using PyCharm, it's definitely fun to check out. And I got the link in the show notes.

20:44 OKKEN: Cool.

20:45 KENNEDY: All right, well, that's a fun one to close it out, for sure. All right, thanks, Brian. And shall we?

20:49 OKKEN: Yeah, thank you.

20:50 KENNEDY: Yeah, chat with you next time. Bye, everyone.

20:52 OKKEN: Okay, bye.

20:53 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit PythonBytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page