Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #99: parse - the regex antidote in Python

Return to episode page view on github
Recorded on Wednesday, Oct 10, 2018.

00:00 Hey everyone, Michael here. I want to take a quick moment for an editorial comment before we get to this week's show. On the last episode, we covered a story from Bloomberg about China implanting hardware hacking devices into motherboards for servers. Since that article came out, there's been a lot of pushback from the organizations named Amazon, Apple, and others.

00:19 And there's been a few articles that raised some doubts about the veracity of the original Bloomberg I'm linking to an article in Forbes called "Doubts Swirl About Bloomberg's China Tip Hack Report." This doesn't mean the original article is false or implausible, but it may be.

00:36 And because of that, I felt like I should add this disclaimer and warning about the coverage we had on episode 98.

00:42 Sorry about that.

00:43 Now let's get on to some Python-focused topics on this episode with Brian.

00:48 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:53 This is episode 99, recorded October 10th, 2018.

00:58 I'm Michael Kennedy.

00:59 - I'm Brian Okken.

01:00 - Brian, we're coming up really close on episode 100.

01:02 - Yeah, one more, this is 99, wow.

01:04 - Yeah, we're gonna have to do something cool for that one.

01:06 But for now, I think it is super cool that DigitalOcean is sponsoring the show, not just for today, but for the rest of the year.

01:12 - That is very cool.

01:13 - Yeah, thank you, DigitalOcean.

01:14 Check them out at pythonbytes.fm/digitalocean.

01:17 Get $100 credit for new users.

01:20 I think I had said this before as a joke to you, And you didn't necessarily agree with it.

01:24 But your first item here may belial some agreement, that if you have a problem and you solve it with a regular expression, you now have two problems.

01:33 - Yes, yeah, well, definitely.

01:36 We think code, at least you have two things to support.

01:38 - Yeah, that's right.

01:39 But you've run across some library that's actually really awesome for like simple, what you might think of as Redge X problems.

01:44 - I got this from a tweet from Kenneth Reitz.

01:47 He's like said, "Oh yeah, everybody, by the way, parse is a thing or something like that.

01:53 Parse is a library that the tagline is, it's the opposite of format.

01:59 In the general sense of it, there's a bunch of things you can do.

02:03 You can parse strings, you can search inside strings, you can find all the element patterns or whatever from a string.

02:10 But you give it both the string that you're searching, and then also instead of a regular expression for what to search for, basically the same string but with parts of it replaced with the curly braces or something like that to say if I were to have printed it with format using this string I would have gotten this output and reverse that out and then you can get the results out to see all the stuff.

02:37 That's awesome. So you could say like this is episode curly curly of Python bytes and then you could actually parse it and that little curly curly would say give you the number, well I guess the first example would give you a string and you could put a colon d and it would actually give you an integer 99.

02:53 Yeah, definitely. That is cool. It has some like cool things too because if you were going to do that and pass in elements of a dictionary, you can have this thing return basically things that look like dictionaries with named elements and both positional elements and named elements.

03:09 And it's pretty neat. And I was playing with it like the for each or the find all.

03:14 So you can you put that in a loop to say like, for all the elements in and I gave it a big file, finding a whole bunch of colon or a CSV file or something and pulling out elements and it works really good.

03:28 And the thing I like about it is it's more readable than a regular expression.

03:32 So yeah, for sure, if you've got something, something simple like that, that you've got multiple people that have to be able to support it.

03:38 I think this is a good choice.

03:40 >> Yeah, I love it.

03:40 It's a really cool example.

03:42 And you can tell that it's probably written by somebody who understands Regex well under the covers.

03:47 But you don't have to think about it, because it has a compiled mode and things like that, which Regex often do.

03:52 >> Yeah, and you can pass in a pattern apparently, but if you're going to figure out patterns, then why not just use Regex?

03:59 So, yeah.

04:00 >> Yeah, quite cool.

04:01 >> Anyway. >> I like it.

04:02 I'm going to see if I can use it next time I need something like this.

04:05 So this next one I want to talk about has to do with GUIs.

04:08 Can you believe it?

04:09 - Yeah, we've covered that a few times.

04:11 - I think we have.

04:12 So this one is called the FMAN Build System and it comes from the project which is like a dual pane file explorer for Windows and Mac and so on from Michael Herman.

04:24 So it's a pretty cool project but I'm not interested so much in covering the desktop app that he built per se but the tool to build it.

04:32 So the FMAN Build System, right?

04:34 So what it lets you do is it lets you create GUI apps for Windows, Mac, and Linux.

04:39 As in, here is my .app file or my .exe.

04:42 Go click it. In fact, it gives you an installer.

04:44 Right? And it like, a proper installer for Windows.

04:48 One of those, you know, drag here and has like the applications folder for a .app file, disk image and macOS.

04:55 This is nice.

04:57 Wow, that's kind of like one of the missing pieces that we've had for packaging and sending out things.

05:03 It really is, right? It's quite cool. So, Like I said, Windows, Mac, and Linux works really well.

05:09 It's what he uses for his project.

05:11 It's open source, so you can use it for free on open source projects.

05:16 It's licensed under the GPL for commercial stuff, so you can basically buy a license for it.

05:22 Now, if you're using Qt, you also have to buy a license for Qt, and that's kind of a complicated story.

05:29 Looking to figure out a little bit more about that.

05:31 Honestly, I don't really know the full story there.

05:34 It's sort of I got this commercial side and this open source side, but at least if you're doing open source stuff, I think it's a really cool option.

05:40 >> Yeah, I like that. Even the idea of being able to, matching the model, similar model is what Qt's model is a decent idea.

05:49 >> Yeah, and if you're at the point as to package Qt apps, it's almost probably unavoidable.

05:54 >> Yeah, definitely. I also like, I got to give a shout out to Michael Herman.

05:59 It's not trivial to say, take a piece of your project and pull it off as he did with the build system so that it can be usable for other people.

06:10 That's pretty cool.

06:11 >> Yeah, absolutely. So, JSON, not a whole lot of validation there, is there?

06:15 >> Well, I think there's a lot of ways to validate JSON, but I don't know if everybody does.

06:22 I don't think I've ever in all the times where I've used Jason to talk with different parts of the application, I usually just kind of assume it's all working, right?

06:32 Yeah, for sure.

06:33 But there are validators out there and this one, the one I want to cover is, has been recommended by a few people.

06:39 It's, the documentation is a little light. So I think it's called fast Jason schema.

06:45 But the name is descriptive.

06:47 Yeah, definitely.

06:48 And I'm not sure what the, so one of the articles I'm going to link to It's got four different libraries, including FastJSON Schema, and I'm not sure what they were validating.

06:59 It's way faster than everything else.

07:02 So JSON Schema takes five seconds, JSON Spec takes seven, and then his was the FastJSON Schema, 250 milliseconds.

07:11 I'm not sure how big of a dataset this is to have anything take five seconds or seven seconds.

07:17 >> Yeah. It must be the same size one would hope.

07:19 >> Yeah. It's a compiling scheme.

07:21 So the kind of the scheme is, it's a pretty simple interface.

07:26 I think, like I said, the documentation is a little slim, but you describe a schema in terms of what the types of each element is supposed to be like.

07:37 And I think there's some optional keywords and stuff like that you can throw in there.

07:41 And then you compile it, you compile that into your own validator.

07:46 So this is like we were talking about with Regex, You compile it so that it runs faster.

07:52 And that's what this does.

07:52 It just comes up, you compile your own validator, and then you can use that to validate any strings that you want.

08:01 - Yeah, it's cool.

08:02 Yeah, so JSON schema is a separate specification, and you can even learn about it at json-schema.org that allows you to create a secondary JSON file that is the type system for the original data exchange.

08:18 Right, so if I have like an address, I could say, okay, here my schema is, this is an object that has properties like post office box and extended address, those are strings and so on.

08:29 You can even have like dependencies and stuff like so, the post office box must be a valid street address, which is defined, you know, elsewhere.

08:37 So this is pretty cool.

08:38 You take those files, you feed it to this validator and it'll take anything you get back from say, a web service or something and say, yeah, this is valid or not.

08:47 - And this project's been around for a while, but the big news lately is that there's multiple drafts of this JSON schema, and the tool we're talking about covers drafts four, six, and seven.

08:59 - Right, which is pretty nice.

09:00 Yeah, very cool, so there was other ones, and they were apparently kind of stale, didn't cover the latest drafts and things.

09:05 So, nice find, and it's way, way faster.

09:08 Now, before we get to the next one, I want to tell you about a cool feature at DigitalOcean.

09:12 So at DigitalOcean, you can, of course, log in and say, create me what they call a droplet, virtual machine or various other things, load balancers, firewalls, and so on.

09:21 And it'll spin up your machine and off you go. And you get some choices like various versions of Ubuntu and other stuff. But what you can do, if you'd like, is you can create your own local virtual machine of whatever you want, some kind of Linux, as long as you can install a few dependencies that it needs to interact with the DigitalOcean infrastructure and upload that. And from then on, you can just click a button and say, "Create me my super special private server as many as you want with their API or whatever.

09:48 Very cool.

09:49 Definitely.

09:50 Yeah, pretty cool. So if you want more control over how your virtual machines get created and what they even look like, check them out at pythonbytes.fm/digitalocean.

09:59 New users get $100 credit, and they've got a bunch of cool stuff that you can do with all their infrastructure.

10:05 Speaking of infrastructure, a lot of people use IPython these days in the whole data science space, right?

10:10 Yeah, very big.

10:11 And people might be tired of me going on and on about async.

10:15 I know some people are not a fan, but it is just so powerful.

10:19 And when it's used at the right time, very, very nice.

10:22 But until recently, IPython was a thing that you put Python code into.

10:27 And async was a thing that you did in files, you know, applications that executed Python code.

10:33 But they didn't really go together.

10:35 Yeah, I'm still trying to get my head around them going together.

10:38 But yeah.

10:39 So here's the thing, if I have an async library that I want to use, basically, the only way to use it in IPython previously, I believe, was to spin up all the infrastructure to sort of host the async loop yourself, which is like five or six lines to just call the function.

10:56 So now, in IPython, you can just say, "Oh, wait." Give it a function, and it just runs it automatically.

11:01 Oh, okay. That's cool.

11:03 So, yeah, IPython 7 is out, And one of the big features that it has is the interactive shell now supports async and await, which is really cool.

11:13 Yeah, that's very neat.

11:15 Yeah, so this one came to us from Nick Spirit. Thank you, Nick, for sending it in.

11:19 And this is written by Mathias Boussigny. And he is the guy who originally cloned the term legacy Python for the world, as far as I can tell.

11:28 Yeah, I think you're right.

11:31 Yeah, he wrote a cool article called "Planning an Early Death for Python 2" or something, and friendly like that, and talked about referring to it as legacy Python, which I think is great.

11:41 So he also wrote this, and he works on IPython and whatnot quite a bit, talks about how when IPython dropped support for Python 2, how are they able to sort of make these features possible?

11:53 If you want to support these types of things, it was much harder to do so if you want to use a Python 3.5 feature, but you also want to support Python 2.

12:02 So it's cool how they talk a lot about that.

12:05 And I also think the open source community is a little, is sort of changing.

12:09 We, we had this idea that kind of from, from I think other commercial applications that you should support as many platforms as possible, or like your library should support as many versions of Python as possible.

12:21 Right.

12:22 If it gets a port two, one, that'd be awesome.

12:23 But, but at the same time, there's a reality that if you only support the more recent versions, you can clean up your code and have it be an easier code base for other developers to work in and increase your open source contributions. And I think that's a very real thing. And I think that's one of like, like you said, it's one of the things that they probably addressed with Python.

12:48 Yeah, definitely. It sounds like it here, when they talked about doing the same thing for Django was we were able to delete a bunch of code. And the easiest way to maintain code is to not have it. Yeah. So yeah, it's a great point. And Here's another example.

13:01 A lot of people are using iPython to teach Python.

13:04 And whether or not that there's a debate as to whether or not that's a good thing.

13:08 But at least now they'll be able to teach async.

13:12 Yeah, that's a good point.

13:13 Yeah, it's there's a lot of presentations and stuff done there.

13:15 And now it's nice and easy to call it.

13:17 Super cool.

13:17 Yeah.

13:18 All right.

13:18 What's the next one you got for us?

13:19 I have a library called Molten.

13:21 And Molten is...

13:22 Is it for studying volcanoes or what is this?

13:24 It's an API framework.

13:26 So the link we're going to include has a little video demo on it.

13:30 But it's like a rest API framework used similar to like API star.

13:37 And in fact, the kind of the motivation, there's a motivation page that talks about how API star is kind of awesome. But there's some of the implementation like a hook model for middleware that this author didn't quite like.

13:53 So they took inspiration from API star and rocket, which is a rust framework, and tried to make this one and it's a Python three only because they're they're leveraging type hints and type annotation.

14:08 Yeah, in a really nice way.

14:10 It's really clean looking. There's a you can implement a an API fairly quickly. But there's also built in speaking of schema validation.

14:20 There is schema validation built into the system so that you know the code that you're writing to deal with requests or the coming in, they're already going to be valid before it even hits you.

14:33 So you won't be hit with invalid data, which is pretty cool.

14:37 Yeah, there's a lot of cool validation.

14:39 So for example, the hello world type method for a web view method is just like def hello and then name colon str age colon int.

14:48 and it actually, you know, grabs the value, say, out of the URL or somewhere, puts it in there, converts it to an integer, and passes it, and you don't have to figure out, you know, how do I go get that from the route match data or other weird data sources like that, so that's really cool.

15:02 And then they take it farther, you say, okay, well, you could have like a string and an integer in the function, or if you've got something more interesting, you could define an actual class that has a little decorator, so it's a schema, like it has an ID that's an optional int, that it has a description that has a status that can be, you know, certain values and has default values, all sorts of cool stuff.

15:25 And then you just say, this web method takes this, like, we have a to do example as a class, right?

15:32 It takes a to do item and it automatically pulls that out, does the validation and check in. Yeah, I'm loving this. This is great.

15:38 Yeah, and also you can define the schema on the output as well to make sure that you're complying.

15:43 I think it's kind of neat.

15:45 And the other, there's a couple other neat features of it.

15:47 or at least features of it, whether or not you like it.

15:51 The middleware is a functional programming-based middleware.

15:55 And a lot of the different pieces, like if you want to have a database management, they're all set up to allow them to be isolated easily.

16:05 So using dependency injection, it's a thing, and it allows you to sort of test your different components by themselves or swap out new ones.

16:13 So it's fun.

16:14 - Yeah, it looks really cool.

16:15 I'm well done on that project, you guys.

16:17 I think it looks like something if I was building an API, I might be pretty excited about using.

16:23 So I wanna round out this episode with something a little fluffy, but nice to just remind everybody why we're here and why we use the tools and the technology that we do.

16:34 So this last one is called a Python love letter.

16:38 - Well, I love Python.

16:40 - Yeah, I love it too.

16:41 So this was actually a thing posted to a Reddit thread by a guy who's pretty new to Python.

16:46 And he posted it, it said, "Dear Python, "where have you been all my life?" Right?

16:50 And the thing he posted was pretty interesting, but also the comments, right?

16:56 There are many, many comments, and just all the people either agreeing or disagreeing or whatever.

17:00 So the guy says, "Look, I'm not a developer, "but I've been teeing with programming "for Basic and Perl and whatnot." And for some reason he decided, you know, he's done with shell scripts, so we've heard that before, right?

17:11 He's gonna go write some Python.

17:14 and said, "Look, I went and I learned Python, "and no, I didn't go from zero to production in a day, "but if my coworkers will leave me alone, "I might be in production tomorrow." Which is, you know, I think that's just, like, kind of sums up a lot of what happens in the Python space. - Yeah, that's neat.

17:30 Just kind of a fun story.

17:31 - Yeah, it's definitely a fun story.

17:32 A couple of the comments that came up that I thought were interesting were, one person said, "Welcome to the club.

17:38 "I came up on C++.

17:40 "My job highly trained me in C and assembly, And every project I touch, can't we do 90% of this, 95% of this in Python?

17:48 And we do, right?

17:49 We don't need inline assembly most of the time.

17:53 Another person said, I have a chip on my shoulder.

17:55 I want to do things the hard way and understand them.

17:57 So I went C++, 'cause that's real programming, dang it.

18:00 But later, after suffering a lot, I kind of learned that doing things smarter was way better than doing the hard way and whatnot.

18:08 So he loves, sort of found his way to Python.

18:12 I guess one other person said, I felt exactly the same way, decided to learn it.

18:16 What a breath of fresh air.

18:17 Sadly, there are a few things in my life that make me feel like this.

18:19 Python and Bitcoin give me the same levels of enjoyment.

18:22 I've used Java, Groovy, Scala, Objective-C, C, C++, et cetera, and nothing feels as good as Python does.

18:30 So definitely, definitely cool.

18:32 And then this person, this is what was notable to me, Brian, closed out his comment, is, hell, my next two planned tattoos are Bitcoin and Python logos on my wrist.

18:42 Way to go.

18:43 That's some commitment.

18:44 The Python, fine, but you're probably going to regret the Bitcoin one.

18:48 Is there an abstract cryptocurrency that is going to encapsulate whatever comes next?

18:54 I agree. I agree.

18:55 Anyway, I thought that was fun and it just reminds us what a great community and ecosystem.

19:00 Yeah, definitely. I also just wanted to say, assembly code, real man, program in bits.

19:06 That's right. 0110, baby.

19:09 So, anyway.

19:12 Alright, so that's for our official items.

19:14 But I see you have one little extra one here that will also bring fun, excitement, and joy to any presentation if you're just sitting down with a co-worker or even at like a meetup, right?

19:25 Yeah, and it's, Oliver Bestwalter got me excited about this and it's Power Mode.

19:32 And I'm linking to something called Power Mode 2, which is a plugin for PyCharm, but there's power modes in a couple different, it started in Atom, I think, and people have probably done it other places.

19:45 It just makes programming more fun.

19:47 - This is funny.

19:48 You introduced this to me, so let me just sort of give people a little description.

19:53 So imagine as you start typing, it's kind of like a bit like a comic book.

19:59 The faster you type, the more excited your editor gets.

20:02 If you copy and like duplicate a method, like a big bam pow thing will pop out, sparks shoot off of your cursor, the faster you type, the more intense it gets.

20:11 Yeah, it's super, super productive and awesome.

20:14 (laughing)

20:16 - I've left it, I've turned off the shaking screen, which is a little unsettling to me, and the flames, but the rest of it, the sparks flying and everything like that, I've been using it for like a week or a week and a half.

20:28 - You leave it on all the time?

20:30 - Yeah.

20:31 (laughing)

20:32 - That's awesome.

20:34 because I really like it when I copy and paste and it goes BAM!

20:37 I know. BAM! POW!

20:38 That's nice.

20:39 That's cool.

20:40 Yeah, Power Mode, if you're using PyJarm, is definitely fun to check out.

20:42 It's got the link in the show notes. Cool.

20:44 Okay.

20:44 All right. Well, that's a fun one to close it out for sure.

20:47 All right. Thanks, Brian.

20:48 And chat with you.

20:49 Yeah, thank you.

20:49 Yeah, chat with you next time.

20:51 Bye, everyone.

20:51 Okay. Bye.

20:52 Thank you for listening to Python Bytes.

20:54 Follow the show on Twitter via @pythonbytes.

20:57 That's Python Bytes as in B-Y-T-E-S.

21:00 And get the full show notes at pythonbytes.fm.

21:03 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

21:07 We're always on the lookout for sharing something cool.

21:11 On behalf of myself and Brian Auchin, this is Michael Kennedy.

21:14 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page