Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #107: Restructuring and searching data, the Python way

Return to episode page view on github
Recorded on Wednesday, Dec 5, 2018.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 107, recorded December 5th, 2018. I'm Michael Kennedy.

00:10 And I'm Brian Ockman.

00:11 And this episode is brought to you by DigitalOcean. Check them out at pythonbytes.fm/DigitalOcean.

00:17 Huge supporters of the show, great product, and you get $100 free credit for new users.

00:22 So check them out. I'll tell you more about them later. But Brian, how you been?

00:26 I'm doing really good.

00:28 Good. So I hear you're working on your stand-up act.

00:30 No.

00:33 No? Your stand-up comedy?

00:35 No, but I do find lots of things funny. And we've got the first topic turned into a Twitter discussion that ended in a joke.

00:45 And so I'm going to share that later in the show.

00:47 Right. But good jokes, punchlines go at the end, right?

00:50 Yeah.

00:50 So the topic I want to talk about is GLOM, which I'd actually heard about.

00:57 It's a package started by Mahmoud Hashemi, who brought us ZeroVer and other great things.

01:03 It's a package to try to reshape data.

01:06 So if you've got like JSON or really any data that is in or data structure that's in one type and one shape and you need it in another shape or you need some of it out, that's what GLOM is written for.

01:18 But it's written to be kind of like, kind of used like a regular expression is.

01:22 It's a general purpose tool that you can use to translate from one thing to another.

01:27 And some of the cool things about it are that it's like a path based.

01:32 You can access things with a path based access.

01:35 Like as an example, if you were going to have a three deep dictionary, you'd have to pass in.

01:43 A dictionary of dictionaries of dictionaries sort of thing.

01:45 Yeah.

01:46 Or maybe two levels and then an item.

01:47 So it's sort of a lot of brackets and colons and brackets and quotes and stuff to specify that.

01:54 So they've got a shorthand version that you can say like A.B.C or something like that instead of all the brackets.

02:02 It's a fairly simple interface to think about.

02:04 It's a GLOM and then you have a target data, target specification.

02:08 And then you've got some other things that you can do like default.

02:13 So if like there's some data that's missing, there's a lot of Python ways to do this sort of thing.

02:18 But GLOM is sort of rather complete.

02:20 It does a lot of neat things.

02:21 And one of the neat things it does is as you're specifying the from and portion of your data transformation, sometimes something might not be there.

02:30 Like if you were expecting element C in like a really nested dictionary and if it's not there or that just element just doesn't exist, you might get something weird in normal Python.

02:42 Like the famous none type object is not subscriptable.

02:46 And it doesn't tell you anything about what went wrong.

02:49 So one of the things GLOM does is gives you better error messages.

02:52 Like could not access C part two of the path ABC, which is like, oh, well, that's way more useful than something on this line was, you know, none basically.

03:02 Yeah, exactly.

03:03 And then they built also built in since it's being used in production.

03:07 Mahmoud is using it at work as well.

03:09 It's got a bunch of cool things like built in data exploration and debugging features.

03:14 So when things do go wrong, you can sort of interactively try to figure out what went wrong.

03:20 That's really cool.

03:21 I love this, the way that it works.

03:23 It seems really nice.

03:24 I feel like you could almost do a little like a minor tweak to it to make it even cooler where you can do straight attribute access.

03:32 So you say GLOM, parenthesis, data, and then the string A.B.C.

03:36 It feels to me like you could extend it and say GLOM of data.A.B.C and have it understand that and sort of apply it so it doesn't look like function calls.

03:47 It looks more like attribute access once you sort of GLOMify an object.

03:50 Who knows?

03:50 But either way, I still think this is really nice, especially if you're working with data that comes, like you said, in nested dictionaries or things like that,

04:00 where you haven't built like some sort of object structure to pack it into with like marshmallow or something.

04:06 You're just like, I'm going to work with this dictionary and it's kind of painful.

04:08 This seems like it takes a lot of the pain away.

04:10 Yeah, I have a use case right now that we're pulling Jason out of, we took an off the shelf Jason reporter for pytest that reports all the test output in Jason.

04:23 And it's nice, but it reports like way too much that we care, than we care about.

04:28 So we're going to use this to, or something like this to translate from what we're getting to a data structure that's easier to work with.

04:36 Yeah, that's quite cool.

04:37 Yeah.

04:37 Super nice.

04:39 So I think there's this topic I want to bring up.

04:42 Let me just know if we've covered it before.

04:45 It has to do with GUIs and Python.

04:46 So who's doing standup now?

04:51 I think you're doing the standup.

04:52 Yeah, I know.

04:53 Pretty much.

04:54 Oh my gosh.

04:54 So long ago, you and I, we started down this path on this journey of exploring like, you know, what we thought were the UI frameworks, like WX Python, the Phoenix release, and Python for Qt coming along.

05:07 Those were like the big pieces of news, and there still are.

05:10 But it seems like every week, somebody's like, oh, I know you guys have talked about 26 other cool UI frameworks, but do you know about X?

05:18 Yeah.

05:18 Right?

05:19 And, you know, even the guy behind Python's simple GUI is like doing all sorts of cool stuff since we started talking about it on the show.

05:27 And there's a lot of cool things happening here.

05:29 Yeah.

05:29 You picked out a really neat one.

05:31 This is a really scientific computing Python GUI focused thing, and it's really, really simple.

05:38 It's not for building super complicated things.

05:41 The idea is I've got some object or set of objects, and I would like to create a GUI around it.

05:49 So, you know, for example, they have like this camera concept, and the camera has a gain and an exposure and some functions and stuff like that.

05:57 Like you can take a picture based on those settings.

06:00 And what you can do, it's a little bit like SQL Alchemin that you specify.

06:04 These are the traits of this object.

06:07 And then you use this thing called traits UI from InThot, and you can upgrade that to like a form, basically.

06:15 So you can say, show the camera, and it pops up a form.

06:17 It says, what is the gain?

06:18 What is the exposure?

06:19 And you can even control the widgets that go there.

06:22 So like an up, down, numerical thing, and so on.

06:25 You can pack on graphs through this Kakoa thing, also from InThot.

06:31 And it's just a really simple way to take an object, show it to the user in a GUI form, and get their values back.

06:37 It's pretty cool.

06:38 And so the mindset kind of is people that are, again, a lot of people are using Python that are not, programming isn't their main job.

06:46 So this is something where people would, they need access to, you know, like let's say a device interaction or something like this example.

06:53 But they need to be able to control it in a user interface.

06:56 It doesn't have to be beautiful, but actually this looks pretty good.

07:01 It doesn't look terrible.

07:01 And what's cool is the foundational framework.

07:04 It'll actually find its way to select like WX Python or PySide, which is the Python for Qt variant, or PyQt5.

07:15 So it'll cycle through the known frameworks and basically say, well, I found WX Python, so we're using that, for example.

07:20 Which is really cool because a lot of those frameworks are much better looking than, say, TK Enter by default.

07:26 Yeah, that's cool.

07:27 Yeah, so you can, if you ship your little app, like you Py installer it with, you know, WX Python, it'll use that.

07:33 You Py installer it with Qt for Python, it'll do that.

07:36 That's really cool.

07:37 Now I kind of want to go out and see if I can write like an oscilloscope interface with this.

07:41 Yeah, you should do it.

07:41 But like I got other things to do.

07:43 Oh, come on, you got a few hours, don't you?

07:45 Yeah.

07:46 Awesome.

07:48 All right, well, what's next?

07:49 Another, taking data from one format and putting it in another one.

07:53 I found another tool that I figured I'd cover in the same episode because I'm comparing them at the same time.

07:59 And so this one is called PAMPY, P-A-M-P-Y.

08:03 It's a pattern matching for Python you always dreamed of.

08:07 That's their tagline.

08:08 It's a very small focused library that it's kind of got a neat interface that's pretty easy to catch on.

08:15 It's got a really interesting interface, yeah.

08:17 Yeah.

08:17 So like the example that we're going to stick in the show notes is you just say from PAMPY import match and underscore.

08:25 So they're overusing, they're reusing underscore or using it as a thing.

08:30 And so you give it a pattern of known, like a known data structure pattern.

08:35 And then you put these blanks in the places where you expect other values.

08:41 And then you call match with any data you want.

08:44 And then this pattern.

08:45 And then it spits out as many variables as you've put underscores in if they match.

08:51 So you can just sort of go through a whole bunch of data and pull out just the bits you need as long as they match the pattern.

08:58 This is kind of similar to the one you had before.

09:00 But it's like regular expressions applied to hierarchical structure of data in like a weird way.

09:07 So let me see if I can try to visualize this for folks.

09:09 So if you have a variable that is a list and unless you have one and then the next item is actually the list, two, three, and then four, you can say match, you know, list of one comma, some underscore, a list that contains an underscore and a three and then an underscore.

09:25 And then every, when you run it through that, it'll say, well, we found a match and the values for the two underscores were two and four.

09:32 That's pretty cool.

09:33 And then the last thing you pass in is the what to do if you find a match.

09:39 And so you can pass in a function that takes that many parameters or a lambda expression or something if you want.

09:45 And it'll call your function with those parameters and do whatever.

09:51 Yeah.

09:51 And you can also just write a function that returns the value so you can capture it, which is kind of cool as well.

09:55 Yeah.

09:56 Yeah.

09:56 Very nice.

09:57 I like it.

09:58 It's one of those things that I think looks really cool and I think would be really useful, but I would forget to use it.

10:03 You know, so I guess I got to remember to use this thing next time that I have like a situation where it would be a really good fit and where, you know, it's a match for the problem I'm solving.

10:13 Nice.

10:15 But it's one of those things also I like.

10:17 I like to see more packages that are just small, sharp tools for one use case or use them for whatever.

10:23 But, I mean, I use screwdrivers for all sorts of stuff.

10:26 But, you know.

10:27 Yeah.

10:27 The little backhand part is good for beating stuff in like nails and whatnot.

10:30 Yeah.

10:31 I think that's a great point.

10:32 All right.

10:33 Now, before we get on to the next one, which has some pretty practical applications, actually, I just want to tell you all about Digitalution.

10:39 So, one of the features I've been really happy with lately is their idea of projects because you go to some of these cloud providers and there's just tons of assets.

10:49 There's servers.

10:49 There's IP addresses.

10:50 There's load balancers.

10:52 They're all just spread in there and you don't know which one goes with which.

10:55 Maybe you've got a QA environment or a staging environment and a production one.

10:59 Which goes with which unless you've named it really carefully and even then it's hard.

11:02 So, at Digitalution, you can go create a project like a production Python Bytes server, you know, as project and put the servers and the floating IP addresses and all that in there.

11:13 Same for staging and so on.

11:15 So, they've got all sorts of cool features.

11:17 If you check them out at pythonbytes.fm/Digitalution, you'll get $100 credit for new users.

11:23 And definitely working out well for us.

11:25 You guys should check them out.

11:26 Speaking of getting checked out, sometimes people get sick or they may be sick.

11:32 You have to go to the doctor and the doctor takes some kind of picture and says, I looked at this scan and either you're okay or you're not okay, right?

11:41 It turns out, though, that analyzing pictures for patterns is something that AI can do really well, right?

11:48 Yeah.

11:48 So, Google recently took in this article, it's so funny, it says, well, they took this off-the-shelf AI and they pointed it at mammogram scans to try to detect a breast cancer.

12:02 And what they found out was a couple of things that were super, super interesting.

12:08 So, first, this thing they called Lina was able to correctly identify tumorous regions 99% of the time.

12:17 The AI was.

12:18 That's amazing.

12:19 I mean, it's not 100%, but it is much better than doctors.

12:24 I can't remember what the doctor percentage was, but it was way off.

12:27 If you have, if it's really a bad case, then it's pretty easy.

12:30 But this is like early detection, right?

12:32 And catching cancer early is the key.

12:34 And this is like much, much better than doctors did.

12:37 So, that's really great.

12:39 So, I guess the first, the question is, does this mean that all the radiologists and their jobs and the cancer pathologists, their jobs are just gone, right?

12:49 Is that what it means, right?

12:51 Because that could be what AI means, say, for truck drivers or taxi drivers.

12:55 But you always think that it's kind of low in jobs.

12:57 But is that really, do people who have medical degrees, are they in danger of being like kicked out of a job by AI?

13:03 I honestly am on the fence.

13:04 I don't really know.

13:05 Like this, this is not a great sign for that skill because computers are getting so good at it.

13:12 But one good sign is they did a second trial where they took six pathologists and they let them do diagnosis with and without the AI's assistants.

13:22 And they said with the assistants, the doctors found it easier to detect these small problems and it only took half as long.

13:28 Yeah.

13:28 Well, that's what I was going to say.

13:30 I mean, like it says 99% of the time, but that's not a real statistic.

13:34 We want to know like how many false positives, how many false negatives.

13:38 Yeah.

13:39 There's going to be gray area where like the computer says, yep, there's cancer there.

13:43 And I'm 100% sure or, you know, close.

13:46 Well, those cases, the doctor probably would have found it also.

13:50 But having the computers do it is going to be better.

13:53 And then the gray area is we're going to always need doctors to look at the stuff that's like questionable, like 50% chance that there might be.

14:01 And they can look at it and go, yeah, maybe we should redo the test or something or whatever.

14:07 I don't know about other countries.

14:09 But I think all of us have a shortage of doctors.

14:12 Yes.

14:13 If we can have the same doctors do 10 or 100 times more patients with the help of AI, then go for it.

14:20 Let's do it.

14:20 Yeah.

14:21 I think that's the real bright point here is to have more doctors and not just having more doctors, but having doctors more evenly distributed.

14:30 In a large country like the U.S., there's very rural parts and there's very urban parts.

14:36 And the access to doctors you have in a big city versus, you know, 100 miles from a big city in a tiny town, that is not the same.

14:43 Right.

14:44 But I can easily see taking a scan at your local doctors, shooting it up to the cloud.

14:48 It says this, you jump on a Zoom meeting with another doctor for five minutes.

14:53 It says, hey, here's what the AI says.

14:55 I checked it over.

14:56 I agree.

14:56 Here's what we're going to do.

14:57 Either, you know, you come to the city for treatment or actually you're fine.

15:01 You just hang out.

15:02 So I think in the, like the democratization of this for people, I think this is really good.

15:08 Yeah.

15:08 And, and, and speeding things up too.

15:10 It might be that, that the, on the walk back from the, the scanning area of, of your doctor's office back to the,

15:18 your normal room in that time, maybe we could have an answer for you instead of having to call you later tomorrow or something.

15:25 So it's all good.

15:27 Yeah, it's definitely good.

15:28 All right.

15:28 So this next one, is this like a little bit like a hundred days of code?

15:32 What is this?

15:33 I think it is, but it's like Christmasy.

15:36 So this is the advent of code.

15:39 And this has been around since 2015 and it's at adventofcode.com.

15:45 It's just sort of a fun code challenges that they reveal one per day for 25 days in December.

15:52 And you've got just small programming puzzles covering a wide variety of skill sets.

15:58 But they're, they're sort of a geared both easy to hard.

16:01 And there's, there's not a particular programming language you can use.

16:05 So a lot of people have said, or I've heard people say they, they solve them in their most comfortable language.

16:10 But then also you've got puzzles, of past years available too.

16:15 If you're learning a new language, you can try to solve these puzzles in, in a new language as well.

16:20 Yeah, I really like it.

16:21 That's, that's pretty cool.

16:21 And the fact that it comes one a day is pretty sweet.

16:25 Yeah.

16:25 And it says it doesn't need a lot of computational power, so it should be accessible.

16:29 Yeah.

16:29 And then we've also put a link into a GitHub repo that's called Awesome Advent of Code, which is a whole bunch of extra resources, like links to where people have posted their solutions in particular languages or things like that.

16:42 So if you're really into it, you can check that out also.

16:45 Yeah.

16:45 I love it.

16:46 And it's, quite timely.

16:48 Yeah.

16:48 I guess people are maybe a couple of days behind.

16:50 They'll have to do a few in a row, right?

16:52 Being December 5th, but that's okay.

16:54 All right.

16:55 The last one is a nice year end type of thing as well.

17:00 And it's, it has to do with the, the sun setting of legacy Python, which most people agree, I think is a good thing, right?

17:07 Definitely.

17:08 Yeah, definitely.

17:08 So when I think of some of the, the holdouts for legacy Python, Python 2, if you will, it's often these enterprises, they have, they have big code bases.

17:20 They don't really want to change them.

17:22 They don't have a large motivation to change them.

17:23 They're often using something like Red Hat Linux because they want the stability of that, the long-term support of that.

17:29 So the news is Red Hat Linux 8 is now updated for Python 6.

17:35 Sorry, 3.6 by default.

17:37 6 would be awesome.

17:38 That'd be a huge announcement.

17:39 No, 3.6 by default instead of 2.7.

17:42 So that's pretty interesting, right?

17:45 Yes.

17:46 By default.

17:47 Yeah.

17:48 I think I'm linking to the Reddit page.

17:50 Yeah.

17:51 I'm linking to the Reddit discussion that then links to the main article because there's some funny stuff in there.

17:55 And I think, Brian, I don't know if this comes from us in any way, or maybe Matias who started this way back when.

18:01 But the very first comment was just simply correcting the title to say, no, you know, you didn't mean to say 2.7.

18:08 You meant to say legacy Python.

18:09 Yes.

18:12 Yes.

18:13 Keep going, people.

18:14 Keep going.

18:14 Yes.

18:15 So, yeah, it's pretty cool.

18:16 They said they have only limited support for Python 2.7.

18:20 And also, no version of Python will be installed by default.

18:23 So you've got to install 3.8 as well.

18:25 But that's what most of the stuff defaults to.

18:28 And that actually, that's kind of cool because then with nothing installed by default, we can probably use some statistics better.

18:37 Yes.

18:38 It's hard to tell if it just comes with your install, installation, then we don't really know what people are choosing.

18:46 Right.

18:46 Absolutely.

18:47 Yeah.

18:48 So there's a couple of comments that are interesting.

18:49 It says Python 2.7 is available as a package, but it'll have a shorter life.

18:55 And the reason it's still available is to facilitate a smoother transition to Python 3.

18:59 That's one.

19:00 And they also say customers are advised to use Python 3 or Python 2 directly because the shebangs that, or sorry, hashbangs that you put at the front at the file, like to say this should be executed in Bash.

19:14 This should be executed in Python.

19:15 Well, now you have to specify a major version.

19:18 You can't say like.

19:20 Yeah, Python 3 or Python 2.

19:21 Yeah, you can't just say Python up there.

19:23 That's actually an error.

19:24 It'll say you have to say Python 2 or 3 if you want this to actually run because they want you to opt in and not just choose some sort of default thing.

19:32 It's pretty cool.

19:33 Yeah.

19:33 So another step towards the present future.

19:37 Yeah.

19:37 So I've never seen hashbang before.

19:40 Yeah.

19:40 I usually see the shebang, but they say hashbangs here.

19:43 Okay.

19:43 It must be the enterprise term.

19:45 Maybe.

19:47 Cool.

19:47 Well, that's pretty much all the news we have for this week.

19:50 There's tons more.

19:51 We're always not covering all the items.

19:54 There's so much going on.

19:55 But that's our news.

19:57 I do want to throw out one thing here.

19:59 And I know, Brian, I'm still waiting for that punchline there.

20:02 So before we get to that, though, I want to say thanks to Brian McCullough over at Tech Meme.

20:08 So Tech Meme is a website that's got like all the latest news on tech, which is pretty cool.

20:11 And they have a podcast called The Long Ride Home.

20:15 You can check that out.

20:16 So the reason I'm bringing this up here is it's a pretty cool show.

20:19 It's kind of like Python Bytes, but more for like general tech.

20:23 You know, like, oh, Google acquired this company or this thing's happening to the iPhone or whatever.

20:29 Right.

20:29 So it's good analysis.

20:31 It's well done.

20:32 It's about the same length.

20:32 But the reason I'm calling him out and saying thank you is he actually covered Python Bytes as the first recommended podcast on his show.

20:40 So I just want to say thanks, Brian.

20:41 And you guys can check out their show as well.

20:43 Yeah, definitely.

20:43 And because they did that, which is it's a really cool call out, too.

20:47 Thanks, Brian.

20:48 But the I listened to a couple episodes and I kind of liked it.

20:51 It's nice.

20:51 Yeah, it's nice.

20:52 I like it.

20:52 It's a good sort of a cousin of the show, if you will.

20:55 Yeah.

20:56 All right.

20:56 All right.

20:56 Tell me about this punchline, man.

20:58 Okay.

20:58 So I had heard of GLOM before, but I heard about it a lot more when I had Mahmoud on testing code.

21:05 And we talked about GLOM, but we mostly were talking about how difficult it was to test it.

21:11 Because if you're using a high level construct, you don't have to write very much code for it.

21:16 So your code can be 100 percent covered, but you really haven't covered all the cases yet.

21:21 So how do you deal with that?

21:22 So we talked about that.

21:23 And then Anthony Shaw got on Twitter and started talking about some of the ways we could increase the coverage of GLOM.

21:31 And then I pointed out holes in his solution.

21:34 And then he replied with this joke.

21:37 And the joke originally came from Brennan Keller.

21:41 And it's a QA engineer walks into a bar.

21:45 He orders a beer.

21:46 Orders zero beers.

21:47 Orders 9,999,000 beers.

21:51 Orders a lizard.

21:53 Orders minus one beers.

21:54 Orders a random set of characters.

21:56 Okay.

21:58 Now a first real customer walks in and asks where the bathroom is.

22:01 The bar bursts into flames, killing everyone.

22:04 I love it.

22:05 It's so perfect.

22:07 Anyway.

22:09 It has nothing to do with anything.

22:12 It's just funny.

22:13 Yeah.

22:13 No, it's really good.

22:14 I like it.

22:15 It's great.

22:15 Thanks for sharing.

22:16 And thanks for doing this podcast with me.

22:18 It's been fun.

22:18 It's fun as always.

22:20 We're going to keep it rolling.

22:21 Strong into 2019 for sure.

22:23 Catch you later.

22:24 Bye.

22:24 Bye.

22:25 Thank you for listening to Python Bytes.

22:26 Follow the show on Twitter via at Python Bytes.

22:29 That's Python Bytes as in B-Y-T-E-S.

22:32 And get the full show notes at pythonbytes.fm.

22:35 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

22:40 We're always on the lookout for sharing something cool.

22:43 On behalf of myself and Brian Okken, this is Michael Kennedy.

22:46 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page