Transcript #107: Restructuring and searching data, the Python way
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 107, recorded December 5th, 2018. I'm Michael Kennedy.
00:10 And I'm Brian Ockin.
00:11 And this episode is brought to you by DigitalOcean. Check them out at pythonbytes.fm/digitalocean.
00:17 Huge supporters of the show, great product, and you get $100 free credit for new users.
00:22 So check them out. I'll tell you more about them later. But Brian, how you been?
00:27 I'm doing really good.
00:28 - Good, so I hear you're working on your stand-up act.
00:31 (laughing)
00:33 - No.
00:33 - No, your stand-up comedy?
00:35 - No, but I do find lots of things funny.
00:39 And we've got a, the first topic turned into a, into a Twitter discussion that ended in a joke, and so I'm gonna share that later in the show.
00:48 - Right, but like good jokes, punchlines go at the end, right?
00:50 - Yeah.
00:51 - Okay, cool.
00:52 - So the topic I wanna talk about is Glam, which I'd actually heard about.
00:57 It's a package started by Mahmoud Hashemi, who brought us Zerover and other great things.
01:03 It's a package to try to reshape data.
01:05 So if you've got like JSON or really any data that is in or data structure that's in one type and one shape and you need it in another shape or you need some of it out, that's what Glom is written for.
01:18 But it's written to be kind of like, kind of used like a regular expression is.
01:22 It's a general purpose tool that you can use to translate from one thing to another.
01:27 And some of the cool things about it are that it's like a path-based, you can access things with path-based access.
01:36 Like, as an example, if you were gonna have a 3D dictionary, you'd have to pass in--
01:43 - A dictionary of dictionaries of dictionaries, sort of thing. - Yeah, so--
01:46 - Or maybe two levels and then an item.
01:48 - So it's sort of a lot of brackets and colons, and brackets and quotes and stuff to specify that.
01:54 So they've got a shorthand version that you can say like a.b.c or something like that instead of all the brackets.
02:02 It's a fairly simple interface to think about.
02:04 It's a GLOM and then you have a target data, target specification, and then you've got some other things that you can do like default.
02:13 So if like there's some data that's missing.
02:15 There's a lot of Python ways to do this sort of thing, but GLOM is sort of rather complete.
02:20 It does a lot of neat things.
02:21 And one of the neat things it does is as you're specifying the from portion of your data transformation, sometimes something might not be there.
02:30 Like, if you were expecting element C in a really nested dictionary, and if it's not there, that element just doesn't exist, you might get something weird in normal Python, like the famous none type object is not subscriptable.
02:47 And it doesn't tell you anything about what went wrong.
02:49 So one of the things Colum does is gives you better error messages.
02:52 I could not access C part two of the path ABC, which is like, oh, well that's way more useful than something on this line was none, basically.
03:02 - Yeah, exactly.
03:03 And then they also built in, since it's being used in production, Mahmoud is using it at work as well.
03:10 It's got a bunch of cool things like built-in data exploration and debugging features.
03:15 So when things do go wrong, you can sort of interactively try to figure out what went wrong.
03:20 - That's really cool.
03:21 I love the way that it works.
03:23 It seems really nice.
03:25 I feel like you could almost do a little--
03:27 like a minor tweak to it to make it even cooler, where you can do straight attribute access.
03:32 So you say glom parentheses data, and then the string a.b.c.
03:36 It feels to me like you could extend it, say glom of data dot a.b.c, and have it understand that and sort of apply it so it doesn't look like function calls.
03:47 It looks more like attribute access once you sort of glomify an object.
03:50 Who knows?
03:50 But either way, I still think this is really nice, especially if you're working with data that comes, like you said, in nested dictionaries or things like that, where you haven't built some sort of object structure to pack it into, like Marshmallow or something.
04:06 You're just like, I'm gonna work with this dictionary, and it's kind of painful.
04:09 This seems like it takes a lot of the pain away.
04:10 - Yeah, I have a use case right now that we're pulling JSON out of, we took an off-the-shelf JSON reporter for pytest that reports all the test output in JSON.
04:24 And it's nice, but it reports like way too much than we care about.
04:28 So we're gonna use this to, or something like this to translate from what we're getting to a data structure that's easier to work with.
04:36 - Yeah, that's quite cool.
04:38 Super nice.
04:39 So I think there's this topic I wanna bring up.
04:42 Let me just know if we've covered it before.
04:45 It has to do with GUIs and Python?
04:46 (laughing)
04:49 - So who's doing standup now?
04:51 I think you're doing the standup.
04:52 - Yeah, I know, pretty much, oh my gosh.
04:54 So long ago, you and I, we started down this path on this journey of exploring what we thought were the UI frameworks, like WXPython, the Phoenix release, and Python for Qt coming along.
05:08 Those were the big pieces of news, and there still are.
05:10 But it seems like every week, somebody's like, oh, I know you guys have talked about 26 other cool UI frameworks, but do you know about X?
05:18 Yeah.
05:18 Right?
05:19 And even the guy behind Python Simple GUI is doing all sorts of cool stuff since we started talking about it on the show.
05:27 And there's a lot of cool things happening here.
05:29 Yeah.
05:30 You picked out a really neat one.
05:31 This is a really scientific computing Python GUI focused thing.
05:37 And it's really, really simple.
05:38 It's not for building super complicated things.
05:41 The idea is I've got some object or set of objects.
05:45 and I would like to create a GUI around it.
05:49 So, you know, for example, they have like a, this camera concept and the camera has a gain and an exposure and some functions and stuff like that.
05:57 Like you can take a picture based on those settings.
06:00 And what you can do, it's a little bit like SQLAlchemy that you specify, these are the traits of this object.
06:07 And then you use this thing called traits UI from InThought and you can upgrade that to like a form basically.
06:15 So you can say, show the camera, and it pops up a form, it says, what is the gain, what is the exposure, and you can even control the widgets that go there, so like an up/down numerical thing and so on.
06:26 You can pack on graphs through this Kakoa thing, also from InThot, and it's just a really simple way to take an object, show it to the user in a GUI form, and get their values back.
06:37 - It's actually, it's pretty cool, and so the mindset kind of is, people that are, again, a lot of people are using Python that are not programming isn't their main job.
06:46 So this is something where people would, they need access to, you know, like let's say a device interaction or something, like this example, but they need to be able to control it in a user interface.
06:57 It doesn't have to be beautiful, but actually this looks pretty good.
07:01 - It doesn't look terrible, and what's cool is the foundational framework.
07:04 It'll actually find its way to select like WX Python or PySide, which is the Python for Qt variant, or PyQt 5, so it'll cycle through the known frameworks and basically say, well, I found WXPython, so we're using that, for example.
07:21 Which is really cool, because a lot of those frameworks are much better looking than, say, TK Enter by default.
07:26 - Yeah, that's cool.
07:27 - Yeah, so you can, if you ship your little app, like you Py installer it with WXPython, it'll use that.
07:33 You Py installer it with Qt for Python, it'll do that.
07:36 That's really cool.
07:37 - Now I kinda wanna go out and see if I can write an oscilloscope interface with this.
07:41 - Yeah, you should do it.
07:42 - I've got other things to do.
07:43 - Oh, come on, you've got a few hours, don't you?
07:46 - Yeah.
07:46 (laughing)
07:47 - Awesome, all right, well, what's next?
07:49 - Another, taking data from one format and putting it in another one.
07:53 I found another tool that I figured I'd cover in the same episode because I'm comparing them at the same time.
08:00 And so this one is called PAMPY, P-A-M-P-Y.
08:04 It's Pattern Matching for Python You Always Dreamed Of.
08:07 That's their tagline.
08:08 It's a very small focused library that it's kind of got a neat interface that's pretty easy to catch up.
08:15 >>It's got a really interesting interface, yeah.
08:17 >>Yeah, so like the example that we're gonna stick in the show notes is you just say from PAMPI import match and underscore.
08:25 So they're overusing, they're reusing underscore or using it as a thing.
08:30 And so you give it a pattern of known, like a known data structure pattern.
08:36 And then you put these blanks in the places where you expect other values.
08:41 And then you call match with any data you want and then this pattern, and then it spits out as many variables as you've put underscores in if they match.
08:52 So you can just sort of go through a whole bunch of data and pull out just the bits you need as long as they match the pattern.
08:59 - This is kind of similar to the one you had before, but it's like regular expressions applied to a hierarchical structure of data.
09:06 in like a weird, weird way.
09:07 So let me see if I can try to like visualize this for folks.
09:09 So if you have a variable that is a list and unless you have one, and then the next item is actually the list two, three, and then four, you can say, match, you know, list of one comma, some underscore, a list that contains an underscore and a three and then an underscore.
09:26 And then every, whenever you run it through that, it'll say, well, we found a match and the values for the two underscores were two and four.
09:33 That's pretty cool.
09:34 And the last thing you pass in is the what to do if you find a match.
09:39 And so you can pass in a function that takes that many parameters or a lambda expression or something if you want, and it'll call your function with those parameters and do whatever.
09:51 >>Yeah, and you can also just write a function that returns the value so you can capture it, which is kind of cool as well.
09:56 Very nice.
09:57 I like it.
09:58 It's one of those things that I think looks really cool and I think would be really useful, but I would forget to use it.
10:04 You know, so I guess I gotta remember to use this thing next time that I have like a situation where it would be a really good fit.
10:11 Where, you know, it's a match for the problem I'm solving.
10:14 - Nice.
10:15 But it's one of those things also I like.
10:17 I'd like to see more packages that are just small, sharp tools for one use case or use them for whatever.
10:23 But I mean, I use screwdrivers for all sorts of stuff.
10:27 But, you know.
10:27 - Yeah, the little backhand part's good for beating stuff in like nails and whatnot.
10:31 I think that's a great point.
10:33 All right, now before we get on to the next one, which has some pretty practical applications, actually, I just want to tell you all about DigitalOcean.
10:40 So one of the features I've been really happy with lately is their idea of projects.
10:45 Because you go to some of these cloud providers, and there's just tons of assets.
10:49 There's servers.
10:50 There's IP addresses.
10:51 There's load balancers.
10:52 They're all just spread in there.
10:54 And you don't know which one goes with which.
10:55 Maybe you've got a QA environment or a staging environment and a production one.
10:59 Which goes with which, unless you've it really carefully and even then it's hard. So at DigitalOcean you can go create a project like a production Python bytes server, you know, as project input the servers and the floating IP addresses and all that in there. Same for staging and so on. So they've got all sorts of cool features. If you check them out at pythonbytes.fm/digitalocean you'll get $100 credit for new users and definitely working out well for us. You guys should check them out. Speaking of getting checked Sometimes people get sick or they may be sick and you have to go to the doctor and the doctor takes some kind of picture and says, "I looked at this scan "and either you're okay or you're not okay." It turns out though that analyzing pictures for patterns is something that AI can do really well.
11:48 - Yeah.
11:49 - So Google recently took, in this article, it's so funny, it says, "Well, they took this off-the-shelf AI and they pointed it at mammogram scans to try to detect a breast cancer. And what they found out was a couple of things that were super, super interesting. First, this thing they called Lina was able to correctly identify tumorous regions 99% of the time. The AI was. That's amazing. I mean it's not a 100%, but it is much better than doctors.
12:24 I can't remember what the doctor percentage was, but it was way off.
12:27 If you have, if it's really a bad case, then it's pretty easy, but this is like early detection, right, and catching cancer early is the key, and this is like much, much better than doctors did.
12:37 So that's really great.
12:40 So I guess the first, the question is, does this mean that all the radiologists and their jobs and the cancer pathologists, their jobs are just gone, right?
12:50 Is that what it means, right?
12:51 'Cause that could be what AI means, say, for truck drivers or taxi drivers.
12:55 But you always think of that as kind of low-end jobs.
12:57 But is that really, do people who have medical degrees, are they in danger of being kicked out of a job by AI?
13:03 I honestly am on the fence.
13:04 I don't really know.
13:05 Like, this is not a great sign for that skill because computers are getting so good at it.
13:12 But one good sign is they did a second trial where they took six pathologists and they let them do diagnosis with and without the AI's assistance.
13:23 And they said with the assistance, the doctors found it easier to detect these small problems and it only took half as long.
13:28 - Yeah, well that's what I was gonna say.
13:30 I mean, it says 99% of the time, but that's not a real statistic.
13:35 We wanna know how many false positives, how many false negatives.
13:39 There's gonna be gray area where the computer says, yep, there's cancer there.
13:43 And I'm 100% sure, or close.
13:47 In all those cases, the doctor probably would have found it also, but having the computers do it is going to be better.
13:54 And then the gray area is we're going to always need doctors to look at the stuff that's questionable, like 50% chance that there might be, and they can look at it and go, "Yeah, maybe we should redo the test or something," or whatever.
14:08 I don't know about other countries, but I think all of us have a shortage of doctors.
14:13 If we can have the same doctors do 10 or 100 times more patients with the help of AI, then go for it.
14:20 Let's do it.
14:21 Yeah, I think that's the real bright point here is to have more doctors and not just having more doctors, but having doctors more evenly distributed.
14:30 In a large country like the US, there's very rural parts and there's very urban parts.
14:36 And the access to doctors you have in a big city versus 100 miles from a big city in a in a tiny town, that is not the same, right?
14:44 But I can easily see taking a scan at your local doctors, shooting it up to the cloud, it says this.
14:50 You jump on a Zoom meeting with another doctor for five minutes, it says, hey, here's what the AI says, I checked it over, I agree, here's what we're gonna do.
14:58 Either you come to the city for treatment, or actually you're fine, you just hang out.
15:02 So I think in the democratization of this for people, I think this is really good.
15:08 - Yeah, and speeding things up too, It might be that on the walk back from the scanning area of your doctor's office back to your normal room, in that time, maybe we could have an answer for you instead of having to call you later tomorrow or something.
15:26 So it's all good.
15:27 - Yeah, it's definitely good.
15:28 All right, so this next one, is this a little bit like 100 Days of Code?
15:32 What is this?
15:33 - I think it is, but it's like Christmas-y.
15:36 So this is the Advent of Code.
15:40 And this has been around since 2015, and it's at adventofcode.com.
15:45 It's just sort of a fun code challenges that they reveal one per day for 25 days in December.
15:53 And you've got just small programming puzzles covering a wide variety of skill sets, but they're sort of geared both easy to hard and there's not a particular programming language you can use.
16:05 So a lot of people have said, or I've heard people say they solve them in their most comfortable language, but then also you've got puzzles of past years available too.
16:15 If you're learning a new language, you can try to solve these puzzles in a new language as well.
16:20 - Yeah, I really like it.
16:21 That's pretty cool, and the fact that it comes one a day is pretty sweet.
16:25 Yeah, and it says it doesn't need a lot of computational power, so it should be accessible.
16:29 - Yeah, and then we've also put a link into a GitHub repo that's called Awesome Advent of Code, which is a whole bunch of extra resources like links to where people have posted their solutions in particular languages or things like that.
16:42 So if you're really into it, you can check that out also.
16:45 - Yeah, I love it and it's quite timely.
16:48 I guess people are maybe a couple days behind.
16:50 They'll have to do a few in a row, right?
16:53 Being December 5th, but that's okay.
16:55 All right, the last one is a nice year-end type of thing as well and it has to do with the sun setting of Legacy Python, which most people agree I think is a good thing, right?
17:07 - Definitely.
17:08 - Yeah, definitely.
17:09 So when I think of some of the holdouts for legacy Python, Python 2, if you will, it's often these enterprises, they have big code bases, they don't really wanna change them, they don't have a large motivation to change them.
17:24 They're often using something like Red Hat Linux because they want the stability of that, the long-term support of that.
17:29 So the news is Red Hat Linux 8 is now updated for Python six, sorry, three, six by default.
17:38 Six would be awesome, that'd be a huge announcement.
17:39 No, three, six by default instead of two, seven.
17:43 So that's pretty interesting, right?
17:45 - Yes, very interesting. - By default, yeah.
17:48 I think I'm linking to the Reddit page.
17:51 Yeah, I'm linking to the Reddit discussion that then links to the main article because there's some funny stuff in there.
17:55 And I think, Brian, I don't know if this comes from us in any way or maybe Matias who started this way back when, But the very first comment was just simply correcting the title to say, no, you didn't mean to say 2.7, you meant to say legacy Python.
18:10 (laughing)
18:12 Yes.
18:12 Keep going people, keep going.
18:15 So, yeah, it's pretty cool.
18:16 They said they have only limited support for Python 2.7 and also no version of Python will be installed by default.
18:24 So you've got to install 3 as well, but that's what most of the stuff defaults to.
18:29 - Actually, that's kind of cool because then with nothing installed by default, we can probably use some statistics better because it's hard to tell.
18:40 If it just comes with your install installation, then we don't really know what people are choosing.
18:46 - Right, absolutely.
18:48 Yeah, so there's a couple comments that are interesting.
18:50 It says Python 2.7 is available as a package, but it'll have a shorter life, and the reason it's still available is to facilitate a smoother transition to Python 3.
18:59 That's one.
19:00 And they also say, "Customers are advised to use Python 3 "or Python 2 directly," because the shabangs, or sorry, hashbangs that you put at the front, at the file, like to say, "This should be executed in Bash.
19:14 "This should be executed in Python." Well, now you have to specify a major version.
19:19 You can't say-- - Yeah, Python 3 or Python 2. - Yeah, you can't just say Python up there, that's actually an error.
19:24 You'll see you have to say Python 2 or 3 if you want this to actually run, because they want you to opt in and not just choose some sort of default thing.
19:32 It's pretty cool.
19:33 So another step towards the present future.
19:37 - Yeah, so I've never seen Hashbang before.
19:40 - Yeah, I usually see it as shabang, but they say Hashbangs here, yeah.
19:43 - Okay.
19:44 - It must be the enterprise term.
19:45 (laughing)
19:46 - Maybe.
19:47 - Cool.
19:48 Well, that's pretty much all the news we have for this week.
19:50 There's tons more.
19:51 We're always not covering all the items.
19:54 There's so much going on, but that's our news.
19:57 I do want to throw out one thing here, and I know Brian, I'm still waiting for that punchline there.
20:02 So before we get to that, though, I want to say thanks to Brian McCullough over at TechMeme.
20:08 So TechMeme is a website that's got all the latest news on tech, which is pretty cool.
20:12 And they have a podcast called The Long Ride Home.
20:15 You can check that out.
20:16 So the reason I'm bringing this up here is it's a pretty cool show.
20:19 It's kind of like Python Bytes, but more for general tech.
20:24 You know, like, oh, Google acquired this company, or this thing's happening to the iPhone or whatever, right?
20:29 So it's good analysis, it's well done, it's about the same length.
20:33 But the reason I'm calling him out and saying thank you is he actually covered Python Bytes as the first recommended podcast on his show.
20:40 So I just wanna say thanks, Brian, and you guys can check out their show as well.
20:43 - Yeah, definitely, and because they did that, which it's a really cool call out too, thanks, Brian.
20:48 But I listened to a couple episodes and I kinda liked it, it's nice.
20:52 - Yeah, it's nice, I like it.
20:52 It's a good, it's sort of a cousin of the show, if you will.
20:55 - Yeah.
20:56 - All right, all right, tell me about this punchline, man.
20:58 - Okay, so I had heard of Glam before, but I heard about it a lot more when I had Mahmood on testing code.
21:05 And we talked about Glam, but we mostly were talking about how difficult it was to test it, because if you're using a high-level construct, you don't have to write very much code for it, so your code can be 100% covered, but you really haven't covered all the cases yet.
21:21 So how do you deal with that?
21:22 So we talked about that.
21:24 And then Anthony Shaw got on Twitter and started talking about some of the ways we could increase the coverage of Glam.
21:31 And then I pointed out holes in his solution.
21:35 And then he replied with this joke.
21:38 And the joke originally came from Brennan Keller.
21:42 And it's a QA engineer walks into a bar.
21:45 He orders a beer.
21:47 Orders zero beers.
21:48 Orders 9,999,000 beers.
21:52 Orders a lizard.
21:53 orders minus one beers, orders a random set of characters.
21:57 Okay, now a first real customer walks in and asks where the bathroom is.
22:02 The bar bursts into flames, killing everyone.
22:04 (laughing)
22:05 - I love it.
22:06 It is so perfect.
22:07 (laughing)
22:09 - Anyway.
22:09 It has nothing to do with anything, it's just funny.
22:13 - Yeah, no, it's really good, I like it.
22:15 It's great, thanks for sharing.
22:16 - Man, thanks for doing this podcast with me.
22:18 It's been fun.
22:19 - It's fun as always.
22:20 We're gonna keep it rolling.
22:21 strong into 2019 for sure. Catch you later.
22:24 >> Yeah. All right. Bye.
22:25 >> Bye.
22:25 >> Thank you for listening to Python Bytes.
22:27 Follow the show on Twitter via @pythonbytes, that's Python Bytes as in B-Y-T-E-S, and get the full show notes at pythonbytes.fm.
22:36 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
22:40 We're always on the lookout for sharing something cool.
22:43 On behalf of myself and Brian Okken, this is Michael Kennedy.
22:46 Thank you for listening and sharing this podcast with your friends and colleagues.