« Return to show page
Transcript for Episode #107:
Restructuring and searching data, the Python way
00:00 KENNEDY: Hello, and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is Episode 107 recorded December 5th, 2018. I'm Michael Kennedy.
00:11 OKKEN: And I'm Brian Okken.
00:12 KENNEDY: And this episode is brought to you by Digital Ocean. Check them out at pythonbytes.fm/digitalocean. Huge supporters of the show, great product, and you get $100 free credit for new users. So check them out, I'll tell you more about them later, but Brian, how you been?
00:27 OKKEN: I'm doing really good.
00:28 KENNEDY: Good. So, I hear you working on your stand up act.
00:30 OKKEN: No.
00:33 KENNEDY: No? Your stand-up comedy.
00:35 OKKEN: No but I do find lots of things funny. And, we've got a... The first topic turned into a Twitter discussion that ended in a joke. And so I'm going to share that later in the show right.
00:48 KENNEDY: Right, but good jokes, punchlines go at the end right?
00:51 OKKEN: So, the topic I'm going to talk about is glom which I'd actually heard about. It's a package started by Mahmood Ishami who brought us ZeroVer and other great things. It's a package to try to reshape data. So if you got like JSON or really any data, or any data structure that's in one type and wants shape and you need it in another shape or you need some of it out. That's what glom is written for but its written to be kind of like, kind of used like a regular expression is. It's a general-purpose tool that you can use to translate from one thing to another. Some other cool things about it are, It's like a path based, you can access things with the path-based access. Like as an example, if you were going to have a 3D dictionary, You'd have to pass in...
01:43 KENNEDY: A dictionary of dictionaries of dictionaries, sort of. Or two levels and an item.
01:48 OKKEN: So it's sort of a lot of brackets and columns, brackets and quotes and stuff to specify that. So they've got a shorthand version that you can say like A.B.C or something like that instead of all the brackets. It's a fairly simple interface to think about it. It's a glom and then you have a target data, target specification and then you've got some other things that you can do like default, so if like there's some data that's missing. There's a lot of python ways to do this sort of thing, but glom's sort of rather complete. It does a lot of neat things, and one of the neat things it does is, as you're specifying the from portion of your data transformation, sometimes something might not be there, like if you were expecting element C in like a really nested dictionary. If it's not there that element just doesn't exist. You might get something weird in normal Python like the famous, "NoneType' object is not subscriptable." And it doesn't tell you anything about what went wrong, so one of the things glom does is give you better error messages.
02:52 KENNEDY: Like "could not access c Part 2 of the path a.b.c", which is like, oh, well, that's way more useful than something on this line was. You know, none basically.
03:02 OKKEN: Yeah, exactly and then they also built in, since it's being used in production. Mahmood is using it at work as well, it's got a bunch of cool things like built in data exploration and debugging features. So when things do go wrong. You can sort of interactively try to figure out what went wrong, so.
03:21 KENNEDY: That's really cool. I love this, the way that it works, it seems really nice. I feel like you could almost do a little like a minor tweak to it to make it even cooler where you can do straight attribute access. So you say, glom, parenthesis data, and then the string a.b.c. It feels to me like you could extend it, say glom of data .a.b.c and have it understand that and sort of apply it. So it doesn't look like function calls, it looks more like attribute access, once you sort of glomify an object, who knows. But either way I still think this is really nice, especially if you're working with data, that comes, like you said in nested dictionaries or things like that, where you haven't built some sort of object structure to pack it into with like marshmallow or something, you're just like I'm going to work with this dictionary and it's kind of painful. This seems like it takes a lot of the pain away.
04:11 OKKEN: Yeah, I have a used case right now that we're pulling JSON out of, we took an off the shelf JSON reporter for pytest that reports all the test output in JSON and it's nice, but it reports like way too much than we care about so we're going to use this to, or something like this to translate from what we're getting to a data structure that's easier to work with.
04:36 KENNEDY: Yeah, that's quite cool. Super nice. So I think there's this topic I want to bring up. Let me just know if we've covered it before, it has to do with GUI's in Python.
04:48 OKKEN: So who's doing stand-up now? I think you're doing the stand-up.
04:53 KENNEDY: Yeah, I know pretty much. Oh my gosh. So long ago, we started down this path, on this journey of exploring like you know what we thought were the UI frameworks, like wxPython, the Phoenix release and Python for Qt coming along and those were like the big pieces of news and they still are. But it seems like every week somebody's like, oh, I know you guys have talked about 26 other cool UI frameworks, but do you know about X? Right. And you know, even the guy behind Python Simple GUI is like doing all sorts of cool stuff since we started talking about it on the show and there's a lot of cool things happening here.
05:29 OKKEN: Yeah, you picked out a really neat one.
05:31 KENNEDY: This is a really scientific computing Python GUI focused thing and it's really, really simple, It's not for building super complicated things. The idea is, I've got some object or set of objects and I would like to create a GUI around it. So for example, they have like, this camera concept, and the camera has a gain and an exposure and some functions and stuff like that, like you can take a picture based on those settings. What you can do, it's a little bit like SQLAlchemy in that you specify these are the traits of this object, and then you use this thing called TraitsUI from EnThought and you can upgrade that to like a form basically. So you say show the camera and it pops up a form, it says what is the gain? What is the exposure and you can even control the widgets that go there so like an up down numerical thing and so on. You can pack on graphs through this Chaco thing also from EnThought and it's just a really simple way to take an object show it to the user in a GUI form and get their values back.
06:39 OKKEN: It's actually pretty cool. The mind set kind of of is, people again, a lot of people are using Python that are not, programming is not their main job so this is something where people would they need access to you know like let's say a device interaction or something like this example but they need to be able to control it any user interface it doesn't have to be beautiful, but actually this looks pretty good.
07:01 KENNEDY: It doesn't look terrible and what's cool is the foundational framework it will actually find its way to select like wxPython or pyside? Which is the Python for Qt variant or PyQt 5, so it will cycle through the known frameworks and basically say, well I found wxPython so were using that. For example, which is really cool because a lot of those frameworks are much better looking then say Tkinter by default.
07:27 OKKEN: Yeah, well that's cool.
07:27 KENNEDY: Yeah, so if you ship your little app like pyinstaller with wxPython it will use that, pyinstaller with Qt for Python, it'll do that. It's really cool.
07:37 OKKEN: Now I kind of want to go out and see if I can write a oscilloscope interface with this but,
07:41 KENNEDY: Yeah, you should do it.
07:42 OKKEN: But I've got other things to do.
07:43 KENNEDY: Aw, come on. You've got a few hours, don't you? Yeah Awesome, well what's next?
07:49 OKKEN: Another taking data from one format and putting it in another one. I found another tool, I figured I'd cover them in the same episode because I'm comparing them at the same time. And so this one is called pampy, P-A-M-P-Y. It's pattern matching for Python you've always dreamed of, that's their tagline. It's a very small focused library that it's kind of got a neat interface, it's pretty easy.
08:15 KENNEDY: It's got a really interesting interface, yeah.
08:18 OKKEN: So the example that were going to stick in the show notes is. You just say from pampy, import match and _. So they're reusing _. Using it as a thing and so you give it a pattern, of known data structure pattern. And then you put these blanks in the places where you expect other values. And and then you call match with any data you want. And then this pattern and then it spits out as many variables as you've put underscores in. If they match. So you can just sort of go through a whole lot like a whole bunch a data and pull out just the bits you need. As long as they match the pattern.
08:59 KENNEDY: This is kind of similar to the one you had before but it's like regular expressions applied to hierarchical structure of data in like a weird, weird way. So let me see if I can try to visualize this for folks. So, if you have a variable that is a list, and in the list you have 1 and then the next item is actually [2, 3, 4], you can say match, you know. [1, [_, 3, _]], a list that contains an underscore and a 3 and then an underscore and then 4, whenever you run it through that, it'll say well we found a match and the values for the two underscores were 2 and 4. That's pretty cool.
09:34 OKKEN: And then the last thing you pass in is the what to do if you find a match. So you can pass in a function that takes that many parameters. Or a Lambda expression or something if you want and it will call your function with with those parameters and do whatever, so.
09:51 KENNEDY: Yeah, I can always write a function that returns the value so you can capture it which is kind cool. So very nice. I like it, it's one of those things that I think looks really cool and I think would be really useful, but I would forget to use it. You know, so I guess I got to remember to use this thing next time that I have, like a situation where it would be a really good fit. Where it's a match for the problem I'm solving.
10:14 OKKEN: Nice. It's one of those things also I like, I like to see more packages that are just small sharp tools for one use case or using for whatever but I use screwdrivers for all sorts of stuff but yeah.
10:28 KENNEDY: The little back hand part is good for beating stuff in like nails and whatnot. I think that's a great point. Alright, now before we get on to the next one which has some pretty practical applications actually. I just want to tell you all about the Digital Ocean. So one of the features I've been really happy with lately is their idea of projects, 'cause go to some of these cloud providers and there's just tons of assets, there's servers, there's IP addresses, there's load balancers. They're all just spread in there and you don't know which one goes with which maybe you've got a QA environment or a staging environment and a production one which goes with which unless you've like, named it really carefully and even then it's hard. At digital ocean you can you go create a project like a production Python Bytes server, you know, it's project input the servers and the floating IP addresses and all that in there saying for staging and so on, so they've got all sorts of cool features. If you check them out at pythonbytes.fm/digitalocean you'll get $100 credit for new users and definitely working out well for us. You guys should check them out. Speaking of getting checked out sometimes people get sick or they may be sick and have to go to doctor and the doctor takes some kind of picture and says, I looked at this. This scan and either you're ok or you're not ok, right. It turns out though that analyzing pictures for patterns is something that AI can do really well right. So Google recently took in this article, it's so funny, they took this off the shelf AI and they pointed it at mammogram scans to try to detect a breast cancer. What they found out was a couple of things that were super, super interesting. First, this thing they called Lyna was able to correctly identify tumorous regions 99% of the time. The AI was.
12:19 OKKEN: That's amazing.
12:20 KENNEDY: I mean, it's not 100% but it is much better than doctors. I can't remember what the doctor percentage was but it was way off. If you have, if it's really a bad case then it's pretty easy but this is like early detection, right and catching cancer early is the key and this is like much, much better than doctors do. So, that's really great. So I guess first the question is, does this mean that all the radiologists and their jobs and all the cancer pathologists, their jobs are just gone, right? Is that what it means? Because that could mean AI being for truck drivers or taxi drivers. But you always think of them as kind of low end jobs but do people who have medical degrees, are they in danger of being kicked out of a job by AI? I honestly am on the fence, I don't really know. This is not a great sign for that skill because computers are getting so good at it, but one good sign is they did a second trial where they took six pathologists and they let them do diagnosis with and without the AI's assistance. They said with the assistance the doctors found it easier to detect these small problems and it only took half as long.
13:28 OKKEN: Yeah, well that was what I was going to say. It says 99% of the time but that's not a real statistic. We want to know how many false positives, how many false negatives. There's going to be a grey area where the computer says, yup, there's cancer there. I'm 100% sure or, ya know, close. While those cases, the doctor probably would have found it also. But having the computers do it is going to be better. Then the gray areas, we're going to always need doctors to look at the stuff that's questionable, Like 50% chance that there might be, and they can look at it and go yeah, maybe we should redo the test or something or whatever. I don't know about other countries, but I think all of us have a shortage of doctors.
14:12 KENNEDY: Yes
14:13 OKKEN: If we can have the same doctors do 10 or 100 times more patients with the help of AI then go for it. Let's do it.
14:21 KENNEDY: Yeah, I think that's the real bright point here is to have more doctors and not just having more doctors but having doctors more evenly distributed. In a large country like the US, there is very rural parts and there's very urban parts and the access to doctors you have in a big city versus 100 miles from a big city, in a tiny town, that is not the same. Right, but I can easily see taking a scan at your local doctors, shooting it up to the cloud, it says this, you jump on a zoom meeting with another doctor for five minutes, it says here's what the AI says, I checked it over, I agree, here's what we're going to do. Either you come to this city for treatment or actually you're fine, you just hang out. So I think in the democratization of this for people, I think this is really good.
15:08 OKKEN: Yeah, in speeding things up too it might be that on the walk back from the scanning area of your doctor's office back to your normal room. In that time, maybe we can have an answer for you. Instead of having to call you later or tomorrow or something. It's all good.
15:26 KENNEDY: Yeah, it's definitely good. So this next one, is this something like 100 Days of Code, What is this?
15:33 OKKEN: I think it is, but it's like Christmasy. This is the Advent of Code. And this has been around since 2015, and it's at adventofcode.com. It's just sort of a fun, code challenges that they reveal one per day for 25 days in December and you've got just small programming puzzles covering a wide variety of skill sets but they're sort of geared both easy to hard and there's not a particular programming language you can use so a lot of people have said they solve them in their most comfortable language but then also you've got puzzles of past years available too. If you're learning a new language you can try to solve these puzzles in a new language as well.
16:20 KENNEDY: Yeah, I really like it that's pretty cool and the fact that it comes one a day is pretty sweet. Yeah and it doesn't need a lot of computational power so it should be accessible.
16:29 OKKEN: Yeah, and we've also put a link into a Github repo that's called awesome-advent-of-code, which is a whole bunch of extra resources like links to where people have posted their solutions in particular languages. Things like that, so if you're really into it, you can check that out also.
16:46 KENNEDY: I love it and it's quite timely. I guess people are maybe a couple days behind, I'll have to do a few in a row, right, it being December 5th but that's ok. The last one is a nice year-end type of thing as well, it has to do with the sun setting of Legacy Python, which most people agree I think is a good thing, right?
17:07 OKKEN: Definitely
17:08 KENNEDY: Yeah, definitely so when I think of some of the holdouts for Legacy Python, Python 2 if you will. It's often these enterprises, they have big codebases, They don't really want to change them, they don't have the large motivation to change them. They're often using something like Red Hat Linux because they want the stability of that, the long term support of that. So the news is Red Hat Linux 8 is now updated for Python 6. Sorry, 3 6, by default. 6 would be awesome, that would be a huge announcement. No 3.6 by default instead of 2.7, so that's pretty interesting right. By default yeah, I think I'm linking to the Reddit page. Yeah, I'm linked into the reddit discussion that then links to the main article because there's some funny stuff in there and I think Brian, I don't know if this comes from us anyway. Or maybe Mattias who started this way back when but the very first comment was just simply correcting the title to say. No, you know you didn't mean to say 2.7, you meant to say Legacy Python. Yes. Keep going people, keep going. It's pretty cool they said they have only limited support for Python 2.7. Also, no version of Python will be installed by default, so you've got to install 3 as well. But that's what most of the stuff defaults to.
18:28 OKKEN: Actually that's kind of cool because then with nothing installed by default, we can probably use some of the statistics better because yes, it's hard to tell if it just comes with your install installation, then we don't really know what people are choosing.
18:46 KENNEDY: Right. Absolutely. Yes, so there's a couple comments that are interesting. Since Python 2.7 is available as a package, but it will have a shorter life and the reason it's still available is to facilitate a smoother transition to Python 3. That's one, and they also say customers are advised to use Python 3 or Python 2 directly because the shebangs, I'm sorry, hashbangs that you put at the front, at the file like to say this should be executed in Bash, this should be executed in Python. Well, now, you have to specify major version. You can't just say Python up there, that's actually an error, so you have to say Python, 2 or 3 if you want this to actually run because they want you to opt in and not just choose some sort of default thing, it's pretty cool. So another another step towards the present future so.
19:38 OKKEN: I never seen hashbang before.
19:41 KENNEDY: I usually you see the shebang, they say hashbangs here. It must be the enterprise term.
19:46 OKKEN: Maybe.
19:47 KENNEDY: Cool. Well, I guess that's pretty much all the news we have for this week. There's tons more. We're always not covering all the items. There's so much going on, that's our news but I do want to throw out one thing here and I know Brian I'm still waiting for that punchline there, so before we get to that, though. I want to say thanks to Brian McCall over at Techmeme. So Techmeme is a website that's got like all the latest news on Tech, which is pretty cool and they have a podcast called The Long Ride Home. You can check it out. So the reason I'm bringing this up here, it's a pretty cool show. It's kind of like Python Bytes but more for like general tech, you know like Oh, Google acquired this company or this things happening with the iphone or whatever right so it's is good analysis, it's well done. It's about the same length. But the reason I'm calling him out saying. Thank you is he actually covered Python Bytes as the first recommended podcast on his show so I just want to say thanks Brian and you guys can check out their show as well.
20:43 OKKEN: Yeah definitely and because they did that, which is a really cool call out too. Thanks Brian. I listen to a couple episodes, I kind of liked it. It's nice.
20:52 KENNEDY: Yeah, it's nice. I like it. It's a good, sort of a cousin of this show if you will. Yeah, all right. Tell me about this punchline man.
20:58 OKKEN: Okay, so I'd heard of glom before but I heard about it a lot more when I had Mahmood on Testing Code. We talked about glom but we mostly were talking about how difficult it was to test it because if you're using a high level construct you don 't have to write very much code for it so your code can be 100% covered but you really haven't covered all the cases yet so how do you deal with that. So we talked about that. And then Anthony Shaw got on Twitter and started talking about some of the ways we could increase the coverage with glom and then I pointed out holes in his solution and then he replied with this joke and the joke originally came from Brennan Keller and it's a QA Engineer walks into a bar, he orders a beer, orders zero beers. Orders 9 billion 999 thousand beers, orders a lizard, orders minus one beers, orders a random set of characters. Okay, now the first real customer walks in and asks where the bathroom is. The bar bursts into flames killing everyone.
22:06 KENNEDY: I love it. That is so perfect.
22:09 OKKEN: Anyway. It has nothing to do with anything it's just funny.
22:14 KENNEDY: It's really good I like it it's great thanks for sharing.
22:16 OKKEN: Man, thanks for doing this podcast with me.
22:18 KENNEDY: It's been fun. It's fun as always. We're going to keep it rolling. Strong into 2019 for sure. Catch you later.
22:24 OKKEN: Bye.
22:25 KENNEDY: Thank you for listening to Python Bytes. Follow the show on twitter via at @pythonbytes, that's python bytes as in B-Y-T-E-S and get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.