« Return to show page
Transcript for Episode #106:
Fluent query APIs on Python collections
0:00 KENNEDY: Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode 106, recorded November 30, 2018. I'm Michael Kennedy.
0:09 OKKEN: And I'm Brian Okken.
0:11 KENNEDY: Hey Brian. How you doing?
0:12 OKKEN: I'm good, it's early in the morning, man.
0:14 KENNEDY: Yeah, good morning man. We're recording a little earlier than our average lunchtime get together. But, you know, this is what we do for our audience. We get up when it's dark, and we talk about Python.
0:26 OKKEN: Yes we do.
0:27 KENNEDY: So before we get on to the topics, just want to say thank you to DigitalOcean. They're sponsoring the show like they did for much of this year. Check them out at pythonbytes.fm/digitalocean, get $100 free credit for new users. Speaking of cloud hosting and whatnot, one of the main innovations that's been coming along in the last couple years is this whole idea of DevOps, right?
0:48 OKKEN: I don't know, DevOps has been around for a while, and really important for a lot of people. But I think Python is getting involved more and more with DevOps. We've also talked about packaging tools and dependency management a lot on this podcast, and others as well. We've talked about Poetry and pipenv and, I don't know if we've covered pip-tools before, but...
1:11 KENNEDY: I don't think we have.
1:12 OKKEN: pip-tools is a package that combines, it has a couple commands. pip compile, and pip sync, which are ways to work with pip based, usage based packages. And I'm not really going to get into that too much, but what I've found is interesting was there's an article recently written by Hynek that is called, "Python Application Dependency Management in 2018." So, basically, he's got a use case. He's working at DevOps, or DevOps problems space, and the requirements he's got for doing packaging is he wants to be able to specify immediate dependencies, like a requirements.txt or something. But then, be able to have a tool resolve the dependency tree. I know I need Django, but if Django needs a bunch of stuff, I don't want to have to specify all that. And then, hopefully, have hashes like the lock files, and then integrate these with with tox, so you can run tests. And then also, the last one is in deploying servers, I don't want to actually have to activate a virtual environment, and then install everything. I want to be able to install dependencies, an application and a dependency into a virtual environment without having to actually go in there and do it. Those all seem reasonable, but so far, he's tried pipenv, and he's tried poetry, and right now he's sticking with pip-tools based solution.
2:46 KENNEDY: By using the pip-tools thing itself, or just pip?
2:48 OKKEN: A combination of them, because you can't just use pip alone for a lot of this stuff so, for instance, pip-tools is one of the things you can create hashes with. You can create the lock-files, essentially. And working with multiple requirements files, like a requirements.txt, or like a requirements, he has a work model which I haven't seen before, which is sticking multiple requirements files in a requirements directory. So you've got a main...
3:17 KENNEDY: Oh, interesting.
3:18 OKKEN: Requirements/main.txt, and then a dev.txt. It's for development requirements. And then you also have to be able to update them. So, occasionally, you got pinned requirements, but you want to be able to update them to the newest and test that. So all these things, it requires a work model that should, these are reasonable requests. One of the things I really liked about this was that it was just, we have an environment now, that we want to be able to I think it's a good thing, we want to be able to not slam people for, make fun of, or bash on different projects too much. But you also want to be able to be honest and say "Hey, I tried this thing, and it didn't work for me." And so, one of the things I liked about this is I think it's a good example article of saying "Hey, this was my use case. I tried this thing, this thing didn't quite work for me, and here's why." But also, don't slam the people who made that thing because I think it's cool that they're working on the problem, it just didn't work quite right for me yet.
4:24 KENNEDY: Yeah, this whole management pack tooling stuff, dependency tooling, it feels a little bit like the editor wars, "Don't tell me to use VIM, I want to use Emacs." or, "I want to use PyCharm." or whatever. Not quite, but it seems something like people get into their workflow, and they really like their workflow, and they kind of want to stick with it. I think there's also some limitations to some of these tools. They're built to address different problems than the one you're trying to solve.
4:50 OKKEN: Right, there are no one-size-fits-all things. I found that refreshing, because I actually really like the idea of Poetry. I like the idea of pipenv and Poetry, but I don't have any specifics right now, but I jumped in and tried to use it in some of my workflows. Like Hynek, some of the things I tried just didn't quite work. So, I have to go back to old ways.
5:15 KENNEDY: Yeah, I guess that makes sense. The more that those tools try to do for you all at once, the more likely they are to not exactly fit your workflow. One way that's really cool to extend applications would be to have the ability for someone to just drop a file into a directory, accessing a common API, and have that just become part of your app, or part of the capabilities of your app, right? Like a plugin, right?
5:40 OKKEN: Yeah, that'd be cool.
5:41 KENNEDY: So, there's a cool library called pluginlib. Pluginlib. P-L-U-G-I-N-lib. And it addresses this problem in a couple ways. When I first heard about this, I'm like, "Well, can't you just put a file in a directory and be good with it? And try to load it?" But, the more I looked at what this library's doing, it's actually solving a lot of problems for us. For one, it's an interesting example of metaclasses, and programming with metaclasses. But, you don't really have to deal with it, because you just use the API and it works. But one of the things, plugins are validated when they're loaded, instead of when they're used, right? Right away, they're validated. Makes sure everything fits the API. They can be loaded through either file paths, through entry points, through module names and so on. You can have multiple versions of the same plugin.
6:38 OKKEN: Oh, interesting.
6:39 KENNEDY: Yeah, pretty interesting. You can blacklist them, so if somebody puts some kind of plug in there, you're like "No, we're not going to allow you to load this, even if we find it." configure something like that. You can group them, you can have conditionals like this plugin works on MacOS, but not on Linux, for example. All sorts of cool stuff like that. So, if you're thinking about having this extensible plugin model, give pluginlib a look. It looks like it handles a pretty simple problem, but has a lot of support for you while it's doing it.
7:10 OKKEN: Yeah, this is pretty neat. Let's check this out.
7:13 KENNEDY: Yeah, I just don't work on any apps that really that makes a lot of sense for me, but if I did, I would definitely want to have a look at that.
7:19 OKKEN: Yeah, and it looks like it is based on, or partly based on, abstract methods of class hierarchies. And I think I remember looking at this, and this doesn't scare me off, but the particular application I was working with, we didn't have any classes, uses anywhere, and I didn't really want to introduce them for this.
7:40 KENNEDY: Yeah like, let's throw in some abstract base classes and hierarchies to a thing that doesn't even use classes. Yeah, that makes sense.
7:47 OKKEN: But at the same time, writing a plugin for an application is a very specific need, and I think it's fine to match the model of the rest of your systems. Something to look into.
7:58 KENNEDY: Yeah, for sure. What do you got for us next?
7:59 OKKEN: For some reason, I'm a little interested in testing, sometimes.
8:03 KENNEDY: Really?
8:04 OKKEN: Yeah. One of the PyBites guys, Bob Belderbos, he wrote an article called, "How to test your Django app with Selenium and pytest." And yeah, pytest is one of my favorite things too. It's a really nice write-up of, I'm going to quote the beginning of it, it says, "In this article, I will show you how to test a Django app with pytest and Selenium." One of the things I'm just intrigued by is they don't use a toy app. They use their code challenges platform, and compare the logged out homepage versus the logged in dashboard. And, I think that's really cool and brave of them to publish it. This is how we're testing our own stuff!
8:45 KENNEDY: Publicly, on the internet.
8:47 OKKEN: Yeah, and it talks through I probably wouldn't have done it if part of the project setup, they're using the activate and deactivate of a virtual environment to load environmental variables. I don't have the specific links to it, but I know there's a ways to do this within pytest itself, as well, using something else.
9:08 KENNEDY: Yeah, I know. It's kind of cool to do that, but at the same time, it makes it hard to I think it's kind of hidden in the whole setup, right? There's nowhere in the code that you see that, and you basically accomplish that by editing the activate script to set some environment variables when you activate the environment, but if you go to another machine, you check it out again at some other location, that kind of stuff is not there. So, you know, it's hard to store passwords and secrets. They definitely don't want to put them into this repo either, so what are you going to do? But it is tricky.
9:41 OKKEN: Yeah, and you wouldn't be able to use tox with this, because tox is going to create its own virtual environment. Yeah, so maybe I'll have to write up an example of how I would tackle that.
9:50 KENNEDY: Yeah, that actually would be interesting.
9:51 OKKEN: But the rest of it, even though they're not mocking out a database itself, they do show how to do that, how to get the database setup. Yeah, anyway, it's kind of a cool example. I love working with Selenium, and this is a neat write-up.
10:09 KENNEDY: It's cool, you know, Maybe not everyone knows about Selenium. They maybe know about web scraping, they may be trying to test it that way. They also maybe know about testing inside of apps. Maybe just quickly tell folks who are not totally familiar. What is this like? Is this like requests, or what is this?
11:19 KENNEDY: You can interact with it as if it were just literally in your browser, because it's literally in a hidden browser in the headless mode. So yeah, it's definitely an interesting way to do it. Like, here's an example. You're like, "Let's take these things and put in a username, put in a password, and then click this button. Find this button and click it." The login button and stuff like that is really unique. Okay, so before we get on to flupy, which is the next thing I want to talk about, let's talk about DigitalOcean real quick. So, you probably built a lot of apps, have a lot of cloud resources running so far this year, but which one goes with which project? That's always been something that's driven me crazy about other places, like go to AWS, and there's tons of virtual machines running and stuff. I feel like, "What is this? Do we need these? I don't even know anymore." So, it'd be really nice if we could group them together. And at DigitalOcean, they created this idea called "projects," where you can group droplets, and load balancers, and IPs, and all that into application categories that they work with. Like, you could have your app, you could have your app sandbox testing QA environment, and you could have multiple ones of those. And so, it's a really nice super cool way to organize your environment in DigitalOcean. Check them out at pythonbytes.fm/digitalocean. And, like I said, for people who are new users, you get $100 free credit to play around. So, that's also really nice to have. So, I'm a big fan of fluent APIs in general. And maybe not everyone knows the term. It's like when you take a thing, and you call a function on it, and that function itself returns back the same object. Instead of saying I've got a variable, I first initialize it to a list, and then I do some operation on that list and it changes it, and I do another operation, maybe it turns it into a set, then I pass that set somewhere else. I could just say, let's say customers, or let's say numbers, that's an example that's easier. So, say numbers.map.filter.skip.take.orderby just one after another, all in one line. That's the fluent API in general. So, the thing that I want to point out is actually two libraries. You can take your flavor of this fluent API and apply it to collections. Because, there's often more than one thing you might want to do to a collection. You might want to change its values. You might want to filter it. You might want to only take a couple of them, or put them into groups and things like that. So, there's one called I originally wanted to pronounce it as "fluppy," but flupy is in fluent Python. And it lets you do exactly what I was describing on collections. So, given any collection, you can upgrade it to a fluent collection by just calling "flu" and passing the collection. Then, it has all these operations like, you say .map, and pass it to lambda, and then on that result, you say .filter, pass another lambda.chunk, to break it into pages I believe, then .take will give you only three chunks of five back, for example. So you do that all in effectively one line. Maybe you want to wrap it for readability, but you don't have to use a bunch of intermediate variables or intermediate collections or anything like that.
14:30 OKKEN: This is nice.
14:32 KENNEDY: It's slick right? And the thing that's really cool is it's all based on yield and generators. Every step in this chain is some generator type of operation. Which means, you could pass it a billion items and it's not going to filter a billion items, map a billion items, and all that. It's just going to take them one at a time through this data pipeline that you build. The example that I put in the show notes uses count. Just, all of it. But, you know, it ends with a break it into pages of five, and take three pages. It only takes 15 items out. Even though you pass it, in the way that that works, of course, with generators, because you can just pull through it until the last part stops pulling. And then it basically stops iterating this infinite collection, really nice.
15:21 OKKEN: In the little example you give, you did like a for item in pipeline at the end, too. So you can use the result as a generator, also.
15:30 KENNEDY: Yeah, exactly. Because it's a generator, that whole sequence of operations doesn't actually execute until you iterate it, until you pass it to something that itself does. Like, pass it to a list or to a dictionary, or something like that, yeah. Or use it in another one of these pipelines. So it's really cool to build up these data pipeline, data transformation things using flupy. And they also have a CLI. So you can do this kind of stuff from the CLI for apparently this guy's data science team, not everyone necessarily is great with Linux, but they would like to do a little bit more with these tools and with these ideas, these Python data pipelines. So you can actually execute these sorts of operations on the command line, as well.
16:13 OKKEN: I like it. And then would you want to cover Asq?
16:16 KENNEDY: I do, I want to cover Asq. A-S-Q. So this is another one, I don't think we've covered it, but I guess it's possible that we have. There's kind of two philosophies here. They both do the same general thing, flupy and Asq. But the operations that you get to apply these fluent collections changes or varies. The flupy is like map and filter and so on, and the other idea is if we work with this LINQ operation, this LINQ API, which comes from the C# space, is actually, I think one of the better parts of C#. It really maps very closely to the SQL query language with where, select, orderby, and so on. So, asq is very similar to this for Python. It's exactly the same thing. A fluent API on top of collections. But here's a collection of words, like zero, one, two, three as written out. You can say query(words).orderby(len) then by and you pass it, here's just ordering alphabetically, but you can pass something else. And take(5).select(str.upper).to_list(). And what you get back is a list of the first five in that ordered by length one, so the short ones like one, six, ten, and so on. To me this database like operators of asq is a little bit nicer than the map filter reduce syntax of the other one. But they're both valuable, and they both have the same idea so I kind of thought it'd be fun to just cover them both here.
17:53 OKKEN: I think it's good there's both of them. Because it kind of depends on what sort of mindset you're thinking about. It's a different way to think of similar problems.
18:01 KENNEDY: Yeah, but they both solve similar problems in really interesting ways. You could say a list comprehension. A list comprehension could do a filter, and it could do a map, but it doesn't I wish there was just a little bit more on those types of things. Like, for example, wouldn't it be great if there was a orderby clause in a list comprehension instead of having to do the list comprehension and then do a sorted on top of the result? Things like that. It's like you could almost not do these things with list comprehensions, but not quite. You can't do a paging and stuff. So it just adds that little bit extra that often you really need and practice.
18:38 OKKEN: Yeah, true.
18:40 KENNEDY: Nice. So, there's a pretty exciting blog that's kicking back into gear.
18:43 OKKEN: Yeah, it's a blog called "Neopythonic." And it's Guido's blog. The last time we saw him, looks like the last blog post before November was in July of 2016. So, one of the benefits of him stepping down as BDFL, is that he can blog now, so this is a good.
19:05 KENNEDY: Yeah, and the one from July was just him turning off comments because people were spamming us, so it was even farther back than that. It's great to see him getting back into action, so we have this, we have him doing some more interviews with the MIT podcasts, video series and whatnot, so it's really great.
19:25 OKKEN: So he's just answering listener questions, or student questions, really. And a couple of the questions are whether to choose a nine-to-five job, or be an entrepreneur, and the other one is will AI make human software developers redundant? Of course, the answer to the second one is no. Because software development is not boring, and AI is to replace boring tasks, like driving a car. But the nine-to-five versus entrepreneurs is interesting, because, as we know, he's worked for other people for a while.
19:59 KENNEDY: Yeah, and some really interesting places, right? Like Google and Dropbox and stuff like that.
20:04 OKKEN: Yeah, I mean they kind of tie together because at the end of the nine-to-five question, he said, "Just be sure to not take a job that can be immediately replaced by a computer."
20:15 KENNEDY: Yeah, for sure. Did you ever see the posters called "de-motivaters?" I think that's what it's called.
20:20 OKKEN: No, it sounds great!
20:22 KENNEDY: You know how those posters that corporations will hang up, they just make you cringe? It'll be like an eagle flying over, like, ocean with a sunset and it'll say, "We all soar together" or, "We can all soar higher" Something, you're like, whoa, no, come on. So, de-motivators have pictures like that, but the caption isn't something super positive. One example would be there's a hang glider flying over an ocean into a sunset, and the caption is something to the effect of, "It's so great to dream big and to fly out, soar towards your dream. It's even better to sit on the shore and watch people sink into the ocean when they..." You know, stuff like this, right? There's one about what you're talking about here, and it says, It's got some picture of people working together, and it says, "Your job is boring. So boring that it'll probably be replaced by robots in 10 years. You should find a new job." But it's got this beautiful picture. Maybe I'll try to find it, and link to it, but yeah, they're really good.
21:27 OKKEN: Yeah, that's dark. Nice.
22:49 OKKEN: That's cool.
23:07 OKKEN: Okay, well this'll be fun to play with.
23:10 KENNEDY: Yeah, it's really interesting. And then they also have a backend database, and backend services where you can write some code that runs in a docker container, or something. So you have a server-side bit, and a client-side bit. But most of the code you write, by default, would probably end up in, if you don't restructure things, probably end up as Python running the browser. And it's really quite nice. So here's a cool way for people who don't really know that much about web apps, but they know Python, A really nice way to quickly get up to speed with, what I think of is forms-over-data type web apps, here's a grid. Here's a bunch of input boxes I need to enter that data, save it, put it into a list, and show it to you. Really quite common types of apps for internet-type apps. So, check that out, it's pretty cool, and it does have a paid version, but there's also a free version. Maybe it's interesting to people, but I think it's just interesting as a case study of how they're using Python in the browser as well.
24:08 OKKEN: Yeah, cool.
24:09 KENNEDY: Alright well, that's it for our six items this week. Brian, got anything else you want to share with everyone?
24:15 OKKEN: Yeah, I've got big news. Big news! Anyway, not huge news, but I wrote a book called "Python testing with Pytest." Did you know that?
24:25 KENNEDY: Yes, I did. It's awesome.
24:27 OKKEN: Yeah, so we talked about it a lot for the first year we were doing this. But, the news related to that is there's a second printing out. I updated it to ran everything with Python 3.7, and the modern version of pytest at the time, and then also fixed a whole bunch of the errata that people that pointed out things like little typos, and things like that. And updated the source code, stuff like that. So it's available. P2 is available. P2 must be second printing. How about you, any news?
25:02 KENNEDY: No, nothing. Nothing right now. I am in just massive podcast recording mode trying to get ready for the holidays. Both so that I can take a week or two off, and it's also hard to get guests to come on. Over the holidays, obviously, they want to do chill-out stuff. It's been all recording, all the time. I have cool things I have recorded, but they're not out yet. And we've even done one together, right? That'll be fun.
25:27 OKKEN: Yeah, and then I had you on mine too. Because I have the same problem you do.
25:33 KENNEDY: We've got a lot of fun stuff coming on Test and Code and Talk Python, and of course, as always, PythonBytes.
25:39 OKKEN: So we're not taking a break over Christmas break? We're going to try to go all the way through?
25:44 KENNEDY: That's right. We may physically be taking a break, but due to the magic of time shifting in podcasting, the podcast will not be taking a break.
25:51 OKKEN: Yes, wonderful. Well, thanks for talking to me this morning.
25:54 KENNEDY: Yeah, you bet. Bye. Thank you for listening to PythonBytes. Follow the show on Twitter via @PythonBytes. That's PythonBytes, as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, This is Michael Kennedy, Thank you for listening and sharing this podcast with your friends and colleagues.