Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


« Return to show page

Transcript for Episode #196:
Version your SQL schemas with git + automatically migrate them

Recorded on Wednesday, Aug 19, 2020.

00:00 Hello, and welcome to Python bytes where we deliver Python news and headlines directly to your ear buds. This is Episode 196, recorded August 19 2020. And I am Brian, aka Michael Kennedy. And actually, we have sponsor this week data dog. Thank you, David. Oh, yeah, more. Thanks, David. First off, I want to talk about Django a little bit. I've always heard Django super easy. And like, that's why people choose it, because that's really easy to get started. And it has all these things that make working with Django easy, and so on. Right? Yeah, I think there's a lot going for it, the community seems pretty awesome. There's a lot of tutorials, there's a lot of expertise that they can help you out. So there's an interesting article by Dan brazo, called surviving Django, if you care about databases. So I mean, surviving Django, right off the start, that's an odd title for an article about Django, it's gonna be kind of hard to summarize, but basically, the take on it is a little bit of a, he has a different take on how to deal with databases, then normally is taught around Django. And it's an interesting perspective, but the gist of it really is centers around that. There's a lot of parts of Django that seem to be database agnostic, so you could use MySQL or Postgres or something else. But he says, kind of in reality, people don't do that people don't really switch databases that much. So if you really want to utilize a database, and the some of the great things about whatever database you pick, maybe not being database agnostic is good. Also, the talks about how to do how to set up schemas and database migrations, using the database not using the built in Django, stuff, it seems a little bit more like, why would I do that? It seems more technical than that I want to do with Django. But there is some reasoning around it. And then he also shows like how exactly how to do this, how to do migrations, how to do schemas, and it really doesn't look that bad. The interesting take, I was curious about what the rest of the Django community would, would feel about this. But then after the article, there's a there's code comment, or there's comments on the article is a really nice civilized discussion between the author and semi Paolo melk. wire, I think, and Andrew Godwin definitely have heard of Andrew before, and some others talking about, basically, that take and one interesting comment was articles like this that point out some of the pitfalls of there possibly are pitfalls with Django. And some well written articles are a good way to kind of point those out. And because there's, you know, there's a lot of fans of Django that really aren't going to talk about the bad parts. And this isn't necessarily the bad part. It's just something to be aware of. Another really interesting comment by Andrew was, I agree that at some point, in a project or company's life, it's big when it's big enough, SQL migrations are the way to go instead of the Django migrations. migrations in the out of box state are mostly there to supplement rapid prototyping. Like a lot of Django can be removed or ignored progressively, if and when you outgrow the single set of design constraints constraints when you chose them. So that kind of take up using Django as migrations and J, the all the agnostic stuff might be good early on, and then maybe slowly going towards j using your database more later, as an interesting take. Yeah, that's cool. A bit of a practicality beats purity on both ends. This article also made me like, really appreciate the Django community because this was not a flame war. This was a civilized discussion about a technical topic. And what on the internet for Yeah.

03:48 is great. Yeah, that's really cool. However, you know, few comments, one, I've switched from one database back into another three or four times on major projects, as you're like, you know what, this is just not doing it, or it's outgrown this or whatever. So, it happens, but at the same time, like that's usually not MySQL to Postgres, it's usually like relational to non relational or something massive, where it's going to require we write anyway. So I do like the idea of saying, you have this capability to be completely agnostic, but you're working with the lowest common denominator there. And that's usually not the best choice. If you're writing an application, maybe if you're working with a library, tons of people are going to use it in ways you don't anticipate. But if it's a application, you know, how it's gonna be used most often. Yeah, also some of those speed and speed improvements you can get out of a database, you really can't do too much of with the Agnostic Front End, you kind of need to know the specifics of that database. So pretty cool. I have this next one I want to talk about an interesting pattern that Python uses, I guess, interesting technique. So you know the ID function right? You can say ID of

05:00 thing, and it'll give you a number back. And it basically tells you what it is like where it is in memory. A familiar this, I guess I don't use this. Yes. If you want to know, like, if I'm giving to variables, are they actually referring to the same object? Or do they just have the same value? Right? Like if I had a dictionary? And I want to know, is it the same dictionary? Or is it just have the same keys and the same values for those keys? You can say ID of one thing and Id the other and then C, Python that'll actually give you the memory address. But in all Python, that gives you a unique identifier that is guaranteed to be different if they're different objects the same if it's the same object, right. Okay. Okay. So one of the things that Python does, that's really interesting. And this is all research, I've pulled up from working on my Python for memory management course, that is probably out by the time that this comes out. But you don't have to take that to care about this. So one of the things that's really interesting in Python is everything is a pointer, right allocated on the heap, including numbers, and strings, and other small stuff that might be allocated on the stack in like languages like C sharp or c++ or whatever, right? So no numbers in Python are way more expensive than they are in languages that treat them as as value types and reference types. So for example, the number four uses 28 bytes of memory in Python, whereas the number four could use 1248. In you know, the languages that treat them as value types, depending if they're like shorts, or Long's or whatever. Right. So there's this cool design pattern called the flyweight pattern. And I'll just give you the quick rundown on that. So flyweight is a software design pattern. A flyweight is an object that minimizes memory usage by sharing as much data as possible with similar objects. Right. So that's from Wikipedia, link over to that. In Python, Python does that for numbers. So if you compute, like through some mathematical function, if you compute the number 16, and then some other way you compute the number 16, and then somewhere else, you parse a string, the number 16, those are all literally the same 16 in memory, okay, okay, because 16 is pretty common. But if you computed 423, the three different ways that would be three copies of four and 23. So Python uses this flyweight pattern for the numbers from negative five to 256. And you'll only ever have one of those in the language in the runtime. But beyond 256, or below negative five, those are always recreated. Isn't that interesting is very interesting. Yeah. So yeah, it doesn't matter how they come out. Basically, if the runtime is going to generate the numbers, say, seven, as an integer, it's going to use the same seven, which is pretty cool. I actually have some example code that people can play with creates like two lists of a whole bunch of numbers, separate ways and then says, you know, are these the same number or not, which is pretty cool. I was just playing with it right now. So you can, if you assign X to one, you can do an ID of both x and one, it'll show up as the same number. But if you sign X to minus 10, x and minus 10 are different IDs in that funky. Yeah, because though numbers in Python are extra expensive. So Python takes special care to not recreate these very common numbers, and apparently very common means negative five to 256. Good.

08:28 Anyway, I thought that might be interesting to people this flyweight design pattern, concept, and then applied to the numbers might be interesting. And there's a little example code that I included it there. So I write an article, but it's like a, an idea with some code. Yeah. So can you I mean, as a user, can I use the flyweight pattern in Python? For other stuff? you totally should? Yeah, like, imagine you've got some objects you're creating. And instead of recreating them over and over, they're being used in a lot of places, you could totally create some kind of like, shared lookup for certain common ones, like maybe you create, you're creating states, and the state has a bunch of information about it. US states or countries or something, but then you often have to go like, Alright, what state is this? Give me that information? Right? You don't need to necessarily recreate that you could just create 50 states, keep them in memory and never allocate them again. Okay. I guess I'm like caching and memorization are ways to do something similar, but only one thing at a time. Exactly. The big, important thing here to make this work correctly, is they have to be immutable, right? Because if if one person gets the state, Georgia, it has certain values, and another person gets it. Oh, it has a new County. Let's add that and like, wait a minute, that's not I've now not recreated a different thing, or like, you know, so it's got to be immutable, which is why it works for numbers, and you could do it for strings and things like that. Okay. They're pretty cool. Something else that's really cool is data dog. So thank you data dog for sponsoring this episode. Let me ask you a question. You have an app in production that's slower than you like it's performance all over the place. Sometimes fast.

10:00 Sometimes slow. Now here's an important question. Do you know why? with data dog you will, you can try to troubleshoot your apps performance with data dogs into n tracing, use the detailed flame graphs to identify bottlenecks and latency in that finicky app of yours. be the hero that got the app back on track with your company. Get started today with a free trial at Python bytes out FM slash data dog Awesome. Thanks a dog us is awesome. What is PIP installing a thing that when I pip install something, and it happens right away? And it's not like 30 seconds of compile time, like, say microwaves good is to get the thing installed. And I don't have to have like Ms build or VC bars, bat setup right or whatever.

10:46 Yeah, so definitely a grateful for wheels. It was still a world that we didn't there was less wheels in it when we started this podcast, I'm pretty sure Yep. Most of the common packages, a lot of migrated to distributing wheels. And package authors have had to care about this a lot. And so I want to talk about this article. It's on the the real Python blog from Brad Solman called, what are Python wheels, and why should we care. One of the things I really love about this is, like I said, a lot of package authors have already gone through this and understand some of the ramifications. But as a normal casual user, of pip install, we don't really think about it. But this is the first half of this article talks about kind of what the users perspective is. And it's kind of a nice look, when you say pip install something, and it's a cool because it as an example, I'm glad they list an example. And it's a particular version of micro whiskey, because most packages are wheels now. But if you install something that is not a wheel, it's probably a tarball. And I don't know if there's other options other than tar balls. But anyway, the tarball is something that ends in tar, Gz. So it's a a tarda in zipped. And that's a whole bunch of Unix speak that you don't really have to care about. But it downloads this blob of stuff, and then unpacks it, and then PIP calls setup and some other stuff to build the wheel After you download it. And then it labels it and in it, then it installs it, there's a whole bunch of steps in there. Plus, it's calling setup.pi. So there could be really any code in there. And so that's kind of creepy. The difference is often with if you actually have a wheel, instead of a turbo pip install, we'll just pull this down and install it and doesn't call setup that py. That's really nice, actually, because one of the things I think a lot of people don't realize until they're like, Oh, wait, why just happened. When you pip install something, you're running semi arbitrary code off of the internet. That's not ideal, right? With the wheels, you don't have to run the because the basically that runs the setup that py and the s just version, I believe. So this is really nice that wheels can cut out that Python execution, but it cuts out plus also not sure what the technology is here, but I think it's probably just um, it's already pre compiled. And there's operating system specifics, but wheels are tend to be smaller than the tar balls. So the download a lot faster. wheels have a bunch of stuff in the name. And it's not just random stuff, it says specific stuff. But it talks about what distribution it is, it's got the version number, it's got like maybe build identifiers, and which Python it's for if it's a Python two versus Python three or a specific version. And then they platform is one of the important bits. So if you have compiled code, then the the there's kind of different ci pipeline to try to build all those wheels. But on the user end, we don't care about it. So one of the different things is one of the interesting bits about going moving towards wheels, is there's a whole bunch of often a whole bunch of different whole bunch of packages up there. And that's something that users will see if they look at what downloads are available, there'll be this whole slew of stuff. And for the most part, you don't have to care about that. If you do pip install, it will just pick the right one for your operating system. However, it's good to be aware of those because if you are creating like a cache of stuff at your, if you have your office or something, you may want to cache more of those depending on which operating systems are being used around. So that little discussion I think is pretty cool. Anyway, I'm not going to get too much into it. This is a good article for Yeah, I use wheels. But what are they and this is this doesn't get too deep into it, but it's nice. Yeah, I Well, we definitely nice. And another solid article from real Python. So very nice. You know what else is good? pandas. I've heard about pandas does a lot of cool stuff. Actually. pandas is really, really cool. You could do a whole bunch of interesting things with it.

14:51 And jack McHugh, he's been on fire lately. He's created all these different projects. He keeps hitting him over and like, Oh, my God, it's like no, this is another one I created.

15:00 And a lot of them are cool. One of the things he created was awesome Python bites. So hat tip to Jake on that, that's cool, like all the awesome stuff that we happen to have covered periodically. But this one is called pandas alive. And so Brian, to get the experience of this one, you need to open it up and just scroll through the readme on the GitHub page, just look at the animations. So you probably have seen these racing histograms or racing bar charts that show stuff happening over time, like here's the popularity of web browsers all the way back from 1993. But was Mozilla and then Netscape and then I and then you know, what, when you see them, like growing and moving over time, so this is a package that if you have a panda's data frame, and a really simple format, where the columns are basically, the different things you want to graph, and it had, they're all arranged by a common date, and they just have numbers, you can turn that into a really cool like bar chart, race type of thing or line graph race, where it's just this animation of those over time of the dates that you have in there. Oh, I really like this. Isn't this cool? Yeah. And the I mean, like the race charts and stuff, those are cool. But then you can also do the like the line, the line graphs, the growing zooming. Yeah, you can do like line graphs. And you can do other types of things, little plot scatterplot type things. You can also do pie charts, but you can even have them together. So you have maps. So if you want to have a map evolving over time, with like different countries, or counties fading in and out, you could have like those two graphs animated side by side at the same time. So you can have like the, the chart of the bars as well as the map all animated together. And like one graph, cool. Seems pretty awesome. Well done, jack. It's based on I believe, matplotlib. And basically, it'll render a bunch of different matplotlib renderings into an animated GIF. So all you have to do is just go like DataFrame dot plot animated. Give it a file name. And then this happens. Oh, that's cool. So you could just generate this GIF and then put it wherever exactly, you put on your website and put it wherever you want. You could share on Twitter, I guess even right, but it doesn't require like a JavaScript back in running something and your Jupiter notebook, and then all that kind of stuff to wire up like, No, it's just a animated GIF that comes out nice. This is mesmerizing. I could just watch these all day, you could watch it for quite a while. So yeah. Anyway, I really think that's a cool project. If you want to visualize data over time, which, yeah, there's a lot of good reasons to do that. And one of the things that has there is an animated maps. But maps are something else. Also. There's also a map function, which has nothing to do with geographic maps. You probably learned Python A long time ago. But do you remember being surprised by mapping Oh, yeah, map and all those things, they always confuse me. And I've always tried to basically avoid them.

17:50 And I've successfully mostly done that. But I know also, yeah, I also know how useful they can be. So tell us about this is an article from Katherine Hancock's how to use the Python map function. And I know we sure people have heard of apps on and map the map function, it's a extremely useful function a useful thing. So it's a built in. And what it does, if you're not familiar with it, it takes two or more parameters. The first parameter to map is the function that you want to apply. And then like, let's say, if you give it as the second argument and iterable like a list or something, it takes that function that you passed in and applies it to absolutely every element of the iterable that you the other one. So like if like if I have an if I have quick like the normal often uses using a lambda function or something, to apply some like quick thing like if I want to do X time squared, x time squared, x times two, or x squared or something like that, and apply that to every element, you can do that and you can make one list and do another. I think it's good for people to like, read about them every once in a while if they're not using them often, because they do come in handy in places that you all the time, for me at least. So it's not an obvious thing, if you're not used to this sort of a function from other languages. I wasn't coming from C Yeah, and maybe Perl has something like this, but I never used it. So that's the normal use of plying it. One of the things I like about this tutorial is it goes through a few different things. So a plant applying lambdas to a list or an iterable. And then the function that you buy doesn't have to be lambda, it could be your own user defined function. Or it could be a built in function that you map to it. I wanted like warn people the part where she's talking about the user defined function. It's oddly complex. For some reason. I'm not sure why this was made so complex, because a user defined functions just work like anything any other function using for map. But one of the things that I even got out of it is I had forgotten. That map applies the function to the iterable one element at a time and it doesn't do it ahead of time.

20:00 So like, for instance, and I am like, really, and I had to like prove it to myself by putting a print statement or something in a function to do it. But what happens is I'm like, let's say I've got iterable hooked up to grab, like a huge data chunk out of a stream or something, I can apply some function to each element, as I'm pulling it out. And using map to do that. So I can iterate over map. So map returns a map object, which, whatever, it doesn't matter, it's just every element that you use, if you use it as an iteration is the answer after you apply the function like a custom generator type thing, yeah. And then if you want it as something solid, you can convert it to a list or, or tuple, or something like that, if you want to do everything, I've done a generator starting to list.

20:49 There's some honesty here too. One of the other thing I often forget about map is that you can map it across. If you have a function that takes multiple arguments, you can pass it multiple iterables. And it'll take, you know, element wise, each one, so like the nth element out of each list and apply that pass it to the function, and then return the answer to that, which is cool. The other thing, a good comment in this because it's a similar problem area is comprehensions, kind of do the same thing. So when would you use map versus comprehension. And the advice in this article is comprehensions are very useful for smaller data sets. But often for large data sets map can be more powerful. So it's reason and sometimes you want to do operations. And if you had to go over different collections, the data would make a really nasty look and comprehension and stuff. So cool. It also can do like pandas type of things a little bit like multiplying vectors, right? Like if I've got two lists, and I want to have pieces put together like that power example that's in there, right, it'll take the first element of the first one, and the second element is the first element, the second one and then apply the function and generate a new list effectively, that has like, as if you had sort of done vector multiplication, which is cool. Or like cross cross minification. Yeah, I often use map also, when I want to muck with something, and it seems a little cleaner to me, to iterate through something, if I know, I'm looking for something, and I'm not gonna get the end of the data, or I'm using endless data. So we spoke earlier about databases, and I've got another one for us this cool thing called auto migrate, it's a project called auto migrate. Okay. So what it does, is, it's kind of like you talked about Django migrations. And we also have SQL alchemy, migrations with Olympic. But some people, either they're not using an ORM, at all, in which case, those tools are useless. Or they want to very carefully write the SQL scripts that control their databases, like some people, there's like a group of DBAs that manage the database. And that's that, right, we're not going to run just random tooling against the database, we're gonna run scripts that are very carefully considered. So this auto migrate thing, what it will do is, if you have a those DDL data definition, language, scripts that say create table, Add Column, and so on, all it has to do is have the script that will say, here's how we create something from scratch, you put that into GitHub, and then you make changes to it. Like to add a column, I go and edit the CREATE TABLE thing, and I just type in the new column in there. And what this will do is it'll look at your Git history. And it'll do diffs on the CREATE TABLE statements, and it'll generate the migration scripts from that now, that's really cool. That's neat, right? So all you got to do is like maintain the here's how I create the database, and it'll actually go, we'll go to go from this version to that version. Here's the script that would actually do it. It'll do all that stuff for you. Nice. Yeah. So if that's your flow, if your flow is to work with these DDL files, ie SQL files, this seems like a great tool. Now, they do say, Oh, this is way better than like an ORM or something. Because in those like Olympic, what you have to do is you have to go and write the migration scripts. Here's how you migrate up, here's how you migrate down. But they left out a little important thing, dash dash autogenerate, which it looks at all of your classes and your database and go, here's the difference. We automatically wrote that for you. Which I think is way nicer, even then this project. So I think a lambic is better. But the big requirement there is you are using SQL alchemy. If you're not using SQL alchemy, to do these migrations, then this tool but you're using the scripts instead to define your database, like I'm sure a lot of like, especially the larger companies where there's like a database team or like DBAs, and so on, are doing and this seems like a really cool project for it. That said, the converse is actually pretty cool. So what it can do is it can look at a database and it will generate your SQL alchemy files for you. That's pretty cool. That's nice. Yeah, it'll generate or RM definitions from

20:49 SQL, right using the SQL alchemy generator, which is pretty awesome. So you can say, here is my CREATE TABLE scripts, generate me the corresponding SQL alchemy thing to match that. So in that direction is pretty awesome also. So which does that this one, this auto migrate, it'll look at your DDL, like create these tables, scripts, and it'll turn it into Python SQL alchemy classes. But the reverse, it was saying like, Oh, it's painful to use Olympic in the other direction. But if you use the auto generate feature of Olympic, then it's also not painful. But there's certainly a couple of use cases that are pretty awesome here. One, like starting from all the create stuff that given a database, just ramp me up to getting a sequel alchemy set of classes that will talk to it as quick as possible. That's really cool. Yeah, if I've got a schema change, is there a version number that's stored in the database somewhere to say, which version of the schema is being used? Yeah, I have no idea about this thing. Okay. with SQL, alchemy, and Olympic. There is a version number it says I'm version of hash. And then all the migrations one of those is the hash. And each migration says the one that came before me is this and the one that comes after me is that they can look at an existing database and say your version x Yes, exactly. For Alembic. I have no idea about this thing. This thing could potentially look at the the table, basically run it like script this CREATE TABLE stuff for me. And then look at that, compared to what it has potential. I have no idea if it's that smart, though. Okay, yeah, but it looks like could be handy for a lot of folks. Well, I've had a rough week. So I got no extra stuff, no extra stuff, no extra stuff, I don't need too much. either. I have a little bit, I just want to give a shout out that we have a ton of new courses coming. And I want to just encourage people, if they're interested in the seat, go to training, talk python.fm slash get notified and put the email there if they haven't created an account or sign up there before. Because we have Excel moving from Excel to Python with pandas coming out of getting started with data science. Coming out, we have Python memory management tips coming out. Those all three will probably be within like, a couple of weeks. And then getting started with Git and Python design patterns as well. So there's a bunch of cool stuff. If you want to hear about any of those, just be sure to get on the mailing list. Oh, wow. That's cool. If I didn't talk to you every week, I would totally get on this mailing list. Awesome. But I'm already on it. I'm sure you are. Because you do talk to me though. You get jokes, definitely. But everybody listens gets them on. So that's right. This is a fun game. To play. The idea is you take some actual legitimate classical painting. And you you know, like if you go to an art gallery, it'll say like, you know, flowers in bloom, oil, Canvas, Monet, 19, or, you know, 1722 or something like that, like in the little placard underneath. So the game is to reinterpret these paintings in modern tech speak. Okay, yeah. So here, I'll do the first one, I put three in the show notes that people can check out. I'll describe this to you, then I'll read the little thing. So there's like a ship that seems to be on fire with some extremely strong guys trying to drag the ship out of the water. Maybe now they're pushing it into the water. And a bunch of folks on the edge sitting off it's like a Viking ship. I think they're actually cremating somebody sitting out anyway. It's this historical picture. And it says the the placard says, engineers remove dead code after dropping a feature flag. Sir Frank Bernard Dickie Dixie, not 1893 oil on canvas.

20:49 Do you want to do the next one? Oh, sure. You pull it up. Okay. How do you how to describe this? This is like a, like a picture. It's a Picasso picture of like an abstract violin. Yeah. Yeah. It's hard to tell really what's going on. You kind of looks like a violin. And the title is CSS without comments. That's good. Because Pablo Picasso 9012. All right, the last one. The last one. We'll say there's, by the way, there's hundreds of these are all really good. So this one is a little disturbing. There's a person who looks deathly ill with a bunch of like gargoyles over them, a priest with a crucifix, kind of glowing. Apparently trying to ward off the gargoyles. And a placard says experienced developer deploys hotfix on production. Francisco Goya oil on canvas circa 1788. That's good. Yeah. So there's just so many of these you can go through them all day. It's really fun. Didn't I can't do that once. Like one of the icons. I think he might have been with us. I know. Chris Medina, Kelsey Hightower, and I were walking around the Portland Art Museum. Like basically playing this game. We were like coming up with a placards. It was fun. And were you there for that you might have been No, it wasn't

20:49 missed that one that was good. I remember that when we could go to conferences, if there were people around you other people close. It was weird. Actually, we don't need anybody to contact us and tell us that we have no idea when different painters were alive. So, but thanks in cool, good for you if you know that. Awesome. Yeah, these are really good. If you enjoy this kind of stuff. There's hundreds of fun pictures to go through. And I think it's also amusing that we often pick visual jokes for an audio format. So sure, why not? Do hard? That's what let's do with is right. Let's do it with abstract art. Yeah.

20:49 Yeah. Anyway, awesome. All right. Well, thanks, Brian. Thank you, paper. Thank you for listening to Python bytes. Follow the show on twitter at Python bytes. That's Python bytes as in be yts and get the full show notes at Python bytes at FM. If you have a news item you want featured, just visit Python bytes.fm and send it our way. We're always on the lookout for sharing something cool. This is Brian Aachen and on behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page