#243: Django unicorns and multi-region PostgreSQL

Published Wed, Jul 21, 2021, recorded Wed, Jul 21, 2021

Watch the live stream:

Play on YouTube

Watch the live stream replay

About the show

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to

00:04 your earbuds. This is episode 243, recorded July 21st, 2021. And I'm Brian Okken.

00:11 And I'm Michael Kennedy.

00:13 And I'm Simon Wilson.

00:14 Welcome, Simon. Thanks for agreeing to show up today.

00:17 No problem at all. I've been looking forward to this.

00:19 If anybody doesn't know who you are, can we do a quick, who's Simon?

00:23 Sure. So yeah, my name's Simon Wilson. I've been doing Python bits and pieces for

00:28 around about 20 years now. So I'm a co-creator of the Django web framework from many,

00:33 many years ago. I think Django has definitely celebrated its 15th birthday now.

00:38 But more recently, I've been working on a set of open source tools around the site.

00:42 This project I have called Dataset, which is a web application for exploring a relational database,

00:48 a SQLite database. But it also has tools for publishing those databases online,

00:52 building those databases out of lots of different sources of data. I'm trying to

00:56 bootstrap an entire ecosystem of data and analytics tooling around SQLite, because it turns out

01:02 everyone in the world has SQLite, even though they don't necessarily know that they have it.

01:06 And there's some really cool stuff that you can do with it.

01:08 Yeah, it's a really cool project.

01:09 Yeah, it is. If you wanted to create your own personal search engine, that would let you just

01:13 go and say, search your Gmail, your Twitter, your Instagram, and your file system all at once.

01:18 Yep.

01:19 That's pretty much it, right?

01:20 That's part of the tooling. Yeah, there's a whole side of it, which I've called dog sheep

01:25 ridiculous reasons. But the dog sheep project is about personal analytics, it's about getting

01:30 your personal tweets and messages and all of the personal data about yourself into one place.

01:36 So you've got essentially a little mini data warehouse on your laptop that you can use to

01:40 query aspects of your own life. And that's been a really fun way of driving features in the software,

01:45 which can then be applied to like company databases and so forth as well.

01:49 Yeah, super cool.

01:50 Well, if I didn't want to do SQLite, I might want to use Mongo. What do you think?

01:55 You may want to. And so there's some big news around MongoDB. MongoDB 5 is out, which, you know,

02:03 I'm all about MongoDB, which makes me super excited. Probably won't switch right, right away because I

02:08 don't actually need the features that are there, but I'm super excited to see things going strong.

02:13 So some of the things that are relevant, and I think they're really relevant to Python people,

02:18 especially the data science side. So basically there's, there's two important things. One has

02:25 to do with working with time series and the other has to do with stability of the app that you don't

02:31 want to keep changing so that you can upgrade your database, right? Like if the database API

02:36 slightly changes, you don't want to have to deal with those incompatibilities until you're ready to take

02:40 advantage of the benefits of making those changes. So one of the things that comes with is in the

02:45 database that are native time series schemas and collection types. That's incredible. Yeah. So

02:52 you can do really interesting things like a moving average as a query with across like data and stored

02:58 data in a format that's meant to make that incredibly fast and low latency, but you can also do like,

03:03 I would like the numerical derivative over time as a moving average, as a query or the integral

03:10 of this collection has it. So you can do like math as part of your query and get it to calculate those

03:17 things in really interesting ways. So the time series has things like clustered indexes and window

03:22 functions and all sorts of interesting things. So that's one. it automatically optimizes your schema

03:28 for high efficient storage, which is pretty cool. That's think independent of the time series, but not

03:33 a hundred percent sure. it has the other big thing is the versioned API for future proof apps.

03:39 So suppose you build against version, I guess five is the one that has it. Do you build against version

03:44 five of MongoDB? And then eventually some point like version seven comes along and like, Oh, you can do this

03:49 new way of querying, but it's going to break some stuff. So you want to use it. You got to fix your app.

03:53 You can just say, I want the database to look like version five forever. And no matter what version

04:00 is in production, it'll, it'll behave the right way according to what you said you wanted it to

04:05 behave right. So you could say, I want version seven to be like five for me, but it can be version seven

04:09 for someone else. That kind of thing. Yeah. The other thing, the way that you talk to it, the way that you

04:13 interact with it is through just a terminal app you fired up or a command prompt app and you talk to it.

04:19 And traditionally this thing has been gross. It's been like, it's fine, but it has zero syntax

04:24 highlighting. It has zero autocomplete those types of things, right? So they're introducing a new shell.

04:31 So traditionally you would have typed Mongo, enter connected. Now you type Mongo SH because the old

04:37 one is still there for compatibility reasons, but that one now has syntax highlighting, better error

04:42 checking, pretty printing, autocomplete, things like that. So if you're going to do stuff on the shell,

04:48 then you really should just run the new one. That's pretty cool. I'm going to go with

04:50 Mongo SH as the, Oh my gosh. Oh my gosh. What are you doing? Yeah. I'm running the shell, the new one. I know that's

04:58 pretty awesome. And then also they, they're talking about having serverless, serverless instances.

05:05 So like Lambda, Lambda type functions where you don't actually have to manage the database or things

05:10 like that. So I didn't know a whole lot about it. You can also watch the, keynote and actually

05:14 their whole conference, the keynote is probably most relevant here. It turns out that it's for a

05:19 public billion dollar company or whatever they're worth. It's incredibly amateur amateurish and,

05:24 and like more like a talent fair of like a high school or something like that, but whatever you'll

05:29 still learn. I mean, it's like, you'll, you'll see it's, it's like super. I have to check it out now.

05:34 Yeah. It's like worth watching for the, like the blush worthy, like, Oh,

05:40 Oh, Oh, come on. Okay. Well, let's just move on now, please. But nonetheless, you do, they do,

05:46 demo some interesting things and whatnot. So that's probably enough on that. But if you're

05:51 into MongoDB, MongoDB five has a lot of cool things to talk about there.

05:54 You know what else is cool and coming up?

05:57 Python 311.

05:59 We don't, we don't even have Python 310 yet. So, well, I do. you, the beta is available

06:06 for three 10, you can run it, but the alpha is around for three 11, which is a neat. Nice.

06:12 and, what I wanted to highlight here was, highlight, was enhanced error locations

06:20 in tracebacks. I'm so excited about this. This is so cool. So, I mean, Python has not been

06:26 that bad for tracebacks. I've, I've dealt with worse tracebacks, but the, it points out what

06:32 line that's going on, but sometimes there's like weird stuff, like none, not irrefrencible or

06:36 something. And you don't know what's going on, but now it'll in three 11, it will point to exactly

06:42 what part of the line has the error, with little, little carrots underneath pointing

06:47 exactly where it's at. That is actually super cool. So like the example you got on the screen here on the

06:52 announcement, you've got multiple objects being accessing their fields, like 0.1.x, 0.2.x. And the

07:00 error is none type object has no attribute X, which is probably the most common error that you'll ever

07:06 find in Python. But what I like about it that you're pointing out here is like the second object is the

07:13 one that is none. And it actually highlights, no, no, not the first one, the second one, because there's

07:17 nothing about the error message that would tell you which of these two things was the problem.

07:21 Yeah. That's awesome. Yeah. And it's, it's deep into the, so if you have a deep stack trace,

07:27 it'll show you exactly where into it. And even like there, there's another example where it shows

07:32 like, deep into a dictionary, deep dictionary, D reference or something, right. And it, and it

07:39 points out exactly which index is the one that's messing up. so that's pretty amazing. also even

07:46 math, arithmetic expressions, like a division by zero, you've got multiple divisions, which one

07:52 is the problem? And it'll show you exactly which one it is. The thing I love about this,

07:57 change is this is one of those things. This is absurdly difficult. Like this is like acres

08:02 of computer science and a bunch of people working together on this for, I couldn't even imagine

08:06 how long it took them to do, to make something, which is just a beautiful little incremental

08:11 improvement to our lives as Python developers. But if you, if you, I think the release notes

08:15 actually talk about some of the internal changes they had to make that to get this to work. This

08:19 is like really deep stuff and it's totally worth it for what you get out of it. But it it's, I think,

08:24 I think it's easy to look at this and think, okay, that's a reasonably sensible, small change.

08:28 And this was not a small change at all.

08:29 And I think it's going to dramatically increase the on-ramping of new people into Python because,

08:35 being able to like figure out what's wrong with your code, that's, you know, basics.

08:41 I mean, some of us old hatters, are used to digging into like confusing tracebacks, but,

08:47 some new people are not. So if we can make them less confusing, that'll be great.

08:51 Right. When I work with new programmers, it's so common. You get it, they get a traceback and

08:55 they freeze because this utter, utter meaningless junk has just shown up on their screen. And what

09:00 are they supposed to do with that? And here it feels like this is just such a huge improvement because

09:04 at least it's pointing to the bit in the giant blob of text that they should be paying attention to.

09:08 Yeah. Lovely. Yeah. I want it in 3.10 though, but we have to wait till 3.11.

09:12 From futures, import nice stack trace or trace back. Yeah. Very cool. All right. So Simon,

09:20 you got the third one. Tell us all about it. Okay. So fly.io, a hosting provider who I've been,

09:26 they launched about a year ago. I've been following along because they're doing some really interesting

09:29 stuff around hosting Docker containers and all my stuff is in Docker containers. So I'm always looking for

09:34 things where I can throw a docking container to host online. Their secret sauce is that they do

09:39 geographic hosting. So you can ask them to run your container in like Tokyo and San Francisco and London,

09:46 and they will do that and they will direct traffic to the closest version of that app. So it's this

09:51 thing. I worked at Eventbrite for many years. And one of the things I was always trying to figure out was,

09:56 okay, could we run Eventbrite close to our users? Could we have like European, a database in Europe

10:01 and a database in New York and give people a faster experience that way? Incredibly difficult to do.

10:06 Right. But what a lot of people do is they do CDNs. So the static content, but then there's one server

10:12 somewhere that is really the one, right? That's the problem. It's the database, it's the application code

10:17 and then it's the database server, especially. And so what fly.io are doing is making it so much easier

10:23 to do this, that you could start a project and have it geographically distributed from day one without

10:27 having to think particularly hard about it. So I like that about them. But then they wrote this,

10:31 this, this article came out within the last week, I think. And it talks about their plan for multi-region

10:37 databases. And in that case, they're talking about Postgres and this desire to have Postgres data,

10:43 have like Postgres databases distributed around the world. And so when you're doing that, splitting up,

10:48 you're having rights to multiple places remains incredibly difficult, but a very common pattern is you say,

10:53 okay, we're going to have the leads database is in, I don't know, New York and that all of the rights

10:58 go to that. And then any of the reads get spread out to a replica database that's running in different

11:04 places around the world. And that's still a really difficult thing to set up with the geographic load

11:08 balancing. So what they propose is basically run your application all the way around the world and set

11:14 it up so that if anyone tries to write to the database and they're not talking to the leads database

11:19 server, the error gets caught and the application server replies to fly CDN and say, it says, hey,

11:26 we run this request against the leader database in New York. And so the user doesn't see anything at

11:30 all. The user attempts to do something and it works. And what's actually happened is they tried to do a

11:36 right against Tokyo. Tokyo said, oh, we can't handle rights fly invisibly sort of internally redirected to

11:43 New York. And the right happened against New York and the result came back. And so this takes

11:46 geographically distributing your database reads, which used to be, I mean, I was thinking it was

11:52 going to be a team of engineers for six months to get this working. And it's just baked into their

11:56 platform. It's this incredibly elegant piece of sort of systems engineering design that they've done.

12:02 And I was fascinated. I've banged my head against this problem for so long and they just solved it.

12:07 You know, they just said, hey, here's a way it will work. We've shipped it, try it out.

12:11 I, as I, as something of a architecture nerd, this really fascinated me.

12:16 This is fascinating. Yeah. And I can see just, you know, we've got like the retry

12:21 decorators and stuff for various Python functions. Like I could see almost a, you know,

12:26 like retry the right decorator that you put on them. And it just goes, it catches the error and it just

12:32 goes, nope, we're going to send it everywhere it goes. And then, then return the result, right? Like

12:35 Yeah. And it's basically put decorators anywhere you're going to ever do a right and you're good

12:39 to go. Exactly. And in fact, they've even got example code for Ruby on Rails. We don't even

12:44 have to do that. They catch the database error that says, you know, you tried to do a write in a read

12:49 only transaction and they turn that into an HTTP header that replays it against the lead region.

12:54 And that's it. It's like this it's in, on the one hand, it's kind of an awful,

12:58 clergy hack, but it's also genius. Like this is taking six months of engineering work and turning

13:03 it into add these five lines of code. Now your application works all the way around the world.

13:08 It fascinates me. Yeah. This is pretty interesting. Yeah.

13:13 They also, I've got, there's one other link in the show notes. There's a second article they put out

13:17 a few days ago, which is just doing something. It's more about using Redis as a cache in your

13:24 geographical data centers. So you can have a local Redis, like, because I mean, their argument is

13:31 people in London tend to be interested in other things that people in London are interested. Ditto for

13:35 Tokyo. So actually distributing your cache by city normally gives you really good cache hit rates.

13:41 But they also pointed out that, and I didn't know that Redis could do this. Redis can be set up to

13:46 allow rights to supposedly read only replicas. So you can have a local cache that you're writing to and

13:52 reading from, but still have that leader Redis in your main data center that can send rights out to all

13:57 of those replicas. So that gives you cache invalidation from a central point. You can, in your sort of

14:03 lead Redis, you can say, "Okay, everyone delete the cache entry for whatever this thing is." And all of

14:08 those replicas around the world will then delete that cache entry, even though normally they're acting

14:12 independently. And yeah, it's, again, this is for, if you're a systems architecture design nerd, the stuff that

14:18 they're doing is so interesting. I think it's interesting, and I'm not one of those.

14:22 Maybe you are and you didn't realize.

14:25 You will be next year. You will be next year. Fantastic. Yeah, this is super cool as well. And

14:31 yeah, it seems really useful. You know, and it's perfectly in line with like, let's take our app and

14:37 put the logic in multiple places. Because that person is unlikely to move from Tokyo to

14:44 Virginia during a session. But once they start in one place, they're going to stay in that place.

14:51 And so the cache would reasonably just have like their local data on that one instance, right?

14:56 Yeah.

14:57 Yeah.

14:57 Cool. But maybe your CDN or not your CDN, your CMS is like generated a page and everybody needs that

15:03 always to be in sync, right? There's that global data as well. Yeah. So very cool. I like this.

15:08 Check it out.

15:09 Indeed.

15:09 Well, let's talk about unicorns.

15:11 I love unicorns. So unicorns, the magical creature. And Simon, I'm so glad that you're here

15:16 because we can get your thoughts on this, even if you maybe haven't been like deep down in it.

15:21 So not too long ago, we talked about HTMX, which I'm still a big fan of HTMX. It's a cool like

15:27 sprinkling of magic onto JavaScript stuff onto your page to make it more interactive. But if you're

15:33 doing Django, HTMX is very relevant, but there's also this thing called Django unicorn at Django-unicorn.com.

15:40 It's a magical full stack framework for Django. So the idea is that you can create these templates,

15:46 these interactive templates without going and rewriting everything in like some front end

15:52 framework, like React or something like that. You can skip the JavaScript build tools because

15:56 you know, you've got a lot less of that. And you can skip a bunch of serializers and just use Django

16:02 for like the API bits. So you install unicorn, you create component, and then at the top of your

16:07 template, you put load, you know, percent load unicorn, and then you can just give it a,

16:12 one of these names. So for example, here's a little task. Task one is tell people about unicorn.

16:18 I can add that as too many will tell people about unicorn. And you can see like this cool little thing

16:24 is interacting and it's not a refreshing the page, right? It's like a front end framework type of thing.

16:30 But the way that you write it is you just put some extra complete pieces on there, like unicorn colon

16:36 prevent, submit, prevent, and we're going to do this add function instead. And if somebody hits the

16:42 escape key, we're going to change the value. And you know, that's not JavaScript. Those are just

16:47 HTML attributes, but they turn into JavaScript, right? Which is very cool. So, and then you just put your

16:53 regular Django template business down and, and off it goes. And it turns it into basically something

16:59 that's way more front end framework friendly. Simon, what do you think?

17:02 So as far as I can tell, the real magic here is that they're using, they're doing the trick

17:08 where you render the HTML on the server. In this case, use reusing your Django template. And then the,

17:13 they send back JSON with a blob of HTML in which you then essentially write into an inner HTML to update

17:19 the page. And I love this pattern. Like, this is, sort of fun. I I've always been a big fan of the

17:25 progressive enhancement, method of writing JavaScript where you get the stuff to more or

17:30 less work without any JavaScript at all. And then if there's JavaScript, then you get in page, page

17:34 updates and all of that kind of thing. but there's also one of the problems I've seen with,

17:40 all sorts of lots of engineering shops that try and do that is that you're not writing your templates

17:44 twice. You have the Django templates that know how to do something, and then you have front end templates

17:48 using react or handlebars or whatever that know how to do something. And you have to keep those in sync,

17:53 which is an enormous waste of time for everyone involved. So what they're doing here then is

17:57 they're handling that they're cleaning up that inconsistency for you. You write a, you write a

18:03 Django template. They can then render, they can use that template in Python code to generate just that

18:08 fragment of HTML, send that back and have that displayed on the page. So yeah, I think this is

18:12 a really interesting approach. I've not spent much time with Django unicorn itself, but,

18:16 it also reminds me a bit of the, I think it's called hot, hot, hot wire. The, Ruby on rails

18:22 community built this, this very exciting, framework again, against these kinds of principles,

18:28 just shipping blobs of HTML back and forth. I feel like it's, something like the,

18:33 the mad rush towards single page applications over the past 10 years, is mostly resulted in applications that load slower and, take, take, take longer for

18:41 people to build. And they're so inconsistent and they make me so crazy. For example, I'll go to

18:48 like a bank or something and I'll say, all right, I'm going to run my one password,

18:52 pre-fill the page and you'll see it fill out the page. And then you try to submit it. It goes,

18:57 please fill out this field. And there's clearly like an email address or something in there.

19:01 What do you got to do? Go put a space, delete the space. So the JavaScript event triggers because

19:05 they're like, not really, not really HTML. It's all that junk. And it's just like,

19:11 yeah, you know what I'm in. But it turns out what people actually want is they don't want a full page

19:16 reload. Like anyone who's getting into single page apps and so on really, they just don't want that

19:20 flicker when the browser reloads everything. So using this trick where if JavaScript is available,

19:25 you update a section of the page using stuff that came back from an Ajax API totally works. And that,

19:30 that feels like the model here and also the hotline model from Wales.

19:34 Exactly. Yeah. So the HTMX, the hotwire and this, it's all about, let's not write new stuff. Let's

19:40 just take the views and the templates already doing their magic. And let's just put the little pieces

19:45 in there to make them dynamic, which I'm all about this. This is great.

19:48 What I've missed is why is this a Django thing? Is it, is it because it uses the Django templates or is

19:54 that? It looks like it. Yeah. It looks like the magic here is that it's using Django templates.

20:00 It's because it has its own. And the models.

20:03 It provides its own views to us because it needs to provide views that have provided

20:07 JSON API where you can send it data from a form. It then renders that Django template in Python code

20:12 and then sends you back the stuff. So there's two sides to this, right? There's the Python Django

20:17 view functions they've written, but they've also written a sort of eight kilobytes, I think of

20:20 JavaScript that, that, that, that hooks it up on the front end. Cool. Nice.

20:24 Yep. Yep. Very neat. So not very much code at all to get your Django to become more dynamic,

20:30 which is great. Yeah. So, our, I don't think unicorns are blue. I'm not really sure what

20:37 color unicorns. I feel like they could be any color. Like they might be rainbow, but, but this, that actually,

20:42 that's not a rainbow. It's not a rainbow. I want to, I want to talk about blue and I'm, I'm, I think I'm,

20:50 I think I'm ready, to have tomatoes thrown at me or something for bringing this up.

20:55 but so blue is, is an alternative to black. anyway. so I love black. I think black's awesome,

21:05 but there are times where you can't use it. and in the, for specific reasons. And, I'm thinking

21:13 here may see basically about the decision that black made to default to, not a default,

21:20 but enforce, double quotes on strings instead of single quotes. There are some code bases where

21:27 there's already a standard to use single quotes. And then there's also code bases where there's so

21:32 many strings that actually have mixed quotes. So you've got, single quotes and then double quotes

21:39 inside. And you know, mine end up mixed sometimes because if I want to put quote something in the

21:45 actual string, I'll use single quotes on the outside. But if I'm going to say it's a good idea,

21:50 I'll put double quotes on the outside. So I don't have to escape the single quote. You know, like if,

21:55 if you're going to have one of the quotes in the string, then just go with the other one is often

21:58 something I'll end up doing. Oh, but actually, black does that for you. If you've got a string

22:03 with a single quote in a string with a double quote, and that's the one time that black will use single

22:07 quotes, which is kind of neat. Okay. Okay. That's good. Yeah. Good to know. I do like that,

22:11 but okay. So if this is this, the sticking point is really just the quotes, then maybe try blue.

22:17 So blue is, is actually, I was worried that it was going to be a fork of black. It's not a fork. It's,

22:23 it's sort of, in includes black and it like, overwrites some of the functionality

22:29 and specifically just a few things. So the differences are the defaults to single quote strings,

22:35 and except for, except for things with places where we love double quotes, like,

22:40 doc strings and triple credit switch strings. For some reason, those look weird with single quotes.

22:45 So, I'm on board with that. it defaults the line lengths to 79 and I don't really care.

22:52 Cause I always override that to like 120 or something like that. and I like black that black allows

22:58 that overriding. and then the other thing that I didn't even think about, which is kind of nice is,

23:03 one of the things black does is, takes the hash. like if you have, hash comments

23:08 on the, on your right side of your code, you've got like a block of them. Like, like maybe you're

23:13 talking about an entire block of code. So you have a block of comments, black alike, remove the

23:18 white space in front of the hash, whereas blue will leave those alone. So you can have block comments

23:24 on the side. that's really it. That's the only difference. and I think having this

23:30 around is a neat thing. interesting quote from the doc is that they'd actually don't want to keep,

23:35 keep this project alive for very long. They'd really like these to just be options in black.

23:40 Yeah.

23:41 I don't know how viral they'll get, but.

23:43 Yeah.

23:43 I don't think that's going to happen. I think black is pretty hardcore guarantee.

23:49 like they're very into not adding configuration where they can still avoid it.

23:53 Yeah. in researching this, one of the things I, somehow missed about black,

24:00 maybe I haven't read the documentation in a long time, but a couple of years ago,

24:03 it added, the ability to have format off and format on. So, one of the things,

24:09 for instance, occasionally, not very often, occasionally I've got a large chunk of data

24:16 set up in, in like a, a list or, or dictionary, something with, that I have called the,

24:23 I have them aligned with comma alignment, like an old style CSV table. and black totally like

24:30 a 19 80 C programmer.

24:32 Yeah. Oh, sure. but black totally tears that apart, but for that you can, you can turn formatting

24:38 off and, I appreciate that.

24:40 Oh, that's cool. That's a good feature.

24:42 Yeah. See, it does have a little bit of, a little bit of give. but yeah.

24:47 Yeah. That's cool. Yeah. Very good one. Very good one.

24:50 So we got next.

24:51 Oh, okay. so this is, there's a link in the show notes. This, this is an article that,

24:57 somebody wrote about using Tesseract OCR to build yourself a searchable index of your screenshots.

25:04 and I got really excited about this because Tesseract is like, Tesseract's been around since 1995,

25:10 I think it's a huge, it was started off at Hewlett Packard and it's pretty much still the leading

25:15 light of OCR in the open source space, but I've never managed to get it to work. And I've always

25:19 wanted OCR that I can just run. And thanks to this article, I can actually use Tesseract now. So

25:25 I've got a couple of demos here. Can we see this? Yeah. So, I grabbed a screenshot just of the,

25:30 a random slide from our conversation earlier and I can run, let's see, I think it's Tesseract,

25:36 screenshot.png. I'll put it in a file called screenshot dash. You have to tell it the language

25:41 that you're using because that affects how it does these things. And it's what's like 70 odd languages,

25:45 I think. and I'm going to say, I want that as a TXT file and you run it. And now if I can't

25:51 screenshot.txt, this is the launch today, MongoDB 5.0. This is the screenshot I took of our conversation

25:58 earlier. A better example even would be the, I took a screenshot of Python documentation just now.

26:04 So I can run that same command, except I'll do it against Python docs.png, Python docs.png. I'll call

26:11 it P screenshot. There we go. Okay. And now if I cat this, this is pretty decent OCR against the screenshot

26:19 of a pilot documentation. The really fun thing though, is that you can say you want it as a PDF file.

26:25 And if you do that, it will give you a PDF, which is visually identical to the screenshot,

26:29 but has selectable text on it. So you can copy and paste out of that PDF. So, the chap whose

26:36 article is linked in the, in the notes, his trick is he has a folder on his

26:43 computer that he saved screenshots to, and he has a automated script that then turns those screenshots

26:49 into these annotated PDFs, which means that spotlight on his Mac can now search them. So anything that he

26:54 drops into that folder, a few seconds later becomes available to global search on his computer.

26:58 I think that's a really neat trick.

27:00 I love it. That's great.

27:02 Then the, so yeah, there's so much stuff I want to do with this. yeah, it was Alec,

27:08 Alexandru Nedlesu. I don't know if I'm pronouncing that correctly. wrote all of this up. but yeah,

27:14 it's, it's, it's, you can install it with homebrew. It's brew install tesseract. There's

27:19 actually a Python library called PI. I think it's called PI tesseract, which I thought was doing

27:23 complicated things with C modules. Actually, if you read the source, it's just shelling out to this

27:28 command. So apparently that's the state of the art in, in Python, OCR is shell out to the tesseract

27:34 command line tool, which I'm perfectly happy to do, you know?

27:37 Yeah. I really like this. You know, it's, if you've got a bunch of image data and you want to

27:43 be able to do interesting things with it, like here's a really quick and easy way to do it. Right.

27:47 Right. It's super simple. The, this article also, I didn't know that you could use the Mac,

27:53 launch D I think you can use, you can add a launch agent, which automatically runs a script when a file

28:00 is saved in a certain folder. So in this case, he's got a launch script that runs the, the, the

28:04 test rack to OCR stuff, but this is great right now. I can automate any folder on my Mac to do

28:09 basically anything using this system that's built into the operating system, but I didn't know how to

28:13 use. I didn't know you could do that either. That's great. That's cool. Yeah. Yeah. That's awesome.

28:18 I feel like this is right up your alley, Simon, you know, with the data set, the dog sheep and like,

28:24 oh, here's this data we got from this, this automation. And yet I just can't dig into it. And now you can.

28:29 I'm really excited about this. Although, so Apple photos, the next version of macOS,

28:34 Apple photos is going to do OCR and all of your photographs for you. So you can search for text

28:39 in pictures that you've taken. And, if it's anything like the current version of,

28:44 our sex photos, all of that data is going to be stored in SQLite databases on your computer.

28:48 Like I've been, having a huge amount of fun building things against my Apple photos library,

28:54 because they already run machine learning labeling against your photos. They know when you take a photo of

28:59 a dog and they tag it with dog and the word dog is in a SQLite database on your computer.

29:04 So once you've figured that out, you can run SQL queries against photos you've taken and say,

29:09 say, show me every photo I've taken of a dog that was in San Francisco on like in the month of May.

29:17 And you get results back, which is crazy interesting.

29:19 Yeah.

29:20 That's pretty cool.

29:21 Yeah. That's super cool. I love the stuff that you're doing with that.

29:25 Is it just local or is there, are they caching that in their own databases as well?

29:30 Oh, well, so they synchronize it all. So if you're using iCloud, your photos are synchronized up to

29:35 their servers that you take a photo on your phone, it shows up on your computer automatically, but all

29:40 of it's the actual local data storage is all SQLite database files. Apple are really big into SQLite.

29:45 So yeah, there are just these files littering your computer with your address book in there and all of

29:50 your iMessages and all of your photo metadata, it's just sat there waiting for you to dig in and play

29:56 with it.

29:56 Nice. With dataset, probably.

29:59 Right? Yep. I've got a script called, I'll add it to the show notes. I've got a script called Dog

30:05 Sheep Photos, which uploads your photos to your own S3 bucket so that you can actually link to them and

30:11 embed them on web pages. And it extracts all of that SQLite data into a more usable format. So yeah,

30:17 I've got a online database of all of my photographs that I update every now and then with the script.

30:23 And it works. It's phenomenal what you can do with it.

30:25 Cool.

30:26 Out in the live scene, Brandon, hey, Brandon says, this is fantastic. Definitely excited.

30:31 And also taking a step back to yours, Brian, David Colton. Hey, David says, I'm using double quotes now

30:38 in black, but my typing has not evolved yet to double quotes. So you just pass it through the single

30:43 quote to double quote compiler process called black. And then you got it all adapted. That's nice.

30:48 Yeah.

30:49 I've saved like black has given me back. I estimate 5% of my program typing time used to be worrying

30:56 about indentation and such like, and I got all of that back. Like, thanks to black. I never even

31:01 think about how I indent or style my code at all. I just say, I, I, I'll literally write horrible run on

31:07 lines that go on for ages and then run black and it formats it nicely. And I forget about it. It's,

31:13 it's wonderful. It's fantastic.

31:14 That's cool.

31:15 Yeah. Great.

31:16 got any extras for us, Michael?

31:18 you know, I do. I always do. Unless I have an extra, extra, extra, you're all about it. Then I

31:23 guess I still do. So, we talked about strong typing last time, which lets you do cool stuff like

31:31 go and put a decorator onto a function and say, well, this one, you know, if it has type

31:37 and annotations or type information, like Python itself just does, if you put at match typing the

31:43 decorator on there, it'll verify it runtime that you said it took an integer and you actually pass

31:48 an integer, not a list or whatever to that parameter. Right. Yeah.

31:52 Well, Felix, who maintains this project reached out today that actually does a whole lot more

31:56 that you should check some other things out. I just wanted to highlight a couple of things that

31:59 he pointed out one. If we, you know, we're all familiar with the named tuple and you, you say

32:05 the type name in a quote, and then you say the fields or the elements attributes in a list, either

32:12 space or comma separated, like spell, mana, fact, and so on. So this one has a typed name tuple where

32:19 you can put the type information in very similar ways to what Python would have like colon,

32:23 str, colon list, and so on. And then you get actual type runtime validation that your

32:29 data going into your named tuple is actually the type of data you expect in your name tuple.

32:33 Oh, nice.

32:34 That's good.

32:34 Isn't that neat?

32:35 Yeah.

32:36 Yeah. So there's that. And then also, I love this about our show. It's, it's kind of blows my

32:41 mind that this, this is how the world works. And I really appreciate this. Everyone who plays along,

32:46 we'll say things like, oh, I wish we could specify indexes in Beanie. And then like the next episode,

32:52 we're like, Hey, look, Roman added a way to do indexes in Beanie. And I said, this is awesome that it applies to

32:59 functions, but why couldn't it apply to classes? It's basically the same thing. And so now six days

33:04 ago, we have a new feature. You can also apply strong typing to classes as well or something

33:11 like that. So well done. Well done.

33:13 Is it because you asked for it? Because I mean, I asked for single quotes in black and I didn't get

33:19 that, but...

33:20 Well, I mean, it also may depend on the size of the project. The more input they get, the less

33:27 influence any individual statement may have on it. Right. Yeah. Anyway, I feel like thanks for

33:33 working on that and the extra information there. Yeah.

33:35 I actually, one other thing. Yes. I have finally, I've been working to make sure that we don't have to

33:42 have one of these completely useless, dreadful talks on technology. Our site uses cookies. Here's our

33:50 cookie policy. Do you accept our cookie policy or do you not accept our cookie policy? AKA, would you

33:55 like our website to work or would you like to go away? Like that's kind of what the button so often

33:59 means. Right. and so I thought I removed all the analytics. I removed anything else that we might

34:05 doing third, be doing third party. We're good. And I went to Python bytes and I'm like, wait, there's,

34:10 there's double click. There's Facebook, there's Google. There's like, what is all this stuff?

34:14 And we started including the live stream YouTube in bed and it started bringing back. And I'm like,

34:20 why would Google be putting in Facebook? That sucks. And there was also the discus

34:25 conversation stuff that people haven't really stopped using. They all just go and chat on the

34:29 YouTube streams. Now they want to have a live comment type of thing. So I'm like, well, I'll

34:33 just take that out. That got rid of the Facebook one. and then, but what do you do about, about

34:38 that? So I, instead of embedding the YouTube player, I said, I'm going to figure out a way to get the

34:44 picture automatically from YouTube, the poster. And then when you hover over it, it just has a play

34:49 icon. It says play on YouTube and it opens up a new window. And I thought I was all clever by just putting

34:54 the image there, but serving it from Google. No, there's now like the YouTube image servers putting

35:01 tracking cookies on our site. I'm like, well, come on. Why is this so hard? So now on the server,

35:06 we use requests. We download the image anytime it has to be shown on a page, put it in MongoDB.

35:12 And then if you pull it, we serve it back out so we can like strip the cookies, the tracking cookies out.

35:17 Nice. And now, now when you look at the tracking content, none detected on the site,

35:23 but why, why world does it have to be so hard? I just want to.

35:26 Isn't it amazing how it used to be YouTube embeds were the absolute gold standard for

35:31 embedding video on a webpage. Like that, why would you do anything else? And now actually I'm beginning

35:35 to think, you know what? Post the video, the .mp.mod file or whatever yourself and stick on an HTML5 video

35:42 embed. And that's probably a better experience for your users as well. Because you know, when they click the

35:47 video on their mobile phone, it'll play full screen and they won't have to hop through to the YouTube

35:50 app and all of that kind of thing. Yeah, absolutely. Yeah. So anyway, just quick shout out, like this is

35:55 taking several passes, but I think it's finally 100% no tracking. I mean, we weren't putting there before,

36:02 but like it was seeping in from just like what we might include on the page as content. Right. So anyway,

36:08 there you have it, Brian. That was my weekend. How was it? Nice. Well, thanks. I appreciate you doing all that work for us.

36:14 Yeah. David Colton has the wash hands emoji. There we go. We're all better. Yeah. Well, I've got no

36:21 extras. Simon, do you have anything extra you want to share? I've got one. So Textual is the,

36:27 you know, and Will McGugan, who's working on Rich has been building Textual, which I know you've talked

36:31 about on the podcast before. What I would encourage people to do is pay close attention because I've never

36:36 seen a piece of open source software developed this quickly. Like every day he's posting this video where he's

36:43 like, oh, and here's the new feature where today he posted a video of it doing full like tree view on a file system,

36:50 which you could interact with with your mouse in the terminal. And when you clicked on a file, it would open it in a separate panel with,

36:56 like with, with syntax highlighting. It's, it's absolutely astonishing. It's like turning into one of the better ways of building a

37:04 GUI application and it's running in, in text in the terminal. We can almost have just a section of the show

37:10 called what's, what's Will up to. You really could. Absolutely. Yeah. He's, he's re implemented CSS grid,

37:17 the CSS grid mechanism for terminal applications. It's brilliant. And yeah, I'm just having such a great time

37:23 watching him do all of this stuff. And he seems to be live streaming it.

37:27 I don't think so, but he posts like little five minute videos on Twitter every day of the stuff that

37:33 he's doing. But I feel inadequate watching him work this fast, but just saying. It's such a delight

37:39 though. It's like he was, he was born to build this piece of software and now he's building it and we all get

37:44 to watch him do it. Yeah. That's great. Yeah. Henry Schreiner out in the live stream says textual is

37:50 amazing. Indeed. It's, it's quite, quite something. Yeah. And I know, I remember when he was trying to

37:55 name it and textual didn't even come up on my radar as something that might be possible, but it's,

38:00 it's so obvious now like graphical and textual. Yeah. It makes sense. It's cool. So, hey, how about a joke?

38:08 Maybe. Oh man, I got some jokes for us. Two jokes. The one, I'm not really sure how to convey it,

38:13 but I, yes, I'll do my best. I want you to sing. No, man, this is you. This is you, bro. All right.

38:20 So first one here is, I could definitely do this one. This one is, from John on Twitter,

38:26 but pointed out to us by Nick Moore, who was previously on the show not too long ago. Thanks,

38:30 Nick. And this one poses, I think also this is perfect for when Simon is on the show. It says,

38:35 what do you get when you select star from goblins, dragons, elves, and comma unicorns,

38:42 a query tale. Oh my goodness.

38:46 It's a fairy tale, a query tale. It's bad.

38:49 It's terrible. It's bad. Oh, wow.

38:50 well, I wanted to share one that people could actually share with their, this isn't in the

38:56 list, but one that people, I just read recently, people might be able to share with their kids.

39:01 in the Northwest, we've got, Sasquatch, right? So, you know what they, yeah, what do they call Bigfoot in Europe?

39:08 Big meter. Oh, it's pretty bad. quick tip. If you're ever near Santa Cruz in California,

39:18 there is a Bigfoot museum in a log cabin in the woods outside of Santa Cruz called the Bigfoot discovery

39:24 experience. And it is not a joke. It is very serious. And there is a man there who will take

39:29 you through all of his evidence for big, Bigfoot. And it takes about an hour. He's got maps and

39:34 plaster casts of feet, footprints and a map with pins on it. And it's fascinating. I could not recommend

39:40 it more. Yeah.

39:42 I wonder if the COVID pandemic has affected the Bigfoot population.

39:46 Oh, you, you should, well, go, go, go, go. You can call him up and ask him while I was talking to him.

39:51 He got a phone call to answer questions about Bigfoot. So he will, he will answer your calls. Yeah.

39:57 All right. Hey, Brian, your joke got a grown all the way from Australia.

40:01 Nice.

40:03 What was it mine? I'm not sure. It could have been either. Honestly.

40:06 Yeah.

40:07 I think I'm going to go with the meter one.

40:10 They were both pretty bad.

40:11 All right. I'll see what I can do this with this next one here. So if, if you're a kid of the nineties,

40:18 I guess it's probably the time there's a pinky in the brain.

40:21 And apparently on one of the 10 places I have to write your name, I typed it too quickly and wrote brain.

40:29 Yeah. And Brett Cannon caught it.

40:33 And so, so he, he did a take on pinky in the brain and it starts out.

40:38 What do you want to do today, Brian?

40:40 Same thing.

40:41 We do everything with every Wednesday, Michael help Python take over the world.

40:45 It's Michael and the brain. Yes.

40:48 Michael and the brain.

40:49 One's into testing others in the GUIs.

40:52 They're both into making Python seem sane.

40:55 They're Michael.

40:56 They're Michael and the brain brain brain.

40:58 Yeah.

40:59 Yeah.

40:59 Fantastic.

41:00 I love it.

41:00 Phenomenal.

41:01 We need to have somebody that's got like musical talent to actually put this together as something.

41:07 So anyway, yes.

41:08 Someone who is not me because it won't come out.

41:10 Well, so we'll put in this with the lyrics in the show notes.

41:14 I think we should leave them there so that we are accepting submissions.

41:17 Yes.

41:18 And if they are, if they pass, we may actually play them on one of the next episodes.

41:23 Oh, I'd love it.

41:24 Yeah.

41:24 Could be the new theme song, Brian.

41:25 Yeah.

41:27 I'm getting tired of the old theme song.

41:30 Yeah, exactly.

41:31 Which is no theme song.

41:32 All right.

41:35 Well, thanks.

41:36 thanks a lot for, showing up Michael and thanks Simon.

41:39 this was fun.

41:40 Thanks for having me.

41:41 Yep.

41:41 You bet.

41:41 Bye everyone.

41:42 Thanks for listening to Python Bytes.

41:44 Follow the show on Twitter via at Python Bytes.

41:47 That's Python Bytes as in B-Y-T-E-S.

41:50 Get the full show notes over at Pythonbytes.fm.

41:53 If you have a news item we should cover, just visit Pythonbytes.fm and click,

41:57 submit in the nav bar.

41:58 We're always on the lookout for sharing something cool.

42:00 If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live.

42:07 That's usually happening at noon Pacific on Wednesdays over at YouTube.

42:12 On behalf of myself and Brian Okken, this is Michael Kennedy.

42:15 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript