Transcript #243: Django unicorns and multi-region PostgreSQL
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to
00:04 your earbuds. This is episode 243, recorded July 21st, 2021. And I'm Brian Okken.
00:11 And I'm Michael Kennedy.
00:13 And I'm Simon Wilson.
00:14 Welcome, Simon. Thanks for agreeing to show up today.
00:17 No problem at all. I've been looking forward to this.
00:19 If anybody doesn't know who you are, can we do a quick, who's Simon?
00:23 Sure. So yeah, my name's Simon Wilson. I've been doing Python bits and pieces for
00:28 around about 20 years now. So I'm a co-creator of the Django web framework from many,
00:33 many years ago. I think Django has definitely celebrated its 15th birthday now.
00:38 But more recently, I've been working on a set of open source tools around the site.
00:42 This project I have called Dataset, which is a web application for exploring a relational database,
00:48 a SQLite database. But it also has tools for publishing those databases online,
00:52 building those databases out of lots of different sources of data. I'm trying to
00:56 bootstrap an entire ecosystem of data and analytics tooling around SQLite, because it turns out
01:02 everyone in the world has SQLite, even though they don't necessarily know that they have it.
01:06 And there's some really cool stuff that you can do with it.
01:08 Yeah, it's a really cool project.
01:09 Yeah, it is. If you wanted to create your own personal search engine, that would let you just
01:13 go and say, search your Gmail, your Twitter, your Instagram, and your file system all at once.
01:18 Yep.
01:19 That's pretty much it, right?
01:20 That's part of the tooling. Yeah, there's a whole side of it, which I've called dog sheep
01:25 ridiculous reasons. But the dog sheep project is about personal analytics, it's about getting
01:30 your personal tweets and messages and all of the personal data about yourself into one place.
01:36 So you've got essentially a little mini data warehouse on your laptop that you can use to
01:40 query aspects of your own life. And that's been a really fun way of driving features in the software,
01:45 which can then be applied to like company databases and so forth as well.
01:49 Yeah, super cool.
01:50 Well, if I didn't want to do SQLite, I might want to use Mongo. What do you think?
01:55 You may want to. And so there's some big news around MongoDB. MongoDB 5 is out, which, you know,
02:03 I'm all about MongoDB, which makes me super excited. Probably won't switch right, right away because I
02:08 don't actually need the features that are there, but I'm super excited to see things going strong.
02:13 So some of the things that are relevant, and I think they're really relevant to Python people,
02:18 especially the data science side. So basically there's, there's two important things. One has
02:25 to do with working with time series and the other has to do with stability of the app that you don't
02:31 want to keep changing so that you can upgrade your database, right? Like if the database API
02:36 slightly changes, you don't want to have to deal with those incompatibilities until you're ready to take
02:40 advantage of the benefits of making those changes. So one of the things that comes with is in the
02:45 database that are native time series schemas and collection types. That's incredible. Yeah. So
02:52 you can do really interesting things like a moving average as a query with across like data and stored
02:58 data in a format that's meant to make that incredibly fast and low latency, but you can also do like,
03:03 I would like the, the numerical derivative over time as a moving average, as a query or the integral
03:10 of this collection has it. So you can do like math as part of your query and get it to calculate those
03:17 things in really interesting ways. So the time series has things like clustered indexes and window
03:22 functions and all sorts of interesting things. So that's one. it automatically optimizes your schema
03:28 for high efficient storage, which is pretty cool. That's think independent of the time series, but not
03:33 a hundred percent sure. it has the other big thing is the versioned API for future proof apps.
03:39 So suppose you build against version, I guess five is the one that has it. Do you build against version
03:44 five of MongoDB? And then eventually some point like version seven comes along and like, Oh, you can do this
03:49 new way of querying, but it's going to break some stuff. So you want to use it. You got to fix your app.
03:53 You can just say, I want the database to look like version five forever. And no matter what version
04:00 is in production, it'll, it'll behave the right way according to what you said you wanted it to
04:05 behave right. So you could say, I want version seven to be like five for me, but it can be version seven
04:09 for someone else. That kind of thing. Yeah. The other thing, the way that you talk to it, the way that you
04:13 interact with it is through just a terminal app you fired up or a command prompt app and you talk to it.
04:19 And traditionally this thing has been gross. It's been like, it's fine, but it has zero syntax
04:24 highlighting. It has zero autocomplete those types of things, right? So they're introducing a new shell.
04:31 So traditionally you would have typed Mongo, enter connected. Now you type Mongo SH because the old
04:37 one is still there for compatibility reasons, but that one now has syntax highlighting, better error
04:42 checking, pretty printing, autocomplete, things like that. So if you're going to do stuff on the shell,
04:48 then you really should just run the new one. That's pretty cool. I'm going to go with
04:50 Mongo SH as the, Oh my gosh. Oh my gosh. What are you doing? Yeah. I'm running the shell, the new one. I know that's
04:58 pretty awesome. And then also they, they're talking about having serverless, serverless instances.
05:05 So like Lambda, Lambda type functions where you don't actually have to manage the database or things
05:10 like that. So I didn't know a whole lot about it. You can also watch the, keynote and actually
05:14 their whole conference, the keynote is probably most relevant here. It turns out that it's for a
05:19 public billion dollar company or whatever they're worth. It's incredibly amateur amateurish and,
05:24 and like more like a talent fair of like a high school or something like that, but whatever you'll
05:29 still learn. I mean, it's like, you'll, you'll see it's, it's like super. I have to check it out now.
05:34 Yeah. It's like worth watching for the, like the, the, the blush worthy, like, Oh,
05:40 Oh, Oh, come on. Okay. Well, let's just move on now, please. But nonetheless, you do, they do,
05:46 demo some interesting things and whatnot. So that's probably enough on that. But if you're
05:51 into MongoDB, MongoDB five has a lot of cool things to talk about there.
05:54 You know what else is cool and coming up?
05:57 Python 311.
05:59 We don't, we don't even have Python 310 yet. So, well, I do. you, the beta is available
06:06 for three 10, you can run it, but the alpha is around for three 11, which is a neat. Nice.
06:12 and, what I wanted to highlight here was, highlight, was enhanced error locations
06:20 in tracebacks. I'm so excited about this. This is so cool. So, I mean, Python has not been
06:26 that bad for tracebacks. I've, I've dealt with worse tracebacks, but the, it points out what
06:32 line that's going on, but sometimes there's like weird stuff, like none, not irrefrencible or
06:36 something. And you don't know what's going on, but now it'll in three 11, it will point to exactly
06:42 what part of the line has the error, with little, little carrots underneath pointing
06:47 exactly where it's at. That is actually super cool. So like the example you got on the screen here on the
06:52 announcement, you've got multiple objects being accessing their fields, like 0.1.x, 0.2.x. And the
07:00 error is none type object has no attribute X, which is probably the most common error that you'll ever
07:06 find in Python. But what I like about it that you're pointing out here is like the second object is the
07:13 one that is none. And it actually highlights, no, no, not the first one, the second one, because there's
07:17 nothing about the error message that would tell you which of these two things was the problem.
07:21 Yeah. That's awesome. Yeah. And it's, it's deep into the, so if you have a deep stack trace,
07:27 it'll show you exactly where into it. And even like there, there's another example where it shows
07:32 like, deep into a dictionary, deep dictionary, D reference or something, right. And it, and it
07:39 points out exactly which index is the one that's messing up. so that's pretty amazing. also even
07:46 math, arithmetic expressions, like a division by zero, you've got multiple divisions, which one
07:52 is the problem? And it'll show you exactly which one it is. The thing I love about this,
07:57 change is this is one of those things. This is absurdly difficult. Like this is like acres
08:02 of computer science and a bunch of people working together on this for, I couldn't even imagine
08:06 how long it took them to do, to make something, which is just a beautiful little incremental
08:11 improvement to our lives as Python developers. But if you, if you, I think the release notes
08:15 actually talk about some of the internal changes they had to make that to get this to work. This
08:19 is like really deep stuff and it's totally worth it for what you get out of it. But it it's, I think,
08:24 I think it's easy to look at this and think, okay, that's a reasonably sensible, small change.
08:28 And this was not a small change at all.
08:29 And I think it's going to dramatically increase the on-ramping of new people into Python because,
08:35 being able to like figure out what's wrong with your code, that's, you know, basics.
08:41 I mean, some of us old hatters, are used to digging into like confusing tracebacks, but,
08:47 some new people are not. So if we can make them less confusing, that'll be great.
08:51 Right. When I work with new programmers, it's so common. You get it, they get a traceback and
08:55 they freeze because this utter, utter meaningless junk has just shown up on their screen. And what
09:00 are they supposed to do with that? And here it feels like this is just such a huge improvement because
09:04 at least it's pointing to the bit in the giant blob of text that they should be paying attention to.
09:08 Yeah. Lovely. Yeah. I want it in 3.10 though, but we have to wait till 3.11.
09:12 From futures, import nice stack trace or trace back. Yeah. Very cool. All right. So Simon,
09:20 you got the third one. Tell us all about it. Okay. So fly.io, a hosting provider who I've been,
09:26 they launched about a year ago. I've been following along because they're doing some really interesting
09:29 stuff around hosting Docker containers and all my stuff is in Docker containers. So I'm always looking for
09:34 things where I can throw a docking container to host online. Their secret sauce is that they do
09:39 geographic hosting. So you can ask them to run your container in like Tokyo and San Francisco and London,
09:46 and they will do that and they will direct traffic to the closest version of that app. So it's this
09:51 thing. I worked at Eventbrite for many years. And one of the things I was always trying to figure out was,
09:56 okay, could we run Eventbrite close to our users? Could we have like European, a database in Europe
10:01 and a database in New York and give people a faster experience that way? Incredibly difficult to do.
10:06 Right. But what a lot of people do is they do CDNs. So the static content, but then there's one server
10:12 somewhere that is really the one, right? That's the problem. It's the database, it's the application code
10:17 and then it's the database server, especially. And so what fly.io are doing is making it so much easier
10:23 to do this, that you could start a project and have it geographically distributed from day one without
10:27 having to think particularly hard about it. So I like that about them. But then they wrote this,
10:31 this, this article came out within the last week, I think. And it talks about their plan for multi-region
10:37 databases. And in that case, they're talking about Postgres and this desire to have Postgres data,
10:43 have like Postgres databases distributed around the world. And so when you're doing that, splitting up,
10:48 you're having rights to multiple places remains incredibly difficult, but a very common pattern is you say,
10:53 okay, we're going to have the leads database is in, I don't know, New York and that all of the rights
10:58 go to that. And then any of the reads get spread out to a replica database that's running in different
11:04 places around the world. And that's still a really difficult thing to set up with the geographic load
11:08 balancing. So what they propose is basically run your application all the way around the world and set
11:14 it up so that if anyone tries to write to the database and they're not talking to the leads database
11:19 server, the error gets caught and the application server replies to fly CDN and say, it says, hey,
11:26 we run this request against the leader database in New York. And so the user doesn't see anything at
11:30 all. The user attempts to do something and it works. And what's actually happened is they tried to do a
11:36 right against Tokyo. Tokyo said, oh, we can't handle rights fly invisibly sort of internally redirected to
11:43 New York. And the right happened against New York and the result came back. And so this takes
11:46 geographically distributing your database reads, which used to be, I mean, I was thinking it was
11:52 going to be a team of engineers for six months to get this working. And it's just baked into their
11:56 platform. It's this incredibly elegant piece of sort of systems engineering design that they've done.
12:02 And I was fascinated. I've banged my head against this problem for so long and they just solved it.
12:07 You know, they just said, hey, here's a way it will work. We've shipped it, try it out.
12:11 I, as I, as something of a architecture nerd, this really fascinated me.
12:16 This is fascinating. Yeah. And I can see just, you know, we've got like the retry
12:21 decorators and stuff for various Python functions. Like I could see almost a, you know,
12:26 like retry the right decorator that you put on them. And it just goes, it catches the error and it just
12:32 goes, nope, we're going to send it everywhere it goes. And then, then return the result, right? Like
12:35 Yeah. And it's basically put decorators anywhere you're going to ever do a right and you're good
12:39 to go. Exactly. And in fact, they've even got example code for Ruby on Rails. We don't even
12:44 have to do that. They catch the database error that says, you know, you tried to do a write in a read
12:49 only transaction and they turn that into an HTTP header that replays it against the lead region.
12:54 And that's it. It's like this it's in, on the one hand, it's kind of an awful,
12:58 clergy hack, but it's also genius. Like this is taking six months of engineering work and turning
13:03 it into add these five lines of code. Now your application works all the way around the world.
13:08 It fascinates me. Yeah. This is pretty interesting. Yeah.
13:13 They also, I've got, there's one other link in the show notes. There's a second article they put out
13:17 a few days ago, which is just doing something. It's more about using Redis as a cache in your
13:24 geographical data centers. So you can have a local Redis, like, because I mean, their argument is
13:31 people in London tend to be interested in other things that people in London are interested. Ditto for
13:35 Tokyo. So actually distributing your cache by city normally gives you really good cache hit rates.
13:41 But they also pointed out that, and I didn't know that Redis could do this. Redis can be set up to
13:46 allow rights to supposedly read only replicas. So you can have a local cache that you're writing to and
13:52 reading from, but still have that leader Redis in your main data center that can send rights out to all
13:57 of those replicas. So that gives you cache invalidation from a central point. You can, in your sort of
14:03 lead Redis, you can say, "Okay, everyone delete the cache entry for whatever this thing is." And all of
14:08 those replicas around the world will then delete that cache entry, even though normally they're acting
14:12 independently. And yeah, it's, again, this is for, if you're a systems architecture design nerd, the stuff that
14:18 they're doing is so interesting. I think it's interesting, and I'm not one of those.
14:22 Maybe you are and you didn't realize.
14:25 You will be next year. You will be next year. Fantastic. Yeah, this is super cool as well. And
14:31 yeah, it seems really useful. You know, and it's perfectly in line with like, let's take our app and
14:37 put the logic in multiple places. Because that person is unlikely to move from Tokyo to
14:44 Virginia during a session. But once they start in one place, they're going to stay in that place.
14:51 And so the cache would reasonably just have like their local data on that one instance, right?
14:56 Yeah.
14:57 Yeah.
14:57 Cool. But maybe your CDN or not your CDN, your CMS is like generated a page and everybody needs that
15:03 always to be in sync, right? There's that global data as well. Yeah. So very cool. I like this.
15:08 Check it out.
15:09 Indeed.
15:09 Indeed.
15:09 Well, let's talk about unicorns.
15:11 I love unicorns. So unicorns, the magical creature. And Simon, I'm so glad that you're here
15:16 because we can get your thoughts on this, even if you maybe haven't been like deep down in it.
15:21 So not too long ago, we talked about HTMX, which I'm still a big fan of HTMX. It's a cool like
15:27 sprinkling of magic onto JavaScript stuff onto your page to make it more interactive. But if you're
15:33 doing Django, HTMX is very relevant, but there's also this thing called Django unicorn at Django-unicorn.com.
15:40 It's a magical full stack framework for Django. So the idea is that you can create these templates,
15:46 these interactive templates without going and rewriting everything in like some front end
15:52 framework, like React or something like that. You can skip the JavaScript build tools because
15:56 you know, you've got a lot less of that. And you can skip a bunch of serializers and just use Django
16:02 for like the API bits. So you install unicorn, you create component, and then at the top of your
16:07 template, you put load, you know, percent load unicorn, and then you can just give it a,
16:12 one of these names. So for example, here's a little task. Task one is tell people about unicorn.
16:18 I can add that as too many will tell people about unicorn. And you can see like this cool little thing
16:24 is interacting and it's not a refreshing the page, right? It's like a front end framework type of thing.
16:30 But the way that you write it is you just put some extra complete pieces on there, like unicorn colon
16:36 prevent, submit, prevent, and we're going to do this add function instead. And if somebody hits the
16:42 escape key, we're going to change the value. And you know, that's not JavaScript. Those are just
16:47 HTML attributes, but they turn into JavaScript, right? Which is very cool. So, and then you just put your
16:53 regular Django template business down and, and off it goes. And it turns it into basically something
16:59 that's way more front end framework friendly. Simon, what do you think?
17:02 So as far as I can tell, the, the real magic here is that they're using, they're doing the trick
17:08 where you render the HTML on the server. In this case, use reusing your Django template. And then the,
17:13 they send back JSON with a blob of HTML in which you then essentially write into an inner HTML to update
17:19 the page. And I love this pattern. Like, this is, sort of fun. I I've always been a big fan of the
17:25 progressive enhancement, method of writing JavaScript where you get the stuff to more or
17:30 less work without any JavaScript at all. And then if there's JavaScript, then you get in page, page
17:34 updates and all of that kind of thing. but there's also one of the problems I've seen with,
17:40 all sorts of lots of engineering shops that try and do that is that you're not writing your templates
17:44 twice. You have the Django templates that know how to do something, and then you have front end templates
17:48 using react or handlebars or whatever that know how to do something. And you have to keep those in sync,
17:53 which is an enormous waste of time for everyone involved. So what they're doing here then is
17:57 they're handling that they're cleaning up that inconsistency for you. You write a, you write a
18:03 Django template. They can then render, they can use that template in Python code to generate just that
18:08 fragment of HTML, send that back and have that displayed on the page. So yeah, I think this is
18:12 a really interesting approach. I've not spent much time with Django unicorn itself, but,
18:16 it also reminds me a bit of the, I think it's called hot, hot, hot wire. The, Ruby on rails
18:22 community built this, this very exciting, framework again, against these kinds of principles,
18:28 just shipping blobs of HTML back and forth. I feel like it's, something like the,
18:33 the mad rush towards single page applications over the past 10 years, is mostly resulted in applications that load slower and, take, take, take longer for
18:41 people to build. And they're so inconsistent and they make me so crazy. For example, I'll go to
18:48 like a bank or something and I'll say, all right, I'm going to run my one password,
18:52 pre-fill the page and you'll see it fill out the page. And then you try to submit it. It goes,
18:57 please fill out this field. And there's clearly like an email address or something in there.
19:01 What do you got to do? Go put a space, delete the space. So the JavaScript event triggers because
19:05 they're like, not really, not really HTML. It's all that junk. And it's just like,
19:11 yeah, you know what I'm in. But it turns out what people actually want is they don't want a full page
19:16 reload. Like anyone who's getting into single page apps and so on really, they just don't want that
19:20 flicker when the browser reloads everything. So using this trick where if JavaScript is available,
19:25 you update a section of the page using stuff that came back from an Ajax API totally works. And that,
19:30 that feels like the model here and also the hotline model from Wales.
19:34 Exactly. Yeah. So the HTMX, the hotwire and this, it's all about, let's not write new stuff. Let's
19:40 just take the views and the templates already doing their magic. And let's just put the little pieces
19:45 in there to make them dynamic, which I'm all about this. This is great.
19:48 What I've missed is why is this a Django thing? Is it, is it because it uses the Django templates or is
19:54 that? It looks like it. Yeah. It looks like the, the magic here is that it's using Django templates.
20:00 It's because it has its own. And the models.
20:03 It provides its own views to us because it needs to provide views that have provided
20:07 JSON API where you can send it data from a form. It then renders that Django template in Python code
20:12 and then sends you back the stuff. So there's two sides to this, right? There's the Python Django
20:17 view functions they've written, but they've also written a sort of eight kilobytes, I think of
20:20 JavaScript that, that, that, that hooks it up on the front end. Cool. Nice.
20:24 Yep. Yep. Very neat. So not very much code at all to get your Django to become more dynamic,
20:30 which is great. Yeah. So, our, I don't think unicorns are blue. I'm not really sure what
20:37 color unicorns. I feel like they could be any color. Like they might be rainbow, but, but this, that actually,
20:42 that's not a rainbow. It's not a rainbow. I want to, I want to talk about blue and I'm, I'm, I think I'm,
20:50 I think I'm ready, to have tomatoes thrown at me or something for bringing this up.
20:55 but so blue is, is an alternative to black. anyway. so I love black. I think black's awesome,
21:05 but there are times where you can't use it. and in the, for specific reasons. And, I'm thinking
21:13 here may see basically about the decision that black made to default to, not a default,
21:20 but enforce, double quotes on strings instead of single quotes. There are some code bases where
21:27 there's already a standard to use single quotes. And then there's also code bases where there's so
21:32 many strings that actually have mixed quotes. So you've got, single quotes and then double quotes
21:39 inside. And you know, mine end up mixed sometimes because if I want to put quote something in the
21:45 actual string, I'll use single quotes on the outside. But if I'm going to say it's a good idea,
21:50 I'll put double quotes on the outside. So I don't have to escape the single quote. You know, like if,
21:55 if you're going to have one of the quotes in the string, then just go with the other one is often
21:58 something I'll end up doing. Oh, but actually, black does that for you. If you've got a string
22:03 with a single quote in a string with a double quote, and that's the one time that black will use single
22:07 quotes, which is kind of neat. Okay. Okay. That's good. Yeah. Good to know. I do like that,
22:11 but okay. So if this is this, the sticking point is really just the quotes, then maybe try blue.
22:17 So blue is, is actually, I was worried that it was going to be a fork of black. It's not a fork. It's,
22:23 it's sort of, in includes black and it like, overwrites some of the functionality
22:29 and specifically just a few things. So the differences are the defaults to single quote strings,
22:35 and except for, except for things with places where we love double quotes, like,
22:40 doc strings and triple credit switch strings. For some reason, those look weird with single quotes.
22:45 So, I'm on board with that. it defaults the line lengths to 79 and I don't really care.
22:52 Cause I always override that to like 120 or something like that. and I like black that black allows
22:58 that overriding. and then the other thing that I didn't even think about, which is kind of nice is,
23:03 one of the things black does is, takes the hash. like if you have, hash comments
23:08 on the, on your right side of your code, you've got like a block of them. Like, like maybe you're
23:13 talking about an entire block of code. So you have a block of comments, black alike, remove the
23:18 white space in front of the hash, whereas blue will leave those alone. So you can have block comments
23:24 on the side. that's really it. That's the only difference. and I, I think having this
23:30 around is a neat thing. interesting quote from the doc is that they'd actually don't want to keep,
23:35 keep this project alive for very long. They'd really like these to just be options in black.
23:40 Yeah.
23:41 I don't know how viral they'll get, but.
23:43 Yeah.
23:43 I don't think that's going to happen. I think black is pretty hardcore guarantee.
23:49 like they're very into not adding configuration where they can still avoid it.
23:53 Yeah. in researching this, one of the things I, somehow missed about black,
24:00 maybe I haven't read the documentation in a long time, but a couple of years ago,
24:03 it added, the ability to have format off and format on. So, one of the things,
24:09 for instance, occasionally, not very often, occasionally I've got a large chunk of data
24:16 set up in, in like a, a list or, or dictionary, something with, that I have called the,
24:23 I have them aligned with comma alignment, like an old style CSV table. and black totally like
24:30 a 19 80 C programmer.
24:32 Yeah. Oh, sure. but black totally tears that apart, but for that you can, you can turn formatting
24:38 off and, I appreciate that.
24:40 Oh, that's cool. That's a good feature.
24:42 Yeah. See, it does have a little bit of, a little bit of give. but yeah.
24:47 Yeah. That's cool. Yeah. Very good one. Very good one.
24:50 So we got next.
24:51 Oh, okay. so this is, there's a link in the show notes. This, this is an article that,
24:57 somebody wrote about using Tesseract OCR to build yourself a searchable index of your screenshots.
25:04 and I got really excited about this because Tesseract is like, Tesseract's been around since 1995,
25:10 I think it's a huge, it was started off at Hewlett Packard and it's pretty much still the leading
25:15 light of OCR in the open source space, but I've never managed to get it to work. And I've always
25:19 wanted OCR that I can just run. And thanks to this article, I can actually use Tesseract now. So
25:25 I've got a couple of demos here. Can we see this? Yeah. So, I grabbed a screenshot just of the,
25:30 a random slide from our conversation earlier and I can run, let's see, I think it's Tesseract,
25:36 screenshot.png. I'll put it in a file called screenshot dash. You have to tell it the language
25:41 that you're using because that affects how it does these things. And it's what's like 70 odd languages,
25:45 I think. and I'm going to say, I want that as a TXT file and you run it. And now if I can't
25:51 screenshot.txt, this is the launch today, MongoDB 5.0. This is the screenshot I took of our conversation
25:58 earlier. A better example even would be the, I took a screenshot of Python documentation just now.
26:04 So I can run that same command, except I'll do it against Python docs.png, Python docs.png. I'll call
26:11 it P screenshot. There we go. Okay. And now if I cat this, this is pretty decent OCR against the screenshot
26:19 of a pilot documentation. The really fun thing though, is that you can say you want it as a PDF file.
26:25 And if you do that, it will give you a PDF, which is visually identical to the screenshot,
26:29 but has selectable text on it. So you can copy and paste out of that PDF. So, the chap whose
26:36 article is linked in the, in the, the, the notes, his trick is he has a folder on his
26:43 computer that he saved screenshots to, and he has a automated script that then turns those screenshots
26:49 into these annotated PDFs, which means that spotlight on his Mac can now search them. So anything that he
26:54 drops into that folder, a few seconds later becomes available to global search on his computer.
26:58 I think that's a really neat trick.
27:00 I love it. That's great.
27:02 Then the, so yeah, there's so much stuff I want to do with this. yeah, it was Alec,
27:08 Alexandru Nedlesu. I don't know if I'm pronouncing that correctly. wrote all of this up. but yeah,
27:14 it's, it's, it's, you can install it with homebrew. It's brew install tesseract. There's
27:19 actually a Python library called PI. I think it's called PI tesseract, which I thought was doing
27:23 complicated things with C modules. Actually, if you read the source, it's just shelling out to this
27:28 command. So apparently that's the state of the art in, in Python, OCR is shell out to the tesseract
27:34 command line tool, which I'm perfectly happy to do, you know?
27:37 Yeah. I really like this. You know, it's, if you've got a bunch of image data and you want to
27:43 be able to do interesting things with it, like here's a really quick and easy way to do it. Right.
27:47 Right. It's super simple. The, this article also, I didn't know that you could use the Mac,
27:53 launch D I think you can use, you can add a launch agent, which automatically runs a script when a file
28:00 is saved in a certain folder. So in this case, he's got a launch script that runs the, the, the
28:04 test rack to OCR stuff, but this is great right now. I can automate any folder on my Mac to do
28:09 basically anything using this system that's built into the operating system, but I didn't know how to
28:13 use. I didn't know you could do that either. That's great. That's cool. Yeah. Yeah. That's awesome.
28:18 I feel like this is right up your alley, Simon, you know, with the, the data set, the dog sheep and like,
28:24 oh, here's this data we got from this, this automation. And yet I just can't dig into it. And now you can.
28:29 I'm really excited about this. Although, so Apple photos, the next version of macOS,
28:34 Apple photos is going to do OCR and all of your photographs for you. So you can search for text
28:39 in pictures that you've taken. And, if it's anything like the current version of,
28:44 our sex photos, all of that data is going to be stored in SQLite databases on your computer.
28:48 Like I've been, having a huge amount of fun building things against my Apple photos library,
28:54 because they already run machine learning labeling against your photos. They know when you take a photo of
28:59 a dog and they tag it with dog and the word dog is in a SQLite database on your computer.
29:04 So once you've figured that out, you can run SQL queries against photos you've taken and say,
29:09 say, show me every photo I've taken of a dog that was in San Francisco on like in the month of May.
29:17 And you get results back, which is crazy interesting.
29:19 Yeah.
29:20 That's pretty cool.
29:21 Yeah. That's super cool. I love the stuff that you're doing with that.
29:25 Is it just local or is there, are they caching that in their own databases as well?
29:30 Oh, well, so they synchronize it all. So if you're using iCloud, your photos are synchronized up to
29:35 their servers that you take a photo on your phone, it shows up on your computer automatically, but all
29:40 of it's the actual local data storage is all SQLite database files. Apple are really big into SQLite.
29:45 So yeah, there are just these files littering your computer with your address book in there and all of
29:50 your iMessages and all of your photo metadata, it's just sat there waiting for you to dig in and play
29:56 with it.
29:56 Nice. With dataset, probably.
29:59 Right? Yep. I've got a script called, I'll add it to the show notes. I've got a script called Dog
30:05 Sheep Photos, which uploads your photos to your own S3 bucket so that you can actually link to them and
30:11 embed them on web pages. And it extracts all of that SQLite data into a more usable format. So yeah,
30:17 I've got a online database of all of my photographs that I update every now and then with the script.
30:23 And it works. It's phenomenal what you can do with it.
30:25 Cool.
30:26 Out in the live scene, Brandon, hey, Brandon says, this is fantastic. Definitely excited.
30:31 And also taking a step back to yours, Brian, David Colton. Hey, David says, I'm using double quotes now
30:38 in black, but my typing has not evolved yet to double quotes. So you just pass it through the single
30:43 quote to double quote compiler process called black. And then you got it all adapted. That's nice.
30:48 Yeah.
30:49 I've saved like black has given me back. I estimate 5% of my program typing time used to be worrying
30:56 about indentation and such like, and I got all of that back. Like, thanks to black. I never even
31:01 think about how I indent or style my code at all. I just say, I, I, I'll literally write horrible run on
31:07 lines that go on for ages and then run black and it formats it nicely. And I forget about it. It's,
31:13 it's wonderful. It's fantastic.
31:14 That's cool.
31:15 Yeah. Great.
31:16 got any extras for us, Michael?
31:18 you know, I do. I always do. Unless I have an extra, extra, extra, you're all about it. Then I
31:23 guess I still do. So, we talked about strong typing last time, which lets you do cool stuff like
31:31 go and put a decorator onto a function and say, well, this one, you know, if it has type
31:37 and annotations or type information, like Python itself just does, if you put at match typing the
31:43 decorator on there, it'll verify it runtime that you said it took an integer and you actually pass
31:48 an integer, not a list or whatever to that parameter. Right. Yeah.
31:52 Well, Felix, who maintains this project reached out today that actually does a whole lot more
31:56 that you should check some other things out. I just wanted to highlight a couple of things that
31:59 he pointed out one. If we, you know, we're all familiar with the named tuple and you, you say
32:05 the type name in a quote, and then you say the fields or the elements attributes in a list, either
32:12 space or comma separated, like spell, mana, fact, and so on. So this one has a typed name tuple where
32:19 you can put the type information in very similar ways to what Python would have like colon,
32:23 str, colon list, and so on. And then you get actual type runtime validation that your
32:29 data going into your named tuple is actually the type of data you expect in your name tuple.
32:33 Oh, nice.
32:34 That's good.
32:34 Isn't that neat?
32:35 Yeah.
32:35 Yeah.
32:36 Yeah. So there's that. And then also, I love this about our show. It's, it's kind of blows my
32:41 mind that this, this is how the world works. And I really appreciate this. Everyone who plays along,
32:46 we'll say things like, oh, I wish we could specify indexes in Beanie. And then like the next episode,
32:52 we're like, Hey, look, Roman added a way to do indexes in Beanie. And I said, this is awesome that it applies to
32:59 functions, but why couldn't it apply to classes? It's basically the same thing. And so now six days
33:04 ago, we have a new feature. You can also apply strong typing to classes as well or something
33:11 like that. So well done. Well done.
33:13 Is it because you asked for it? Because I mean, I asked for single quotes in black and I didn't get
33:19 that, but...
33:20 Well, I mean, it also may depend on the size of the project. The more input they get, the less
33:27 influence any individual statement may have on it. Right. Yeah. Anyway, I feel like thanks for
33:33 working on that and the extra information there. Yeah.
33:35 I actually, one other thing. Yes. I, I have finally, I've been working to make sure that we don't have to
33:42 have one of these completely useless, dreadful talks on technology. Our site uses cookies. Here's our
33:50 cookie policy. Do you accept our cookie policy or do you not accept our cookie policy? AKA, would you
33:55 like our website to work or would you like to go away? Like that's kind of what the button so often
33:59 means. Right. and so I thought I removed all the analytics. I removed anything else that we might
34:05 doing third, be doing third party. We're good. And I went to Python bytes and I'm like, wait, there's,
34:10 there's double click. There's Facebook, there's Google. There's like, what is all this stuff?
34:14 And we started including the live stream YouTube in bed and it started bringing back. And I'm like,
34:20 why would Google be putting in Facebook? That sucks. And there was also the discus
34:25 conversation stuff that people haven't really stopped using. They all just go and chat on the
34:29 YouTube streams. Now they want to have a live comment type of thing. So I'm like, well, I'll
34:33 just take that out. That got rid of the Facebook one. and then, but what do you do about, about
34:38 that? So I, instead of embedding the YouTube player, I said, I'm going to figure out a way to get the
34:44 picture automatically from YouTube, the poster. And then when you hover over it, it just has a play
34:49 icon. It says play on YouTube and it opens up a new window. And I thought I was all clever by just putting
34:54 the image there, but serving it from Google. No, there's now like the YouTube image servers putting
35:01 tracking cookies on our site. I'm like, well, come on. Why is this so hard? So now on the server,
35:06 we use requests. We download the image anytime it has to be shown on a page, put it in MongoDB.
35:12 And then if you pull it, we serve it back out so we can like strip the cookies, the tracking cookies out.
35:17 Nice. And now, now when you look at the tracking content, none detected on the site,
35:23 but why, why world does it have to be so hard? I just want to.
35:26 Isn't it amazing how it used to be YouTube embeds were the absolute gold standard for
35:31 embedding video on a webpage. Like that, why would you do anything else? And now actually I'm beginning
35:35 to think, you know what? Post the video, the .mp.mod file or whatever yourself and stick on an HTML5 video
35:42 embed. And that's probably a better experience for your users as well. Because you know, when they click the
35:47 video on their mobile phone, it'll play full screen and they won't have to hop through to the YouTube
35:50 app and all of that kind of thing. Yeah, absolutely. Yeah. So anyway, just quick shout out, like this is
35:55 taking several passes, but I think it's finally 100% no tracking. I mean, we weren't putting there before,
36:02 but like it was seeping in from just like what we might include on the page as content. Right. So anyway,
36:08 there you have it, Brian. That was my weekend. How was it? Nice. Well, thanks. I appreciate you doing all that work for us.
36:14 Yeah. David Colton has the wash hands emoji. There we go. We're all better. Yeah. Well, I've got no
36:21 extras. Simon, do you have anything extra you want to share? I've got one. So Textual is the,
36:27 you know, and Will McGugan, who's working on Rich has been building Textual, which I know you've talked
36:31 about on the podcast before. What I would encourage people to do is pay close attention because I've never
36:36 seen a piece of open source software developed this quickly. Like every day he's posting this video where he's
36:43 like, oh, and here's the new feature where today he posted a video of it doing full like tree view on a file system,
36:50 which you could interact with with your mouse in the terminal. And when you clicked on a file, it would open it in a separate panel with,
36:56 like with, with syntax highlighting. It's, it's absolutely astonishing. It's like turning into one of the better ways of building a
37:04 GUI application and it's running in, in text in the terminal. We can almost have just a section of the show
37:10 called what's, what's Will up to. You really could. Absolutely. Yeah. He's, he's re implemented CSS grid,
37:17 the CSS grid mechanism for terminal applications. It's brilliant. And yeah, I'm just having such a great time
37:23 watching him do all of this stuff. And he seems to be live streaming it.
37:27 I don't think so, but he posts like little five minute videos on Twitter every day of the stuff that
37:33 he's doing. But I, I feel inadequate watching him work this fast, but just saying. It's such a delight
37:39 though. It's like he was, he was born to build this piece of software and now he's building it and we all get
37:44 to watch him do it. Yeah. That's great. Yeah. Henry Schreiner out in the live stream says textual is
37:50 amazing. Indeed. It's, it's quite, quite something. Yeah. And I know, I remember when he was trying to
37:55 name it and textual didn't even come up on my radar as something that might be possible, but it's,
38:00 it's so obvious now like graphical and textual. Yeah. It makes sense. It's cool. So, hey, how about a joke?
38:08 Maybe. Oh man, I got some jokes for us. Two jokes. The one, I'm not really sure how to convey it,
38:13 but I, yes, I'll do my best. I want you to sing. No, man, this is you. This is you, bro. All right.
38:20 So first one here is, I could definitely do this one. This one is, from John on Twitter,
38:26 but pointed out to us by Nick Moore, who was previously on the show not too long ago. Thanks,
38:30 Nick. And this one poses, I think also this is perfect for when Simon is on the show. It says,
38:35 what do you get when you select star from goblins, dragons, elves, and comma unicorns,
38:42 a query tale. Oh my goodness.
38:46 It's a fairy tale, a query tale. It's bad.
38:49 It's terrible. It's bad. Oh, wow.
38:50 well, I wanted to share one that people could actually share with their, this isn't in the
38:56 list, but one that people, I just read recently, people might be able to share with their kids.
39:01 in the Northwest, we've got, Sasquatch, right? So, you know what they, yeah, what do they call Bigfoot in Europe?
39:08 Big meter. Oh, it's pretty bad. quick tip. If you're ever near Santa Cruz in California,
39:18 there is a Bigfoot museum in a log cabin in the woods outside of Santa Cruz called the Bigfoot discovery
39:24 experience. And it is not a joke. It is very serious. And there is a man there who will take
39:29 you through all of his evidence for big, Bigfoot. And it takes about an hour. He's got maps and
39:34 plaster casts of feet, footprints and a map with pins on it. And it's fascinating. I could not recommend
39:40 it more. Yeah.
39:42 I wonder if the COVID pandemic has affected the Bigfoot population.
39:46 Oh, you, you should, well, go, go, go, go. You can call him up and ask him while I was talking to him.
39:51 He got a phone call to answer questions about Bigfoot. So he will, he will answer your calls. Yeah.
39:57 All right. Hey, Brian, your joke got a grown all the way from Australia.
40:01 Nice.
40:03 What was it mine? I'm not sure. It could have been either. Honestly.
40:06 Yeah.
40:07 I think I'm going to go with the meter one.
40:10 They were both pretty bad.
40:11 All right. I'll see what I can do this with this next one here. So if, if you're a kid of the nineties,
40:18 I guess it's probably the time there's a pinky in the brain.
40:21 And apparently on one of the 10 places I have to write your name, I typed it too quickly and wrote brain.
40:29 Yeah. And Brett Cannon caught it.
40:33 And so, so he, he did a take on pinky in the brain and it starts out.
40:38 What do you want to do today, Brian?
40:40 Same thing.
40:41 We do everything with every Wednesday, Michael help Python take over the world.
40:45 It's Michael and the brain. Yes.
40:48 Michael and the brain.
40:49 One's into testing others in the GUIs.
40:52 They're both into making Python seem sane.
40:55 They're Michael.
40:56 They're Michael and the brain brain brain.
40:58 Yeah.
40:59 Yeah.
40:59 Fantastic.
41:00 I love it.
41:00 Phenomenal.
41:01 We need to have somebody that's got like musical talent to actually put this together as something.
41:07 So anyway, yes.
41:08 Someone who is not me because it won't come out.
41:10 Well, so we'll put in this with the lyrics in the show notes.
41:14 I think we should leave them there so that we are accepting submissions.
41:17 Yes.
41:18 And if they are, if they pass, we may actually play them on one of the next episodes.
41:23 Oh, I'd love it.
41:24 Yeah.
41:24 Could be the new theme song, Brian.
41:25 Yeah.
41:27 I'm getting tired of the old theme song.
41:30 Yeah, exactly.
41:31 Which is no theme song.
41:32 All right.
41:35 Well, thanks.
41:36 thanks a lot for, showing up Michael and thanks Simon.
41:39 this was fun.
41:40 Thanks for having me.
41:41 Yep.
41:41 You bet.
41:41 Bye everyone.
41:42 Thanks for listening to Python Bytes.
41:44 Follow the show on Twitter via at Python Bytes.
41:47 That's Python Bytes as in B-Y-T-E-S.
41:50 Get the full show notes over at Pythonbytes.fm.
41:53 If you have a news item we should cover, just visit Pythonbytes.fm and click,
41:57 submit in the nav bar.
41:58 We're always on the lookout for sharing something cool.
42:00 If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live.
42:07 That's usually happening at noon Pacific on Wednesdays over at YouTube.
42:12 On behalf of myself and Brian Okken, this is Michael Kennedy.
42:15 Thank you for listening and sharing this podcast with your friends and colleagues.