Transcript #243: Django unicorns and multi-region PostgreSQL
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 243, recorded July 21st, 2021.
00:10 And I'm Brian Okken.
00:12 And I'm Michael Kennedy.
00:13 And I'm Simon Wilson.
00:14 Welcome, Simon. Thanks for agreeing to show up today.
00:17 No problem at all. I've been looking forward to this.
00:19 If anybody doesn't know who you are, can we do a quick "Who's Simon?" Sure. So yeah, my name's Simon Wilson.
00:25 I've been doing Python bits and pieces for around about 20 years now. So I'm a co-creator of the Django web framework from many, many years ago. I think Django is celebrating its 15th birthday now. But more recently, I've been working on a set of open source tools around this project I have called Dataset, which is a web application for exploring a relational database, SQLite database, but it also has tools for publishing those databases online, building those databases out of lots of different sources of data.
00:55 I'm trying to bootstrap an entire ecosystem of data and analytics tooling around SQLite because it turns out everyone in the world has SQLite even though they don't necessarily know that they have it.
01:06 And there's some really cool stuff that you can do with it.
01:08 >> Yeah, it's a really cool project.
01:09 >> Yeah, it is.
01:10 If you wanted to create your own personal search engine that would let you just go and say search your Gmail, your Twitter, your Instagram, and your file system all at once.
01:19 That's pretty much it, right?
01:20 >> That's part of the tooling.
01:21 Yeah, there's a whole side of it, which I've called dog sheep for ridiculous reasons.
01:26 But the dog sheep project is about personal analytics.
01:29 It's about getting your personal tweets and messages and all of the personal data about yourself into one place.
01:36 So you've got essentially a little mini data warehouse on your laptop that you can use to query aspects of your own life.
01:41 And that's been a really fun way of driving features in the software, which can then be applied to like company databases and so forth as well.
01:48 Yeah, super cool.
01:50 >> Well, if I didn't want to do SQLite, I might want to use Mongo. What do you think?
01:55 >> You may want to.
01:57 So there's some big news around MongoDB.
01:59 MongoDB 5 is out, which I'm all about MongoDB, which makes me super excited.
02:05 Probably won't switch right away because I don't actually need the features that are there, but I'm super excited to see things going strong.
02:13 So some of the things that are relevant, and I think they're really relevant to Python people, especially the data science side.
02:20 So basically there's two important things.
02:23 One has to do with working with time series and the other has to do with stability of the app that you don't want to keep changing so that you can upgrade your database, right?
02:34 Like if the database API slightly changes, you don't wanna have to deal with those incompatibilities until you're ready to take advantage of the benefits of making those changes.
02:43 So one of the things that comes with is in the database that are native time series, schemas and collection types.
02:50 That's incredible.
02:51 Yeah.
02:51 So you can do really interesting things like a moving average as a query with across like data and stored data in a format that's meant to make that incredibly fast and low latency.
03:02 But you can also do like, I would like the, the numerical derivative over time as a moving average, as a query or the integral of this collection.
03:12 So you can do like math as part of your query and get it to calculate those things in really interesting ways.
03:19 So the time series has things like clustered indexes and window functions and all sorts of interesting things.
03:25 So that's one.
03:26 It automatically optimizes your schema for high-efficient storage, which is pretty cool.
03:31 That's, I think, independent of the time series, but not a hundred percent sure.
03:35 It has, the other big thing is the versioned API for future-proof apps.
03:39 So suppose you build against version, I guess five is the one that has it.
03:43 Do you build against version five of MongoDB?
03:45 And then eventually some point like version seven comes along and like, Oh, you can do this new way of querying, but it's going to break some stuff.
03:51 So you want to use it.
03:53 You got to fix your app.
03:53 You can just say, I want the database to look like version five forever.
03:58 And no matter what version is in production, it'll, it'll behave the right way, according to what you said, you wanted it to behave, right?
04:05 So you could say, I want version seven to be like five for me, but it can be version seven for someone else.
04:09 That kind of thing.
04:10 Yeah.
04:11 The other thing, the way that you talk to it, the way that you interact with it is through just a terminal app you fired up or a command prompt app and you talk to it.
04:19 And traditionally this thing has been gross.
04:21 It's been like, it's fine, but it has zero syntax highlighting.
04:25 It has zero auto-complete, those types of things, right?
04:29 So they're introducing a new shell.
04:31 So traditionally you would have typed Mongo, enter, connected.
04:34 Now you type Mongo SH because the old one is still there for compatibility reasons.
04:38 but that one now has syntax highlighting, better error checking, pretty printing, auto-complete, things like that.
04:46 If you're going to do stuff on the shell, then you really should just run the new one.
04:49 >> That's pretty cool. I'm going to go with Mongoosh as the-
04:52 >> Mongoosh. Oh, Mongoosh, what are you doing?
04:56 Yeah, run the shell, the new one.
04:57 I know it. That's pretty awesome.
04:59 Then also, they're talking about having serverless instances.
05:05 Like Lambda type functions, where you don't actually have to manage the database or things like that.
05:10 So I didn't know a whole lot about it.
05:12 You can also watch the keynote and actually the whole conference.
05:15 But the keynote is probably most relevant here.
05:17 Turns out that it's for a public billion dollar company or whatever they're worth.
05:22 It's incredibly amateurish and more like a talent fair of like a high school or something like that.
05:28 But whatever, you'll still learn.
05:30 I mean, it's like you'll see.
05:31 It's like super.
05:32 >> I have to check it out now.
05:34 >> Yeah, it's worth watching for the blush worthy.
05:40 Like, "Oh, come on.
05:42 Okay, well, let's just move on now, please." But nonetheless, they do demo some interesting things and whatnot.
05:49 That's probably enough on that.
05:50 But if you're into MongoDB, MongoDB 5 has a lot of cool things to talk about there.
05:55 >> What else is cool and coming up?
05:57 >> Python.
05:58 >> 3.11. We don't even have Python 3.10 yet.
06:03 Well, I do. The beta is available for 3.10.
06:07 You can run it, but the Alpha is around for 3.11, which is neat.
06:11 >> Nice.
06:12 >> What I wanted to highlight here was enhanced error locations in tracebacks.
06:21 I'm so excited about this. This is so cool.
06:24 Python has not been that bad for tracebacks.
06:27 I've dealt with worse tracebacks, but it points out what line is going on, but sometimes there's weird stuff like none, not irreferenceable or something, and you don't know what's going on.
06:39 But now in 3.11, it will point to exactly what part of the line has the error, with little carrots underneath pointing exactly where it's at.
06:48 >> That is actually super cool.
06:50 So like the example you got on the screen here on the announcement, you've got multiple objects accessing their fields like 0.1.x, 0.2.x, and the error is none type object has no attribute X, which is probably the most common error that you'll ever find in Python.
07:08 But what I like about it that you're pointing out here is like the second object is the one that is none.
07:14 And it actually highlights, no, no, not the first one, the second one, 'cause there's nothing about the error message that would tell you which of these two things was the problem.
07:22 That's awesome.
07:23 - Yeah, and it's deep into the, so if you have a deep stack trace, it'll show you exactly where into it.
07:29 And even like there's another example where it shows like a deep into a dictionary.
07:35 >> A four-level deep dictionary dereference or something, right?
07:39 >> It points out exactly which index is the one that's messing up.
07:43 So that's pretty amazing.
07:45 Also, even math, arithmetic expressions like a division by zero.
07:50 You've got multiple divisions, which one is the problem?
07:53 It'll show you exactly which one it is.
07:56 >> The thing I love about this change is this is one of those things, this is absurdly difficult.
08:01 This is like acres of computer science and a bunch of people working together on this for I couldn't even imagine how long it took them to do, to make something which is just a beautiful little incremental improvement to our lives as Python developers. But then if you, if you, I think the release notes actually talk about some of the internal changes they had to make that to get this to work. This is like really deep stuff and it's totally worth it for what you get out of it. But it, it's, I think it's easy to look at this and think, okay, that's a reasonably sensible small change and this was not a small change at all.
08:30 >> I think it's going to dramatically increase the on-ramping of new people into Python because being able to figure out what's wrong with your code, that's basics.
08:41 Some of us old hatters are used to digging into confusing tracebacks, but some new people are not.
08:49 If we can make them less confusing, that'll be great.
08:51 >> Right. When I work with new programmers, it's so common, they get a traceback and they freeze because this utter, utter meaningless junk has just shown up on their screen.
09:00 And what are they supposed to do with that?
09:01 And here it feels like this is just such a huge improvement because at least it's pointing to the bit in the giant blob of text that they should be paying attention to.
09:08 - Yeah, lovely.
09:10 - Yeah, I guess. - I want it in 3.10 though.
09:11 But we have to wait till 3.11.
09:13 - From futures, import nice stack trace, or trace back.
09:17 Yeah, very cool.
09:18 All right, so Simon, you got the third one.
09:21 Tell us all about it.
09:22 - Okay, so Fly.io, a hosting provider who they launched about a year ago.
09:27 I've been following along because they're doing some really interesting stuff around hosting Docker containers, and all my stuff is in Docker containers, so I'm always looking for things where I can throw a Docker container that will host online.
09:37 Their secret sauce is that they do geographic hosting, so you can ask them to run your container in Tokyo and San Francisco and London, and they will do that, and they will direct traffic to the closest version of that app.
09:50 It's this thing, I worked at Eventbrite for many years, And one of the things I was always trying to figure out was, okay, could we run Eventbrite close to our users?
09:59 Could we have a database in Europe and a database in New York and give people a faster experience that way?
10:05 Incredibly difficult to do.
10:06 >> Right.
10:07 >> And Fli.io- >> What a lot of people do is they do CDNs, so the static content.
10:10 >> Right.
10:11 That's easy.
10:12 >> But then there's one server somewhere that is really the one that, yeah.
10:15 >> It's the database.
10:16 It's the application code, and then it's the database server especially.
10:19 What Flydl and AYA are doing is making it so much easier to do this, that you could start a project and have it geographically distributed from day one without having to think particularly hard about it. I like that about them. This article came out within the last week, I think, and it talks about their plan for multi-region databases. In their case, they're talking about Postgres and this desire to have Postgres databases distributed around the world. When you're doing that, having rights to multiple places remains incredibly difficult.
10:52 But a very common pattern is you say, "Okay, we're going to have the leads database is in, I don't know, New York and all of the rights go to that." Then any of the reads get spread out to a replica database that's running in different places around the world.
11:05 That's still a really difficult thing to set up with the geographic load balancing.
11:09 What they propose is basically run your application all the way around the world and set it up so that if anyone tries to write to the database and they're not talking to the Leeds database server, the error gets caught and the application server replies to FlyCDN and says, "Hey, rerun this request against the Leeds database in New York." The user doesn't see anything at all.
11:31 The user attempts to do something and it works.
11:34 What actually happened is they tried to do a write against Tokyo.
11:37 Tokyo said, "Oh, we can't handle writes." Fly invisibly internally redirected to New York.
11:43 The write happened against New York and the result came back.
11:45 And so this takes geographically distributing your database reads, which used to be, I mean, I was thinking it was going to be a team of engineers for six months to get this working.
11:55 And it's just baked into their platform.
11:57 It's this incredibly elegant piece of sort of systems engineering design that they've done.
12:02 And I was fascinated, you know, I've banged my head against this problem for so long, and they just solved it.
12:07 You know, they just said, hey, here's a way that will work.
12:10 We've shipped it, try it out.
12:12 I, as something of a architecture nerd, this really fascinated me.
12:16 - This is fascinating, yeah.
12:18 And I can see just, you know, we've got like the retry decorators and stuff for various Python functions.
12:24 Like I could see almost a, you know, like retry the right decorator that you put on them.
12:30 And it just goes, it catches the error and it just goes, nope, we're gonna send it everywhere it goes and then return the result, right?
12:35 Like it basically puts decorators anywhere you're gonna ever do a write and you're good to go.
12:40 - Exactly.
12:41 They have example code for Ruby on Rails. They catch the database error that says you tried to do a write in a read-only transaction and turn it into a HTTP header that replays it against the lead region. And that's it. It's like this -- on the one hand, it's kind of an awful, cludgy hack, but it's also genius. This is taking six months of engineering work and turning it into add these five lines of code and now your application works all the the way around the world.
13:09 It fascinates me.
13:10 - Yeah, this is pretty interesting.
13:13 Yeah.
13:14 - They also, I've got, there's one other link in the show notes.
13:16 There's a second article they put out a few days ago, which is just doing something, it's more about using Redis as a cache in your geographical data centers.
13:26 So you can have a local Redis, because I mean, their argument is people in London tend to be interested in other things that people in London are interested.
13:35 Ditto for Tokyo.
13:36 So actually distributing your cache by city normally gives you really good cache hit rates.
13:42 But they also pointed out that, and I didn't know that Redis could do this, Redis can be set up to allow writes to supposedly read-only replicas.
13:50 So you can have a local cache that you're writing to and reading from, but still have that leader Redis in your main data center that can send writes out to all of those replicas.
13:59 So that gives you cache invalidation from a central point.
14:02 You can, in your sort of lead Redis, you can say, okay, everyone delete the cache entry for whatever this thing is.
14:08 And all of those replicas around the world will then delete that cache entry, even though normally they're acting independently.
14:13 And yeah, it's, again, this is for, if you're a systems architecture design nerd, the stuff that they're doing is so interesting.
14:20 - I think it's interesting and I'm not one of those, but.
14:23 - Maybe you are and you didn't realize.
14:25 (laughing)
14:26 - You will be next year.
14:27 You will be next year.
14:29 Fantastic.
14:29 Yeah, this is super cool as well.
14:31 And it seems really useful, you know, and it's perfectly in line with like, let's take our app and put the logic in multiple places because that person is unlikely to move from Tokyo to Virginia during a session.
14:46 But once they start in one place, they're going to stay in that place.
14:51 So the cache would, would reasonably just have like their local data on that one instance.
14:56 Right.
14:56 Yeah.
14:57 Yeah.
14:57 Cool.
14:58 But maybe your CDN or not your CDN, your CMS is like generated a page and everybody needs that always to be in sync, right?
15:04 There's that global data as well.
15:06 Yeah, so very cool.
15:07 I like this.
15:08 Check it out. - Neat.
15:09 - Neat indeed.
15:10 - Well, let's talk about unicorns.
15:11 - I love the unicorns.
15:12 So unicorns, the magical creature.
15:15 And Simon, I'm so glad that you're here 'cause we can get your thoughts on this, even if you maybe haven't been like deep down in it.
15:21 So not too long ago, we talked about HTMX, which I'm still a big fan of HTMX.
15:26 It's a cool like sprinkling of magic onto JavaScript, these stuff onto your page to make it more interactive.
15:33 But if you're doing Django, HTMX is very relevant, but there's also this thing called Django Unicorn at django-unicorn.com.
15:40 It's a magical full stack framework for Django.
15:44 So the idea is that you can create these templates, these interactive templates without going and rewriting everything in like some front end framework like React or something like that.
15:54 you can skip the JavaScript build tools because you know, you got a lot less of that and you can skip a bunch of serializers and just use Django for like the API bits.
16:04 So you install Unicorn, you create a component and then at the top of your template, you put load, you know, %load Unicorn and then you can just give it a, one of these names.
16:13 So for example, here's a little task.
16:15 Task one is tell people about Unicorn.
16:18 I can add that, adds too many.
16:21 I'll tell people about Unicorn.
16:23 And you can see like this cool little thing is interacting and it's not refreshing the page, right?
16:28 It's like a front end framework type of thing.
16:30 But the way that you write it is you just put some extra template pieces on there like unicorn colon prevent, submit, prevent.
16:38 And you're gonna do this add function instead.
16:40 And if somebody hits the escape key, we're gonna change the value.
16:45 And that's not JavaScript.
16:46 Those are just HTML attributes, but they turn into JavaScript, right?
16:50 Which is very cool.
16:51 So, and then you just put your regular Django template business down and off it goes.
16:57 And it turns it into basically something that's way more front-end framework friendly.
17:02 Simon, what do you think?
17:03 So as far as I can tell, the real magic here is that they're using, they're doing the trick where you render the HTML on the server, in this case, we using your Django template.
17:12 And then they send back JSON with a blob of HTML in which you then essentially write into an inner HTML to update the page.
17:19 And I love this pattern.
17:21 Like this is sort of fun.
17:24 I've always been a big fan of the progressive enhancement method of writing JavaScript, where you get the stuff to more or less work without any JavaScript at all.
17:32 And then if there's JavaScript, then you get in-page, page updates and all of that kind of thing.
17:37 But there's also, one of the problems I've seen with all sorts of, lots of engineering shops that try and do that is that you end up writing your templates twice.
17:44 You have the Django templates that know how to do something.
17:46 And then you have front end templates using React or handlebars or whatever that know how to do something, and you have to keep those in sync, which is an enormous waste of time for everyone involved.
17:55 What they're doing here then is they're handling that, they're cleaning up that inconsistency for you.
18:02 You write a Django template, they can use that template in Python code to generate just that fragment of HTML, send that back and have that displayed on the page.
18:12 I think this is a really interesting approach.
18:13 I've not spent much time with Django Unicorn itself, but it also reminds me a bit of the, I think it's called Hotwire.
18:20 The Ruby on Rails community built this very exciting framework, again, against these kinds of principles, just shipping blobs of HTML back and forth.
18:30 I feel like it's something like the mad rush towards single-page applications over the past 10 years, has mostly resulted in applications that load slower and take longer for people to build.
18:42 >> They're so inconsistent and they make me so crazy.
18:46 for example, I'll go to like a bank or something and I'll say, all right, I'm going to run my one password pre-filled page and you'll see it fill out the page and then you try to submit it.
18:57 It goes, please fill out this field.
18:58 And there's clearly like an email address or something in there.
19:01 What do you got to do?
19:02 Go put a space, delete the space.
19:03 So the JavaScript event triggers because they're like, not really, it's all that junk.
19:10 And it's just like, yeah, you know what I mean?
19:13 What people actually want is they don't want a full page reload.
19:17 Like anyone who's getting into single page apps and so on, really, they just don't want that flicker when the browser reloads everything.
19:22 So using this trick where if JavaScript is available, you update a section of the page using stuff that came back from an Ajax API totally works.
19:30 And that feels like the model here and also the Hotwire model from Rails.
19:33 - Exactly, yeah.
19:35 So the HTMX, the Hotwire, and this, it's all about let's not write new stuff.
19:40 Let's just take the views and the templates already doing their magic and let's just put the little pieces in there to make them dynamic, which I'm all about this.
19:47 This is great.
19:48 >> What I've missed is why is this a Django thing?
19:51 Is it because it uses the Django templates or is that?
19:55 >> It looks like it, yeah.
19:56 It looks like the magic here is that it's using Django templates.
20:00 >> And the view as well.
20:03 >> It provides its own views because it needs to provide views that have provided JSON API you can send it data from a form.
20:10 It then renders that Django template in Python code and then sends you back the stuff.
20:13 There's two sides to this.
20:15 There's the Python Django view functions I've written, but they've also written a eight kilobyte, I think, of JavaScript that hooks it up on the front end.
20:23 >> Cool. Nice.
20:24 >> Yeah. Very neat. Not very much code at all to get your Django to become more dynamic, which is great.
20:31 >> Yeah. I don't think unicorns are blue.
20:36 >> I'm not really sure what color unicorns are.
20:38 I feel like they could be any color.
20:39 They might be rainbow, but actually that's not a rainbow.
20:44 >> It's not a rainbow.
20:45 I want to talk about blue.
20:48 I think I'm ready to have tomatoes thrown at me or something for bringing this up.
20:55 Blue is an alternative to black.
21:01 Anyway, I love black.
21:04 I think black's awesome.
21:05 but there are times where you can't use it for specific reasons.
21:12 I'm thinking here basically about the decision that Black made to default to not a default but enforce double quotes on strings instead of single quotes.
21:24 There are some code bases where there's already a standard to use single quotes, and then there's also code bases where there's so many strings that actually have mixed quotes.
21:36 You've got single quotes and then double quotes inside.
21:40 >> Mine end up mixed sometimes because if I want to put quote something in the actual string, I'll use single quotes on the outside.
21:47 But if I'm going to say it's a good idea, I'll put double quotes on the outside, so I don't have to escape the single quote.
21:55 If you're going to have one of the quotes in the string, then just go with the other one is often something I'll end up doing.
22:00 >> Actually, Black does that for you.
22:02 if you've got a string with a double quote in, that's the one time that black will use single quotes, which is neat.
22:08 >> Okay. That's good. Good to know.
22:10 >> I do like that.
22:12 If the sticking point is really just the quotes, then maybe try blue.
22:17 Blue is actually, I was worried there was going to be a fork of black.
22:21 It's not a fork. It includes black and it overwrites some of the functionality, and specifically, just a few things.
22:32 So the differences are it defaults to single quotes strings, except for places where we love double quotes, like doc strings and triple-quoted strings.
22:43 For some reason, those look weird with single quotes.
22:45 So I'm on board with that.
22:48 It defaults the line lengths to 79, and I don't really care because I always override that to 120 or something like that.
22:55 I like that black allows that overriding.
22:59 Then the other thing that I didn't even think about, which is nice, is one of the things Black does is takes the hash.
23:06 If you have hash comments on your right side of your code, you've got a block of them.
23:12 Maybe you're talking about an entire block of code, so you have a block of comments.
23:17 Black will remove the white space in front of the hash, whereas blue will leave those alone, so you can have block comments on the side.
23:25 That's really it. That's the only difference.
23:28 I think having this around is a neat thing.
23:32 Interesting quote from the doc is that they actually don't want to keep this project alive for very long.
23:38 They'd really like these to just be options in black.
23:40 [LAUGHTER]
23:41 >> Yeah.
23:42 >> I don't know how far they'll get.
23:43 >> I don't think that's going to happen.
23:45 I think black is pretty hardcore, they're very into not adding configuration where they can still avoid it.
23:54 >> Yeah. In researching this, one of the things I somehow missed about Black, maybe I haven't read the documentation in a long time, but a couple of years ago, it added the ability to have format off and format on.
24:08 One of the things, for instance, occasionally, not very often, occasionally I've got a large chunk of data set up in a list or dictionary or something, that I have them aligned with comma alignment, like an old style CSV table.
24:29 >> Or a 1980 C programmer.
24:32 >> Yeah, sure. But black totally tears that apart.
24:36 But for that, you can turn formatting off and I appreciate that.
24:40 >> That's cool. That's a good feature.
24:42 >> See, it does have a little bit of give.
24:46 >> Yeah, that's cool. Yeah, very good one.
24:50 >> What do we got next?
24:51 >> Okay. There's a link in the show that's this.
24:55 This is an article that somebody wrote about using Tesseract OCR to build yourself a searchable index of your screenshots.
25:05 And I got really excited about this because Tesseract is, like, Tesseract's been around since 1995, I think.
25:10 It started off at Hewlett-Packard, and it's pretty much still the leading light of OCR in the open source space, but I've never managed to get it to work, and I've always wanted OCR that I can just run.
25:22 And thanks to this article, I can actually use Tesseract now.
25:25 I've got a couple of demos here. Can we see this?
25:27 Yeah. I grabbed a screenshot just of the random slide from our conversation earlier, and I can run, let's see, I think it's tesseract screenshot.png.
25:37 I'll put it in a file called screenshot-.
25:39 You have to tell it the language that you're using because that affects how it does these things.
25:44 It's like 70 languages, I think.
25:46 I'm going to say I want that as a txt file, and you run it, and now if I cat screenshot.txt, This is the launch today, MongoDB 5.0.
25:56 This is the screenshot I took of our conversation earlier.
25:59 A better example even would be the I took a screenshot of Python documentation just now.
26:04 So I can run that same command except I'll do it against Python docs.png.
26:09 Python docs.png, I'll call it pscreenshot.
26:12 There we go. Now if I cat this, this is pretty decent OCR against the screenshot of a file of documentation.
26:21 The really fun thing though is that you can say you want it as a PDF file.
26:25 If you do that, it will give you a PDF which is visually identical to the screenshot but has selectable text on it.
26:31 You can copy and paste out of that PDF.
26:34 The chap whose article is linked in the notes, his trick is he has a folder on his computer that he saves screenshots to, and he has a automated script that then turns those screenshots into these annotated PDFs, which means that Spotlight on his Mac can now search them.
26:53 Anything that he drops into that folder, a few seconds later becomes available to global search on his computer.
26:59 I think that's a really neat trick.
27:01 >> I love it. That's great.
27:03 >> Yeah, there's so much stuff I want to do with this.
27:06 Yeah, it was Alexandru Nedolcu.
27:10 I don't know if I'm pronouncing that correctly, I wrote all of this up.
27:13 But yeah, you can install it with Homebrew, it's brew install tesseract.
27:18 there's actually a Python library called Py, I think it's called PyTesseract, which I thought was doing complicated things with C modules.
27:25 Actually, if you read the source, it's just shelling out to this command.
27:29 Apparently, that's the state of the art in Python OCR, is shell out to the Tesseract command line tool, which I'm perfectly happy to do.
27:38 >> I really like this. If you've got a bunch of image data and you want to be able to do interesting things with it, here's a really quick and easy way to do it.
27:48 >> Right. It's super simple.
27:49 This article also, I didn't know that you could use the Mac LaunchD, I think.
27:55 You can add a launch agent which automatically runs a script when a file is saved in a certain folder.
28:01 In this case, he's got a launch script that runs the TestRack OCR stuff, but this is great.
28:07 Now I can automate any folder on my Mac to do basically anything using the system that's built into the operating system that I didn't know how to use.
28:14 >> I didn't know you could do that either.
28:15 That's great. That's cool.
28:17 Yeah, that's awesome. I feel like this is right up your alley, Simon, you know, with the data set, the dog sheep and like, oh, here's this data we got from this, this automation, and yet I just can't dig into it. And now you can. I'm really excited about this. Although, so Apple Photos, the next version of macOS, Apple Photos is going to do OCR and all of your photographs for you.
28:38 So you can search for text in pictures that you've taken. And if it's anything like the the current version of OSX photos, all of that data is going to be stored in SQLite databases on your computer.
28:48 I've been having a huge amount of fun building things against my Apple Photos library because they already run machine learning labeling against your photos.
28:58 They know when you take a photo of a dog and they tag it with dog and the word dog is in a SQLite database on your computer.
29:04 Once you've figured that out, you can run SQL queries against photos you've taken and say, show me every photo I've taken of a dog that was in San Francisco in the month of May and you get results back, which is crazy interesting.
29:20 >> Yeah.
29:20 >> That's pretty cool.
29:21 >> Yeah, that's super cool.
29:23 I love the stuff that you're doing with that.
29:25 >> Is it just local or are they caching that in their own databases as well?
29:30 >> Well, they synchronize it all.
29:32 If you're using iCloud, your photos are synchronized up to their servers.
29:36 You take a photo on your phone, it shows up on your computer automatically, but all of it, the actual local data storage is all SQLite database files.
29:43 Apple are really big into SQLite.
29:46 Yeah, there are just these files littering your computer with your address book in there and all of your iMessages and all of your photo metadata.
29:53 It's just sat there waiting for you to dig in and play with it.
29:56 >> Nice. With dataset, probably.
29:59 >> Right. Yeah. I've got a script called, I'll add it to the show notes, I've got a script called DogSheepPhotos, which uploads your photos to your own S3 bucket so that you can actually link to them, embed them on web pages, and it extracts all of that SQLite data into a more usable format.
30:17 Yeah, I've got a online database of all of my photographs that I update every now and then with the script, and it works.
30:24 It's phenomenal what you can do with it.
30:26 >> Cool.
30:26 >> Out in the live scene, Brandon.
30:28 Hey, Brandon. Says, "This is fantastic.
30:30 Definitely excited." Also taking a step back to yours, Brian.
30:34 David Colton. Hey, David says, I'm using double quotes now in black, but my typing has not evolved yet to double quotes. So you just pass it through the single quote to double quote compiler process called black, and then you got it all adapted. That's nice.
30:48 I'd say like black has given me back, I estimate 5% of my program typing time used to be worrying about indentation and such like, and I got all of that back. Like thanks to black, I never even think about how I indent or style my code at all.
31:04 I just say, I'll literally write horrible run on lines that go on for ages, and then run black and it formats it nicely and I forget about it.
31:13 It's fantastic.
31:15 >> That's cool.
31:15 >> Yeah, great.
31:16 >> Got any extras for us, Michael?
31:18 >> You know I do. I always do.
31:20 Unless I have an extra, extra, extra, you're all about it, then I guess I still do.
31:25 We talked about strong typing last time, which lets you do cool stuff like go and put a decorator onto a function and say, well, this one, you know, if it has type annotations or type information like Python itself just does, if you put at match typing the decorator on there, it'll verify at runtime that you said it took an integer and you actually passed an integer, not a list or whatever to that parameter.
31:51 Right.
31:51 Well, Felix, who maintains this project, reached out and said, Hey, that actually does a whole lot more that you should check some other things out.
31:58 I just wanted to highlight a couple of things that he pointed out.
32:00 One, if we, you know, we're all familiar with the named tuple and you, you say the type name in a quote, and then you say the fields or the elements attributes in a list, either space or comma separated, like spell, mana, fact, and so on.
32:15 So this one has a typed name tuple where you can put the type information in very similar ways to what Python would have like colon, str, colon, list, and so on.
32:25 And then you get actual type runtime validation that your data going into your named tuple is actually the type of data you expect in your name tuple.
32:33 Oh, nice.
32:34 Does that mean?
32:35 Yeah.
32:36 Yeah.
32:36 So there's that.
32:37 And then also I love this about our show.
32:40 It's, it's kind of blows my mind that this, this is how the world works.
32:43 And I really appreciate this.
32:44 Everyone who plays along, we'll say things like, oh, I wish we could specify indexes in Beanie.
32:50 and then the next episode we're like, "Hey, look, Roman added a way to do indexes in Beanie." I said, "This is awesome that it applies to functions, but why couldn't it apply to classes?
33:01 It's basically the same thing." So now, six days ago, we have a new feature.
33:06 You could also apply strong typing to classes as well, or something like that. So well done.
33:14 >> Is it because you asked for it?
33:15 Because I asked for single quotes in black and I didn't get that.
33:20 Well, I mean, it also may depend on the size of the project.
33:25 The more input they get, the less influence any individual statement may have on it.
33:31 Right.
33:32 Yeah.
33:33 Anyway, I feel like thanks for working on that and the extra information there.
33:35 Yeah.
33:36 I actually, one other thing.
33:37 Yes.
33:38 I have finally, I've been working to make sure that we don't have to have one of these completely useless dreadful talks on technology.
33:48 Our site uses cookies.
33:50 our cookie policy. Do you accept our cookie policy or do you not accept our cookie policy?
33:54 AKA, would you like our website to work or would you like to go away? Like that's kind of what the button so often means. Right. and so I thought I removed all the analytics. I removed anything else that we might do in third be doing third party. We're good. And I went to Python bites and I'm like, wait, there's, there's double click. There's Facebook, there's Google. There's like, what is all this stuff? And we started including the live stream YouTube in bed and it And it started bringing back and I'm like, why would Google be putting in Facebook?
34:22 That sucks.
34:23 And there was also the Discus conversation stuff that people haven't really stopped using.
34:28 They all just go and chat on the YouTube streams now.
34:31 If they want to have a live comment type of thing.
34:33 So I'm like, well, I'll just take that out.
34:34 That got rid of the Facebook one.
34:37 And then, but what do you do about that?
34:39 So instead of embedding the YouTube player, I said, I'm going to figure out a way to get the picture automatically from YouTube, the poster.
34:48 And then when you hover over it, it just has a play icon.
34:50 It says play on YouTube and it opens up a new window.
34:52 And I thought I was all clever by just putting the image there, but serving it from Google.
34:57 No, there's now like the YouTube image servers putting tracking cookies on our side.
35:02 I'm like, well, come on, why is this so hard?
35:04 So now on the server, we use requests.
35:08 We download the image.
35:09 Anytime it has to be shown on a page, put it in MongoDB.
35:12 And then if you pull it, we serve it back out so we can like strip the cookies, the tracking cookies out.
35:17 And now, now when you look at the tracking content, none detected on the site.
35:23 But why, why world does it have to be so hard?
35:26 I just wouldn't put it like this.
35:27 Isn't it amazing how it used to be YouTube embeds were the absolute gold standard for embedding video on a webpage.
35:32 Like why would you do anything else?
35:34 And now actually I'm beginning to think, you know what, post the video, the .mp.mov file or whatever yourself and stick on an HTML5 video embed.
35:42 And that's probably a better experience for your users as well.
35:46 'cause when they click the video on their mobile phone, it'll play full screen and they won't have to hop through to the YouTube app and all of that kind of thing.
35:52 - Yeah, absolutely.
35:53 Yeah, so anyway, just quick shout out, like this is taking several passes, but I think it's finally 100% no tracking.
36:01 I mean, we weren't putting it there before, but like it was seeping in from just like what we might include on the page as content, right?
36:07 So anyway, there you have it, Brian.
36:09 That was my weekend.
36:10 How was yours? - Nice.
36:11 Well, thanks.
36:12 I appreciate you doing all that work for us.
36:15 >> Yeah. David Coles has the wash hands emoji.
36:19 There we go. We're all better.
36:20 >> Yeah. Well, I've got no extras.
36:22 Simon, do you have anything extra you want to share?
36:24 >> I've got one. So Textual is the, I'm Wil McGugan who's working on which has been building Textual, which I know you've talked about on the podcast before.
36:33 What I would encourage people to do is pay close attention because I've never seen a piece of open-source software developed this quickly.
36:40 Every day, he's posting this video where he's like, Here's the new feature where today he posted a video of it doing full tree view on a file system, which you could interact with with your mouse in the terminal.
36:53 When you clicked on the file, it would open it in a separate panel with syntax highlighting.
36:59 It's absolutely astonishing.
37:01 It's like turning into one of the better ways of building a GUI application, and it's running in text in the terminal.
37:07 >> We can almost have just a section of the show called, "What's Will Up To?" >> You really could, absolutely.
37:14 He's re-implemented CSS Grid, the CSS Grid mechanism for terminal applications.
37:21 It's brilliant. Yeah, I'm just having such great time watching him do all of this stuff.
37:25 >> He seems to be live streaming it.
37:28 >> I don't think so, but he posts little five-minute videos on Twitter every day of the stuff that he's doing.
37:33 >> But I feel inadequate watching him work this fast.
37:37 But just saying.
37:38 >> It's such a delight though.
37:39 It's like he was born to build this piece of software, and now he's building it and we all get to watch him do it.
37:45 >> Yeah, that's great.
37:46 >> Yeah. Henry Shriner, hey, out in the live stream says textual is amazing.
37:50 Indeed, it's quite something.
37:53 >> Yeah. I remember when he was trying to name it, and textual didn't even come up on my radar as something that might be possible, but it's so obvious now, like graphical and textual.
38:04 Yeah, it makes sense. It's cool.
38:06 Hey, how about a joke maybe?
38:08 Oh man, I got some jokes for us.
38:10 two jokes.
38:11 The one I'm not really sure how to convey it, but I guess I'll do my best.
38:16 I want you to sing.
38:17 No, man, this is you.
38:18 This is you, bro.
38:19 All right.
38:20 So first one here is, I could definitely do this one.
38:24 This one is, from John on Twitter, but pointed out to us by Nick Moore, who was previously on the show not too long ago.
38:30 Thanks, Nick.
38:30 And this one poses, I think also this is perfect for when Simon is on the show.
38:35 It says, what do you get when you select star from goblins, dragons, elves, and comma unicorns?
38:42 A query tail.
38:44 Oh my goodness.
38:45 It's close to a fairy tale, a query tale.
38:48 It's bad.
38:49 Terrible.
38:49 It's bad.
38:50 Oh, wow.
38:53 Well, I wanted to share one that people could actually share with their--
38:55 this isn't in the list, but one that I just read recently, people might be able to share with their kids.
39:02 In the Northwest, we've got Sasquatch, right?
39:05 So, yeah, what do they call big foot in Europe?
39:08 Big meter.
39:09 Oh, quick tip.
39:16 If you're ever near Santa Cruz in California, there is a big foot museum in a log cabin in the woods outside of Santa Cruz called the big foot discovery experience.
39:25 And it is not a joke.
39:26 It is very serious.
39:28 And there was a man there who will take you through all of his evidence for big, big foot.
39:31 And it takes about an hour.
39:33 He's got maps and plaster casts of feet footprints and a map with pins on it.
39:37 It's fascinating.
39:39 I could not recommend it more.
39:41 >> I wonder if the COVID pandemic has affected the Bigfoot population.
39:47 >> You should. You can call him up and ask him.
39:50 While I was talking to him, he got a phone call to answer questions about Bigfoot.
39:54 >> Oh, yeah. I can call him up.
39:55 >> He will answer your calls.
39:57 >> Yeah.
39:57 >> All right.
39:58 >> Hey, Brian, your joke got it grown all the way from Australia.
40:02 >> Nice.
40:04 >> Or was it mine? I'm not sure.
40:05 It could have been either, honestly.
40:07 >> Yeah.
40:07 >> I'm going to go with the meter one.
40:10 >> They were both pretty bad.
40:11 >> All right. I'll see what I can do with this next one here.
40:16 If you're good to the 90s, I guess is probably the time.
40:20 There's a pinky in the brain.
40:22 Apparently, on one of the 10 places I have to write your name, I typed it too quickly and wrote brain.
40:30 >> Yeah, and Brett Cannon caught it.
40:34 >> He did a take on Pinky and the Brain, and it starts out, "What do you want to do today, Brian?" >> Same thing we do every Wednesday, Michael, help Python take over the world.
40:46 >> It's Michael and the Brain.
40:48 Yes, Michael and the Brain.
40:50 One's into testing, others into GUIs.
40:52 They're both into making Python seem sane.
40:55 They're Michael, they're Michael, and the Brain, Brain, Brain.
40:58 >> Brain, Brain, Brain.
40:59 >> Yeah. Fantastic.
41:00 >> I love it. Thank you.
41:02 >> We need to have somebody that's got musical talent to actually put this together as something.
41:07 >> Yes. Someone who is not me because it won't come out well.
41:11 >> So with the lyrics in the show notes, I think we should leave them there.
41:16 >> We are accepting submissions.
41:18 >> Yes.
41:18 >> If they pass, we may actually play them on one of the next episodes.
41:23 >> I would love it.
41:24 >> Yeah.
41:24 >> Could be the new theme song, Brian.
41:26 >> Yeah.
41:26 >> The dawning of an era.
41:28 >> I'm getting tired of the old theme song.
41:30 Yeah, exactly. Which is no theme song.
41:32 All right. Well, thanks. Thanks a lot for showing up, Michael. And thanks, Simon.
41:39 Thanks for having me.
41:41 Yep. You bet. Bye, everyone.
41:42 Thanks for listening to Python Bytes. Follow the show on Twitter via @pythonbytes.
41:47 That's Python Bytes as in B-Y-T-E-S. Get the full show notes over at pythonbytes.fm.
41:53 If you have a news item we should cover, just visit pythonbytes.fm and click submit in the the nav bar, we're always on the lookout for sharing something cool.
42:00 If you want to join us for the live recording, just visit the website and click live stream to get notified of when our next episode goes live.
42:07 That's usually happening at noon Pacific on Wednesdays over at YouTube.
42:12 On behalf of myself and Brian Okken, this is Michael Kennedy.
42:15 Thank you for listening and sharing this podcast with your friends and colleagues.