#217: Use your cloud SSD for fast, cross-process caching

Published Tue, Jan 19, 2021, recorded Tue, Jan 19, 2021

Sponsored by Linode! pythonbytes.fm/linode

Special guest: Ogi Moore

Watch the live stream on YouTube.

Michael #1: diskcache

via Ian Maurer
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
The cloud-based computing of 2020 puts a premium on memory. Gigabytes of empty space is left on disks as processes vie for memory.
Among these processes is Memcached (and sometimes Redis) which is used as a cache.
Wouldn't it be nice to leverage empty disk space for caching?
Features:
Pure-Python
Fully Documented
Benchmark comparisons (alternatives, Django cache backends)
100% test coverage
Hours of stress testing
Performance matters
Django compatible API
Thread-safe and process-safe
Supports multiple eviction policies (LRU and LFU included)
Keys support "tag" metadata and eviction
Developed on Python 3.8
Tested on CPython 3.5, 3.6, 3.7, 3.8
Tested on Linux, Mac OS X, and Windows
Tested using Travis CI and AppVeyor CI

Brian #2: TOML is 1.0.0 now.

What does that mean for Python?
- Hopefully, some kind of toml parser will make it into Python core.
Any Python access to 1.0.0? Mixed
- Implementations and TOML version support page lists:
- pytomlpp supports 1.0.0-rc.3, which is a wrapper around C++ tomlplusplus, which does support 1.0.0. Confusing
- tomlkit supports 1.0.0-rc.1, so that’s promising
- toml supports 0.5.0, great name. It’d be cool if it would support 1.0.0
What’s different between 0.5.0 and 1.0.0?
- Unless I’m mistaken, not much: CHANGELOG
- 1.0.0-rc1
- Leading zeroes in exponent parts of floats are permitted.
- Allow raw tab characters in basic strings and multi-line basic strings.
- Allow heterogenous values in arrays.
- Other than that, lots of “Clarify …”, which I’m not sure how those all affect implementation.
I’d love to hear more from people who know more about this

Ogi #3: pyqtgraph

pyqtgraph - plotting library, for when you need fast/interactive plots
Uses qt5 (and soon qt6) bindings to generate plots within Qt applications
Fills a niche role, want easy mouse interactivity, running locally on a machine
Often used in engineering/scientific applications when looking at a lot of data, and wanting interactivity

Michael #4: Parler + Python = Insurrection in public

via Jim Kring and Mark Little
According to Wikipedia: Parler (/ˈpɑːrlər/) is an American alt-tech microblogging and social networking service. Parler has a significant user base of Donald Trump supporters, conservatives, conspiracy theorists, and right-wing extremists.
ArsTechnica article send in by Mark Little
Ars: Parler’s amateur coding could come back to haunt Capitol Hill rioters
Coding mess
- A key reason for her success: Parler’s site was a mess. Its public API used no authentication.
- When users deleted their posts, the site failed to remove the content and instead only added a delete flag to it.
- Oh, and each post carried a numerical ID that was incremented from the ID of the most recently published one.
- Another amateur mistake was Parler’s failure to scrub geolocations from images and videos posted online.
Some 80 terabytes of posts, 1M videos, many already deleted, preserved for posterity.
Catalog and Python pointed out by Shaun King.
See the catalog (maybe, it’s the ugly side of people).
The gist: https://gist.github.com/kylemcdonald/d8884da1a82ef50754ee49e0b6561071
Partially back online with Russian hosting service?

Brian #5: Best-of Web Development with Python

Suggested by Douglas Nichols
Cool list with nice icons
Covers
- Frameworks, HTTP Clients, Servers
- Auth tools, HTML Processing, URL utilities
- OpenAPI, GraphQL, Websocket
- RPC, Serverless, Content Management
- Web Testing, Web Forms, Markdown
- Third-party APIs
- Email, Web Scraping & Crawling, Monitoring
- Admin UI
- API Proxies
- Flask/FastAPI/Pyramid/Django Utilities
Nice to see lots of FastAPI projects:
- fastapi-sqlalchemy - Adds simple SQLAlchemy support to FastAPI.
- fastapi-plugins - FastAPI framework plugins.
- fastapi_contrib - Opinionated set of utilities on top of FastAPI.
- starlette_exporter - Prometheus exporter for Starlette and FastAPI.
- fastapi-utils - Reusable utilities for FastAPI.
- fastapi-code-generator - This code generator creates FastAPI app from an..
- slowapi - A rate limiter for Starlette and FastAPI.
- fastapi-versioning - api versioning for fastapi web applications.
- fastapi-react - Cookiecutter Template for FastAPI + React Projects. Using..
- fastapi_cache - FastAPI simple cache.

Ogi #6: Assorted

Pyjion - https://github.com/tonybaloney/Pyjion a JIT extension for CPython that compiles python code using .NET 5 CLR
CuPy - NumPy compatible multi-dimensional array on CUDA, uses _``*array_function_* (enabled with numpy 1.17) code using numpy to operate directly on CuPy arrays
- see NEP-18 and CuPy docs
- compatible with other libraries as well

Extras:

Michael:

Trying Firefox + Brave + VPN
Python Web Conf 2021 call for talks, due Jan 29, I’ll be speaking!
PyCon US 2021 launched call for proposals:
- December 22, 2020 — Call for proposals opened
- February 12, 2021 — Proposals are due
- March 16, 2021 — Notifications will be sent to presenters
- March 23, 2021 — Deadline for speakers to confirm participation
- March 30, 2021 — Schedule is publicly released
- April 28, 2021 — Deadline to submit pre-recorded presentation (tutorials will be live)
- May 12-13, 2021 — Tutorial days
- May 15-16, 2021 — Conference days
Apple launching Racial Equity and Justice Initiatives with partners across a broad range of industries and backgrounds — from students to teachers, developers to entrepreneurs, and community organizers to justice advocates

Brian:

PyCascades 2021 schedule https://2021.pycascades.com/program/schedule/

Ogi:

Anthony Explains Video Series
Learn X in Y minutes
Reading Working in Public by Nadia Eghbal - provides some sanity checks for existing maintainers, might be a fantastic perspective for new contributors to open source

Joke

Tech Support, 2x

Working at the help desk? Get the theme song:

Here to help song

And help by chat:

"Running a successful open source project is just Good Will Hunting in reverse, where you start out as a respected genius and end up being a janitor who gets into fights." - Byrne Hobart

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 217, recorded, what is it, January 19, 2021.

00:11 I'm Brian Okken.

00:12 I'm Michael Kennedy.

00:13 And I'm Ogie Moore.

00:14 Welcome. Thanks for joining us.

00:16 Thanks for having me.

00:17 Yeah, thanks for coming.

00:18 Who's first? Michael's first.

00:20 I'm first. You want to talk about caching?

00:22 I got some cool stuff to talk about with caching.

00:24 So I recently got a recommendation from Ian Maurer, who was talking about genetics and biology over on Talk Python, I think 154, so a while back.

00:35 But he pointed out this project called Python Disk Cache.

00:38 And it just seems like such a cool project to me.

00:42 So one of the big problems or not problems, one of the tradeoffs or the mix of resources we have to work with when we're running stuff in the cloud so often has to do with limited RAM, limited memory in that regard.

00:53 And limited CPU, but usually have a ton of disk space.

00:57 For example, on my server, I think I've got like using five gigs out of 25 gigs, but I've only got, you know, two or four gigs of RAM, right?

01:05 But one of the things you can do to make your code incredibly fast is to cache stuff that's expensive, right?

01:11 If you're going to do a complicated series of database queries, maybe just save the result and refresh it every so often or something like that, right?

01:18 Well, this library here is kind of the simplest version of one of these caches.

01:23 Like people often recommend Memcached.

01:24 They talk about Redis.

01:27 You might even store something in your database and then pull it back out.

01:30 And all those things are fine.

01:32 They just have extra complexity.

01:33 Now I have a separate database server to talk to if I didn't have one before.

01:37 I've got a Redis caching server.

01:38 Now I got to share.

01:39 What if you just use that extra hard disk space to make your app faster?

01:44 A lot of these cloud systems like Linode, for example, they have SSDs for the hard drive.

01:49 So if you store something and then read it back, it's going to be blazing fast, right?

01:52 So disk cache is all about allowing you to do, you know, put this thing in the cache and get it from the cache, but it actually stores it in the file system.

01:59 That's pretty cool, right?

02:00 Yeah.

02:01 Yeah.

02:01 So it's super easy to use.

02:03 You can just come up here and say, import disk cache.

02:06 Just to get an item, I just say cache, like a dictionary, basically.

02:10 And to put it back, same thing.

02:11 You give it a key and a value.

02:12 It's basically like a dictionary.

02:14 But it persists across runs.

02:15 It's multi-threaded, multi-process safe, and all those kinds of things.

02:20 So incredibly, incredibly cool.

02:22 It's pure Python.

02:22 It runs in process.

02:24 So there's not like a server to manage.

02:25 It has 100% test coverage, hours of stress testing.

02:28 It's focused on performance.

02:30 And it actually, Django has a built-in caching API in Django.

02:34 And you can plug this into Django.

02:36 So when people say cache with my thing, even third-party apps and stuff, you can automatically start using this, which is pretty awesome.

02:43 It has support for eviction.

02:44 So most recently used, first and so on.

02:48 You can tag things and say these can get evicted sooner and whatnot.

02:53 So really, really nice.

02:54 Incredibly easy to use.

02:55 I definitely recommend people check it out because very nice.

02:59 It has different kinds of data structures that you can work with, like a fan-out cache, a Django cache, a regular cache, and so on.

03:05 So if you want to work with some code and it's possibly going to run in multiple processes or it's going to start and then restart,

03:12 start and stop and then run again, and you wanted to not have to recompute everything, this cache.

03:17 Are evictions on hold for 2020?

03:20 Yeah.

03:21 Well, because of COVID, you're going to need more disk bait.

03:24 No, I'm just kidding.

03:24 No, this looks cool.

03:27 So one of the things I was confused about is it's called the cat disk cache, but what's the difference between that and just like a key value store database?

03:36 Well, the key value store database in practice would be no different.

03:41 Okay.

03:42 But you have a separate server.

03:45 Like there is a server process that runs somewhere that you have to have like a connection string and stuff to that you talk to it in this way.

03:52 This is like I have a file.

03:54 I use the same API to talk to it.

03:56 So instead of having another server to manage another place to run it, you just say like, let me just put it on the SSD.

04:01 And that's probably quite fast.

04:03 Cool.

04:03 And then we got a quick question here.

04:05 Brandon asked, do they talk about any way to scale this out, say multiple servers behind a load balancer?

04:11 I did not see anything.

04:12 I'm pretty sure as far as I can tell that it's local, just like sort of a per machine type of thing.

04:20 Not a, but it does go across processes, but it doesn't, I haven't seen anything talking about multiple machine.

04:24 I guess you could set up a, like a microservice, but at that point you might as well just have Redis.

04:28 Yeah.

04:29 Yeah.

04:29 Redis is kind of on my list of things to try here pretty soon too.

04:33 Yeah, absolutely.

04:34 Another thing that I want to check out is, is some of the, well, I like Toml lately.

04:41 Yeah.

04:41 How about you?

04:42 Toml's great.

04:43 I heard that it reached 1.0.

04:45 Yeah.

04:45 So it is, it's at 1.0 now.

04:48 And, and I think that they were kind of headed there anyway.

04:52 So I was looking through the change log.

04:54 Looks like they had several release candidates and, and I'm anyway, we'll talk about it a little bit.

05:01 So it's, it's at 1.0 now.

05:03 I mean, a lot of us don't really understand.

05:06 It's maybe I'm speaking for myself.

05:08 Don't really get what, what, what all the specification means.

05:11 I just use it.

05:12 It just works.

05:13 It's easy.

05:13 And, and one of the things I use it for is the pyproject.toml file.

05:18 It's mostly what I use it for.

05:20 But, but pyproject.toml is taking off and this is at 1.0.

05:25 So what does this mean?

05:26 I'm hoping that this means that we have like a Python package built into the Python that parses

05:33 Toml.

05:33 Yeah.

05:33 Now the language is stable, right?

05:35 Yeah.

05:35 Maybe it means I need to learn more about Toml.

05:37 Maybe.

05:38 But I think there's talk about it.

05:41 I'm not sure what the state of it is.

05:42 Maybe we could get Brett or somebody to talk about it.

05:46 But in the meantime, if you want to play with 1.0 with Python, there's, I think there might

05:53 be limited choices.

05:53 So I went out and looked.

05:55 There's a page on the project page that, that shows it's like down at the bottom.

06:01 It shows the different projects, the implement that implement the various versions of Toml.

06:07 And there's one project.

06:09 So there's a C++ project that there are handful C++ that support the 1.0.0, the most

06:16 recent version of Toml.

06:18 And then various support levels for different, for different other things.

06:23 There's a, there's a 1.0.0 release candidate one that's supported by TomlKit.

06:28 So TomlKit is a Python project that looks, and I think that that might be sufficient to

06:33 try out most of the features, the new features.

06:36 And then, then there's the, what I would think of is just the Toml project in Python.

06:42 That one's only, it supports 0.5.0.

06:46 So I'm not sure what's going on there.

06:48 It'd be great if it would support the latest.

06:51 But then I'm like, what does that mean?

06:52 What is, what's different between 0.5.0 and 1.0?

06:56 And so I went and looked at the changelog.

06:57 There's, there's three things that jump out that look like they're new, really changes.

07:02 One of them is leading zeros in exponent parts of floats are permitted.

07:07 So, okay.

07:09 Then allowing raw character tabs in basic strings and multi-line basic strings.

07:15 That seems reasonable.

07:16 And then the difficult one might be allowing heterogeneous values in arrays, which that's

07:24 cool.

07:24 And I'm, yeah.

07:25 So apparently it wasn't there before.

07:27 Yeah.

07:27 But none of those seem like super common stuff.

07:30 That's going to be a big breaking change.

07:32 Like, oh, well, of course we use heterogeneous types in here.

07:34 Like we're just going to mix it up and random stuff in our array, right?

07:37 It seems like it's, it's probably still the built-in or the pure Python one is probably

07:42 decent still.

07:42 Right.

07:43 And I need the, I guess there's a whole bunch of these that are listed as clarify, like

07:49 clarify it, but it is a specification.

07:51 So clarify might be very important, but I'm not sure how important that is.

07:55 It probably affects the implementation, but I'm putting this out because I'd like to hear

08:00 from people that know more than I do about this and how this affects Python.

08:04 And then if we should care about it.

08:06 Yeah.

08:06 Yeah, for sure.

08:06 That's very cool.

08:07 Let's see it coming along and it definitely lends some support to the whole Pi project,

08:11 Toml stuff.

08:11 Yeah.

08:12 Yeah.

08:12 Hey, before we move on to Augie's first topic, Martin Boris asked, I was wondering, is this

08:17 disc cache thing I mentioned, is it a simple way to share data between UVicorn and Gunicorn

08:22 workers?

08:23 Yes, exactly.

08:24 That's exactly why it matters because it goes across the worker processes or across worker process

08:29 in general, across multi-processes and a consequence of multiple worker processes.

08:34 Because normally you would either cache in like process memory.

08:36 So you've got to do it like 10 times.

08:37 You've got it all fanned out, different processes running.

08:40 So this will solve that for sure.

08:42 And then one for you, Brian, for Magnus Carlson.

08:44 Yeah.

08:45 Does, what is that?

08:46 Does PEP 621.

08:48 The Toml spec, whatever the PEP is for that.

08:50 Specify the version of Toml to use.

08:52 I don't know.

08:53 I'll have to ask Brett about that too.

08:54 Yeah.

08:55 I don't know either.

08:55 Sorry.

08:56 All right, Augie.

08:56 What you got?

08:57 Well, I'm here.

08:59 Well, thank you for inviting me again.

09:01 This is actually, you have two consecutive weeks of hosting mechanical engineers as your

09:06 guest on the podcast.

09:06 Why not?

09:07 So thanks for being inclusive.

09:11 But I wanted to talk about PyQtGraph, which is not new, but it's a...

09:18 Yeah, people maybe don't know though, so tell them about it.

09:19 Yeah, absolutely.

09:20 So PyQtGraph is a plotting library, but it's a little different from the likes of Matplotlib

09:27 and on the variance or derivatives from that or a bouquet.

09:31 PyQtGraph uses the Qt framework, and it's meant for embedding interactive plots within

09:38 GUI applications.

09:40 And as a consequence of using the Qt, you can actually get some really high performance

09:48 out of it, which Matplotlib is absolutely phenomenal for generating plots for publications

09:54 or for static media on websites.

09:58 But the moment you try and do anything like with mouse interactions, you might be in for

10:01 a bit of a tough time.

10:04 With this, you're running on like native with Qt, you're running natively on the OS, right?

10:10 Absolutely.

10:10 Yeah, you're running...

10:11 Yeah, there's no client-server relationship like you would get with a bouquet, which you

10:16 might need in some certain situations.

10:18 But anyway, so part of the PyQtGraph library is...

10:23 Which, you know, I guess I should identify that I am a maintainer of.

10:28 But is that we actually bundle an example application.

10:32 So if you're ever curious about the library and its capabilities, you know, and don't feel

10:37 like reading through dozens of pages of documentation, you can just run this example app, which I have

10:41 on the screen share.

10:42 And it shows you the list of various...

10:44 And this comes with PyQtGraph, right?

10:46 Yes.

10:46 Yeah.

10:47 It's bundled in the library.

10:48 So if you put in SolpyQtGraph, you get this.

10:50 And here's some of the basic, you know, plots.

10:54 But, you know, and as you can see, you get our mouse interactivity going and, you know, we

10:59 can do zoom behavior.

11:00 Nice.

11:01 And, but what's really cool about this library is that example here, basic plotting, is generating

11:07 during...

11:07 With this code right here.

11:09 All those plots was in this...

11:11 Oh, I can't tell how many lines.

11:12 Maybe 70 lines total.

11:14 Yeah.

11:14 But anyway, you can, within this editor here, you can change any of the code and experiment

11:19 with yourself.

11:20 And here on the tab, you see all these different items, you know, it does 2D.

11:23 We have some 3D capability, which you need the PyOpenGL library for.

11:27 Another, this one is just maybe a dozen lines of code, but you have a couple plots here.

11:34 And then just with the mouse interactivity, right, we can subselect or here you can get our

11:38 crosshairs and get information about what's the data points underneath the mouse.

11:42 So for an analysis tool, it is really, really, it can be incredibly powerful.

11:48 And if you're generating tools for any kind of engineering or scientific analysis where

11:54 you want like the user to be able to interact with the data in some way, you know, zoom in,

11:58 zoom out, things like that, or PyQD graph might be a really good option for you.

12:03 Yeah, absolutely.

12:04 Can you run the basic plotting thing one real quick?

12:06 Oh yeah, of course.

12:09 So when I was looking at this, the thing that stood out to me was while it looks like the

12:14 graphs are beautiful and they look good.

12:15 You know, the first couple that it's like, I could probably do that in Bokeh or PlotLayer,

12:20 you know, MapPlotLib.

12:21 So something like that, right?

12:22 But the nice interaction between multiple graphs as you zoom in one, the other goes in, or that

12:27 super high frequency yellow one that's people listening.

12:31 It's like refreshing, you know, many, many times a second, right?

12:34 Getting high frame rates out of those like Jupyter notebooks sounds tricky.

12:39 Yeah.

12:39 And I'm actually really glad you brought up high frame rates.

12:42 I'm actually on the verge of merging a pull request to integrate a Coupy support, which is

12:49 the CUDA number arrays or some of the image data.

12:54 And on some of our benchmarks were showing being able to go from, you know, maybe 20 frames

12:59 per second of images up to over 150 frames per second, which, you know, at that point,

13:04 you know, monitors can't keep up, but you know, you lessen the CPU load substantially.

13:08 Yeah, that's fantastic.

13:09 We got a comment question from the Anthony Shah.

13:14 I use the built-in grapher app in macOS.

13:19 I do not know what the built-in grapher app is.

13:22 So I am, I'm afraid I don't know how to answer that.

13:27 You don't know if it can replace it or not.

13:29 I don't know either, but yeah.

13:30 But PyQt graph, it has a couple dependencies.

13:35 You need some Qt bindings.

13:36 And right now we support Qt5, 5.12 and newer.

13:41 Up until very recently, PyQt graph supported like virtually any Qt bindings you could install,

13:47 like even going back a decade, which eventually I had to put an ax to that.

13:50 That was just too much work.

13:51 And so we support Qt5.12 or newer.

13:58 We don't support Qt6 yet, although there is a pull request in to add support for PySide6,

14:05 which was discussed on the show just two weeks ago.

14:07 It just came out, right?

14:08 Right.

14:09 It just came out, which I'm really thankful for contributors that are submitting these pull

14:15 requests.

14:15 I often feel bad.

14:17 I can't keep up with the rate that they're coming in, but it's still appreciated.

14:21 Are you looking for contributors to the project?

14:24 Yeah, absolutely.

14:25 And not just contributors to the code, but also people that are willing to look over a pull

14:32 request or willing to test out pull request mainly with the plotting library.

14:36 Sometimes testing can be really difficult because like visual artifacts, like how do I test for

14:43 that?

14:43 Right.

14:45 And so sometimes a lot of a big chunk of our testing is, well, does this break or does this look

14:50 right?

14:51 And being able to, you know, having somebody else, you know, verify that kind of stuff or is a really

14:59 big help.

15:00 So if you're interested in this and feel free to reach out to me directly or take a look at our issue

15:10 tracker or pull request tracker.

15:12 Yeah.

15:13 No.

15:13 And I guess the last thing I should say is it's, it's primarily used in scientific and

15:16 engineering applications.

15:18 It's a periodically I go through to get log and I look at like the email addresses that

15:23 people are contributing to and, you know, NASA Ames Research Center and a bunch of places

15:28 like that.

15:29 But, but, but yeah, no, I get a kick out of that.

15:33 Yeah.

15:33 That's super, super cool.

15:34 Nice.

15:34 Thanks for sharing that and good work on it.

15:36 Well, I, another cool thing is Linode and they're sponsoring this episode.

15:40 Thank you, Linode.

15:41 Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines.

15:46 Develop, deploy, and scale your modern applications faster and easier.

15:50 Whether you're developing a personal project or managing larger workloads, you deserve simple,

15:55 affordable, and accessible cloud computing solutions.

15:58 As listeners of Python Bytes, you get a $100 free credit.

16:02 You can find all about the, those details at pythonbytes.fm/Linode.

16:07 Linode also has data centers around the world with this same simple and consistent

16:11 pricing, regardless of location.

16:14 Choose the data center nearest to your users.

16:16 You also receive 24 seven, 365 day human support with no tiers or handoffs, regardless of your

16:24 plan size.

16:25 You can choose shared and dedicated compute instances, or you can use your $100 credit

16:30 on S3 compatible object storage, managed Kubernetes and more.

16:34 If it run on, if it runs on Linux, it runs on Linode.

16:37 Visit pythonbytes.fm Linode slash Linode and click on the create free account, free account

16:43 button to get started.

16:45 Awesome.

16:45 Thanks for supporting the show, Linode.

16:47 Okay, Brian, I want to cover something that comes to us from two listeners.

16:50 This comes from Jim Kring, who pointed out some really interesting aspects, how Python is being

16:59 used in this whole parlor, social media kerfuffle.

17:02 And a great article by my good friend and fellow Portlander, Mark Little.

17:07 So let's go over the article first.

17:09 So you guys heard there was basically an attempt to overthrow the US government.

17:13 You guys hear that?

17:14 That was lovely.

17:15 What idiots.

17:16 So a lot of the people who were there got kicked off of, you know, official social media and

17:22 they went to this site called Parler.

17:23 So Parler, according to Wikipedia, is an American alt tech microblogging and social media networking

17:28 service.

17:29 And it has a significant user base of Donald Trump supporters, conservatives, conservatives,

17:33 conspiracy theorists, and right-wing extremists.

17:36 Not my words, that's Wikipedia.

17:38 So a lot of the people who stormed the Capitol tried to get into Congress and stop the counting

17:45 of the votes.

17:45 They decided to live blog it on their personal accounts.

17:49 But a lot of them were no longer on Twitter and whatnot, although some were.

17:54 So they were on Parler.

17:56 And they probably came to realize, you know, it's probably not a good idea of showing me

18:01 charging into the Capitol as like hundreds of people are being arrested and charged with

18:05 federal crimes, right?

18:06 At the same time, Parler was getting kicked off of Apple's App Store or the iOS.

18:11 They were getting kicked off of the Google Play Store.

18:14 They were getting banned in a lot of places.

18:15 So there was this hacker is not the right person, the sort of data savior person, I guess you could

18:21 say, who came along and realized it would be great if we could download all of that content

18:26 and save it and hand it over to journalists at, say, like ProPublica, hand it over to the

18:31 FBI and so on.

18:32 It turns out it wasn't very hard to do.

18:34 There was a couple of things.

18:36 If you look through the Ars Technica article about how the code behind Parler was a coding

18:42 mess.

18:43 And I've tried to figure out what technology was used to implement it.

18:46 I just couldn't find that anywhere.

18:47 Anyway, it says the reason this woman was so successful at grabbing all this data, which

18:53 she got like 1 million videos and a whole bunch of pictures.

18:56 There's a whole host of mistakes.

18:59 So the public API for it used no authentication.

19:02 Let me rephrase that.

19:04 Restate that.

19:05 The public API used zero authentication.

19:07 No rate limiting, nothing.

19:08 Just, yeah, sure.

19:10 We'll just go ahead.

19:10 There you go.

19:11 You have it all.

19:13 Secondly, when a user deleted their post, the site didn't remove it.

19:17 It just flagged it as deleted.

19:18 So it would show up in the feed, which in and of itself is not necessarily bad.

19:22 But you pair that with every post was an auto incrementing ID, which meant you could just enumerate.

19:28 You're like, oh, I'm on post 500.

19:30 Well, let's see what 501 is.

19:31 It doesn't matter if it's deleted.

19:32 Give me that.

19:33 That's crazy, right?

19:34 So she wrote a script in Python to go download it.

19:40 And you can actually see, like, here's all the videos and all the stuff and their IDs and whatnot.

19:45 And in here, this is the one that Jim sent over.

19:48 If you look, there's a gist here that shows you how do you download a video from Parler.

19:53 Let's go down and find, is it here?

19:55 No, maybe it's not there.

19:56 I think it might be back.

19:57 There's a part where it shows the how do you download it with Python and so on.

20:02 So you just go through and, like, you know, screen scrape it traditional Python right there.

20:06 So apparently Python was used to free and capture all of this.

20:10 Oh, another thing that they did in Parler that made it easy to get was when you upload videos and images to places like Twitter,

20:18 they'll auto strip the XIF, like the geolocation and whatnot from the images.

20:22 No, they don't need it.

20:23 Just post it.

20:24 All right.

20:24 So, like, geolocation, camera name, all that kind of stuff is all in there.

20:29 So there's just a bunch of badness.

20:32 They've been since kicked off of AWS because, you know, crimes.

20:37 And now they're apparently trying to get hosted in a server in Russia.

20:42 Is that right, Augie?

20:43 Yeah.

20:44 There was a...

20:45 Actually, I think there's an article on Ars Technica that went up this morning that they're somewhat partially online on some Russian infrastructure, which...

20:53 Yeah, they're only partially online.

20:54 Because I looked and it...

20:56 They're like...

20:57 It says something like, well, we're trying to come back.

20:59 Here's a couple of posts.

21:01 It's not...

21:02 Yeah, it's not all the way back, right?

21:04 They're experiencing technical difficulties, as in the world hates them and is trying to make them go away.

21:10 So I'm not here to try to make this a political statement or anything like that.

21:14 That's not why I covered the story.

21:15 I covered it because I thought it's very interesting, both the security side and how people were able to leverage Python to sort of grab this stuff before it's gone.

21:23 Some of the journalists were asking, like, is there a more accessible way to get the data?

21:27 They're like, yes, we're going to build...

21:29 The woman who got it is like, we're going to build some better way for you to get it.

21:32 But right now, it's like, I had to run into the burning building and grab the files before they were gone.

21:36 Yeah, the other thing I sort of want to point out about this story is it's not like Parler was lacking funding to develop these tools.

21:43 They had, from what I understand, they had significant financial backing.

21:47 Yeah.

21:48 And whether they did not have the technical expertise, the time, I don't know.

21:52 But I'm really curious as more fallout comes from this, you know.

21:57 Yeah.

21:58 There's going to be some good stories from a technical standpoint on here.

22:01 Absolutely.

22:02 Well, pretty insane.

22:04 All right, Brian.

22:05 Let's move on to something more Devy developer.

22:08 Web Devy.

22:09 Well, you know, maybe if you want to scrape the web or something else.

22:12 Absolutely.

22:14 Yeah.

22:14 We've got a suggestion from Douglas Nichols.

22:18 Thanks, Douglas.

22:19 Best of the web development with Python.

22:21 So we've seen...

22:22 I would put Parler not in that list.

22:24 Yeah.

22:24 So we've seen best of lists like this before.

22:30 I'm kind of a fan of them.

22:31 Yeah.

22:32 But one of the things I liked about this is the icons are nice.

22:36 So there's a whole bunch of different icons that are used to help, you know, you can see the likes or the lows and stuff of different projects.

22:45 And then there's icons for...

22:46 You can search for Flask projects or things like that.

22:49 That's nice.

22:50 But it's a pretty big comprehensive list.

22:53 We've got web frameworks, HTTP clients, servers, authorization tools, URL utilities, OpenAPI, GraphQL, which is nice to see.

23:02 There's even web testing and markdown listed, how to access third-party APIs.

23:09 But then near the end, I really liked seeing there's a bunch of utilities sections.

23:14 So there's Flask utilities and FastAPI and Pyramid and Django utilities, which are really neat.

23:20 And what I really was pleased to see was that even though FastAPI is, what, a couple years old now?

23:26 There's a whole bunch of FastAPI projects that are there to make FastAPI easier, like using SQLAlchemy or, you know, coming up with a contributions thing or...

23:39 Yeah, fantastic.

23:39 ...different React, how to use React with it, things like that.

23:43 Yeah.

23:43 So, yeah.

23:44 Nice if you're trying to check out, want to look at different tools that are available for web development with Python.

23:50 This might be a good place to peruse.

23:52 I feel like that's one of the big challenges in general, you know, with people coming into Python or getting into a new framework.

23:59 It's like there's 500 libraries to do a thing.

24:02 Yes.

24:03 Which one should I use?

24:04 Not, can I find a library?

24:05 But there's too many, right?

24:06 Yeah.

24:07 So do you have a suggestion for that?

24:09 Well, I think these awesome lists are super good, right?

24:11 Yeah.

24:11 Because they're somewhat vetted and whatnot.

24:13 I recommend.

24:15 So, like, for instance, if I was building a...

24:19 Well, it's harder now.

24:20 But if I was building something new with a web development or web interface or something, and I didn't have...

24:24 Like, which framework to pick is, like, one of the starter things.

24:27 Yeah.

24:27 It's the people I have around me as resources.

24:31 So I know that you know about Pyramid, but you're also barely knowledgeable about FastAPI.

24:38 Yeah.

24:39 And I know some people that are Django-friendly and know quite a bit about Django.

24:44 So if you've got a couple of friends that already know one of these big hitters, I would go with that so that you can ask them questions.

24:53 Well, maybe even you don't pick the same thing, but you could ask, like, you chose this one.

24:58 Tell me.

24:58 You looked at a lot of the other ones.

24:59 Why did you pick that?

25:00 Yeah.

25:01 Oh, yeah.

25:01 Yeah.

25:02 That's a good idea.

25:02 Yeah, for sure.

25:03 Like, maybe FastAPI makes sense for me.

25:05 It doesn't make sense for you.

25:06 But you can then see why it made sense for me and not for you or whatever.

25:09 Yeah.

25:10 Yeah.

25:10 Absolutely.

25:11 All right.

25:12 I up now?

25:13 Yeah, you're up.

25:14 So Mr. Shaw being in the audience here was a bit of a surprise.

25:19 One of the things I wanted to talk about is, I'm going to butcher this.

25:24 I apologize.

25:25 Pigeon.

25:26 Pigeon.

25:27 I think it's Pigeon.

25:28 Pigeon.

25:29 Oh, that's you.

25:29 Oh, my goodness.

25:30 Okay.

25:31 Yes.

25:32 What a wonderful name.

25:33 And I've been fascinated by this.

25:38 So what Pigeon is, this feels so awkward to talk about somebody else's project when they're

25:46 in the audience here.

25:47 That's it.

25:47 It's a legit extension of CPython.

25:49 That compiles Python code using the .NET 5 CLR.

25:53 And what's been fascinating to me about this is this is like a whole area of software that

26:02 I have absolutely no experience with.

26:04 Like, I know nothing about, but I've been following what Anthony's been talking about on Twitter

26:11 about it.

26:11 And he's been explaining what he's doing along the way in these Twitter-sized increments that

26:20 I feel like I'm able to follow along with the intent.

26:24 And I found this project absolutely fascinating.

26:26 And I'm seeing the rates of improvement over time.

26:31 And I've just been absolutely blown away.

26:32 And so I think this has been absolutely amazing.

26:37 And I really hope that...

26:39 I'm really curious.

26:40 So one of the benchmarks that Anthony's been using is his own Python implementation of the

26:45 end body problem, which is sort of funny that's come up because I've been wanting to do an

26:51 end body plotting example in PyQD graph.

26:54 So now, of course, this has been sort of on my to-do for some time.

26:59 So now I'm curious if I should even attempt to, or if it's even remotely possible to try and

27:05 integrate those functionalities together.

27:07 Yeah.

27:08 That's cool.

27:09 And go ahead.

27:11 No, no, go ahead.

27:12 Oh, sorry.

27:12 The other things that I've...

27:15 This is...

27:16 The other thing that I've recently used for some extension or made use of, but is not particularly

27:22 new, is the NumPy's underscore or dunder array function functionality, which is specified

27:31 in nep18.

27:32 And what that allows for is using NumPy methods on not necessarily NumPy arrays.

27:42 So, for example, with Kupy, you can use the NumPy methods that would operate on an ND array,

27:49 but use it on a Kupy array.

27:53 And this is not limited to Kupy.

27:55 There's other libraries that offer this functionality, too.

27:59 But this makes it so much easier to integrate various libraries together with really having

28:08 minimal code impact and having near-identical APIs.

28:12 And earlier, I was talking about the pull request for giving Kupy support into PyQD Graph and

28:18 this functionality, which was implemented in Kupy, but it's made the integration so much easier.

28:23 Nice, because you guys are already implemented with NumPy, and it's just like, we're just going

28:27 to go through this layer, basically.

28:28 Yeah.

28:29 I mean, there's some other gotchas that you have to have with handing stuff off to the

28:33 GPU and stuff like that.

28:34 But yeah, no, that's...

28:35 But the actual size of the diff was not that big, you know, for what you would think.

28:41 Well, and you think what it means to run on a CPU or run on a GPU.

28:45 Like, that's a very different whole set of computing and assumptions and environments and so on.

28:53 And to make that a very small merge is crazy.

28:56 Right.

28:57 Yeah.

28:57 No, it's fantastic.

29:00 Yeah.

29:00 As I said, it's nothing new.

29:02 This functionality has existed.

29:03 It's been enabled by default in NumPy since version 1.17, which I believe is almost coming

29:10 up on two years old now.

29:11 But this is the first time I've made use of this functionality or been impacted by this

29:16 functionality directly.

29:17 And I'm so appreciative of it.

29:19 Yeah.

29:20 Fantastic.

29:20 And that's super cool.

29:22 I've not really found a reason for me to work with Kupy or anything like that.

29:26 But I'm just really excited about the possibilities for people for who it does matter, you know?

29:31 Yeah.

29:31 Absolutely.

29:31 Yeah.

29:33 I actually, I always, every time I hear about it, I write a note down and say, oh, I got to

29:37 check this out.

29:37 Looks neat.

29:38 Absolutely.

29:40 Well, there we go.

29:41 There's our six, six items.

29:43 Do you have anything extra for us, Michael?

29:45 This all must be an extra, extra, extra, extra.

29:48 You're all about it.

29:49 So I'm just going to throw a few things out really quick.

29:52 One, I got my new M1 not long ago and actually had to send in my old laptop.

29:57 Its battery was dying.

29:58 Its motherboard is dying, all sorts of things.

30:00 So I had to put it in a box and send it away.

30:02 I'm like, I don't really want to put my data in here.

30:05 So I just formatted that as well.

30:06 So now I have two brand new computers.

30:08 I'm trying to think like, all right, what kind of getting bugged by how much spying, monitoring,

30:13 observation, all these different companies are doing.

30:15 So I've started running just Firefox, but also, you know, when things, a lot of times,

30:20 like for example, StreamYard, I can't use a green screen on Firefox.

30:22 I have to use Chrome, it says.

30:24 I'm like, I don't really want to use Chrome, but I want a green screen.

30:27 So here I am.

30:28 So I've started using Brave.

30:29 Whenever something says I have to have Chrome, I started using Brave, which is a more privacy

30:33 focused browser.

30:34 So I thought that was interesting.

30:35 And just turning on a VPN like all the time just to limit people observing, not that I really

30:41 need to keep anything super secret.

30:43 Two conferences are coming out with calls for proposals that are due quite soon.

30:48 So the Python web conf has got some calls for proposal.

30:52 The conference is actually March 24th.

30:56 That order is not quite right, is it?

30:59 22nd to 26th.

31:01 If you look at their site, like the days that it's on are like sort of not in order.

31:06 Anyway, end of March, there's a cool online conference.

31:09 They did this last year, Six Feet Up did, and they're doing it again.

31:13 And this year, I'm actually speaking here.

31:15 Brian, are you speaking there?

31:16 The web conf?

31:17 Yeah.

31:17 No.

31:17 Well, there's a call for paper, so you could be.

31:19 You too, Augie.

31:21 Yeah.

31:22 And I think they expanded it out to be like five days or something.

31:24 So there'll be a lot of content, which is very cool.

31:26 So I'll be giving a talk on Python memory deep dive there, I believe.

31:30 And then the big one, PyCon.

31:32 PyCon is virtual again this year, but the call for proposals has gone out and is due February

31:38 12th.

31:38 So if you want to be part of PyCon, get out there and send something in.

31:42 Are you going to submit something?

31:44 I will probably do it.

31:45 Yeah.

31:45 It means I got more work to do.

31:47 But yeah, I think I'll do it.

31:50 You got any plans?

31:50 I'll probably submit something.

31:52 Maybe three, four, five, six, seven, eight, nine, ten proposals.

31:56 The more you submit, the better chances you got.

31:59 Augie, you can submit to others.

32:00 There's talk amongst us, PyQT graph maintainers, about doing a tutorial session at SciPy.

32:07 So I might, I know that's not listed here, but we're considering doing that, which SciPy

32:13 is also virtual this year.

32:14 That makes a lot of sense.

32:15 Yeah, that's cool.

32:15 Awesome.

32:16 Then final, hear, hear, hear all about it, extra stuff is Apple is launching a racial

32:21 equity and justice initiative, which I think is pretty cool.

32:24 Basically, they're setting up centers to teach programming and other entrepreneurship skills

32:30 in underserved communities.

32:32 Right.

32:33 And I know there's, again, more, a lot of political stuff around all this, but to me, I would just

32:36 love to be in a world where I look around the community and it looks representative of

32:40 everybody.

32:41 Right.

32:41 I think people feel included.

32:43 Like tech is such a wonderful space.

32:44 I think this is a cool initiative.

32:47 Obviously it could be, hopefully they deliver it in the right way.

32:50 It's not just like, we're going to teach everyone how to build iPhone apps.

32:52 That's what the world is.

32:53 Right.

32:54 You know, it's a more broad sort of conversation.

32:56 I could go, go any which way.

32:58 And I, hopefully it's just a start.

32:59 Like if you look, they're saying they're donating a hundred million dollars to this cause, which

33:03 is a lot of money, but it's also only eight hours of profit to Apple.

33:06 So yeah, it's got room to grow, I suppose.

33:09 Anyway, I just want to give a shout out to that as well.

33:12 That seemed pretty cool.

33:13 Hi Brian.

33:13 How about you?

33:14 More conference stuff?

33:15 Well, PyCascades is, actually, I don't remember when it is, but.

33:19 February possibly.

33:20 February probably.

33:21 Yep.

33:22 February 20th it starts.

33:24 And there, there is, the schedule's up.

33:27 So I wanted to announce the schedules there so you can check it out.

33:30 There's still tickets available and you can see what's going to happen.

33:34 I really had, I had fun at the in-person PyCascades and I think they did a good job for the online

33:40 one in 2020.

33:41 So, and we're going to be there.

33:43 Yeah, we are.

33:44 We're on a panel.

33:45 Yeah.

33:45 Along with Ollie Spittle.

33:46 Yeah.

33:47 Should be fun.

33:48 But there's, definitely be fun.

33:49 About podcasting.

33:50 There's like another panel about writing technical books that looks good.

33:55 There's a bunch of cool talks that I'm looking forward to seeing.

33:58 Yeah.

33:58 Me too.

33:58 It looks great.

33:59 I love all these online conferences that it's pretty accessible to everybody.

34:03 Last year, if we would announce this, we'd be like, oh, well, I'm not in Portland, so it

34:06 doesn't matter to me.

34:07 Yeah.

34:07 Right.

34:07 But, Augie, I know you got some stuff to shout out real quick, but also a quick question,

34:13 a follow-up from Anthony.

34:15 NumPy uses AVX extensions for native matrix multiplication on supported CPUs.

34:20 It'd be interesting if that extension supported the same for non-Numpy arrays.

34:24 Thoughts?

34:24 Ideas?

34:25 Yeah.

34:25 I, the, yes, I'm sure you can use those extensions on, I mean, NumPy doesn't have a monopoly

34:31 on AVX extensions, you know, it just needs, whatever library you use, I think you just

34:37 need to, or would need to be compiled with, the Intel MKL BLAS extension, which is,

34:44 goes into build systems, which is way over my head.

34:48 and, I, yeah, I used to live in the C++ world and whatnot, but I'm far from that

34:55 world that you and Anthony are inhabiting these days.

34:57 Right.

34:58 So, yeah, I'm, I'm, so yeah, I'm short.

35:01 I'm not sure.

35:02 but in terms of the extras, a couple of things I wanted to bring attention to is,

35:06 I've been loving the Anthony explains video series and these are generated by,

35:11 I'm probably going to mispronounce his last name.

35:13 Anthony style.

35:14 He's, he's been a guest on, can't remember if he's been a guest here, but I think he's

35:18 been on guest on talk by thong to me.

35:20 he maintains pre-commit.

35:21 He's a pie test developer.

35:22 And, Anthony Sotili.

35:24 Sotili.

35:25 Yeah.

35:26 And, and I've been absolutely loving his Anthony explains, playlist series.

35:32 the other resource that I've recently found myself having to make use of is learn X and Y

35:37 minutes.

35:37 I've, you know, sometimes I have to write something in a tech stack or in a language I

35:42 have absolutely no familiarity with.

35:44 and so that's, that resource has been absolutely amazing, for, you know,

35:50 the five minute overview, right on the real.

35:52 Yeah.

35:53 That's cool.

35:53 And then the other one is, this book I've been reading, working in public.

35:57 And, I think, Guido plugged it a while ago on, on his Twitter feed, but, it's,

36:03 talks, talks about maintaining open source projects.

36:06 and some of the issues arising that I think, it's, you know, I'm still not done

36:12 with it, but I think it's both helpful for, a maintainer point of view to, you

36:16 know, for a sanity checking, your experiences might not, you know, you're not, might not

36:21 be as isolated.

36:22 And I think it's helpful for new, open source contributors to see what, what things

36:27 might look like from the maintainer's perspective as well.

36:29 I've heard really good things.

36:30 Yeah.

36:30 Have you read it, Brian?

36:31 I, it has an audio book version, so I listened to it and it, you wouldn't think

36:36 like a book on open source would be good audio, but it was great.

36:39 Yeah.

36:40 Fantastic.

36:40 Awesome.

36:41 All right.

36:41 Well, Brian, should we do a joke?

36:43 Yes, we should.

36:44 All right.

36:44 So I put two jokes into the show notes.

36:46 One of them is a rap song, which I know Brian is especially fond of.

36:51 It's a rap song about working at, the help desk.

36:55 So if you, you're the help desk for your company or, or I guess public support as well, it's

37:00 by dual call or called here to help.

37:02 And man, it is so funny.

37:03 It's a video, song, you know, on YouTube.

37:07 So it doesn't really make sense to cover it, but I thought I'd throw it in there as a pre,

37:10 pre-recommendation of what I'm going to actually talk about.

37:13 Augie, what do you think?

37:14 I see you smiling.

37:14 Oh, I have to say that that song was just a jam after jam after jam.

37:19 It was, it is.

37:21 I need you to click your right mouse button.

37:23 I only have one mouth.

37:25 So here's, here's the actual Python related, a joke, for us.

37:31 and it's a tech support, thing.

37:33 Brian, why don't you be the person that needs some help?

37:36 Okay.

37:36 Hi.

37:37 This is a chat by the way.

37:38 tech support.

37:39 How may I help you?

37:40 Hi, I've got a problem.

37:41 Your program is telling me to get a pet snake.

37:44 I don't want one.

37:45 Excuse me.

37:46 It's giving me a message telling me I need a snake to run it.

37:49 Okay.

37:49 read the message to me, please.

37:51 Python required to run the script.

37:54 That's terrible.

37:56 That is terrible.

37:57 Terribly good is what it is.

37:58 Yeah.

37:59 so, Hey, I wanted to, add some humor as well.

38:03 All right.

38:03 Do it.

38:03 So I saw this on Twitter and it was a quote from, from, how do we, I don't know how to

38:09 pronounce that name.

38:10 Byron, Brian, Byron.

38:12 I don't know.

38:12 A quote from Byron Hobart.

38:14 Running a successful open source project is just goodwill hunting in reverse where you start

38:20 out as a respected genius and you end up being a janitor who gets into fights.

38:24 Yeah, that's awesome.

38:26 And it goes right along with the book recommendation as well.

38:29 Well, that, that's a good way to put a cap in.

38:32 Yep.

38:33 All right.

38:33 Well, thank you, Brian.

38:34 Thank you.

38:34 Thank you, Augie.

38:35 Thank you for having me.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript