#278: Multi-tenant Python applications

Published Fri, Apr 8, 2022, recorded Wed, Apr 6, 2022

Watch the live stream:

Play on YouTube

Watch the live stream replay

About the show

Sponsored by: Microsoft for Startups Founders Hub.

Special guest: Vuyisile Ndlovu

Brian #1: dunk - a prettier git diff

Darren Burns
Uses Rich
“⚠️ This project is very early stages” - whatever, I like it.
Recommendation is to use less as a pager for it
- git diff | dunk | less -R

Michael #2: Is your Python code vulnerable to log injection?

via Adam Parkin
Let’s just appreciate log4jmemes.com for a moment
Ok, now we can talk about Python

We can freak our the logging with line injection

    "hello'.\nINFO:__main__:user 'alice' commented: 'I like pineapple pizza"

Results in two lines for one statement

    INFO:__main__:user 'bob' commented: 'hello'.
    INFO:__main__:user 'alice' commented: 'I like pineapple pizza'.

The safest solution is to simply not log untrusted text. If you need to store it for an audit trail, use a database.
Alternatively, structured logging can prevent newline-based attacks.
Padding a ton? One such case is abusing padding syntax. Consider this message:
*"%(user)999999999s"*
This will pad the user with almost a gigabyte of whitespace.
Mitigation: To eliminate these risks, you should always let logging handle string formatting.
See this discussion: Safer logging methods for f-strings and new-style formatting

Vuyisile #3: Building multi tenant applications with Django

Free book by Agiliq, covers different approaches to building Software as a service applications in Python/Django.
Covers four approaches to multi tenancy, namely:
1. Shared database with shared schema
2. Shared database with isolated schema
3. Isolated database with a shared app server
4. Completely isolated tenants using Docker

Brian #4: Should you pre-allocate lists in Python?

Redowan Delowar
Discussion of 3 ways to build up a list
- Start empty and append: l=[]; l.append(1); …
- Pre-allocate: l = [None] * 10_000; …
- List comprehension: l = [i for i in range(10_000)]
Interesting discussion and results
- The times (filling the list with the index):
  - append: 499 µs ± 1.23 µs
  - pre-allocate: 321 µs ± 71.1
  - comprehension: 225 µs ± 711
- Python lists dynamically allocate extra memory when they run out, and it’s pretty fast at doing this.
- Pre-allocation can save a little time.
- Conclusion: use comprehensions when you can, otherwise, don’t sweat it unless you really need to shave off as much time as possible
Of note: this was just measuring time, no discussion of memory usage.

Michael #5: mockaroo and tonic

Do you need to generate fake data?
Mockaroo let’s you generate realistic data based data types (car registrations, credit cards, dates, etc)
Tonic takes your actual production data and reworks it into test data (possibly striping out PII)

Vuyisile #6:

Brachiograph —the cheapest, simplest possible Python powered pen plotter by Daniele Procida
Low tech Raspberry Pi project that can be built for < $50 using common household objects like a clothes peg ice cream stick

Extras

Brian:

April 8 new date for Python Issues migrating to GH

Michael:

ngrok has a detailed web explorer

Vuyisile:

Thunder Client : VS Code extension, Lightweight client for testing REST APIs Postman alternative

Joke: Linux world in tatters

Related: Origin of the joke - Lapsus$ claims to leak 90% of Microsoft Bing's source code

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 278, recorded April 6th, 2022.

00:10 I'm Michael Kennedy.

00:11 And I'm Brian Okken.

00:12 And I'm Vuisile Ngovo.

00:14 Welcome, Vuisile. It's really great to have you here.

00:17 You know, I'm really excited and I feel honored to be here.

00:20 So thanks for the opportunity.

00:21 Yeah, it's going to be great to share some Python news with you.

00:24 Now, before we jump into all those things, tell us a bit about yourself.

00:28 What do you do? What are you into?

00:30 Firstly, I'd like to say I'm still relatively early on in my career.

00:34 And I'm from Zimbabwe.

00:36 And we have a small but growing Python community here.

00:39 And for a long time, I didn't have any community.

00:43 So podcasts like your talk Python podcast was the only way I got to connect with community members.

00:49 So it's really great to be here.

00:51 But on the question about me, I'm a software developer.

00:55 I work in the back end.

00:57 I work for a company called Ideation.ai.

01:00 It's a health tech startup that's building information systems that help clinicians manage patients and hospitals better.

01:09 So I work mostly on APIs and microservices using Python, Django, Postgres mostly.

01:15 That sounds like a really fun project.

01:17 And we know that health care needs help and automation and modernization.

01:22 So thanks.

01:23 That's awesome.

01:24 Yeah, very cool.

01:25 All right.

01:25 Brian, should we kick it off?

01:26 Yeah, sure.

01:28 When you talk about getting drunk or what is this?

01:29 Getting drunk.

01:31 Oh, sorry.

01:31 I must have misread that.

01:34 So this was just announced a few days ago from Darren Burns.

01:38 He's the engineer that's helping William.

01:44 Is it Will?

01:44 Will.

01:45 With Rich and everything.

01:48 The Rich Empire.

01:50 Yeah.

01:51 So this is just really cool because I often wanted to, the dunk, he released dunk.

01:57 So dunk is a prettier get diff tool.

02:03 And it uses Rich and it's just a command line tool and it's beautiful.

02:08 So you just even install it and then you do a get diff on something.

02:14 It could be one file or it could be, it's usually a commit, right?

02:18 So you do a diff of whatever you have now or other stuff.

02:21 And it just, instead of doing the weird, like the hard to read command line.

02:26 The plus, minus, plus.

02:27 Yeah.

02:28 Yeah, that thing.

02:29 It's got these nice, just this nice colors with Rich of like, you know, what was added,

02:35 what was green for added, red for taken out.

02:37 And the line numbers.

02:40 It's beautiful.

02:41 And it's still a work in progress, but I'm using it already.

02:44 It's just great.

02:45 This is fantastic.

02:46 When I first looked at this, I thought it was like a gooey window that was showing, but

02:51 no, that's just the terminal.

02:52 Yeah.

02:53 Yeah.

02:54 Yeah.

02:54 It's pretty cool.

02:55 You know, text editor, right?

02:58 Yeah.

02:59 Definitely.

02:59 Some text editors have, have something that like this nice, but just on the command line,

03:04 it's super cool.

03:05 the, one of my first questions with it was sometimes I have a lot of diff stuff.

03:10 So does this have a pager?

03:11 And the answer from Darren was, it does not have a pager, but, but you can use a less

03:18 dash capital R.

03:20 I don't know what the R does, but anyway, if you pipe, get diff to dunk and then pipe it

03:26 to less dash R, you've got a diff with a pager that works for me.

03:29 I'll just alias that to something.

03:31 So, yeah.

03:32 Yeah.

03:32 That's really cool.

03:33 Anyway, pretty quick, pretty short, topic.

03:36 But, for people that are looking at, get diffs a lot, this is a super handy thing

03:41 to look at.

03:42 Yeah.

03:42 This is neat.

03:42 I usually do a lot of my diffs in PyCharm and it actually looks real similar to that UI.

03:47 We see, what about you?

03:49 How do you see your diffs?

03:50 Yeah.

03:51 I do most of my coding and Visual Studio Code and I use the visual studio diff, viewer

03:57 for that same.

03:57 It's pretty similar to this.

03:59 Nice.

04:00 Yeah.

04:00 This looks great.

04:01 I think very nice.

04:02 Yeah.

04:04 Very nice one.

04:05 How about we start with some memes?

04:06 So we all heard about log4j.

04:10 and my favorite one was the take on the XKCD about little Bobby tables, the little, little,

04:18 little Jindy we call him or something like that.

04:21 I can't remember, but you know, all the jokes aside, like, you know, here's Homer Simpson,

04:26 zero days without log4j CVE.

04:29 And, maybe the best one was that, that this guy right here, this, this guy,

04:34 he looks like he's probably about 75 or entire and says upgraded log, upgrading log4j three

04:39 times.

04:39 It wasn't that stressful.

04:40 Says Dave, 28 years old.

04:42 But in all seriousness, like, is there a log for pi?

04:48 Like, is this something that we should consider?

04:49 And my first thought was, yeah, no, we're good.

04:52 Like we don't have this stupid, like remote method invocation where you can inject like,

04:56 a function call as a string inside of your, your log message.

05:02 Oh no.

05:03 But here's Ari Bovenberg who wrote an article that says, yeah, it's not anywhere near as

05:08 severe as that, but there are some things you should consider.

05:10 And so for your consideration, I present this article and some ideas.

05:14 So it says, look, here's the basics of logging.

05:17 And this is using Python's built in logger.

05:19 I'm, I'm a fan of, log book and log guru and the sort of higher level nicer things, but

05:25 nonetheless, here's the basics, right?

05:27 So you can log like say logger.info or trace or whatever, and then put out a message like

05:33 hello world.

05:33 There's no injection there.

05:34 You can also do this thing, which is really the crux of the problem across the board is

05:39 you can say, here's a formatted string and the data that formats it.

05:44 so you can put in the problem with log for J was even if the string was fully evaluated

05:50 as user input or something, you take some user input and you fully validate it.

05:54 It still could have, it'll still get like reinterpreted for these remote, like trying to find you

06:00 what machine am I running on or my production or debug?

06:02 Like, let me go call this function and find out or just call it to hack you.

06:06 But so the Python version doesn't have that, but you can do this like format string and pass

06:11 this context variable thing, like pass a data structure in, in, in that case, some bad stuff

06:17 can actually happen here.

06:19 Right?

06:20 So that's fine.

06:21 So as well, what about, what if I wrote my, as my name instead of, or my message instead

06:26 of hello, I wrote hello, quote backslash and info main user, Alice commented something else.

06:34 And you would, you pass that over.

06:35 And what you would end up with is a log message that was supposed to be one line that ends up

06:39 like two.

06:40 So that could cause some confusion, right?

06:43 That might, might be problematic.

06:44 It's not going to result you in being hacked, but there's more like denial of service type of

06:49 thing.

06:50 So like one thing you could say is, well, just don't use backslash N, like take those

06:53 out.

06:53 But there are all sorts of freaky Unicode ways to like restructure similar meanings and stuff.

07:00 So another one has to do with formatting.

07:03 So if you're logging in some information and it's just a regular F string, that's probably

07:08 fine.

07:09 But if what you're logging into the F string, you can later get evaluated again, passing this

07:15 data structure, asking the logger to fill out the format string.

07:20 Then you can pass interesting stuff.

07:22 One of the more interesting ones was you could say percent, parenthesis, variable name, close

07:29 parenthesis, 99999999999s.

07:33 And what that'll do is it'll pad the username with a gigabyte of white space and then try to

07:38 have you write it to the log file.

07:40 Oh, great.

07:41 So that's bad, right?

07:43 Right.

07:45 Yeah.

07:45 You could also do things if you knew the data structure that was being passed in to fill

07:50 out the log string.

07:51 You could sort of try to reach out and get variable names out of it by putting a formatted

07:57 string in there.

07:58 And if you marry that with the huge piece of text, that'll make the login really slow.

08:04 So you could put in like different things.

08:07 And if you see, oh, this message actually makes the request really slow, you could infer that

08:11 maybe that data is actually in the variable being passed over.

08:15 So then you could try to get it to write it to a file if you have, say, file access, but

08:19 not other types of access.

08:20 Anyway.

08:21 So there's a bunch of things.

08:21 So basically the long story short is don't mix like F string formatting along with passing

08:27 more data to the log file.

08:29 One or the other, because the logger knows how to look for some of these things when it

08:34 takes the data and puts it in the format, but it doesn't do that for the original string.

08:38 So careful about mix and match.

08:40 Final thing.

08:40 There's actually, it's been included in a PEP and there's a discussion on discuss.python.org.

08:46 And there's actually a pretty interesting discussion with a bunch of core devs there.

08:49 So you can see that's maybe a better follow up there, but pretty interesting.

08:53 There's no log for PIE, but there doesn't mean you can just completely go crazy with unverified

09:00 user input.

09:01 You should trust your users though.

09:03 I know.

09:03 Why not?

09:06 They're so friendly and consider.

09:08 Yeah.

09:08 Why not?

09:08 The real ones are.

09:10 You know, when this, when this log for J vulnerability came out and I realized that it wasn't really a big

09:16 problem in Python, I didn't pay any attention to it.

09:19 And now I'm, I'm actually shocked that you could do a denial of service attack using that.

09:23 Yeah, exactly.

09:24 I think that's what it basically becomes is there's two aspects.

09:28 One is you can sort of crush the server by having it write so much data.

09:31 The other that they pointed out here was if your goal is to try to obscure regular hacking,

09:38 if you could wreck the log file with so much data that it's really difficult for people to parse the

09:44 log file, you might be able to hide yourself a little bit better for longer.

09:47 So anyway, there's some interesting stuff there.

09:49 All right, we see that over to you.

09:51 Yep.

09:52 Thanks.

09:52 If you're building a software as a service platform in Python and Django, there are a few

09:59 things to think about, you know, like the architecture you're going to use, what type of database

10:03 you're going to use, whether you use a single database or multiple databases and all these

10:08 things.

10:08 So while I was getting ready for this call, I found this book, it's called Building Multi-Tenant

10:14 Applications with Django.

10:15 And it's by an author that you've actually covered on the show.

10:18 It's a company, I think, called the Gin.

10:21 So this book is free.

10:23 It's open source.

10:23 Anyone's free to read it, download it.

10:26 And it goes through the different approaches that you'd have to follow.

10:30 I mean, the different architecture designs that you should consider when building software as a

10:35 service or multi-tenant applications.

10:37 And so one of the things they cover here is email where you're using queries to isolate

10:42 the data.

10:43 Something like Postgres, you must do that.

10:46 This book goes over the different approaches you can use to build multi-tenancy apps, right?

10:51 And then it also covers some third-party packages that you can install that help do a lot of the boilerplate code for you.

10:58 That's really nice.

11:00 Because I've considered this.

11:01 It'd be so great if you're doing some sort of software as a service type thing where people log in and you want that group just to see all their data and all their records and stuff.

11:10 But it's so scary because if you just forget the where clause on just one, on just one.

11:15 Exactly.

11:16 They get everybody's data, which is really bad, right?

11:19 And so this is really cool.

11:21 Yeah.

11:21 Yeah, this is neat.

11:22 So the book covers things like using HTTP headers or subdomains in the request to identify different tenants and how you do that, how you capture that using middleware in Django.

11:35 That's cool.

11:37 So some of the middleware is Django multi-tenant, Django tenant schemas, or Django DB multi-tenant.

11:44 Not a ton of variation in the name in there, but it's still pretty cool, right?

11:49 And some of them use schemas and some of them use isolated databases, right?

11:53 Yeah.

11:54 Nice.

11:54 So it will depend on what your tolerance for cost is and database management.

12:00 So if you don't mind having a database for each client, you could do that.

12:04 And then you'd have to do migrations on each database, wherever you make updates to the application.

12:09 Or if you just want to have a single shared database, you can do that and isolate using schemas.

12:13 Yeah.

12:13 I hadn't thought about having to migrate every separate database, but yeah, that's a ton of work.

12:18 The deployment all of a sudden looks really rough, doesn't it?

12:21 Yeah.

12:22 But that's true isolation there.

12:23 Yeah, exactly.

12:24 There's no way you're going to make a mistake there.

12:26 Do you guys do anything like this with your healthcare products?

12:30 Yeah.

12:31 We use one of these approaches.

12:32 I can't tell you which one, but we use our software as a, what do you call it?

12:38 Software as a service.

12:40 We have a number of clients.

12:41 They need to have a central login, like the single application that you can all log in and view only their data.

12:48 And we can't have information from one client leaking over into another.

12:51 Yeah.

12:52 Cool.

12:52 All right.

12:53 Well, really neat.

12:53 I'm sure that'll be super valuable to people.

12:55 Indeed.

12:56 Yeah.

12:57 Now, Brian, before we move on, how about I tell you about our sponsor?

13:00 Once again, Microsoft is here.

13:02 So let's hear from them before we carry on.

13:04 This episode of Python Bytes is brought to you by Microsoft for Startups Founders Hub.

13:09 Starting a business is hard.

13:11 By some estimates, over 90% of startups will go out of business in just their first year.

13:16 With that in mind, Microsoft for Startups set out to understand what startups need to be successful

13:21 and to create a digital platform to help them overcome those challenges.

13:25 Microsoft for Startups Founders Hub was born.

13:28 Founders Hub provides all founders at any stage with free resources to solve their startup challenges.

13:34 The platform provides technology benefits, access to expert guidance and skilled resources,

13:40 mentorship and networking connections, and much more.

13:43 Unlike others in the industry, Microsoft for Startups Founders Hub doesn't require startups

13:48 to be investor backed or third party validated to participate.

13:52 Founders Hub is truly open to all.

13:55 So what do you get if you join them?

13:56 You speed up your development with free access to GitHub and Microsoft Cloud computing resources

14:01 and the ability to unlock more credits over time.

14:04 To help your startup innovate, Founders Hub is partnering with innovative companies like OpenAI,

14:09 a global leader in AI research and development, to provide exclusive benefits and discounts.

14:14 Through Microsoft for Startups Founders Hub, becoming a founder is no longer about who you know.

14:19 You'll have access to their mentorship network, giving you a pool of hundreds of mentors

14:23 across a range of disciplines and areas like idea validation, fundraising, management and coaching,

14:29 sales and marketing, as well as specific technical stress points.

14:33 You'll be able to book a one-on-one meeting with the mentors, many of whom are former founders

14:37 themselves.

14:38 Make your idea a reality today with the critical support you'll get from Founders Hub.

14:43 To join the program, just visit pythonbytes.fm/foundershub.

14:47 All one word, no links in your show notes.

14:49 Thank you to Microsoft for supporting the show.

14:53 This is a topic that has been very interesting to me, sort of this memory story around Python lists.

14:58 Yeah.

14:59 I'm looking forward to this one you got to share.

15:02 So I was interested.

15:03 This is a, we're going to present an article called Pre-Allocated Lists in Python by Redouan

15:09 Delaware, I think.

15:11 Anyway, I've always, I've thought about this before because one of the things that happens

15:16 with when you allocate a list in Python, if it's empty, it's not really empty.

15:20 There's some data there already.

15:22 And one of the first things the article talks about is this data structure that AC struct that

15:27 Python uses to store basically the info about the list.

15:31 But it's still space, but it's, you know, it's still, it's empty, supposedly.

15:35 And then when you, and normally you kind of just append to it.

15:39 So you, or one way to add things to a list is to just append one thing after another.

15:45 And what Python does, it's kind of a neat algorithm, is it allocates more than it needs.

15:51 So if you add, if it, you, you add like five things or six things or something, and there's

15:55 not enough space, it'll, it'll, and I don't remember the real algorithm, but it chunks a

16:00 bigger portion.

16:01 And then if you run out of space again, you get more space added to it.

16:06 Right.

16:06 Because the last thing you want to do is reallocate for every bite at a time and copy the whole

16:12 list as you're adding a thousand items.

16:14 That would be super bad.

16:15 Right.

16:15 So this, this article talks about three different ways.

16:18 Like, let's say, if you know, you know, you're going to have 10,000 elements in a list.

16:22 and, in this example, it's just counting, you know, a zero through, you know, 9,999,

16:30 and filling it into the list.

16:31 but, but there's, that's, I think that that's irrelevant.

16:36 It's the same sort of work, for each kind of list, but it takes three kinds.

16:40 Well, the first kind is starting with an empty list and just appending every time.

16:44 And that seems like it would be slow, but, it's not actually not.

16:48 The other two ways are to preallocate.

16:50 And I'm like, how would you preallocate?

16:52 but, his, his technique was to, to take like none and, assign your list none times 10,000.

17:03 So you had a 10,000 element list of nuns.

17:06 That's fine.

17:07 and then.

17:08 Long as that's not a valid value, you're fine.

17:11 Yeah.

17:11 and then the other, the third way, was to take, let's see, where is it?

17:20 Is to, to do a list comprehension and do, and just assign your list, the list comprehension,

17:26 and then put a for loop, for I in range 10,000 in the middle of it.

17:32 and in, in the case, in this case, if you, if you weren't really just counting to

17:37 a 10,000 and doing something else, it would be a similar sort of thing of you'd have a for

17:40 loop to fill this, this in.

17:42 And I actually had no guesses as to what would be fastest.

17:46 So the final say when he was doing timing on this was that the append

17:53 method actually was, was the slowest, but not terrible.

17:57 it's pretty efficient and the pre allocate method, it shaved.

18:02 So we had a 499 microseconds on his machine.

18:05 and then 321 on, the pre allocate.

18:10 so that's not even half as, I mean, it's not an order of magnitude, but it is faster.

18:15 And the list comprehension was 225.

18:18 So that was about half.

18:19 It was about, twice as fast as the append was to use the list comprehension and list comprehension

18:25 is actually the most readable of the three, I think.

18:28 So, it's just sort of a, that I guess it's an interesting article to look at like how

18:34 to discuss like how, how this, this allocating and allocating extra memory happens, with

18:42 append.

18:42 But it also, is interesting that the pre allocate, it seems like that would be the

18:47 fast, one of the faster ones.

18:49 And it's not, so interesting.

18:51 yeah.

18:52 I wonder if, I don't think the list has this.

18:56 I know in other languages it does where you, when you create the list empty, you can say,

19:00 I would like to initialize you with this capacity.

19:02 Yeah.

19:03 Right.

19:04 And if it was like a built-in way to say, when you allocate your inner C level array pointers,

19:09 make it this big to start with, but still sort of fill into it before you start your growing

19:14 algorithm.

19:14 Yeah.

19:15 Maybe that'd be a cool PEP for some of the containers if it's not there, but yeah, this,

19:18 I think it's natural that the list comprehension is fastest.

19:21 And also it doesn't, it means you don't end up with a weird programming model where you have a list.

19:26 It's length is one thing, but that's not what you should actually work with.

19:29 I think that's, it's probably not worth it except for extreme cases.

19:33 A couple of things that I was, I found interesting about this that I'd like to pursue a little further is it

19:38 didn't talk about memory space.

19:40 So one of the benefits of pre-allocating is you're not allocating more than you need,

19:44 but I don't know if you're not allocating.

19:46 I don't know what the Python algorithm is.

19:49 But, but the, so I'd, I'd like to see this with space.

19:55 So how much memory is being used by the three methods.

19:58 The other thing that would be interesting to see is to throw NumPy in the mix because I

20:04 know NumPy has some more efficient, I mean, it's a completely different beast, but still.

20:08 You work with homogeneous data that's numbers or something or strings.

20:12 Yeah.

20:12 What do you think about this?

20:13 Do you have to worry about these little details?

20:16 Are you guys under like heavy performance pressure?

20:18 You know, no, not, not right now, at least.

20:21 I've never had to think about like C level things and I'm actually taken aback that so much

20:27 goes into allocating stuff to a list because in Python, you know, allocating stuff to a

20:31 list is just create the list and put stuff in there, you know?

20:33 Yeah.

20:34 So this is eyeopening to me.

20:36 Yeah.

20:36 It's pretty cool.

20:37 It's not like C where you have to pre-allocate it and then fill it out and or something funky

20:41 like that.

20:42 Mm-hmm.

20:43 Yeah.

20:43 So, yeah.

20:43 So what McCugan is saying, I think the list comprehension will pre-allocate because the range object

20:48 has a dunder length int method that reports its size.

20:52 And yeah.

20:53 So I think maybe the time saving we're getting is that we're not filling it in with

20:57 nones to begin with, but actually filling it in with the data we want.

21:01 Okay.

21:02 Yeah.

21:02 Good to know.

21:03 Thank you, Will.

21:04 I've more than once had a, not argument, but a disagreement where somebody said, but you

21:10 need to show me because, and it's, oh, you have a for loop and you just append to the list.

21:16 That's the same as a list comprehension.

21:17 They're doing the same thing.

21:18 The outcome, the final result is the same, but the information that Python has to work with

21:24 is more much like, well, we'll say it here.

21:26 It can take all the information it has to work with.

21:28 I say, oh, look, it's going to be this long as we loop.

21:30 And you're going to just add stuff to the list, not use it in other interesting ways.

21:33 So just go and, and jam on it.

21:36 Right.

21:36 Yeah.

21:37 All right.

21:38 Speaking of working with some data, let me tell you about this cool project called Mockaroo.

21:44 You guys familiar with this?

21:45 No, no, no.

21:47 So here's the story.

21:48 Imagine you needed some data and you want this for testing or this could be testing like unit

21:56 testing.

21:56 This could be development.

21:57 Like one of the big problems with UI apps is having something to display just so that

22:04 it, it fills it out.

22:05 If I'm going to like fill out a webpage and I say, I want to work on the CSS of this, this

22:09 table or the CSS of this list.

22:11 If there's nothing in the list, what are you going to do?

22:14 Right.

22:14 So you want to have some realistic data to work with.

22:18 So this Mockaroo is this free thing that has all these different types of data that you can

22:22 work with.

22:23 So I can come over here and just say, I want some data and I want it in a CSV format

22:27 or SQL table or Firebase or Excel or XML or, you know, my favorite probably is JSON.

22:35 And then you can say, all right, well, I'm going to have an ID here.

22:38 We have like a customer table.

22:39 So ID, first name, last name, but it has also things like gender.

22:42 And one of the types you can pick is gender.

22:45 So it has all these well-known data types.

22:47 So if I go and type in, I want a gender, not only will it say male, female or something,

22:52 it gives you like a list.

22:53 So I can have gender written out as female, male or non-binary.

22:57 I could have gender abbreviated as M or F or just binary.

23:01 So you can have like lots of control.

23:03 So if I wanted to like, you say auto or car, what do I got to type in a car?

23:07 You can do like car makes, models, registration numbers, all of these things.

23:12 So you can say this one is a gender abbreviated and like you fill it out.

23:16 Then you can just say, generate me this data exactly like you want.

23:21 And then download it in whatever format.

23:23 Like I said, CSV, SQL, insert statements, JSON, Excel.

23:27 Isn't that cool?

23:28 That is pretty cool.

23:29 So I've used this more than once.

23:31 I can see a use case for this already.

23:34 Awesome.

23:35 Yeah, right?

23:35 Yeah.

23:36 Yeah.

23:36 And I kind of liked the first option when you were selecting the gender type, having it be

23:43 animal names.

23:44 That'd be fun.

23:45 Yeah.

23:46 I mean, there's all like, there's all these, there's all these, that's crazy.

23:49 There's all these different data formats.

23:51 So you've got like cars.

23:54 What else we got here?

23:56 Credit cards, goods, ISBNs for books, numbers on a normal distribution, passwords.

24:02 Even MongoDB object IDs.

24:04 That's cool.

24:05 Oh, that is pretty cool.

24:06 Yeah.

24:06 So you have e-commerce stuff, money, stock market symbols, locations, healthcare.

24:14 Let's see.

24:15 How about that?

24:15 You got your drug companies, your NHS numbers and all those different things.

24:20 Oh, it's because I'm searching for common.

24:22 Why is car keep showing up?

24:23 Animal common names.

24:24 Yeah.

24:25 Yes.

24:25 You could have a wombat or a jungle kangaroo.

24:28 I mean, these are all some fun, right?

24:30 Yeah.

24:31 So these are all super neat.

24:32 You can get up to like a thousand rows for free.

24:34 And then I think you have to pay if you need more than that.

24:36 And then a follow on, I believe this from the same company, full disclosure, these guys

24:41 sponsored Talk Python.

24:42 But I've wanted to talk about this even before.

24:44 So they have this thing, the service called Tonic that you can then point at your production

24:51 database and it'll do things like generate me something that looks exactly like production

24:56 data, but doesn't have any personally identifiable information so that I can give it to the developers

25:01 to test with real looking data with real variations from our clients, but is sort of safe.

25:08 Like if they lose their laptop or whatever, or they just leave it open, it's not going to

25:12 destroy something.

25:13 Right.

25:13 Yeah.

25:13 That's pretty cool.

25:14 Yeah.

25:15 So you basically connect it to your database and then it will go along and sort of create

25:22 data that looks more like what you actually have instead of just this mockaroo data.

25:27 So pretty neat.

25:28 Anyway, if you need to do some testing, you need to generate fake data, not just for like

25:33 pytest testing, but also UI development and just something to work with.

25:37 These are both good options.

25:38 Very cool.

25:38 Yeah.

25:39 Cool.

25:40 Sam out in the audience says, this is fantastic.

25:42 I agree.

25:43 And Will says, yeah, super useful.

25:45 I could see even using this for testing development of rich and textual out there.

25:50 So very cool.

25:50 All right.

25:51 Vusile, off to you.

25:53 Last one.

25:53 All right.

25:54 So this is a fun project that a good friend of mine, Daniele Procida made.

25:58 He's demoed it at a couple of conferences.

26:01 It's called the Brachio Graph.

26:03 The goal for this project is to make a pin plotter powered by Python and make it as cheap

26:09 as possible using common things you can find in the house.

26:12 So it's a plotter.

26:14 It uses a Raspberry Pi and ice cream stick and a clothespin to draw and a pencil, of course.

26:20 Right.

26:22 So it's got Python code that turns an image into, I think it's called a raster.

26:28 It rasterizes an image into coordinates on a piece of paper.

26:34 I could have used this yesterday.

26:35 Oh my gosh.

26:36 This is great.

26:36 Yeah.

26:38 So I don't know if I can play video here, but it looks pretty cool when it's actually printing

26:43 out or plotting out an image.

26:45 Let me see if I can get it to work here.

26:47 But yeah, it has a motor that then does everything and it can draw very basic images.

26:54 Oh, wow.

26:55 It's a fun project that you can work on.

26:58 And it costs, I mean, the setup for this costs less than 50 US dollars.

27:02 And it's a pretty, pretty fun project.

27:05 Oh, I would have gotten an A in art class if I had this.

27:10 No, I love it.

27:10 This is really neat.

27:11 People should definitely play the video and watch it because it's fascinating.

27:14 Yeah.

27:15 Yeah.

27:15 The website has like a how-to guides and documentation on how to build this, what things you need,

27:23 what sources to the software and everything.

27:25 And it's also an open source project that anyone can contribute to if you're interested.

27:30 This is really neat.

27:31 This is one of the things I like about simple things like this is they're great projects to

27:37 start kids with because it's very real and physical.

27:40 Yeah.

27:41 I was thinking this would be awesome in a teaching scenario as well.

27:44 Yeah.

27:45 Cool.

27:45 All right.

27:46 This is a great one.

27:46 And I love it.

27:47 Very neat to do with Python and stuff.

27:49 All right.

27:50 Well, I think that's it for our main items.

27:52 Brian, do you got anything you'd like to share?

27:54 Oh, we covered last, I think we covered last week that the Python issues were migrating to

28:00 GitHub and it might be on April Fool's Day and it was not.

28:05 So next plan looks like April 8th.

28:08 Next, one more week.

28:09 If we keep talking about it, it's never going to happen.

28:11 Okay.

28:11 Like a watch pot sort of a thing.

28:14 Exactly.

28:14 Yes.

28:15 Well, I'm waiting for it to happen.

28:17 I want it to happen.

28:18 I know.

28:19 The transformation will be complete at that point, right?

28:21 So next week, we won't cover it at all unless it's already happened.

28:25 But if it's delayed again, we won't cover it again until that.

28:28 Yeah, exactly.

28:29 We're not getting roped into this three times.

28:31 Lucille, anything else you want to give a shout out to?

28:34 Yeah, yeah.

28:34 Just one thing is a project that I found recently.

28:38 It's called Thunder Client.

28:40 It's an alternative video code extension and it's lightweight.

28:44 You download it and install it in less than a second and you can get started sending requests.

28:50 And it has less setup than Postman.

28:52 Right.

28:53 And it doesn't need any, it's like, it's easy to install.

28:56 Yeah, so if you were testing APIs, like construct a JSON thing, put this header in, you want to call it.

29:01 Yeah, Thunder Client for VS Code.

29:03 Very nice.

29:03 Yeah.

29:04 Thunder, Thunder.

29:05 If you're using VS Code.

29:06 Exactly.

29:08 Yeah.

29:09 Yeah.

29:09 You just switch tabs, you know, instead of switching applications.

29:13 So that shaves a few microseconds off your workflow.

29:15 Yeah, exactly.

29:16 That's cool.

29:17 I love it.

29:18 All right.

29:19 Nice.

29:19 I've got just one thing I believe today.

29:22 This is really short.

29:23 I've spoken about ingrok at ingrok.com before about how it's really cool for exposing.

29:27 If you're like wanting to expose an API to the outside world that you're developing or you need to debug it.

29:34 I've used this for like web hooks.

29:35 So this company, I need to integrate with their web hook.

29:39 So I need them to call this, but it's not working.

29:41 So I want like a break point on my machine.

29:43 But how do they get to my machine?

29:45 Just run ingrok and it'll tunnel it right through the firewalls using SSH reverse tunnels.

29:51 That's all good.

29:51 What I discovered working on yet another integration project was that there's actually this super rich inspector that I think people haven't noticed in there.

30:01 If you fire up an ingrok thing and then you go to look host 4040, every request comes through.

30:06 You can see the summary, the HTTP headers, the cookies, the response, the status codes, the duration, all that.

30:14 If you're using ingrok for that sort of use case, be sure to check out this like live web view that lets you dive into.

30:20 It's almost like the dev tools, the network tab of the dev tools, but for just people coming in rather than you consuming stuff.

30:28 So it's pretty cool.

30:29 That's neat.

30:30 Are you guys ready for a joke?

30:32 Yes.

30:32 Shall we finish it out with a joke?

30:35 You may have heard recently that the Microsoft source code for Bing was got by the Lazarus group.

30:44 And people thought this was some folks in like Brazil or somewhere in South America.

30:49 It turns out it was a bunch of British teenagers.

30:53 What about like $14 million in Bitcoin and whatever.

30:57 So they had gotten a hold of some of the windows and Bing source code, I believe it was.

31:02 And there was like, oh my gosh, is this going to reveal a bunch of zero days because people can go through the source code?

31:07 Well, we don't do that much windows, at least on the server and Python.

31:11 There's some, but not as much.

31:12 But we use a lot of Linux, right?

31:14 For all the Talk Python, PythonByte stuff, we've got like a fleet of eight Linux servers.

31:19 Now, Brian, when I saw this headline, I really began to worry that maybe some vulnerabilities would be discovered or some kind of problem would happen here.

31:28 So the headline is, Linus Torvalds confirms the Lapsus breach after hackers publish the Linux kernel source code to the internet.

31:37 Okay.

31:41 In a blog post on Tuesday, published hours after the Lapsus posted a torrent file containing partial source code from the Linux kernel, the geek man himself revealed that his branch was cloned by the hacking group, granting attackers unlimited power to the article stops there.

31:58 Oh, man.

32:00 How many times do you have to read?

32:02 Exactly.

32:04 I think being open source, it's probably okay.

32:07 Yeah.

32:09 Yeah.

32:10 Yeah.

32:11 Oh, no.

32:12 It's published the source.

32:14 They published the source to Linux.

32:16 What are we going to do?

32:17 The programming humor just never stops.