#324: JSON in My DB?

Published Tue, Feb 21, 2023, recorded Tue, Feb 21, 2023

Watch the live stream replay

About the show

Sponsored by Compiler Podcast from Red Hat. Connect with the hosts

Michael: @mkennedy@fosstodon.org
Brian: @brianokken@fosstodon.org
Show: @pythonbytes@fosstodon.org
Special guest, Erin Mullaney: @erinrachel@fosstodon.org

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.

Brian #1: Use TOML for <code>.env</code> files?

Brett Cannon
.env files are used to store default settings that can be overridden by environmental variables.
Possibly brought on by twelve-factor app design.
Supported by python-dotenv, which is also used by pydantic, pipenv, and others.
One issue is that it’s not a defined standard.
- from python-dotenv docs “The format is not formally specified and still improves over time. That being said, .env files should mostly look like Bash files.”
Adafruit decided that an upcoming CircuitPython will use TOML as the format for settings.toml files, which are to be used mostly how .env files are being used.
Brett notices this may fix things for Python for VS Code, and other people as well.
So… Is this a good idea? I think so.

Michael #2: Pydantic gets serious funding

via Mark Little (was on episode 285)
Sequoia backs open source data-validation framework Pydantic to commercialize with cloud services.
Pydantic Services Inc. emerges from stealth today with $4.7 million in seed funding.
Pydantic’s new commercial entity will incorporate a swath of new tools and services that are both “powered-by and inspired-by the Pydantic library”
Pydantic will start with an initial team of six, with the first three engineers based in Montana, Chicago and Berlin.
“With $4.7 million in the bank, Colvin said that they’re continuing to rewrite parts of Pydantic in Rust, with a view toward making it more efficient via a ten-fold performance improvement.”

Erin #3: JSON Fields for performance (Denormalization)

David Stokes
Using JSON fields when you design your databases is a good way to improve database query performance.

Brian #4: f-strings with pandas and Jupyter keyboard shortcuts

Kevin Markham
After a couple year break from blogging, friend of the show Kevin Markham has a couple great, short, useful posts.
How to use Python's f-strings with pandas
- My favorite bit is the part about using f-strings for dictionary keys
Fly through Jupyter with keyboard shortcuts 🚀
- I’m a sucker for a rocket emoji
- Not an overwhelming list. Just the essentials for even the casual Jupyter user.
- Examples
  - Esc and Enter for command mode/edit mode
  - a and b for creating a new cell above or below current cell.
  - m and y for changing the cell type to Markdown or code.
  - Shift+m to merge cells
  - so many more

Michael #5: BioGPT

“GPT” for biomedical text generation and mining
As motivation, let’s see what ChatGPT can do with arrow anti-patterns in Python.
Smaller models and “Large” models
Used via an API rather than chat style.
BioGPT has also been integrated into the Hugging Face transformers library too
Play with it here.

Erin #6: Code Mentorship and Communicating with Newer Devs

Sheena O’Connell
Sheena O’Connell gave a talk at DjangoCon about her work at Umuzi, training unemployed young people in underserved communities in Africa and also was on Django Chat Podcast.
Dmitriy Chukhin
Caktus Group is trying a new mentorship program for folks who don’t have the necessary training.

Extras:

Michael:

News is, these are no loner news: Security Researchers Uncover 700+ Malicious Open-Source Packages in npm and PyPI
Git security vulnerabilities announced, again
git ignores
- https://github.com/github/gitignore
- https://gitignore.io

Erin:

DjangoCon is in October in Durham, NC this year (Oct 15-20)

Joke:

Remember your pointers?

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 324, recorded February 21st, 2023.

00:10 I'm Michael Kennedy.

00:12 And I'm Brian Okken.

00:13 And I'm Erin Malini.

00:15 And this episode is brought to you by Compiler, a podcast from Red Hat.

00:18 Tell you more about them.

00:19 Erin, it's awesome to have you on the show.

00:21 Thanks for joining us.

00:22 Thanks for asking me to be on.

00:24 Yeah, you bet.

00:25 Yeah.

00:27 Why don't you tell folks a bit about yourself before we jump into the topics?

00:30 Yeah, I'm Erin Malini.

00:32 I've been a web developer since around the year 2000.

00:35 I currently work at Energy Solutions as a code base lead on a Django project there,

00:41 which means that I write and review a lot of Django and Python code on a day-to-day basis.

00:47 Energy Solutions, where I work, is an energy consulting company that's mission-driven to

00:53 protect the environment through different energy things.

00:57 To be real, not specific.

00:59 I specifically work on a Django project that facilitates energy efficiency programs.

01:05 And energy efficiency is actually a super powerful and cost-effective way to combat climate change.

01:13 And that's according to the U.S. Department of Energy.

01:16 Yeah, that's awesome.

01:18 All the wasted energy and bad insulation and other things like that.

01:22 That's really cool.

01:23 Yeah.

01:23 That's good work.

01:24 Really quickly, before we dive into Brian's item here, how are you feeling about Django and the recent changes?

01:29 I feel like it's picked up a lot of momentum lately.

01:31 It's picked up some new features like async stuff.

01:34 Is that exciting for you and your team?

01:36 Yeah, for sure.

01:37 It's exciting.

01:38 I am coming from a background where I was actually coding in a different web framework for years and switched over to Django.

01:46 So I'm just happy to hear that more and more people are downloading it and using it.

01:54 I wanted to stick around because I like it.

01:58 Yeah, absolutely.

01:59 All right, Brian, you want to kick us off here?

02:01 Sure.

02:02 So this one, first one's coming from Brett Cannon.

02:04 So he wrote an article called, Use Toml for .env files?

02:09 And so there's the question at the end, and we'll talk about that.

02:15 But I just ran across, I mean, I don't know, because I'm not a web developer very much, I mean, I'm getting more so now, but I wasn't really familiar with the .env files until just recently.

02:27 And so one of the great things about this article is it talks about kind of what these are.

02:32 So what these are often is you've got settings for your application.

02:39 And there's an idea of a 12-factor app design, which I kind of like read about many years ago and forgot about.

02:47 But one of the ideas is you don't want to like have too many differences between your development environment and your live environment.

02:54 And one of the ways you do this is using environmental variables to store things like login credentials and all that sort of junk.

03:01 And in Python, one of the ways we do that is through .env files and also through a project called Python.env, which is used by Pydantic and a lot of other projects.

03:14 And what this does is it allows you to have defaults in there.

03:18 So in your development environment, you might have something silly, some silly credentials, or looking up somewhere.

03:26 But then in your live environment, those are actually set by the production server to set those secrets.

03:33 And so the question really is, what's the format of this?

03:37 So, and I kind of never really thought about it before.

03:41 And basically the problem is it's not defined.

03:45 And it's in...

03:48 There exists a text file that has secrets.

03:49 Yeah.

03:50 So with that.

03:50 Yeah.

03:51 And it says it's kind of like Bash-ish files or something.

03:56 It's by the...

03:57 It's a format that's not formally specified and improves over time according to the Python.env readme.

04:04 But that's not really...

04:06 What does that mean?

04:07 It kind of means it's your application so that you can define it however you want, right?

04:12 But maybe we should have some standardization.

04:15 So Brett was looking further into this.

04:18 And one of the solutions that Adafruit came up with was let's not use .env, but actually just do a settings.toml.

04:27 And it's used for the same thing, to store secrets such as passwords and API keys.

04:32 So they're using Toml.

04:34 And then basically kind of when you just do a normal simple Toml file, it looks pretty much like a normal any other .env file that people have used.

04:43 So really that's the question that Brett is posing is, can we just standardize on this?

04:49 Why don't we just, you know, standardize .env as .toml, as Toml format?

04:54 And I think why not?

04:56 Mostly it'll work for everybody already.

05:00 And then you could do cool things if we did Toml.

05:03 You could extend it a bit.

05:04 So like in the VS Code code base, they're talking about like using categories and specific table to hit.

05:11 You'd have multiple tables in there instead of just the global one.

05:14 But I think that's a cool idea.

05:16 I like the ability to have multiple things like test and maybe dev or like a connection string to a database or something.

05:22 Yeah.

05:23 It wouldn't make me sound if it was JSON as well.

05:26 I know Aaron is going to make a cameo for JSON later.

05:31 But, you know, Toml seems to be winning on these things and I would be okay with Toml as well.

05:36 So Aaron, you do web development.

05:39 Do you use .env files or this sort of a setting?

05:42 We use settings.

05:45 We, yeah, we don't use .env files.

05:49 We have, we do have local settings, but yeah.

05:52 Cool.

05:53 I'm not really a Django developer.

05:54 So maybe is it built into Django to have some solution for this?

05:58 We, yeah, I'd have to like, I'm not a, I get it running on my machine and then I go and I code.

06:04 Yeah.

06:05 So all the OS stuff.

06:07 Yeah.

06:08 All the OS stuff is not, yeah.

06:10 It's not stuff I worry about unless I'm installing a new requirement or something.

06:14 Well, yeah, Django does have its way of managing settings that predates this stuff, I believe, as well.

06:20 All right.

06:20 Yeah, that makes sense.

06:21 Well, Michael, should we switch to Pydantic?

06:24 I have some, I have some crazy news for you.

06:26 Yeah, let's do it.

06:27 First, huge, huge congrats over to Samuel Colvin.

06:33 And I've had him on the show to talk about Pydantic before.

06:36 Pydantic is one of the more exciting libraries, I think, especially in the API space.

06:41 But also Python Bytes itself is powered by Beanie, the MongoDB, ORM or ODM.

06:48 And that is, uses Pydantic models as its validation in an exchange.

06:52 Like the things that are mapped to MongoDB are Pydantic classes.

06:56 So here's the news.

06:57 The Sequoia, like one of the biggest VC firms in California, in the world probably,

07:04 backs open source data validation Pydantic to commercialize with cloud services.

07:09 That's crazy, huh?

07:10 Yeah.

07:11 We are a long way from the buy me a coffee, donate PayPal button that you see on various projects in this.

07:18 And I think it's just a sign of the open source space finding its way to support really successful projects

07:27 and to support people whose time and energy and contributions to the world would be better spent to create,

07:32 further this library than say potentially like, well, how can we get like 1% of 1% increase on ad clicks by using my library or something like that?

07:43 You know, working for like companies that don't necessarily contribute so much.

07:46 So some of the highlights here, you'll notice when I said we're a long ways from buying me a cup of coffee,

07:53 Pydantic Services Incorporated emerges from Stealth today with $4.7 million in seed funding.

07:59 Wow.

07:59 Yeah.

08:00 That's big coffee.

08:01 That is a lot of coffee.

08:03 That's like coffee for life.

08:05 Some of that fancy kind, you know, the weird variations and stuff.

08:10 Yeah.

08:10 Anyway, so it's not just Sequoia.

08:12 It's Peratech, it's Irregular Expressions, it's Zapier co-founder Brian Helmig, who's also been on Talk Python before,

08:20 and some other folks, co-founder of Sentry, David Kramer, so on.

08:24 So let me see.

08:25 I wrote down some of the highlights of this whole article that I wanted to hit on.

08:29 First of all, also this comes from Mark Little, who was a guest on show 285 and also a friend of mine.

08:35 So thanks, Mark, for sending that in.

08:36 The new, the whole, like, so you might be wondering, okay, well, $4.7 million is amazing.

08:42 It's a lot of support.

08:42 It means Pydantic is only going to get better and stronger.

08:45 But what the heck are you going to get for your $4.7 million?

08:49 So the idea is that this new commercial entity, it'll incorporate a bunch of tools and services that are powered by and inspired by the Pydantic library.

09:00 And from what I can tell is its primary goal is to make Pydantic really, really good further, right?

09:06 There's already this big project for 2.0 for rewriting the core in Rust.

09:11 This is the last time I had Samuel on the show on Talk Python to talk about that, which is going to make it a lot faster.

09:16 But something a little bit akin to a platform as a service, something a little bit like a Heroku, where you can push Python code to production in simple ways.

09:27 But using the validation and the data exchange and the understanding that Pydantic has for data as part of this.

09:34 So final thing, then I'll get your thoughts on this, is you're going to start with an initial team of six.

09:39 The first three engineers are based in Montana, Chicago, and Berlin of various places.

09:44 And so, yeah, I wish all the luck to the Pydantic team and to Samuel and folks.

09:50 I think this is great.

09:51 What do you all think?

09:52 I think this is great.

09:55 I like the conversion to Rust.

09:57 That's pretty exciting.

09:58 Yeah.

09:59 Yeah.

10:00 How's this sit with you?

10:01 Does this surprise you?

10:02 No, it's cool.

10:05 It's very cool.

10:07 I mean, I'm just Googling it because I didn't research it ahead of this talk.

10:12 But, yeah, it sounds like it can be used with any Python-based framework.

10:16 Yeah, it came out of Fast.

10:18 Yeah, it came out of FastAPI.

10:20 And it plays many important roles in FastAPI.

10:23 It's the data validation.

10:24 It's also the type hints that does the automatic data conversion.

10:27 But it also drives the Swagger, OpenAPI documentation, and all those things.

10:32 But it's been used way, way more places, for example, like Beanie, which I mentioned, or SQL Model, and plenty of others.

10:39 And it's just starting to gain a ton of momentum as a really solid data exchange for Python that's not directly talking to databases.

10:47 So, yeah, it should be good to see it grow.

10:48 What does that mean, not directly talking to databases?

10:52 Meaning it just reads what comes back from the API and validates that?

10:57 Yeah, it basically will take any JSON.

11:00 Or if you could take a Toml document and you could turn it into a Python dictionary, then you could pass that on and have it validated.

11:06 So you could say things like, this class has a list, which is a list of orders.

11:14 And there can be no more than three orders in the list.

11:17 And they have to be orders.

11:18 And this thing has to be a number.

11:20 And just all that kind of logic gets expressed in the model there.

11:23 Yeah.

11:24 Yeah, it's cool.

11:24 So one, just, I guess, a random thing.

11:27 So it's a team of six, first three engineers based in Montana, Chicago, or Berlin.

11:31 I wonder who's in Montana.

11:34 And I guess if you had to choose one of three places to live, would you choose Montana, Chicago, or Berlin?

11:40 Gosh, I could.

11:41 I think I'd go with Berlin.

11:44 I could make a case for Montana or Berlin.

11:47 They both are awesome in their own separate ways.

11:49 Like, what's your spare time look like, I guess?

11:54 Yeah.

11:54 I mean, I do love the theaters in Chicago.

11:57 The theaters in Chicago are beautiful.

11:59 I do too.

12:00 But I'm thinking of motorcycle riding for days in Montana and the cities and all that stuff in Berlin.

12:05 Erin, where would you live?

12:06 Man, between those, that's really a hard choice.

12:10 I moved to North Carolina for shorter winters, so it seems like Chicago would be out for that reason.

12:15 Because they have even longer.

12:16 Montana might really be out.

12:17 Winters, yeah.

12:18 So I would need to research what had the shortest winter, but also had really good vegan food.

12:23 Like, Chicago has amazing vegan food, but the winters, I just can't.

12:28 I think Berlin's going to be your vet.

12:29 Yeah.

12:30 Yeah.

12:30 All right.

12:31 Awesome.

12:31 Well, over to you.

12:33 What's your first topic?

12:34 Okay, cool.

12:36 Yeah, and I just wanted to go back to the Tommel topic because I kind of froze on that one.

12:40 So we are using a YAML file for local settings, not a Tommel file.

12:44 I haven't actually seen Tommel before.

12:45 I don't really know how different looking it is.

12:48 But yeah, and settings are kind of baked into Django for outside of the local environment stuff.

12:54 Cool.

12:55 But yeah, so my next, so my topic was, my first topic is JSON fields for performance and thinking about JSON fields in terms of what they are, which is kind of like denormalized data.

13:06 I'm really interested in the topic of normalization and denormalization and specifically how JSON fields are basically denormalized and mutable data that's probably living in an otherwise normalized database.

13:21 So I was interested in this topic and I searched to see if I could find it anywhere online.

13:26 And yeah, so what we're showing here is this was a talk given by David Stokes at PHP UK in 2019 called How Denormalizing Your Data with JSON Can Boost Query Performance.

13:40 I always mispronounce.

13:43 Do you guys pronounce it JSON or JSON?

13:44 And I'm sure you've talked about this before.

13:47 I guess, I hadn't really thought, I'd say JSON, like on top.

13:51 Yeah, yeah.

13:52 But I, Brian, where are you laying on this?

13:54 Just like the name, Jason.

13:56 Jason, yeah.

13:57 Oh my gosh.

13:57 It is Jason.

13:58 I disagree.

13:59 It's like, it's the name.

14:00 According to the creator, it is Jason.

14:02 Okay.

14:03 It's Jason.

14:03 Creator of Jason.

14:04 Got it.

14:05 It's Jason.

14:05 But I will mispronounce it a lot.

14:07 And it stands for JavaScript Objects Notation.

14:11 But yeah, I think my filly comes out because I'm always saying JSON.

14:15 So, yeah.

14:17 So David Stokes gave this talk.

14:18 He is a technology evangelist.

14:21 And a lot of the talk was about MySQL as a backend in particular.

14:25 But the parts of the talk that I found really interesting are the history lesson.

14:30 And I kind of have it highlighted here.

14:32 It starts at around minute 250, where he talked about how Edgar Codd at IBM developed the idea

14:39 of a relational data because hardware was expensive at the time.

14:43 So having relational tables and normalized data was a way to not have duplication of data.

14:50 And normalized data, just a quick definition is like, or example, is like taking an address

14:57 and breaking it down into parts.

15:00 So experts, you know, had been saying for years at this point, like normalizing data is the way to go.

15:07 You want to normalize your data.

15:08 And then during this history talk, you mentioned, and then no SQL came in and shook things up.

15:14 And after that, SQL added JSON data types or a mutable data type.

15:20 So you don't have to define and normalize your whole database.

15:24 You can kind of have these mutable fields.

15:26 So, okay.

15:28 So anyway, the history lesson, I just found that super interesting as a data person.

15:33 Do you guys find that interesting at all?

15:35 I do.

15:36 Yeah.

15:37 I think that this concept of mutable schema, not mutable data per se, but that the schema

15:45 itself doesn't have to be as controlled and as strictly guarded by a DBA that goes through

15:53 some giant process to figure out what you do, can add a ton of flexibility to the way that

15:59 you evolve your app.

16:01 Right.

16:01 So there doesn't necessarily have to be a DBA.

16:03 It could be like, well, how, how are we going to schedule the downtime so that we can do the

16:09 schema migration as we roll out this new feature?

16:12 Right.

16:13 Like those kinds of things can get challenged, challenging.

16:16 If you roll out the code first, then, and it's some kind of relational thing, you're using

16:22 SQLAlchemy or something like that.

16:23 It's going to crash saying that the code doesn't match the database.

16:26 You roll out the database first, you know, it may no longer match what the code that's

16:30 running is.

16:31 And like, there's always this, well, what do I do?

16:33 And having some of this more mutable schema, in this case, they're talking about MySQL.

16:39 I believe it's basically the same for Postgres, where you can have columns that are JSON.

16:43 And then you can, you just say to the database, the schema is JSON, but your code knows, well,

16:49 it's actually a list of these things with these properties in it.

16:52 And you want to add a new property?

16:53 Great.

16:54 You add a new property.

16:55 As long as your code can deal with it, super.

16:56 So I think it's, it's certainly something people should consider.

16:59 It really adds a lot of flexibility.

17:01 You don't need necessarily a normalization table because you can just put the stuff, you

17:06 know, in a list, for example.

17:07 Yeah.

17:07 And not only flexibility, but also quicker querying.

17:13 So yeah, so I really liked starting around minute 14, which is, this is what I was kind of looking

17:20 for when I was looking for this topic.

17:22 So I really liked that he gave this talk about it.

17:25 He goes over an example of a music store and you have these items in a music store, like

17:31 guitars, and you don't want to have to add field every time there's a new guitar feature,

17:37 right?

17:37 So you have these, these JSON fields in your database.

17:42 And like you said, they're available in lots of different backends.

17:46 We use Postgres and yeah, we use JSON fields all over the place.

17:50 So, and he has this really cool diagram where he shows, you know, reducing database dives and

17:55 many too many joins where you're diving from, you know, you know, one index into another, into

18:01 another to just to get at the data that you can get at the top level if you have it in

18:06 this JSON field.

18:06 Right.

18:07 You don't have to do a multi-way, many to many join when it's just in there directly, right?

18:13 Because you have more flexibility.

18:14 It doesn't have to be tabular.

18:16 Yeah.

18:17 So I found it really cool.

18:18 We use JSON fields in one of our big Django projects quite a bit.

18:22 And yeah, our data is totally, our schemas are normalized.

18:27 But we, we find it really helpful for also for reporting, making reporting really, really

18:33 fast because of that database dive that you don't need to do.

18:37 And also for tracking snapshots of data.

18:41 So something happened on this date and then the relational record changed, but the JSON gives

18:48 you the snapshot of, of what the user did on that date.

18:52 So that's really useful too.

18:53 Because if the snapshot doesn't match the current schema, well then how are you going

18:57 to store it?

18:57 Like that gets to be a problem, but just JSON is JSON.

18:59 That's right.

19:00 Yeah.

19:01 I guess I've taken this kind of to the far extreme in my world.

19:05 So I'm a huge advocate, but doing, I do almost all my work on MongoDB, which means it's, it's

19:10 all JSON all the way down.

19:12 All right.

19:13 So, but I think it's absolutely fabulous way to work.

19:17 I love it.

19:17 The operational side of not doing massive migrations all the time.

19:21 It's really, really good.

19:22 Yeah.

19:22 And I'm actually working on a blog, a blog article about it because I couldn't find what

19:27 I specifically wanted to talk about today.

19:29 So I'm, I'm writing up a blog article.

19:31 It's not published.

19:32 It won't, it'll be published next month.

19:34 But yeah, I'll share it later with you guys.

19:37 Yeah.

19:38 Please do.

19:38 And I think that's, I think that's a great, actually a great thing for people to do is

19:43 just, there's a discussion of something and it does, if you can't find an article that

19:48 expresses what you want to express, then write one.

19:51 It's great.

19:51 Yep.

19:52 Indeed.

19:53 All right, Brian, how about I tell everyone about our sponsor before we move on?

19:56 Oh, that's a great idea.

19:58 Yeah.

19:58 As I said at the beginning, this episode is brought to you by the compiler podcast from

20:04 Red Hat.

20:04 And just like you out there listening, we're big fans of podcasts, Brian and I, and we're

20:11 happy to share one of the most highly respected, one from the most highly respected open source

20:15 companies, Compiler, original podcast from Red Hat.

20:18 It brings together a curious team of red hatters to simplify tech topics, provide insight for

20:23 a new generation of IT professionals.

20:25 And the show covers topics like what are the components of a software stack?

20:29 Are big mistakes that big of a deal?

20:31 And do you have to know how to code to contribute and get started in open source?

20:35 And not all, not always.

20:37 Depends on how you're trying to contribute.

20:38 So Compiler closes the gap between those who are new to technology and those behind the inventions

20:45 and services shaping our world.

20:47 And they bring together stories and perspectives from the industry and simplify its language,

20:51 culture, and movements in a way that's fun, informative, and guilt-free.

20:55 I recently listened to Are We as Productive as We Think?

20:58 And that episode is really fun.

21:00 There's a bunch of good advice in there.

21:03 As a developer, owner of our tech company, and a technologist, these productivity hacks such

21:08 as time boxing, focusing on one task at a time, and incorporating intentional breaks into your

21:14 workday all stood out as super relevant.

21:16 They suggest that by creating an honest self-image of your productivity habits and being intentional

21:21 about how you spend your time, you can reduce the overwhelm of multitasking that you have to do

21:26 and increase your focus and creativity leading to you being more successful, for sure.

21:32 So learn more about Compiler at pythonbytes.fm/compiler.

21:35 The link is in your podcast show notes.

21:37 Thanks to Compiler and Red Hat for keeping this podcast going strong.

21:41 Awesome.

21:42 All right.

21:42 Yeah.

21:43 Thanks.

21:43 Fun show.

21:44 And tell us...

21:46 You gonna take us to school, Brian?

21:47 Yeah.

21:48 So Kevin Markham is a friend of the show, a friend of ours.

21:51 Ran into him a lot during...

21:54 When I was going to conferences more.

21:56 That's hopefully coming up again.

21:59 What are those?

22:00 Conferences.

22:01 You know what?

22:01 People get together in real life.

22:03 But...

22:04 So Kevin took a little bit of a break.

22:06 He used to write a lot.

22:08 And I guess I hadn't noticed.

22:11 But there's a break between August of 2021 and then now in February of 2023.

22:17 So a couple of your break.

22:19 And we all need that.

22:20 That's fine.

22:21 But these articles are great.

22:22 So a couple of new articles that he has.

22:24 I'm gonna pop through a couple of them.

22:26 How to use f-strings with pandas.

22:28 So basically, it's a good discussion of f-strings.

22:33 If you're not comfortable with f-strings already, this is a good intro to why f-strings are great to pop in values.

22:42 I don't know if it's really that panda specific.

22:44 But one of the things I really loved, I'm gonna pop to my favorite part of this article.

22:49 So...

22:50 And I forget to do this.

22:52 So I'm glad that he points these out.

22:53 So one of the things is you can...

22:55 It's not just taking a value and putting it in brackets so that you can print it.

22:59 But you can do...

23:00 It's an expression in the brackets.

23:02 So you can call like upper for a name variable so that you can print it in uppercase and not have to do that before you pass it to the F string.

23:10 Or you could do things like, you know, a little bit of math.

23:14 So if you've got like...

23:15 This is an example.

23:15 Had days completed.

23:16 And he did like, you know, 365 minus that divided by...

23:20 So get a percentage.

23:21 So this is pretty cool to think.

23:24 Remember, if the only place you're gonna use the value is within the string, you could just do it within the expression.

23:31 So this is a good one.

23:32 The part that it really...

23:34 It never really occurred to me to do that I wanted to highlight was he had different columns of data within like a data frame and referencing him with a string index.

23:44 And then using F string as the...

23:48 To pick the index within a, you know, a loop.

23:52 And it never occurred to me to use f-strings to generate the index in...

23:58 So for a string index, this is a cool idea.

24:01 Yeah, that is wild.

24:02 I like it.

24:03 Highlight.

24:03 The other article is a fly through of Jupyter keyboard shortcuts.

24:08 And I guess I just have to say I'm a huge fan of the rocket emoji.

24:13 I wonder why.

24:16 Yeah.

24:17 But the...

24:18 I like...

24:20 This is not overwhelming.

24:21 So especially for people that use...

24:24 I mean, if you use it a lot and you don't know keyboard shortcuts, this would be a good intro.

24:28 But people like me that just pop in, use it every once in a while for something.

24:34 These are useful just for those people too.

24:36 It's not an overwhelming list.

24:37 There's some great stuff like just, you know, hitting escape and enter to go back and forth between command mode and edit mode, for instance.

24:44 And then I'm going to tell you going to remember this one.

24:47 A and B for create a cell above or below the current cell.

24:50 So these are just some really great little Jupyter tricks to make yourself more productive and not have to touch the mouse as much.

24:59 So anyway, some good things here.

25:02 I think it's great.

25:02 I wish actually Jupyter had more hotkeys.

25:05 There's really a lot more they could do there.

25:07 But knowing the ones that are there, I think it's pretty excellent.

25:10 Yeah.

25:12 For me, I often try to use Vim shortcuts and it's just not...

25:17 It doesn't work.

25:18 It's just not going to have it.

25:18 Erin, what are your thoughts here?

25:21 The F-string article was really nice.

25:25 Yeah.

25:26 It's hard to find a good F-string article that tells you all these different things you can do.

25:31 So I was just scanning through it and we use f-strings quite a bit.

25:36 And if we have old format Python strings that are in the code that we're updating in a pull request,

25:43 we always ask the developer to please update those old ones.

25:47 So use F-string as well.

25:49 Oh, that's a good idea.

25:49 They're just so much more readable.

25:50 As you're going through it, go ahead and fix them.

25:52 Yeah.

25:53 Yeah.

25:53 Instead of like fixing them all, just go through and fix the ones that you're touching.

25:57 Does PyUpgrade do that?

25:58 Or I can't remember.

26:00 I can tell you that Flint does.

26:01 Flint.

26:02 Yeah, that's it.

26:03 Flint.

26:03 So I've taken Flint and run it against large projects that I've done.

26:09 And in the early days, it introduced one bug out of 20,000 lines of code.

26:13 But it rewrote like 1,000 string formats of various versions.

26:19 And I found it to be really helpful.

26:21 And that's F-L-Y-N-T.

26:23 Yeah.

26:24 For the podcast listeners.

26:26 Exactly.

26:27 Thank you.

26:28 Yeah.

26:29 So this is really good.

26:30 So if you ask people to do that, you could suggest like, and you could try just running this

26:35 on your code.

26:35 Yeah.

26:36 And just make sure it doesn't break anything.

26:38 But it's been pretty stable since the oddities it hit.

26:41 That's cool.

26:41 We'll check that out.

26:42 Cool.

26:43 Indeed.

26:43 All right.

26:44 Brian, you all done with yours?

26:44 Yeah.

26:45 And I just did look it up.

26:47 I think the Pi upgrade also does it.

26:49 Oh, no.

26:50 Anthony Lister out there in the audience is just trying to egg us on.

26:54 Single quotes or double quotes with those f-strings.

26:56 See last episode.

26:59 Yeah, exactly.

27:00 That was a whole debate last episode.

27:01 All right.

27:02 My next item is BioGPT.

27:06 And so we've heard about ChatGPT.

27:10 And this is similar stuff, but applied to biology.

27:13 So, Brian, you said you hadn't heard.

27:14 Create me a cat that barks.

27:16 Exactly.

27:16 And now make it mutate into a snake.

27:19 How many generations will this take?

27:23 Three.

27:23 All right.

27:24 So I want to just, as a way to, you know, it's not really easy for me to demo this.

27:28 So like, let me, as a way of motivation, just show you like a ChatGP thing.

27:32 Since you were just asking about it, Brian.

27:33 Okay.

27:34 Check this out.

27:34 Here's a cool program that talks about how you should never write insanely nested,

27:40 code.

27:40 You should instead use, so for people listening, this is like, it says, is this a platypus?

27:46 If self-thought is minimal.

27:47 And then if self-thought has for, then if self-thought has a beak and so on and so on.

27:52 It's like nested over.

27:53 So the code starts in the middle, maybe a bit to the right of the screen.

27:57 And it says return true, right?

27:58 Like you shouldn't do that.

27:59 What should you do?

28:00 You should write guarding clauses.

28:01 So check this out, Brian.

28:02 If I go over to ChatGP and I say, I'm going to give you a program in Python.

28:08 I want you to name it Arrow.

28:11 And it'll say, sure.

28:12 Arrow sounds like a great name.

28:13 And I give it this.

28:14 And it talks about what it does.

28:16 It checks whether it's a platypus.

28:17 And say, rewrite Arrow to be less nested using guarding clauses.

28:23 Certainly.

28:24 Here you go, it says.

28:25 And what did it write?

28:26 Exactly.

28:28 The new pattern that you should have used.

28:30 Is that insane?

28:30 What do you think, Brian?

28:31 Aaron?

28:32 I wouldn't write the code like this anyway.

28:35 But okay.

28:36 All right.

28:37 Now that.

28:38 So Arrow checks for a platypus.

28:43 Platypus.

28:44 Platypus.

28:45 We'll fix it.

28:46 Whatever.

28:46 Oh, here.

28:47 Hold on.

28:48 Platypus.

28:49 Rewrite it to check for crocodiles.

28:54 Look at this.

28:55 So sure.

28:56 No problem.

28:57 We're going to write, is it a crocodile?

28:58 And look, the tests are, is it a reptile?

29:00 Has scales?

29:01 Does it have jaws?

29:03 Does it have a four chamber heart?

29:04 Wow.

29:05 Is that insane?

29:07 All I did is I'm going to give you this code and just start asking questions.

29:11 So, okay.

29:12 So impressive, right?

29:13 So back to chat, BioGPT.

29:15 Think of what this can do for doctors and nurses and people trying to understand like written

29:21 text of this.

29:23 So it contains, this BioGPT contains an implementation specifically trained for like medical analysis.

29:32 Kind of like chatGP is a general analysis tool.

29:36 This one is like specifically for medicine.

29:39 Okay.

29:39 So pretty cool.

29:41 Apparently it can do pub med QA tests.

29:44 I have no idea what that is, but if I was a doctor, I'm sure this is like, how good are you

29:49 answering questions with 81% accuracy, which might sound like, well, that's 19% not good

29:54 enough, but I bet you doctors don't do it at 100% accuracy either.

29:58 You know, there's a lot of examples where AI is predicting cancer sooner or better or more

30:04 accurately than.

30:05 And I bet it's better than like Dr. Google and looking up your symptoms and thinking you

30:11 have the worst thing.

30:12 Yeah.

30:13 Yeah, exactly.

30:14 Well, that's what I was curious about if it was named like, like, what am I dying of

30:17 today?

30:18 Will, will I die GPT?

30:21 I don't want to know.

30:22 Oh, it seems grumpy.

30:24 I don't know.

30:24 So it comes with different models.

30:26 It has the GP, the BioGPT one, but it also has the large one.

30:30 And my experience with this stuff is the large models are where it's at.

30:33 The, the regular ones are quick, but they're not very accurate.

30:36 You want to go for the large model.

30:37 So there's a bunch of different ones, like one trained for, fine tuned for relaxation.

30:44 relation extraction tasks on KD DTI, which is a certain type of data sets or other ones.

30:49 So you can pick which ones it is.

30:51 And then you just start writing Python code.

30:53 So you can either use a PyTorch style programming, or I think down here, there's a hugging face

30:58 variant as well.

30:59 So if it seems a little bit cleaner, a little bit nicer.

31:02 So you just, your model is from pre-trained Microsoft slash BioGPT.

31:08 And there's even a thing where you can try it out down here.

31:11 There's like a live, yeah.

31:13 Some, answering questions, for example, you can pull this up and you can ask it questions.

31:18 for example, this one, should chest wall irradiation be included, after dah, dah, dah, dah, dah.

31:25 Yes.

31:26 It's just, yes.

31:27 I don't know.

31:28 People can play around with examples.

31:29 Like I said, I'm not a doctor.

31:30 I don't really know a reasonable thing to ask it, but it's, it's a weird world that we live

31:35 in and it has lots of positives and lots of negatives.

31:38 I'm sure that we're going to come to learn about, but BioGPT, if you're working on analyzing

31:42 medical texts, check this out.

31:44 It's from Microsoft.

31:45 I think anything that would reduce the amount of time doctors and medical professionals have

31:50 to spend on the computer is probably good.

31:52 So if this means they need to enter less things in because it's just like figuring stuff out

31:57 for them, then that would be really powerful.

32:00 But if it's just another tool that they have to use on the internet that makes them not get

32:05 to be face to face with their patients, then I'm just kind of skeptical of it.

32:09 Yeah.

32:09 I feel like you could ask it questions like we gave this person, oh, here's their symptoms.

32:14 We gave them this diagnosis.

32:15 Is that consistent with, you know, historical things?

32:18 And it could do a lot of comparisons and analysis.

32:20 Or do you think this person has this disease instead of just yes or no?

32:24 It's like, why do you think that, you know, you could have this conversation with it and

32:28 it may be able to tell you.

32:29 Yeah.

32:29 That's really cool.

32:30 Indeed.

32:31 All right.

32:32 Well, I guess I was joking about it a little bit, but I think there's a lot of power there.

32:37 I mean, like you said, I don't know if we can get doctors actually seeing people more,

32:43 but also, you know, maybe a 911 call could like, if we determine it's not an emergency

32:49 yet, but maybe we could direct the person to the right place faster.

32:52 I mean, there's lots of places where maybe somebody not with a, like the full degree,

32:58 but somebody that's still pretty involved with medicine can, can utilize this to ask the,

33:03 ask better questions and get somebody to somewhere faster.

33:06 Right.

33:07 Or even highlight, you know, what were the key takeaways from this visit with the doctor?

33:11 Yeah.

33:11 Right.

33:12 Yeah.

33:12 So anyway, it's cool.

33:15 Yep.

33:16 All right.

33:16 One more bit of feedback out there.

33:17 Will McGooghan.

33:18 Hey, Will.

33:18 This is the kind of thing I'd like to see from AI, AI used for not putting artists and

33:24 copywriters out of business.

33:25 Yeah, I agree.

33:26 Define people's good work, not necessarily replacing it.

33:30 Yeah.

33:30 We'll see where it goes.

33:31 All right.

33:32 Erin, got the last one.

33:34 Okay.

33:34 Great.

33:35 So yeah, talking about code mentorship and communicating with new developers, that's my next topic.

33:41 So Sheena O'Connell gave a talk at DjangoCon last year.

33:46 I attended that conference, but I missed this talk and watched it online later.

33:51 And it's about her work at Muzi, training unemployed young people in underserved communities in Africa.

33:59 So her company had to quickly build an online learning management system when the pandemic hit in 2020.

34:06 And they built that LMS in Django, which is why she was giving a talk at DjangoCon.

34:12 Before then, the learning was all done in person.

34:14 So anyway, you might think like, that's cool and all, but what is like, how can I apply that to me?

34:21 And I think that this talk is really excellent.

34:24 I also think, I don't know if you all have ever listened to the Django Chat podcast.

34:29 They had Sheena on.

34:31 And she talked about her work at Muzi.

34:35 And she talked about getting learners to review each other and also teaching green developers how to use GitHub and things like that.

34:44 So they don't, quote, bother their teammates too much once they get into their jobs after they're finished at a Muzi.

34:51 And she specifically said, the quote I liked was, what sort of thing does a person need to know in order to not annoy their coworkers in the first three months?

35:00 So I really liked thinking about the learning in that way.

35:04 And yeah, so something we started doing recently where I work is we had been doing code reviews.

35:12 Me and the other code-based lead had been kind of just doing them all ourselves.

35:16 And our project manager, Matt, suggested we take, we have a new requirement where two non-code-based leads have to review any pull request before any code-based lead looks at it.

35:29 So that's something we just implemented.

35:32 Have either of you have familiarity with pull requests and code reviews in your day-to-day?

35:38 Yes.

35:39 Yeah.

35:41 So I have to say it's really been really helpful to us.

35:46 And I liked Sheena talking about that on the Django Chat podcast.

35:51 She also mentioned that at Muzi, the learners review each other.

35:56 So someone who is further along in her course gets to both learn how to review code and also review someone else's answer.

36:05 Because, you know, with Python, there are like a lot of different correct answers, right?

36:10 So just like reactivating that part of their brain to look back at the previous answers is kind of cool.

36:18 Yeah.

36:19 Where are you going to?

36:19 I also think that it's cool that they're learning more than just loops, variables, functions, you know, but how to coexist as a teammate in a software team.

36:29 Yeah.

36:29 Yeah, that's cool.

36:30 Yeah.

36:31 Good find there.

36:32 Yeah.

36:33 So we're always looking for new ways to like onboard developers.

36:38 And another cool idea that Sheena had was writing half solutions and leaving gaps for others to fill in the blank.

36:45 I thought that was kind of cool because when we onboard a new developer to our code base, it can be really rocky.

36:52 And I kind of thought like, oh, that might be kind of neat.

36:55 Instead of giving them a whole ticket to work on, like half finishing you a ticket and like letting them fill in the other blanks is kind of cool.

37:02 And just one more article that I found about this was on the Cactus blog.

37:07 I used to work at Cactus as a Jango developer there.

37:11 And so I still follow their blog quite often.

37:14 And they had this recent blog post from Dimitri Chukin about their new internal mentorship program there where they have three different paths.

37:25 And one is apprenticeship for folks just starting out as developers.

37:30 One is for fellowship.

37:32 And that's for people who are currently training in one of those coding camps.

37:35 And then the third one, which is really kind of special, is mentorship for high school students.

37:41 So I thought that was kind of neat.

37:44 We're still, where I work, we're still figuring out how to onboard people.

37:48 I feel like that is one of the hardest things.

37:50 Do you both know what I'm talking about?

37:53 Odmorting is extremely difficult.

37:55 And it depends on how much, well, it depends on the skill set you need people to have.

38:00 I mean, when you have like a diverse set of skills, we always face that.

38:04 So I've got, I need somebody that knows both Python well, testing practices well, C++ well.

38:10 And it'd be great if they also knew like RF measurements and stuff like that.

38:15 And you just can't find those people.

38:17 So you have to pick what you, where you want somebody to compliment somebody else with and, and know that you're going to have to help train.

38:24 Right.

38:24 They support them in the other areas.

38:26 Yeah.

38:27 Yeah.

38:28 Cool.

38:28 And one of the things that you mentioned, like code reviews, we, we use code reviews a lot for communication.

38:36 Not, not necessarily for people to catch what somebody else is doing wrong, but to make sure that everybody understands what the rest of the team is working on.

38:44 So we, especially for long running things, we have a practice of using draft code reviews.

38:50 So code reviews and draft so that, and GitLab won't let you merge it if it says draft in the title.

38:56 So, so then, then people can just keep updating that and then, then get feedback even when it's not ready, when the code's not ready yet.

39:04 So good way to do that.

39:06 Yeah.

39:06 Okay.

39:06 Cool.

39:07 Nice.

39:08 Well, nice, nice find Aaron.

39:10 All right.

39:10 Nice.

39:10 That's all of our items, Brian.

39:12 Got some extras for us to share.

39:14 Anything else you want to throw out there real quick?

39:15 No, I spending most of my extra time getting my talk ready for Pike Cascades.

39:19 So Pike Cascades coming up soon.

39:20 Yeah, indeed.

39:21 Coming up very soon.

39:22 Excellent.

39:23 Aaron, how about you?

39:24 Want to throw anything out there?

39:25 Yeah.

39:25 DjangoCon US is in Durham, which is 15 minutes from where I live.

39:30 So I'm excited.

39:31 Nice.

39:32 North Carolina is a fun place to visit.

39:34 Yes.

39:35 It's generally warm, although not always warm, but generally warmer than a lot of places.

39:39 It's generally warmer and it's in October.

39:41 So it'll be kind of a nice time of year, probably.

39:44 Hopefully not boiling hot, but yeah, probably not.

39:47 Cool.

39:47 I'll have to try to see if I can get an excuse to get out there.

39:50 That'd be fun.

39:51 All right.

39:51 Excellent.

39:52 Anything else?

39:52 Is that it?

39:53 How about you?

39:54 Yeah, I got one.

39:55 You know, I do.

39:56 All right.

39:57 So, an article came out a few days ago.

39:59 Security researchers uncover 700 malicious open source packages on npm and PyPI.

40:05 This used to be a thing that could even headline.

40:08 I think we even headlined in like, was it the title of one of our shows, Brian?

40:11 The news here is not this.

40:13 The news is that this stuff is just not news anymore.

40:16 So people be careful out there when you pip install stuff, make sure you spell it right.

40:21 That's like the, that's generally the worst thing is the typo squatting.

40:25 So anyway, I didn't realize that that's how they were.

40:29 Oh, that's so smart.

40:30 Like they might put a virus in request instead of requests with the plural, you know what I mean?

40:36 Or, or if you transpose two letters and there's some stuff that they're, the PyPA is trying

40:41 to do to work that, but it's still tricky.

40:44 Or standard lib stuff that you don't have to install.

40:47 It's just there.

40:48 People.

40:48 Right.

40:49 Right.

40:50 yeah.

40:51 And it's create a package for that.

40:53 All right.

40:53 that's not the end of it.

40:55 Another one.

40:56 Brian, do you remember I announced, Hey everybody update your get.

40:59 There's a, security vulnerability in get.

41:01 This is the first time this has happened in a really long time.

41:04 Yeah.

41:04 I said, make sure you apply get or you install 2.39.1 or higher.

41:11 Well, guess what?

41:11 2.39.1 has a vulnerability that's completely different.

41:16 But if you try to clone from a malicious repository, you're going to be having a bad day.

41:21 So update your get again.

41:23 All right.

41:23 and then also I'm working on a project now where I needed a, an ignore file, but the

41:30 project was originally created in one language and I wanted the ignore file for another.

41:35 And I was basically going to combine them.

41:37 So maybe you all know this, maybe you know this, but GitHub, when you go to create a new

41:42 project, you can choose what kind of project is it?

41:44 Is it C++?

41:45 Is it Python?

41:46 Is it Dart?

41:47 Is it Flutter?

41:47 And you'll get a different, ignore for that.

41:50 Well, there's actually a repo github.com/github/gitignore.

41:54 And every single language that you could have chosen that dropdown has its ignore file here.

42:00 So for example, the Python one, this, it's checked into this project.

42:04 So when you say create a new Python project, what comes out as the ignore is actually this

42:09 file.

42:09 So if there's people out there who really need a change to the default behavior of the

42:15 Python .gitignore for projects, you know, you could go to a PR for this, but the way I use it is I just

42:19 said, I also need one on Flutter or, there's not a Flutter one, but there's a Dart one.

42:24 So I got, I grabbed the one for Dart and piled that in there as well.

42:28 Add it to, or even if you're not using GitHub, you can use this for.

42:32 Yeah, exactly.

42:32 It has nothing to do with GitHub.

42:33 It's just, you have access to every version of an ignore file that GitHub thinks is good.

42:39 Related to that is .gitignore.io.

42:42 This is another one.

42:43 You come down here and search for other stuff.

42:45 Like for example, there was no Flutter, but in, in the GitHub one, but over here I can

42:50 put Flutter and here's my Flutter one for all the crazy build code generation madness you

42:54 get.

42:54 It was a project by top towel, but .gitignore.io and you just put it here.

42:58 I'm looking for whatever.

43:01 And then, you know, see.

43:03 Type, type by test, see if it'll do.

43:05 No results found.

43:07 Oh, oh, sad.

43:09 Yeah.

43:09 Sad face.

43:10 Okay.

43:10 But anyway, if you're looking for ignores, for projects, there you go.

43:14 Those are kind of nice.

43:15 Cool.

43:15 Nice.

43:16 All right.

43:17 Are you all ready for a joke?

43:18 Yeah.

43:19 Yes.

43:20 Brian, I thought about you on this one in particular, so we'll see, we'll see what you think

43:23 of it.

43:24 So this is one.

43:25 It has, it's a cartoon and it has a cartoon character looking at two red buttons.

43:29 They're both going to do something massive.

43:31 One has the star asterisk character and one has the ampersand.

43:35 And there's the person there just sweating out like, their fingers in the middle,

43:40 doesn't know which one to pick.

43:41 And it said, my C code isn't working.

43:43 no one involves pointers.

43:47 What do you think, Brian?

43:48 I would not hire this person.

43:50 So the star will dereference the pointer, turning a pointer into one less level of pointing

43:57 and do a value where the ampersand will take a variable and make it a pointer.

44:01 Or if it is a pointer, make it a pointer to a pointer or even more so.

44:05 Which one do you press?

44:06 Oh my gosh.

44:07 It should be obvious by context.

44:10 It says a C++.

44:11 Aaron, do you have to do any of this kind of crazy stuff?

44:16 Are you thankfully above and beyond the pointer world?

44:19 I am.

44:19 Yeah.

44:20 Thankfully not.

44:21 Yeah.

44:22 No, no, no C++ in my world.

44:24 Yeah.

44:25 All right.

44:25 Well, that's what I got.

44:26 I brought that one for you, Brian.

44:28 It's good.

44:28 Thanks.

44:29 I'll to, I'll incorporate that as my next interview.

44:33 You need to change a string and you're given a variable.

44:39 Which one of these do you push?

44:40 All right.

44:42 All right.

44:43 Cool.

44:43 All right.

44:44 Well, Aaron, it's been great to have you on the show.

44:46 Thanks for being here.

44:47 Thanks for having me.

44:48 Nice to meet you both.

44:49 Yeah, you bet.

44:50 And Brian, thanks as always.

44:51 See you.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript