#149: Python's small object allocator and other memory features

Published Wed, Sep 25, 2019, recorded Wed, Sep 18, 2019

Sponsored by Datadog: pythonbytes.fm/datadog

Brian #1: Dropbox: Our journey to type checking 4 million lines of Python

Continuing saga, but this is a cool write up.
Benefits
- “Experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem. Here are typical questions that are often tricky to answer without type annotations:
  - Can this function return None?
  - What is this items argument supposed to be?
  - What is the type of the id attribute: is it int, str, or perhaps some custom type?
  - Does this argument need to be a list, or can I give a tuple or a set?”
- Type checker will find many subtle bugs.
- Refactoring is easier.
- Running type checking is faster than running large suites of unit tests, so feedback can be faster.
- Typing helps IDEs with better completion, static error checking, and more.
Long story, but really cool learnings of how and why to tackle adding type hints to a large project with many developers.
Conclusion. mypy is great now, because DropBox needed it to be.

Michael #2: Setting Up a Flask Application in Visual Studio Code

Video, but also as a post
Follow on to the same in PyCharm:
- video and post
Steps outside VS Code
- Clone repo
- Create a virtual env (via venv)
- Install requirements (via requirements.txt)
- Setup flask app ENV variable
- flask deploy ← custom command for DB
VS Code
- Open the folder where the repo and venv live
- Open any Python file to trigger the Python subsystem
- Ensure the correct VENV is selected (bottom left)
- Open the debugger tab, add config, pick Flask, choose your app.py file
- Debug menu, start without debugging (or with)
Adding tests via VS Code
- Open command pallet (CMD SHIFT P), Python: Discover Tests, select framework, select directory of tests, file pattern, new tests bottle on the right bar

Brian #3: Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know

How data scientists can go about choosing between the multiprocessing and threading and which factors should be kept in mind while doing so.
Does not consider async, but still some great info.
Overview of both concepts in general and some of the pitfalls of parallel computing.
The specifics in Python, with the GIL
Use threads for waiting on IO or waiting on users.
Use multiprocessing for CPU intensive work.
The surprising bit for me was the benchmarks
- Using something speeds up the code. That’s obvious.
- The difference between the two isn’t as great as I would have expected.
A discussion of merits and benefits of both.
And from the perspective of data science.
A few more examples, with code, included.

Michael #4: ORM - async ORM

And https://github.com/encode/databases
The orm package is an async ORM for Python, with support for Postgres, MySQL, and SQLite.
SQLAlchemy core for query building.
databases for cross-database async support.
typesystem for data validation.
Because ORM is built on SQLAlchemy core, you can use Alembic to provide database migrations.
Need to be pretty async savy

Brian #5: Getting Started with APIs

dataquest.io post
Conceptual introduction of web APIs
Discussion of GET status codes, including a nice list with descriptions.
- examples:
  - 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
  - 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
endpoints
endpoints that take query parameters
JSON data
Examples in Python for using:
- requests to query endpoints.
- json to load and dump JSON data.

Michael #6: Memory management in Python

This article describes memory management in Python 3.6.
Everything in Python is an object. Some objects can hold other objects, such as lists, tuples, dicts, classes, etc.
such an approach requires a lot of small memory allocations
To speed-up memory operations and reduce fragmentation Python uses a special manager on top of the general-purpose allocator, called PyMalloc.
Layered managers
- RAM
- OS VMM
- C-malloc
- PyMem
- Python Object allocator
- Object memory
Three levels of organization
- To reduce overhead for small objects (less than 512 bytes) Python sub-allocates big blocks of memory.
- Larger objects are routed to standard C allocator.
- three levels of abstraction — arena, pool, and block.
- Block is a chunk of memory of a certain size. Each block can keep only one Python object of a fixed size. The size of the block can vary from 8 to 512 bytes and must be a multiple of eight
- A collection of blocks of the same size is called a pool. Normally, the size of the pool is equal to the size of a memory page, i.e., 4Kb.
- The arena is a chunk of 256kB memory allocated on the heap, which provides memory for 64 pools.
Python's small object manager rarely returns memory back to the Operating System.
An arena gets fully released If and only if all the pools in it are empty.

Extras

Brian:

Tuesday, Oct 6, Python PDX West,
Thursday, Sept 26, I’ll be speaking at PDX Python, downtown.
Both events, mostly, I’ll be working on new programming jokes unless I come up with something better. :)

Michael:

Jokes: A few I liked from the dad joke list.

What do you call a 3.14 foot long snake? A π-thon
What if it’s 3.14 inches, instead of feet? A μ-π-thon
Why doesn't Hollywood make more Big Data movies? NoSQL.
Why didn't the div get invited to the dinner party? Because it had no class.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 149, recorded September 18th, 2019.

00:10 I'm Michael Kennedy.

00:11 And I am Brian Okken.

00:12 And this episode is brought to you by Datadog.

00:13 I'll tell you more about them later.

00:15 Brian, this first item that you have here, it actually sparked some philosophical sort of challenge to my way of seeing the world here.

00:25 So why don't you run it by and I'll tell you about my problem.

00:28 Maybe you can help me through it.

00:29 I'm curious about this now.

00:30 Yes.

00:31 I'm pretty sure we've covered this before, but Dropbox is kind of behind a lot of the push to do different type checking or type hinting and checking those type hints within Python.

00:44 The mypy project is, I think, spearheaded by Dropbox.

00:48 Yes.

00:48 There's an article that they put out called Our Journey to Type Checking 4 Million Lines of Python.

00:54 Wow.

00:55 4 million lines.

00:56 That's a big code base.

00:57 That's a lot of Python.

00:59 Yeah.

00:59 I wonder how much of it's interconnected.

01:01 You know, like, you've got all these little utilities and nothing actually depends on it directly.

01:06 Maybe they depend on the output.

01:08 On the other hand, there could be like a super complicated sort of monolith thing.

01:14 It's interesting to think about that much code.

01:16 That is a ton.

01:17 They're leading a lot of stuff, but one of the – I like this.

01:19 So why?

01:20 I mean, that's not free.

01:21 You don't have a huge code base and move it to type checking.

01:25 You don't get that for free.

01:26 So there has to be benefits to this cost.

01:27 So there has to be benefits to this cost.

01:29 And that's one of the things.

01:31 So this article does talk about their – kind of their – does go through some of their story of how they did it.

01:37 What I really liked is it covered some of the benefits.

01:39 And this isn't even that surprising.

01:41 It says experience tells us that understanding code becomes the key to maintaining developer productivity, and that grows with a larger code base.

01:51 So without type of annotation, basic reasoning such as figuring out what the valid arguments to a function are or the return types – that's a key one for me – becomes kind of a hard problem.

02:04 And just answering those questions quickly, more quickly, what does this function return?

02:10 Does it return none sometimes?

02:11 Can it return none?

02:12 Things like that.

02:14 These become more and more of a drain as you're looking at a larger code base.

02:17 I mean, that's definitely true.

02:19 You spend more time reading code than writing it.

02:21 So thinking about the types as you're writing it and putting those in place, especially for interfaces to functions, those are an easy win.

02:29 I like it.

02:31 They talked about some of the other benefits that the type checker is actually finding subtle bugs that they wouldn't have caught easily without it.

02:39 Refactoring becomes easier.

02:40 And then running the type checking is faster than running the suite of unit tests, so the feedback can be faster.

02:48 And I didn't think about that aspect of it.

02:51 That's pretty interesting.

02:51 To include type checking as part of like a TDD flow.

02:55 I haven't tried that.

02:57 That would be kind of fun.

02:57 And then one of the things I do know is that the IDEs such as Visual Studio Code and PyCharm allow for better completion and static error checking and a whole bunch of goodies that you get from the IDEs if you have type hints in there.

03:14 But anyway, the other part of the story that I think is they talk about is the improvements to mypy to fit their needs.

03:21 And so if you like mypy now, it's probably it's because Dropbox needed it to be really good.

03:28 So anyway, it's a good article.

03:30 I'm a big fan of type hinting and stuff.

03:33 I think it all these things here that you've laid out, I definitely think they're all true.

03:37 I would say absolutely the biggest one for me is making the IDEs and the editors just better.

03:43 When I get the return value function that declares its return type and I hit dot on that variable, boom, there's the list of the things that I can do.

03:52 I type one or two characters.

03:54 It autocompletes.

03:54 I just, you know, just flow.

03:56 And yes, it's in the docs.

03:57 What comes back from some of these things?

03:59 Yes, you can go look them up.

04:00 What arguments or what operations you can do on them?

04:03 But if it's one character or two typing and it's just always there, it just massively improves what you're doing and your confidence and the speed.

04:11 And it doesn't take you out of that flow.

04:13 And I really appreciate that aspect of it.

04:15 One of the things I'm embracing more and more is things that can return multiple types because we definitely can do that in Python.

04:21 So things that arguments that can be set to none but are either none or a Boolean or there can be an A element or a list of those types of elements.

04:32 Those sorts of things are great because if they're one of the types most of the time, you don't even really think about making sure that it works for the other one.

04:40 For sure.

04:40 So you want to hear my philosophical dilemma?

04:43 Yeah, I do.

04:44 All right.

04:44 So in that article, it says something to the effect of mypy is an open source project and the core team is employed by Dropbox.

04:53 One of the people who is doing major work on this project is Guido Van Rossum.

04:57 Yeah.

04:58 Yeah.

04:58 I think he did something in Python, like created things like that.

05:02 Right.

05:02 He created the language and whatnot.

05:04 And it wasn't until, gosh, I don't know, well into the 2010s or something like that until type hinting became a thing in the language.

05:13 So Python was created.

05:15 Its sort of core essence is a language without type declarations.

05:20 Right.

05:21 So here's my philosophical debate.

05:22 Like, would Guido have gone back and said in 1991, actually a little bit of type hints should have been how Python originally came into the world?

05:33 Or is this something that you have to go through and you're like, oh, it's fine when you have 100 lines of code that don't have any type information.

05:42 But if you have 4 million, all of a sudden you're in a bad place with 4 million and hundreds of people working on it.

05:48 Well, all of a sudden these types now are super valuable because here he is working explicitly on this thing that, you know, he probably decided not to have in his original language.

05:57 And there's my dilemma.

05:58 I think it's the size thing.

06:00 It's helpful for large projects for tiny little things.

06:03 It's not.

06:04 I mean, has it ever bothered you that there are no type declarations in Bash scripts?

06:10 Yeah, not really, I guess.

06:12 I don't do really huge Bash applications.

06:14 Yeah, that's probably some form of anti-pattern right there, isn't it?

06:18 Yeah, I don't know.

06:20 Maybe it's also the tooling, right?

06:21 Like the editors do a lot more with that information now.

06:24 It is an interesting question of why didn't it have it to begin with?

06:28 If someone else was working on this, sure, okay, these are two philosophies and they kind of come together or don't in different ways.

06:33 But it's the same person, right?

06:35 So that was my thought as I was looking through this article.

06:38 Yeah.

06:39 Yeah.

06:39 But cool.

06:39 I'm happy to see them doing it.

06:41 And I like to bring this sort of stuff into my code as well.

06:44 I think it makes it better.

06:45 All right.

06:46 Well, what do you got for us?

06:47 I did mention that we have these editors these days that do so much more than they did in 1991.

06:54 And namely, this would be PyCharm and Visual Studio Code, right?

06:58 Those are the two main ones.

06:59 Obviously, there's others.

07:00 But these are the main ones that are like super rich, right?

07:02 Yeah.

07:03 Our friend Miguel Grimberg decided he was going to put together a cool video about setting up Visual Studio Code to work with a full-fledged Flask application.

07:13 So with PyCharm, I think it's pretty straightforward, right?

07:16 PyCharm kind of is what it is.

07:18 You go in and like, all right, here's the project.

07:20 I see that.

07:21 Here's how I run stuff.

07:23 Here's how – like it's sort of really clear what you do.

07:26 There's a lot of stuff going on there and it's really busy.

07:28 But you can look at it and see what you're supposed to do.

07:30 With Visual Studio Code, I don't feel that way.

07:32 I look at it and I go like, all right, I know that this thing can be configured and adapted to do all this amazing stuff.

07:38 And it gives me no breadcrumbs or hints on how to even like take that first step.

07:44 I'm like, man, I know this thing's cool, probably.

07:46 But I'm just going to edit this file and go on, right?

07:49 But this is a video that also has a blog post version from Miguel.

07:57 And it's actually a follow-up to doing the same thing in PyCharm about a year ago.

08:00 And I think the reason he did it in PyCharm, even though I just told you how easy it was, is he's doing it in PyCharm Community, which is not officially able to support web development.

08:09 It's the free version.

08:10 So he's like, how do you set up a web development project in a thing that's not meant for that or officially configured for that or whatever?

08:17 Anyway, so it goes through and it sort of walks you through all the steps.

08:20 And you know what?

08:21 It's really nice.

08:22 And I think the grand finale, you will appreciate it, Brian.

08:25 So as I think a lot of people do, so here's what we're going to do.

08:28 We're going to go set up.

08:29 We're going to clone the repo.

08:30 We're going to create a virtual environment.

08:31 We're going to install the requirements and sort of configure environment variables, maybe run some custom flask commands like flask deploy, which initializes the database or does database migrations and all that kind of stuff in the terminal before we actually get to the editor.

08:48 And this is how I work as well.

08:49 How about you?

08:50 Do you like start from within PyCharm or do you kind of get to it eventually?

08:53 Oh, no, same thing.

08:55 I'm setting up.

08:56 Well, I've got little extra little hooks to create an environment and activate an environment because I'm doing that on the command lane all the time anyway.

09:04 Like if I'm going to clone a repo and stuff, I'm just going to do that.

09:08 Same.

09:08 And I have all these aliases and stuff that will like do multiple steps at once and make it a little bit nicer and so on.

09:14 All right.

09:15 So all that is in terminal.

09:16 But then he says, all right, here's what we're going to do in VS Code.

09:18 You're going to open the folder, which is a thing you could do in VS Code, and it will automatically find the virtual environment.

09:24 But in order for all that stuff to happen, you have to encourage Visual Studio Code to go into Python mode.

09:31 So just open any Python file.

09:33 And that like activates all the little subsystems that fire up like the environment variable detection and all that kind of stuff, the virtual environment detection and so on.

09:41 And then he says, all right, now what we want to do is how do you run the thing?

09:45 So he talks about how to set up a run configuration in the debugger.

09:50 So you open the debugger tab, add a configuration, and you can actually pick Flask.

09:55 And it knows all about Flask.

09:56 It asks you a couple of questions like, well, what's the app PY called?

09:59 And things like that.

10:01 So then it'll set it all up.

10:02 And then you can run it in the debugger or run it without, and that's pretty nice.

10:06 And then it says, finally, there's another thing about this UI that, like I said, it's kind of like water, right?

10:12 It can be whatever you want, but you don't look at water and go, I bet that could be like a sculpture of a seal if I froze it and carved it down, right?

10:21 So it's our example, but yeah, sure, go on.

10:23 Yeah, right.

10:24 Like, okay, ice sculptures.

10:25 So there's another command you can run in VS Code.

10:28 And this I didn't know about is you can ask it to discover Python tests.

10:31 That's nice.

10:32 Yeah.

10:33 So you can say discover Python tests, and it'll hunt through and find all the tests in your project.

10:37 And it'll even offer what test frameworks you want to run.

10:40 You want to run unit tests or PyCharm or whatever.

10:43 And then once you do that, like a new UI element sort of pops up, and now you can run your tests in a pretty cool runner.

10:49 So it's about a half hour video.

10:51 It's good, I think.

10:52 And there's something really nice about seeing it in action.

10:55 I'm a big fan of learning through, you know, video stuff, as people might imagine, since I put some time and energy into it.

11:01 But it's one thing to read it.

11:02 It's another to, like, see just that sort of process gone through and explain step by step.

11:07 And Miguel does a good job, and I like it.

11:10 At the end, he also talks about a limitation of handling crashing Flask applications with a debugger.

11:18 And he says it's a Flask thing, not a VS Code thing.

11:21 So you have to do it in both PyCharm and VS Code.

11:24 But he shows you the little workaround.

11:26 Yeah, basically you have to stop going through the Flask run option and go to the Flask.py or app.py, run it, and then override some settings in the run there.

11:38 So yeah, it's pretty straightforward.

11:38 But that's definitely a nice touch as well.

11:40 Yeah, and then the other thing I wanted to touch on is when he's showing how to run tests in the video, they're just sort of magically running in the background.

11:47 And you don't see what they're doing.

11:49 And he doesn't cover this, but at the bottom of the screen or at the bottom of your VS Code window, there's some icons that show you the status of the tests.

11:59 And if you click on that, that's where you go look at the output and look at the failures and whatever.

12:03 Yeah, very cool.

12:04 Nice.

12:05 So that's a good one.

12:06 Another thing that I'm a big fan of is parallel programming.

12:09 And you've got a few things on that one for us, huh?

12:12 There's an article called Multiprocessing vs. Threading in Python, What Every Data Scientist Needs to Know.

12:19 It talked about multiprocessing and threading.

12:22 It did not talk about async.

12:24 And I don't know if that's appropriate or not, if async is even something that would be useful for data science or not.

12:32 Sometimes.

12:33 Not computationally, though.

12:34 In any case, I liked it.

12:36 Because a lot of people from data science are coming into programming.

12:39 Like we know, they're coming in not as programmers.

12:41 They're coming in from other fields.

12:43 So there's a lot of background computer science knowledge that they just don't have.

12:48 Or, you know, there might be gaps.

12:50 So that's one of the reasons why I picked this, because I like it.

12:53 I like that it talked about some of the basic concepts of parallelism, parallel computing, how to think about it.

13:00 It has some diagrams.

13:01 And then what the difference between multiprocessing and threading is in general.

13:07 Specifically, threading is within one process.

13:11 You've got a bunch of stuff going on.

13:13 And multiprocessing is you get a bunch of processes.

13:16 But there's tradeoffs.

13:18 And then it also talks about specifically that Python has a GIL, so it's a little different.

13:23 But because of the GIL, so, you know, it talks about that threads wait on.

13:27 You can use either one.

13:29 But in general, the general rule of thumb is CPU intensive work.

13:33 You need multiprocessing.

13:35 If you're IO bound or waiting on users, then threads are fine for that.

13:41 So the surprising bit to me was the charts and some of the graphs that he has.

13:47 Because he sort of does some benchmarks of code, running something on, like, both CPU intensive and IO intensive work.

13:57 And how it speeds up with multiprocessing and multithreading.

14:02 Obviously, throwing more processors at it helps.

14:07 Or more threads.

14:08 But what surprised me is that the difference between the two wasn't really that great.

14:13 I thought it would be more pronounced.

14:15 Basically, if you're not sure which one to use, pick one.

14:19 And it'll speed up your code.

14:20 Interesting, yeah.

14:22 I kind of thought it would be, even with CPU intensive stuff, at least with stuff he was showing,

14:28 that even multithreading helped speed things up.

14:31 So I think this is good.

14:33 And then he goes through a couple of data, specifically data science examples,

14:36 and shows the code and how to throw multiprocessing and multithreading at data science problems.

14:42 That sounds super useful.

14:43 And the comparisons are interesting.

14:45 These benchmarks are always so full of landmines and special cases.

14:51 And I didn't use it that way.

14:52 So I didn't get the right results that you said.

14:54 You know, like, they're just so tricky to get them right.

14:56 But it is cool to have them here.

14:58 I like that a lot.

14:59 One thing I would like to throw out there is, you know, a lot of times you have these sort of,

15:03 I could do it this way or I could do it that way.

15:05 And we'll see what we get.

15:06 And then sometimes it's this, sometimes it's that.

15:09 So now you've got to know two APIs and how you combine them.

15:11 And I'm a big fan of the unsync, U-N-S-Y-N-C library, which takes the async programming model and applies it to multiprocessing to threads and async methods and makes it all nice and clean.

15:26 Just a couple of decorators and they're all the same.

15:28 So do you still have to pick?

15:29 You have to pick at the implementation level.

15:32 So imagine you have three functions.

15:34 One of them is async because it actually implements async await it uses them.

15:38 One is just a regular function you'd like to run on a thread.

15:41 One is a regular function.

15:43 Sorry, one is a function that does computational stuff and one does a weighting.

15:46 So you just put a decorator.

15:48 You say at unsync on the regular async one.

15:51 That will run on asyncio.

15:52 On the one that's doing weighting stuff that would work for threads, you just say at unsync and it automatically runs on threads if it's not an async method.

16:00 In the last one, you would say at unsync CPU bound equals true.

16:05 But then once you consume those, the way you program against it, they're all the same regardless of which style it is.

16:13 So it's like when you define the function, like, oh, this is a CPU bound one.

16:15 Oh, this one is actually async.

16:17 So it just is async and it just adapts.

16:19 It's a pretty cool library.

16:22 It's 126 lines of Python in one file.

16:25 And it does all that to unify all these APIs.

16:27 It's great.

16:27 Wow, that's cool.

16:28 Yeah, so pretty cool.

16:29 Anyway, yeah, this is really nice and certainly something people want to think about.

16:33 It's a little bit tricky.

16:35 We'll see if this is still a discussion in a couple of years, right?

16:38 In Python 3.9, there's talk of maybe using subinterpreters to remove the limitation of the GIL inside of single processes and all sorts of stuff.

16:47 Eric Snow is working on that.

16:48 So if they actually got that working, then you'd probably be better because you can share data better, more richly and faster within a single process.

16:57 And it's about to get even more crazy.

17:01 That's a long discussion.

17:03 Yeah.

17:03 How much more do you have to care about, like, blocking and stuff like that?

17:08 Yeah.

17:08 Yeah, it brings all that stuff back in because you don't have the GIL anymore.

17:12 Actually, with the subinterpreters, they're talking about a mechanism to explicitly share data in a safe way between them.

17:18 So still, it's faster, though.

17:20 Okay.

17:20 Cool.

17:20 Well, speaking of making things faster, you know, if you're looking at your app and you wonder what's going on, it would be nice to see everything that's going on across all the layers, across the database, across the web tier, things like that.

17:34 So you should check out Datadog.

17:36 They're sponsoring this episode.

17:37 It's a modern cloud monitoring and cloud scale monitoring platform that brings together metrics and logs and distributed traces all in one place.

17:44 So it auto instruments things like Django and Flask and Postgres means you get to see everything across all those boundaries.

17:52 And it helps you optimize your Python apps in just a few minutes.

17:55 Start monitoring your environment for free and get a sweet Datadog t-shirt.

17:59 Just visit pythonbytes.fm/Datadog to get started.

18:02 Nice.

18:02 Well, not to be outdone by your async stuff.

18:06 I also chose some async stuff here.

18:09 So remember, we talked about Starlette a little while ago.

18:13 And Starlette comes from this GitHub organization called Encode, E-N-C-O-D.

18:18 And that place is full of magic.

18:21 So they have UVicorn, which is the ASGI server.

18:25 That's pretty awesome, like G-Unicorn.

18:27 But for async based on the uv event loop, uv loop, event loop, and so on.

18:33 And there's Starlet.

18:34 There's also a Django REST framework.

18:35 But there's HTTPX, which we talked about last time.

18:38 And the last thing I want to just cover is a few more things in here.

18:42 Because like I said, there's a lot of great stuff.

18:43 There's a project just simply called ORM.

18:47 We've got SQLAlchemy and Django ORM.

18:51 And these guys just said, you know what?

18:52 Well, just the term ORM is just free in Python.

18:55 So let's just do that.

18:56 Which is an async ORM.

18:59 And they also have a thing called databases, which adds async support for talking to all these different databases.

19:07 Postgres and whatnot.

19:09 So this is a really cool project, especially this ORM one.

19:13 Because it's kind of like SQLAlchemy.

19:16 And it's actually based on the SQLAlchemy core for building queries.

19:21 And that gives you a bunch of benefits, right?

19:23 That means if you already have some stuff that works with SQLAlchemy, to some degree it will be similar.

19:27 It means that Alembic, which is the tool to do database migrations on SQLAlchemy, also works with this ORM.

19:36 So you can automatically just apply Alembic to it.

19:38 And that's pretty cool.

19:39 Wow.

19:39 Yeah, it uses this database project that I talked about for cross-database async support.

19:45 And it also has this thing called type system for data validation, which is pretty cool.

19:49 I hadn't heard of that either.

19:50 But yeah, it's a really sweet async API for working with databases and ORMs.

19:59 So the way you create the models, it's very similar to SQLAlchemy.

20:01 It's not identical, but it's similar.

20:04 And then from there on, you just work with it kind of like you would do normal ORM stuff, right?

20:10 Like I would say, if I'm working on an album, I might say album.objects.create.

20:16 Or maybe I would do some kind of filter.

20:19 So I'd say track.objects.filter, and I would do something.

20:22 But every one of these operations is async.

20:25 So you just put a weight in front of it.

20:26 And if you have something you've got to scale, a whole lot of concurrent data traffic, like say a website,

20:33 well, this is a pretty good combo.

20:35 So like in the future, will we just have a weight in front of every other word?

20:41 Everything.

20:42 Exactly.

20:42 So I was going to point out that you've got to be pretty async and await savvy to be doing it.

20:47 Like there's a lot of awaiting, isn't there?

20:50 Yeah.

20:52 I think if you want to work with this library, you just have to say, we're just going all in on async.

20:56 And that's the way it goes, right?

20:58 No, it's good.

20:59 If you're already working with async, that's when you would think, hey, I wonder if there's an async ORM that I can use.

21:04 Yeah.

21:05 Yeah, it looks good.

21:06 And I like that it's based on SQLAlchemy Core.

21:08 That means a big chunk of like the database conversation and the, say, the table creation and the migrations,

21:16 all that stuff is already known and proven and working really well.

21:20 It's just this API kind of site around the side of the traditional SQLAlchemy conversation,

21:27 like directly with the database.

21:29 I do wish that SQLAlchemy would take this approach.

21:32 I interviewed Mike Bayer about it a long time ago.

21:34 And like four years ago, he said, I don't really think it's going to make that big of a difference.

21:39 But I think it actually would make a huge difference.

21:42 You just got to think about, you know, what is your goal, right?

21:45 If your goal is performance, it probably won't make a big difference.

21:48 If your goal is scalability, it can make a tremendous difference, right?

21:52 Are you trying to make an individual user's experience a little bit faster?

21:56 Or are you trying to make the website not take 10 concurrent users, but 10,000, right?

22:02 Like it probably might even make it a tiny bit slower for that one person, but it might make that 10 to 10,000 like no big deal.

22:09 So it depends on what you're after, right?

22:11 Yeah, definitely.

22:12 Speaking of what you're after, what's next for us?

22:14 One of the things you might be after is some data on somebody else's website, like through an API.

22:19 Yes.

22:19 There's more and more people.

22:20 And I think it's great kind of doing the data science stuff of people coming into Python and programming from just trying to get their work done.

22:29 And this is a dataquest.io blog post called Getting Started with APIs.

22:34 And it's not getting started writing APIs.

22:37 It's getting started consuming them with Python.

22:41 If you kind of know what all this stuff is, but you haven't really thought about the basics, that's why I picked up this post is because it's really good with the basics.

22:50 It's really good with the basics, right?

23:20 If you want to specify it.

23:21 So with APIs, you can have parameters to your queries to say, I only want the data for this user.

23:28 Or they gave an example of Spotify music or something.

23:33 You don't want to have all the data for all the songs that Spotify knows about, but maybe just the songs from a particular artist or something.

23:41 So things like that are good.

23:44 But this is actually the first time I've seen this, and they're probably all over the place.

23:48 But talked about status codes, especially get status codes, because that's what we're doing here is retrieving things.

23:55 And had a nice list of all the descriptions and things that you might run into for error codes, including like the 301, which isn't necessarily a problem, but you're getting redirected.

24:07 So maybe you want to know about that.

24:08 And then the 400 is something's not wrong on their end.

24:14 It's wrong on your end.

24:15 It's the server thinks you made a bad request.

24:18 So that might be an endpoint that expects data or parameters, but you didn't send any parameters with it.

24:24 Or you sent an end when it expected a string or whatever.

24:27 And then, you know, it talks about endpoints and endpoints that take query parameters, endpoints being the specific API.

24:35 So we think of a service providing an API, but it's usually not just one API.

24:40 It's usually a whole bunch of related different bits of data that you can query together or query separately for different aspects of it.

24:49 And then, of course, what APIs usually return is JSON data.

24:53 So it has a little bit of an explanation for what JSON looks like.

24:57 And then using the JSON module to convert back and forth between native Python stuff and JSON.

25:04 And it also talks about requests and a bunch of examples for how to pull this.

25:08 So if you're getting started trying to pull some data from an API somewhere, this is a good way to get started.

25:15 It's a nice blend of theory and steps, right?

25:18 It doesn't just say, well, you open up requests and you do this.

25:20 It's like, here's what an API is.

25:22 Here's what the HTTP verbs mean.

25:24 Here's what the status codes are.

25:27 Here's how you get to that.

25:28 And, you know, how do you, like, manifest that in Python and stuff?

25:31 Yeah, it's nice.

25:31 Yeah, but it's not at the level of, like, a college course lecture.

25:35 It's just enough to get the concepts right.

25:38 Exactly.

25:38 It's not trying to make you read the restful dissertations and things like that.

25:44 Yeah, I don't even know if it mentions rest, even though that's what we're talking about.

25:47 Cool.

25:47 That's probably a good thing.

25:48 That was overdone for a while.

25:50 Now, last thing I want to cover is memory management in Python.

25:53 This is an article entitled Memory Management in Python.

25:56 But what it really is, is it's a narrow slice, but a common slice of memory management in Python.

26:02 So you probably don't think about memory very much in Python, huh, Brian?

26:05 No, I usually forget about it.

26:06 Yeah, just forget about it.

26:07 That's right.

26:09 So you don't use malloc or free or new or any of these things.

26:14 Definitely not delete.

26:15 If you use delete, it means something else, sort of, and things like that, right?

26:19 Yeah.

26:20 So I think it's actually pretty interesting that the story of understanding how the runtime experience is in CPython is kind of opaque a little bit, right?

26:30 There's not a lot written about memory management, which is why I decided to pick this thing and talk a little bit about what it covers.

26:37 Because I think it doesn't really matter that you know this in some sense, right?

26:43 Like your Python code will still work, but you more closely understand what your code is doing, how that might map over to like CPU architectures and caches and RAM and all that kind of stuff.

26:54 And, you know, just having a high-level understanding of that's good.

26:58 Yeah, so here's a pretty deep, detailed article, not too long, get to it pretty quick, about memory management in Python.

27:04 But it only covers, like I said, a little bit.

27:07 It's really about how does small object allocation and deallocation happen in Python.

27:14 It doesn't talk about the gill, which is about thread safety and memory allocation.

27:17 It doesn't talk about reference counts.

27:19 It doesn't talk about garbage collection for cycles or much else.

27:24 So it's all about small objects.

27:26 But most things we make in Python are small objects.

27:29 Even when they're big, they're really just a bunch of small things all pointed at each other, right?

27:33 So if I've got like a list of a million items, I don't have each of those items is 10 bytes.

27:39 I don't have 10 million bytes.

27:41 I have this big list with a bunch of things.

27:44 But then each one of those is a pointer out to its actual thing that it is, right?

27:49 Even when you have strings or even numbers, right?

27:53 A lot of languages, numbers are allocated on the stack and treated as value types and stuff.

27:58 But, you know, everything is an object.

28:00 So every little thing that you make has to get allocated and deallocated.

28:03 So understanding how these small objects get allocated, that's pretty interesting.

28:08 So that's what this article talks about.

28:10 So I'll try to like summarize some of the stuff covered there.

28:13 One of the problems you have with memory allocation is that memory can get super fragmented, right?

28:18 If I just allocate a bunch of stuff and then delete it and keep allocating it and just let that grow, you know, just keep adding on the end, wherever the memory is.

28:27 And I want to interact with that.

28:28 That can really mess up like reading from RAM and getting stuff on cache to be high performance and stuff like that, right?

28:34 So what Python does is it actually pre-allocates these little 256K chunks and then it partitions those up and it plucks in the small objects into those spaces and then will potentially take them back out and then reuse those spaces that it had already allocated when it needs to make a new small thing.

28:55 Okay.

28:56 All right.

28:56 So that's supposed to help with memory optimization, the locality stuff, the fragmentation and so on.

29:03 So there's a special memory manager in Python called PyMalloc, general purpose allocator.

29:10 On top of like C malloc, there's a Python allocator, right?

29:14 So there's like this layer, we have RAM, we have the operating systems, virtual memory management.

29:19 We have C's malloc.

29:20 We have this PyMem, PyMalloc thing.

29:23 We have the Python object allocator that then figures out where to place these things and we actually have object memory.

29:29 So there's a lot of stuff going on here and they break it into three levels of organization.

29:34 Okay.

29:35 So for small objects, which are things that are individually smaller than 512 bytes, right?

29:41 Not like maybe a list that has a bunch of stuff, but each, each little bit smaller, right?

29:45 So those are the things we're talking about.

29:48 And what happens is it gets broken into these three things called the block, the pool and the arena.

29:54 So a block is a chunk of memory of a certain size and it only holds Python objects of a certain size.

30:03 So maybe there's a block that holds 16 byte Python things.

30:07 Yeah.

30:08 Okay.

30:08 It's weird.

30:09 Yeah.

30:10 So the reason is, is Python can then, it knows how to exactly fill up and then reuse those blocks.

30:17 Oh yeah.

30:17 Right.

30:18 So if it's like, oh, I'm going to get a bunch of numbers, all the numbers are the same size unless they become utterly huge.

30:23 So we can just like allocate them into the spot.

30:26 Some of those numbers go away.

30:27 We got another block.

30:27 We dropped that new number pointer in right there or the number, which we then point at right there and so on.

30:32 So there's these different blocks.

30:34 Each one is a uniform size between eight and 512 bytes.

30:39 And then the blocks are managed by this thing called a pool, which is usually limited to a memory page size.

30:45 So four kilobytes.

30:47 And then the pools are managed as these things called arenas.

30:51 And these are the things that are allocated on the heap.

30:54 I believe they are 256K pieces of memory, which hold 64 pools, which hold some number of blocks and things like that.

31:04 Right.

31:04 So there's this really intricate way in which memory is trying to be grouped together and then also trying to be reused without reallocating it from the operating system.

31:14 Okay.

31:15 Right.

31:15 So even though Python might new up a bunch of objects, it actually says, well, but we already have this block that holds those size of things.

31:23 And there's some spots in there.

31:24 So let's fill that bad boy up.

31:25 Oh, all right.

31:25 Yeah.

31:26 Anyway, so it's pretty interesting how all the stuff is working together, but that's the Python small object allocator.

31:33 Never thought about that before, but kind of interesting.

31:35 Also, I'm trying to visualize like a sports arena with 64 swimming pools in it.

31:41 That's not a bad one.

31:43 And then each pool is filled with exactly the same size people or creatures swimming around, something like that.

31:48 Yeah.

31:49 Yeah, there you go.

31:49 That makes a lot of sense.

31:50 The first part of it totally made sense.

31:52 The last bit, maybe not so much.

31:54 All right.

31:54 Well, anyway, what I like about this article is it seems like it has a lot of stuff from like, here's the actual C code that defines what an arena is.

32:04 Here you can see it's like a doubly linked list and how it all fits together.

32:06 And it's just got some good analysis.

32:08 So have a look if you've wondered about this.

32:11 All right.

32:11 Well, that's it for our main items.

32:12 I know, Brian, you have big news for the entire world if they live near Portland.

32:18 If they live in Portland or really close to Portland.

32:22 Or want to come to Portland.

32:23 September 26th, I'll be speaking downtown at the Portland Python user group.

32:29 And I'm still working on my talk, but I'll be there.

32:32 That'll be fun.

32:33 And then I'll probably polish it more.

32:35 And people have to volunteer for this other talk.

32:38 So on October 6th, it's the inaugural first day of meeting the Python PDX West.

32:46 So we've got a new user group for Python in town.

32:50 I'm hosting it along with you.

32:52 Yeah, it'll be fun.

32:53 I'm really looking forward to it.

32:54 Yeah.

32:54 And you'll be speaking there.

32:55 I will.

32:56 And I'm trying to get other people to volunteer to speak.

32:58 And if they don't, then it'll just be you and me speaking.

33:01 But I think it'll be fun.

33:02 So we've got a bunch of people signed up so far.

33:05 So it's filling up fast.

33:07 People should sign up.

33:08 That's cool.

33:08 Maybe we could do a live Python Bytes sometime there as well at the end of the day or something.

33:12 Who knows?

33:13 That's a great idea.

33:14 Yeah.

33:14 We could have.

33:15 Maybe not Tuesday, October 6th, but maybe someday we can make that happen.

33:18 Maybe someday.

33:18 Yeah.

33:19 Yeah.

33:19 That's great news.

33:19 If you happen to be around, definitely drop in.

33:23 That'd be great.

33:24 It's on meetup.com, right?

33:25 People want to sign up there.

33:26 Yep.

33:26 And a link in the show notes.

33:27 Do you have any intention of recording live casting or otherwise spreading this in a farther path?

33:34 It's not a bad idea.

33:35 We don't have anything like that set up right away.

33:38 In the future, maybe we could do that.

33:39 Probably people would be interested in watching these.

33:42 But I also want to make it really accessible to people that are new to presenting as well.

33:47 I'd love to have people come in and do a talk that they're working on.

33:52 It's not quite polished yet.

33:53 I want it to not just be experts talking to everybody else, but I'd like it to be people working out things that they're just interested in.

34:02 So I think it would be good.

34:03 Yeah.

34:03 That sounds like a great philosophy for it.

34:05 How about you?

34:06 Any extras?

34:07 I have a couple presenting and speaking PyCon 2020, which is a little earlier this year.

34:12 I believe it's like in April or something.

34:14 The website's up.

34:15 Yeah.

34:16 So April 15th to 23rd.

34:19 So the call for proposals is now open for PyCon 2020.

34:23 So if you would like to be considered, a talk of yours to be considered there, then now is the time.

34:29 Yeah.

34:29 Go ahead and submit those because you know you're only going to spend like a week writing it up anyway.

34:33 So I may as well get that done.

34:34 That's right.

34:36 Do it like a Band-Aid.

34:37 Stop worrying about it.

34:38 Just get it over with.

34:39 Yeah.

34:39 Pull it right off.

34:40 All right.

34:40 Another thing I just...

34:41 Have you heard of Gitbook?

34:42 Yeah, but I haven't really looked into it much.

34:45 I hadn't either.

34:45 I was interviewing the guy, Joe, from Masonite, the Masonite Web Framework.

34:50 And I noticed that Masonite's documentation is written in Gitbook.

34:57 And so I looked at it and Gitbook is pretty interesting.

34:59 You can use it as kind of like almost a base camp project management type thing.

35:05 So stuff, personal notes or things you want to track or stuff like that.

35:09 But you can also use it for documentation and knowledge bases and whatnot.

35:13 So it looked pretty cool.

35:15 And so I thought I'd just, you know, let people know that it's out there.

35:17 It's free for small teams, like with some limitations.

35:20 It's cost a little bit of money for non-trivial small teams, like $7 user.

35:26 But it's also free for open source and nonprofit teams, which is kind of cool.

35:30 So I'm also a big fan of Read the Docs.

35:32 So it's, you know, I'm not saying they shouldn't use that.

35:35 But here's an interesting project that I ran across that I hadn't heard of.

35:38 It looks nice.

35:39 If people, for some reason, are opposed to Read the Docs, I don't know why you would be.

35:43 Or just like this look better is another opportunity.

35:46 So good to have options.

35:48 Good to have options.

35:49 Also good to have laughs.

35:50 Yeah, let's do some jokes.

35:51 All right.

35:52 How about you go first?

35:53 Okay.

35:54 So I pulled these out of a list of dad jokes you had posted somewhere on our Trello, but

36:00 changed it a little bit.

36:01 So what do you call a 3.14 foot long snake?

36:05 I don't know.

36:05 Well, that would be a python, of course.

36:07 With the Greek symbol thon, yeah?

36:10 Python?

36:10 Yeah.

36:10 So if it's not feet, but 3.14 inches, then what is it?

36:15 It's a micro python.

36:16 It's a micro python, a mu python.

36:18 Yeah.

36:20 I feel like we're back in calculus or physics.

36:21 Yeah.

36:22 So do you want to do some of these?

36:23 Sure.

36:24 So why doesn't Hollywood make more big data movies?

36:28 I don't know.

36:28 Why?

36:28 No sequel.

36:29 This last one, it's a little bit crass.

36:32 It's, I don't know, it's a little low level, but I'll see what I can do here.

36:35 So why didn't the angle bracket div get invited to the dinner party?

36:40 I don't know.

36:40 Why?

36:41 It had no class.

36:42 Oh, yeah.

36:45 That's a good one.

36:45 All right.

36:45 Well, thanks for throwing those in there.

36:47 These are fun.

36:47 Yeah.

36:48 Thank you once again for talking with me on a nice Wednesday.

36:51 Absolutely.

36:52 See you later.

36:53 Bye.

36:53 Thank you for listening to Python Bytes.

36:55 Follow the show on Twitter via at Python Bytes.

36:57 That's Python Bytes as in B-Y-T-E-S.

37:00 And get the full show notes at pythonbytes.fm.

37:04 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

37:08 We're always on the lookout for sharing something cool.

37:10 On behalf of myself and Brian Okken, this is Michael Kennedy.

37:14 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript