#149: Python's small object allocator and other memory features
Published Wed, Sep 25, 2019,
recorded Wed, Sep 18, 2019
Sponsored by Datadog: pythonbytes.fm/datadog
Brian #1: Dropbox: Our journey to type checking 4 million lines of Python
- Continuing saga, but this is a cool write up.
- Benefits
- “Experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem. Here are typical questions that are often tricky to answer without type annotations:
- Can this function return None?
- What is this items argument supposed to be?
- What is the type of the id attribute: is it int, str, or perhaps some custom type?
- Does this argument need to be a list, or can I give a tuple or a set?”
- Type checker will find many subtle bugs.
- Refactoring is easier.
- Running type checking is faster than running large suites of unit tests, so feedback can be faster.
- Typing helps IDEs with better completion, static error checking, and more.
- “Experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem. Here are typical questions that are often tricky to answer without type annotations:
- Long story, but really cool learnings of how and why to tackle adding type hints to a large project with many developers.
- Conclusion. mypy is great now, because DropBox needed it to be.
Michael #2: Setting Up a Flask Application in Visual Studio Code
- Video, but also as a post
- Follow on to the same in PyCharm:
- Steps outside VS Code
- Clone repo
- Create a virtual env (via venv)
- Install requirements (via requirements.txt)
- Setup flask app ENV variable
- flask deploy ← custom command for DB
- VS Code
- Open the folder where the repo and venv live
- Open any Python file to trigger the Python subsystem
- Ensure the correct VENV is selected (bottom left)
- Open the debugger tab, add config, pick Flask, choose your app.py file
- Debug menu, start without debugging (or with)
- Adding tests via VS Code
- Open command pallet (CMD SHIFT P), Python: Discover Tests, select framework, select directory of tests, file pattern, new tests bottle on the right bar
Brian #3: Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know
- How data scientists can go about choosing between the multiprocessing and threading and which factors should be kept in mind while doing so.
- Does not consider async, but still some great info.
- Overview of both concepts in general and some of the pitfalls of parallel computing.
- The specifics in Python, with the GIL
- Use threads for waiting on IO or waiting on users.
- Use multiprocessing for CPU intensive work.
- The surprising bit for me was the benchmarks
- Using something speeds up the code. That’s obvious.
- The difference between the two isn’t as great as I would have expected.
- A discussion of merits and benefits of both.
- And from the perspective of data science.
- A few more examples, with code, included.
Michael #4: ORM - async ORM
- And https://github.com/encode/databases
- The orm package is an async ORM for Python, with support for Postgres, MySQL, and SQLite.
- SQLAlchemy core for query building.
- databases for cross-database async support.
- typesystem for data validation.
- Because ORM is built on SQLAlchemy core, you can use Alembic to provide database migrations.
- Need to be pretty async savy
Brian #5: Getting Started with APIs
- dataquest.io post
- Conceptual introduction of web APIs
- Discussion of GET status codes, including a nice list with descriptions.
- examples:
301
: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.400
: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
- examples:
- endpoints
- endpoints that take query parameters
- JSON data
- Examples in Python for using:
requests
to query endpoints.json
to load and dump JSON data.
Michael #6: Memory management in Python
- This article describes memory management in Python 3.6.
- Everything in Python is an object. Some objects can hold other objects, such as lists, tuples, dicts, classes, etc.
- such an approach requires a lot of small memory allocations
- To speed-up memory operations and reduce fragmentation Python uses a special manager on top of the general-purpose allocator, called PyMalloc.
- Layered managers
- RAM
- OS VMM
- C-malloc
- PyMem
- Python Object allocator
- Object memory
- Three levels of organization
- To reduce overhead for small objects (less than 512 bytes) Python sub-allocates big blocks of memory.
- Larger objects are routed to standard C allocator.
- three levels of abstraction —
arena
,pool
, andblock
. - Block is a chunk of memory of a certain size. Each block can keep only one Python object of a fixed size. The size of the block can vary from 8 to 512 bytes and must be a multiple of eight
- A collection of blocks of the same size is called a pool. Normally, the size of the pool is equal to the size of a memory page, i.e., 4Kb.
- The arena is a chunk of 256kB memory allocated on the heap, which provides memory for 64 pools.
- Python's small object manager rarely returns memory back to the Operating System.
- An arena gets fully released If and only if all the pools in it are empty.
Extras
Brian:
- Tuesday, Oct 6, Python PDX West,
- Thursday, Sept 26, I’ll be speaking at PDX Python, downtown.
- Both events, mostly, I’ll be working on new programming jokes unless I come up with something better. :)
Michael:
Jokes: A few I liked from the dad joke list.
- What do you call a 3.14 foot long snake? A π-thon
- What if it’s 3.14 inches, instead of feet? A μ-π-thon
- Why doesn't Hollywood make more Big Data movies? NoSQL.
- Why didn't the
div
get invited to the dinner party? Because it had no class.