Episode #288: Performance benchmarks for Python 3.11 are amazing
Watch the live stream:
About the show
Sponsored by us! Support our work through:
- Suggested by a several listeners
- “Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
- Lazy | eager execution
- SIMD (Single Instruction/Multiple Data)
- Query optimization
- Powerful expression API
- Rust | Python | ...”
- Python API syntax set up to allow parallel and execution while sidestepping GIL issues, for both lazy and eager use cases. From the docs: Do not kill parallelization
The syntax is very functional and pipeline-esque:
import polars as pl q = ( pl.scan_csv("iris.csv") .filter(pl.col("sepal_length") > 5) .groupby("species") .agg(pl.all().sum()) ) df = q.collect()
Polars User Guide is excellent and looks like it’s entirely written with Python examples.
- Includes a 30 min intro video from PyData Global 2021
Michael #2: PSF Survey is out
- Have a look, their page summarizes it better than my bullet points will.
- Found through Vincent D. Warmerdam’s excellent intro videos on gin on calmcode.io
- Quickly make parts of your code configurable through a configuration file with the
It’s in interesting take on config files. (Example from Vincent)
# simulate.py @gin.configurable def simulate(n_samples): ... # config.py simulate.n_samples = 100
You can specify:
- required settings:
- blacklisted settings:
- external configurations (specify values to functions your code is calling)
- can also references to other functions:
dnn.activation_fn = @tf.nn.tanh
- required settings:
- Documentation suggests that it is especially useful for machine learning.
- From motivation section:
- “Modern ML experiments require configuring a dizzying array of hyperparameters, ranging from small details like learning rates or thresholds all the way to parameters affecting the model architecture.
- Many choices for representing such configuration (proto buffers, tf.HParams, ParameterContainer, ConfigDict) require that model and experiment parameters are duplicated: at least once in the code where they are defined and used, and again when declaring the set of configurable hyperparameters.
- Gin provides a lightweight dependency injection driven approach to configuring experiments in a reliable and transparent fashion. It allows functions or classes to be annotated as
@gin.configurable, which enables setting their parameters via a simple config file using a clear and powerful syntax. This approach reduces configuration maintenance, while making experiment configuration transparent and easily repeatable.”
- via Eduardo Orochena
- Performance may be the biggest feature of all
- Python 3.11 has
- task groups in asyncio
- fine-grained error locations in tracebacks
- the self-type to return an instance of their class
- The "Faster CPython Project" to speed-up the reference implementation.
- See my interview with Guido and Mark: talkpython.fm/339
- Python 3.11 is 10~60% faster than Python 3.10 according to the official figures
- And a 1.22x speed-up with their standard benchmark suite.
- Arriving as stable until October