Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

#288: Performance benchmarks for Python 3.11 are amazing

Published Tue, Jun 14, 2022, recorded Tue, Jun 14, 2022
Watch this episode on YouTube
Play on YouTube
Watch the live stream replay

About the show

Sponsored by us! Support our work through:

Brian #1: Polars: Lightning-fast DataFrame library for Rust and Python

  • Suggested by a several listeners
  • “Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
    • Lazy | eager execution
    • Multi-threaded
    • SIMD (Single Instruction/Multiple Data)
    • Query optimization
    • Powerful expression API
    • Rust | Python | ...”
  • Python API syntax set up to allow parallel and execution while sidestepping GIL issues, for both lazy and eager use cases. From the docs: Do not kill parallelization
  • The syntax is very functional and pipeline-esque:

    import polars as pl
        q = (
            pl.scan_csv("iris.csv")
            .filter(pl.col("sepal_length") > 5)
            .groupby("species")
            .agg(pl.all().sum())
        )
        df = q.collect()
    
  • Polars User Guide is excellent and looks like it’s entirely written with Python examples.

  • Includes a 30 min intro video from PyData Global 2021

Michael #2: PSF Survey is out

  • Have a look, their page summarizes it better than my bullet points will.

Brian #3: Gin Config: a lightweight configuration framework for Python

  • Found through Vincent D. Warmerdam’s excellent intro videos on gin on calmcode.io
  • Quickly make parts of your code configurable through a configuration file with the @gin.configurable decorator.
  • It’s in interesting take on config files. (Example from Vincent)

        # simulate.py
        @gin.configurable
        def simulate(n_samples):
          ...
        # config.py
        simulate.n_samples = 100
    
  • You can specify:

    • required settings: def simulate(n_samples=gin.REQUIRED)`
    • blacklisted settings: @gin.configurable(blacklist=["n_samples"])
    • external configurations (specify values to functions your code is calling)
    • can also references to other functions: dnn.activation_fn = @tf.nn.tanh
  • Documentation suggests that it is especially useful for machine learning.
  • From motivation section:
    • “Modern ML experiments require configuring a dizzying array of hyperparameters, ranging from small details like learning rates or thresholds all the way to parameters affecting the model architecture.
    • Many choices for representing such configuration (proto buffers, tf.HParams, ParameterContainer, ConfigDict) require that model and experiment parameters are duplicated: at least once in the code where they are defined and used, and again when declaring the set of configurable hyperparameters.
    • Gin provides a lightweight dependency injection driven approach to configuring experiments in a reliable and transparent fashion. It allows functions or classes to be annotated as @gin.configurable, which enables setting their parameters via a simple config file using a clear and powerful syntax. This approach reduces configuration maintenance, while making experiment configuration transparent and easily repeatable.”

Michael #4: Performance benchmarks for Python 3.11 are amazing

  • via Eduardo Orochena
  • Performance may be the biggest feature of all
  • Python 3.11 has
    • task groups in asyncio
    • fine-grained error locations in tracebacks
    • the self-type to return an instance of their class
  • The "Faster CPython Project" to speed-up the reference implementation.
    • See my interview with Guido and Mark: talkpython.fm/339
    • Python 3.11 is 10~60% faster than Python 3.10 according to the official figures
    • And a 1.22x speed-up with their standard benchmark suite.
  • Arriving as stable until October

Extras

Michael:

Joke: Why wouldn't you choose a parrot for your next application


Want to go deeper? Check our projects