Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

#236: Fuzzy wuzzy wazzy fuzzy was faster

Published Wed, Jun 2, 2021, recorded Wed, Jun 2, 2021

Watch the live stream:

Watch this episode on YouTube
Play on YouTube
Watch the live stream replay

About the show

Sponsored by Sentry:

  • Sign up at pythonbytes.fm/sentry
  • And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES

Special guest: Anastasiia Tymoshchuk

Brian #1: Using accessible colors, monolens & CMasher

  • Tweet by Matthew Feickert, @HEPfeickert
    • “I need to give some serious praise to fellow Scikit-HEP dev Hans Dembinski on his excellent monolens tool for interactive simulation of kinds of color blindness. It works really quite well and the fact that is a pipx install away is awesome!
  • monolens lets you “view part of your screen in greyscale or simulated colorblindness”
    • So simple. Just pops up a box that you can drag around your monitor and view stuff in greyscale.
  • Reply tweet by Niko, @NikoSercevic
    • “I mean to use cmasher so I know it’s cb friendly”
  • CMasher : “Scientific colormaps for making accessible, informative and cmashing plots”
    • Provides a collection of scientific colormaps and utility functions to be used by different Python packages and projects, mainly in combination with matplotlib.
    • Lots of great colormaps that are color blindness friendly.
    • Just specify the CB friendly colormaps with plots, super easy.
          # Import CMasher to register colormaps
          import cmasher as cmr
      
          # Import packages for plotting
          import matplotlib.pyplot as plt
          import numpy as np
      
          # Access rainforest colormap through CMasher or MPL
          cmap = cmr.rainforest                   # CMasher
          cmap = plt.get_cmap('cmr.rainforest')   # MPL
      
          # Generate some data to plot
          x = np.random.rand(100)
          y = np.random.rand(100)
          z = x**2+y**2
      
          # Make scatter plot of data with colormap
          plt.scatter(x, y, c=z, cmap=cmap, s=300)
          plt.show()
      

Michael #2: rapidfuzz: Rapid fuzzy string matching in Python and C++

  • via Mikael Honkala
  • Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance
  • “you mention fuzzywuzzy for fuzzy text matching in the last episode, and wanted to mention the rapidfuzz package as a high-performance alternative.”
  • “non-rigorous performance testing of several alternatives (including fuzzywuzzy), and rapidfuzz came out on top with a sizable margin.”
  • Simple Ratio example:
        > fuzz.ratio("this is a test", "this is a test!")
        96.55171966552734
    

Anastasiia #3: Structlog to improve your logs

  • One of the best ways to improve logs is to add more structure to them
  • Why do we even need to care about logs?
    • logs can provide visibility to production, what is actually happening
    • logs can help to improve tracing of a bug, especially if logs are machine-readable and easy parseable
    • logs can give you a clue why a bug or an exception occurred
  • It’s super easy to start with Structlog, also easy to integrate it with ELK stack for further processing
  • Features that you will get if switch your logs to use structlog:
    • readable structure of logs in key-value pairs
    • easy to parse with any post processor to visualise logs and to have more visibility for your code
    • you can create custom log levels and separate specific logs with event keys for each log
  • I am working with structured logs for a couple of years and recommend everyone to try

Brian #4: xfail now works with pytest-subtests

Michael #5: BaseSettings in Pydantic

  • via Denis Roy
  • Create a model that inherits from BaseSettings
  • The model initialiser will attempt to determine the values of any fields not passed as keyword arguments by reading from the environment.
  • This makes it easy to:
    • Create a clearly-defined, type-hinted application configuration class
    • Automatically read modifications to the configuration from environment variables
    • Manually override specific settings in the initialiser where desired (e.g. in unit tests)
  • Get values from OS ENV or .env files
  • Also has support for secrets files

Anastasiia #6: Take care of the documentation on your team will thank you later

  • Sphinx and ReadTheDocs will make life of developers so much easier
  • Everyone knows importance of documentation, but how to keep it up to date?
  • In my experience, I tried to use Confluence, describe new features in detailed Jira tickets, write some hints in Google docs and sharing them with the team. It does not work, as documentation is getting outdated and piles up drastically
  • Benefits of implementing continuous documentation for the code:
    • easy to support by writing docstrings, updating them when needed
    • easy to find needed information in a centralised documentation
    • easy to keep versioning for each new release of the code
    • ReadTheDocs if free for open source code
    • Sphinx will generate code reference documentation for the code

Extras

Michael

Brian

  • pytest uses. Please comment on this thread if you know of some great projects that use pytest, if they converted from something else, or just find it interesting that they use pytest.

Joke

First time recursion


Want to go deeper? Check our projects