Episode #236: Fuzzy wuzzy wazzy fuzzy was faster
Watch the live stream:
About the show
Sponsored by Sentry:
- Sign up at pythonbytes.fm/sentry
- And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES
Special guest: Anastasiia Tymoshchuk
- Tweet by Matthew Feickert, @HEPfeickert
- “I need to give some serious praise to fellow Scikit-HEP dev Hans Dembinski on his excellent monolens tool for interactive simulation of kinds of color blindness. It works really quite well and the fact that is a pipx install away is awesome!
- monolens lets you “view part of your screen in greyscale or simulated colorblindness”
- So simple. Just pops up a box that you can drag around your monitor and view stuff in greyscale.
- Reply tweet by Niko, @NikoSercevic
- “I mean to use cmasher so I know it’s cb friendly”
- CMasher : “Scientific colormaps for making accessible, informative and cmashing plots”
- Provides a collection of scientific colormaps and utility functions to be used by different Python packages and projects, mainly in combination with matplotlib.
- Lots of great colormaps that are color blindness friendly.
- Just specify the CB friendly colormaps with plots, super easy.
# Import CMasher to register colormaps import cmasher as cmr # Import packages for plotting import matplotlib.pyplot as plt import numpy as np # Access rainforest colormap through CMasher or MPL cmap = cmr.rainforest # CMasher cmap = plt.get_cmap('cmr.rainforest') # MPL # Generate some data to plot x = np.random.rand(100) y = np.random.rand(100) z = x**2+y**2 # Make scatter plot of data with colormap plt.scatter(x, y, c=z, cmap=cmap, s=300) plt.show()
- via Mikael Honkala
- Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance
- “you mention fuzzywuzzy for fuzzy text matching in the last episode, and wanted to mention the rapidfuzz package as a high-performance alternative.”
- “non-rigorous performance testing of several alternatives (including fuzzywuzzy), and rapidfuzz came out on top with a sizable margin.”
- Simple Ratio example:
> fuzz.ratio("this is a test", "this is a test!") 96.55171966552734
Anastasiia #3: Structlog to improve your logs
- One of the best ways to improve logs is to add more structure to them
- Why do we even need to care about logs?
- logs can provide visibility to production, what is actually happening
- logs can help to improve tracing of a bug, especially if logs are machine-readable and easy parseable
- logs can give you a clue why a bug or an exception occurred
- It’s super easy to start with Structlog, also easy to integrate it with ELK stack for further processing
- Features that you will get if switch your logs to use structlog:
- readable structure of logs in key-value pairs
- easy to parse with any post processor to visualise logs and to have more visibility for your code
- you can create custom log levels and separate specific logs with event keys for each log
- I am working with structured logs for a couple of years and recommend everyone to try
Brian #4: xfail now works with pytest-subtests
- Admittedly, there may be few people that care about this, but I’m one of them.
- subtests are a kinda weird feature of unittest that came in with Python 3.4
- They’re really a context manager that you can use within a test function
- pytest started supporting them through a plugin, pytest-subtests, sometime in 2019
- With the plugin, you can use either the unittest style, or a fixture fixture style, without unittest.
- It’s a similar problem/solution that pytest-check solves to allow multiple failures per test case.
- But, like I said, they have some quirks.
- See Paul Ganssle’s Subtests in Python
- T&C 111: Subtests in Python with unittest and pytest
- One quirk is that xfail didn’t work right. It’s discussed in both links above.
- Anyway, it’s fixed now, thanks to maybe-sybr, as of version 0.5.0
- So you can now trust that xfail will work properly with subtest
Michael #5: BaseSettings in Pydantic
- via Denis Roy
- Create a model that inherits from
- The model initialiser will attempt to determine the values of any fields not passed as keyword arguments by reading from the environment.
- This makes it easy to:
- Create a clearly-defined, type-hinted application configuration class
- Automatically read modifications to the configuration from environment variables
- Manually override specific settings in the initialiser where desired (e.g. in unit tests)
- Get values from OS ENV or .env files
- Also has support for secrets files
Anastasiia #6: Take care of the documentation on your team will thank you later
- Sphinx and ReadTheDocs will make life of developers so much easier
- Everyone knows importance of documentation, but how to keep it up to date?
- In my experience, I tried to use Confluence, describe new features in detailed Jira tickets, write some hints in Google docs and sharing them with the team. It does not work, as documentation is getting outdated and piles up drastically
- Benefits of implementing continuous documentation for the code:
pipxis now part of the PyPA (via Matthew Feickert)
- I’ll be “speaking” at Manning’s Developer Productivity conference. It’s free, so feel free to sign up.
- Cloud bills, they do pile up!
- Flake8-FastAPI (via Brian Skinn)
- Want to contribute to Jupyter? Add a localization.
- pytest uses. Please comment on this thread if you know of some great projects that use pytest, if they converted from something else, or just find it interesting that they use pytest.
First time recursion