#186: The treebeard will guard your notebook
Published Thu, Jun 18, 2020,
recorded Wed, Jun 10, 2020
Sponsored by us! Support our work through:
Michael #1: sidetable - Create Simple Summary Tables in Pandas
- by Chris Moffitt
- Makes it easy to build a frequency table and simple summary of missing values in a DataFrame.
Example without and with
A useful tool when starting data exploration on a new data set
- At its core, sidetable is a super-charged version of pandas
value_counts
with a little bit ofcrosstab
mixed in. - With sidetable is imported, you have a new accessor on all your DataFrames -
stb
that you can use to build summary tables.
Brian #2: tabulate
- suggested by Tom McDermott
- Pretty-print tabular data in Python, a library and a command-line utility.
from tabulate import tabulate table = [["Sun",696000,1989100000], ["Earth",6371,5973.6], ["Moon",1737,73.5], ["Mars",3390,641.85]] headers=["Planet","R (km)", "mass (x 10^29 kg)"] table_str = tabulate(table, headers=headers) print(table_str)
Planet R (km) mass (x 10^29 kg)
-------- -------- -------------------
Sun 696000 1.9891e+09
Earth 6371 5973.6
Moon 1737 73.5
Mars 3390 641.85
- lots of table formats, including
- simple (Markdown extended)
- github (github flavored markdown)
- pipe
- jira
- mediawiki
- html
- plain (just spaces)
- different column alignment options
- number formatting
Michael #3: treebeard - ci for notebooks
- via Brian Skinn
- Continuous Integration for binder-ready repos
- A solution for setting up continuous integration on data science projects requiring minimal configuration.
- Functionality:
- Automatically installs dependencies for binder-ready repos (which can use conda, pip, or pipenv)
- Runs notebooks in the repo (using papermill)
- Uploads outputs, providing versioned URLs and nbcoverted output notebooks
- Integrates with repos via a GitHub App
- Slack notifications
- A secret store for integrating with existing infrastructure
- A notebook that can run all code cells successfully will be tagged as successful. Treebeard shows a summary of all notebook statuses once execution is finished.
Brian #4: Upcoming features in venv/virtualenv
- In episode 184, we discussed how virtualenv and venv
- Coming in Python 3.9, venv will get
--upgrade-deps
flag.- `--upgrade-deps Upgrade core dependencies: pip setuptools to the latest version in PyPI``
- It’s listed as being changed in 3.8, but it just missed 3.8 by a smidge and will have to wait until 3.9, which is available as beta now. Here’s beta 3.
- Automatically updates pip and setuptools in the new environment.
- virtualenv is also getting a new goodie, periodic update.
- Not only does it create environments with updated setuptools, pip, wheel packages, it will periodically go out and check for updates to make sure it’s ready for your next virtual environment.
- You can also manually have it update, with the
--upgrade-embed-wheels
flag.
Michael #5: PEP 582 now!
- via Luiz Irber
- This PEP proposes to add to Python a mechanism to automatically recognize a __pypackages__ directory and prefer importing packages installed in this location over user or global site-packages.
- How virtual environments work is a lot of information for anyone new. It takes a lot of extra time and effort to explain them.
- Different platforms and shell environments require different sets of commands to activate the virtual environments.
- Virtual environments need to be activated on each opened terminal.
- Tools like pip can be used to install the required dependencies directly into this directory.
- Still in draft mode but Python 3.8?
- https://github.com/David-OConnor/pyflow implements PEP 582
- Unfortunately requires everything running via
pyflow
for now.
Brian #6: awesome pyproject.toml projects
- “We think
pyproject.toml
is pretty awesome, so this awesome list contains projects already using it, or discussing its inclusion.” - Testing and formatting apparently switched pretty quick
- coverage.py
- pytest
- tox
- ward (new to me, no test names, test names are strings)
- black
- isort
- code analysis projects
- pylint
- unimport
- wemake-python-styleguide
- packaging projects
- some articles on pyproject.toml
- and a list of projects discussing the switch
- Python bytes awesome list
Extras:
Brian:
- new website for Pragmatic
Michael:
Joke:
- Spouse: Stop by the store on the way home from work, "Honey, please stop at the market and buy 1 bottle of milk. If they have eggs, bring 6"
- Me: I came back with 6 bottles of milk.
- Spouse: "Why the hell did you buy 6 bottles of milk? It's just the two of us!"
- Me: "Why do you think? Because they had eggs!"