Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

Episode #224: Join us on a Python adventure back to 1977

Published Wed, Mar 10, 2021, recorded Wed, Mar 10, 2021.



Special guest: Calvin Hendryx-Parker

Live stream

Michael #1: AWSimple

  • by James Abel
  • AWSimple is a more object oriented interface on top of boto3 for some of the common “serverless” AWS services: S3, DynamoDB, SNS, and SQS.
  • Features:
    • Simple Object Oriented API on top of boto3
    • One-line S3 file write, read, and delete
    • Automatic S3 retries
    • Locally cached S3 accesses
    • True file hashing (SHA512) for S3 files (S3's etag is not a true file hash)
    • DynamoDB full table scans (with local cache option)
    • DynamoDB secondary indexes
    • Built-in pagination (e.g. for DynamoDB table scans and queries). Always get everything you asked for.
    • Can automatically set SQS timeouts based on runtime data (can also be user-specified)
  • Caching: S3 objects and DynamoDB tables can be cached locally to reduce network traffic, minimize AWS costs, and potentially offer a speedup.

Brian #2: coverage and installed packages

  • I’ve covered coverage.py a lot on Test & Code, starting with episode 12, and even talked about it on episode 147, and many others.
  • Except there’s something I missed, hidden in plain sight, all this time.
  • coverage --source , as well as pytest --cov if using pytest-cov plugin, is not just a path.
  • “You can specify source to measure with the --source command-line switch, or the [run] source configuration value. The value is a comma- or newline-separated list of directories *or package names*. If specified, only source inside these directories or packages will be measured.” - coverage.py docs, (emphasis mine)
  • Up to now I was doing this trick I picked up from I don’t remember where I would run coverage from the top level project directory, specify the source as the project source, and set a [paths] setting in .coveragerc, the source setting to both the project source and the site-packages directory.
  • Then the report would show the coverage of the source code, even though it was the site-packages code that was running.
  • That trick is still nice to specify the output as your project directory, which is usually a shorter relative path.
  • However, it’s not essential. You can just specify the source as the package name, without the above trick, and coverage will report the coverage of the installed package. That is usually good enough.
  • Super cool

Calvin #3: Finding Mona Lisa in the Game of Life with JAX

  • by Atul Vinaya
  • Lots of great code examples
  • Showcases the speed increase you can get using JAX on a GPU vs CPU unvectorized
  • Initial implementation took days of CPU time to get a rough result
  • JAX compiles numpy to highly vectorized code to run on a GPU
  • Requires some refactor of the code to optimize for a highly parallel run on GPUs
  • Post includes link to notebook used for the project
  • “Running ~1000 iterations for a 483px wide Mona Lisa on the google colab GPU runtime only takes around 40 seconds!”

Michael #4: Python Package Index nukes 3,653 malicious libraries uploaded soon after security shortcoming highlighted

  • From Mark Little
  • Recall Google’s Python goal was around PyPI security.
  • Related (from @tonny) Poison packages – “Supply Chain Risks” user hits Python community with 4000 fake modules
  • PyPI has removed 3,653 malicious packages uploaded days after a security weakness in the use of private and public registries was highlighted.
  • “Developers are often advised to review any code they import from an external library though that advice isn't always followed.” ← yeah
  • Last month, security researcher Alex Birsan demonstrated how easy it is to take advantage of these systems through a form of typosquatting that exploited the interplay between public and private package registries.
  • Birsan set out to see whether he could identify the names of private packages used inside companies and create malicious packages using those library names to place in the public package registries – the indexes that keep track of available software modules.
  • The names of private packages turned out to be rather easy to find, particularly in the Node.js/JavaScript ecosystem because private package.json files show up rather often in public software repositories.
  • So Biran crafted identically named libraries that he designed to sneak system configuration data through corporate firewalls.
  • The challenge then became getting applications that require private libraries to look for those file names in a polluted public source. As it turns out, it's common for corporate software developers to rely on a hybrid configuration for their applications, one that references private internal packages but also supports fetching dependencies from a public registry, in order to ensure packages are up-to-date.
  • The companies that Birsan managed to attack with this technique include Apple, Microsoft, Netflix, PayPal,Shopify, Tesla, Uber, and Yelp. And for his efforts, he has been awarded at least $130,000 from bug bounty programs involving these firms.
  • Birsan's success in carrying out such attacks should set off alarm bells. Software supply chain attacks present a higher degree of risk than many threat scenarios because they have the potential to affect so many downstream victims
  • Makes me want to setup devpi + devpi-constrained just for internal projects.
  • What to do?
  • Don’t do mass bogus uploads like this to prove your point. We appreciate the message you are trying to deliver, but it’s already been documented so you are just making distracting work for other people who could more usefully be doing something else for the project.
  • Don’t choose a PyPI package juat because the name looks right. Check that you really are downloading the right module from the right publisher. Even legitimate modules sometimes have names that clash, compete or confuse.
  • Don’t hook internal projects to external repositories by mistake. If you are using Python packages that you haven’t published externally, then the one thing you can be sure of is that all external copies of “your” package are imposter modules, probably malware.
  • Don’t blindly download package updates into your own development or build systems. Test and review everything you download before you approve it for use. Remember that packages typically include update-time scripts that run when you do the update, so malware infections could be delivered as part of the update process, not of the module source code that ultimately gets installed.

Brian #5: python-adventure

  • Brandon Rhodes
  • "This is a faithful port of the “Adventure” game to Python 3 from the original 1977 FORTRAN code by Crowther and Woods (it is driven by the same advent.dat file!) that lets you explore Colossal Cave, where others have found fortunes in treasure and gold, though it is rumored that some who enter are never seen again. “
  • “For extra authenticity, the output of the Adventure game in this mode, python3 -m adventure, is typed to your screen at 1200 baud.”
  • Colossal Cave Adventure is the first known work of interactive fiction and, as the first text adventure game, is considered the precursor for the adventure game genre. “ - wikipedia
  • related:
  • side note:

Calvin #6: Exciting New Features in Django 3.2

  • From Haki Benita
  • Upcoming LTS Release in the 3 series
  • Expected in April
  • Post highlights some interesting new features that you might not have noticed
  • New Features
    • Covering Indexes in Postgres support (performance plus!)
    • Timezones are hard and TruncDate now helps keep you from pulling out the foot cannon
    • JSONObject DB Functions, helping the unstructured data world keep using Postgres
    • Signal.send_robust() now logs exceptions so you don’t have to!
    • The new QuerySet.alias() method allows creating reusable aliases for expressions (Performance!)
    • The new display decorator makes creating calculated admin fields cleaner
    • Value Expressions Detects Type, more cleaning up to allow the ORM to figure it out
    • Notable missing feature is Async ORM, but this will be awesome when it lands
    • More are listed on the Django 3.2 Release Page

Extras:

Michael:

Calvin:

  • DjangoCon Europe 2021 CFP is Open until 4/1 https://2021.djangocon.eu/talks/cfp/
  • Python Web Conf 2021
    • 4 Tracks this year
    • 60 Amazing Speakers (almost 20% women)
    • Tickets
    • Professional $199
    • Student $99
    • Grants Available!

Joke:

    /** Logger */
    private Logger logger = Logger.getLogger();
    // This is black magic
    // from
    // *Some stackoverlow link
    // Don’t play with magic, it can BITE.
    # For the sins I am about to commit, may Guido van Rossum forgive me
    // Remove this if you wanna be fired
    }
    catch(Exception ex)
    {
       // Houston, we have a problem
    }
    int getRandomNumber()
    {
    Return 4; // chosen by fair dice roll.
    // guaranteed to be random.
    }

https://twitter.com/LinuxHandbook/status/1368974401979383810


Click to show comments