Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book

#20: Finding similar but not identical images in 128 bits via Python

Published Wed, Apr 5, 2017, recorded Tue, Apr 4, 2017

Sponsored by Rollbar, thank you! rollbar.com/pythonbytes

#1 Brian: Duplicate image detection with perceptual hashing in Python

  • Ben Hoyt
  • From Jetsetter.com, Invitation-Only Travel Community
  • We use a perceptual image hash called dHash (“difference hash”), which was developed by Neal Krawetz in his work on photo forensics. It’s a very simple but surprisingly effective algorithm that involves the following steps (to produce a 128-bit hash value)
    • Convert the image to grayscale
    • Downsize to a 9x9 square of gray values (or 17x17 for a larger, 512-bit hash)
    • Calculate the “row hash”: for each row, move from left to right, and output a 1 bit if the next gray value is greater than or equal to the previous one, or a 0 bit if it’s less (each 9-pixel row produces 8 bits of output)
    • Calculate the “column hash”: same as above, but for each column, move top to bottom
    • Concatenate the two 64-bit values together to get the final 128-bit hash
  • Fast: Python is not very fast at bit twiddling, but all the hard work of converting to grayscale and downsizing is done by a C library: ImageMagick+wand or PIL.
  • Available via github: https://github.com/Jetsetter/pybktree

#2 Michael: Google Open Source/Python

  • subprocess32: A reliable subprocess module for Python 2
  • Grumpy: A Python to Go transcompiler and runtime
  • Python Fire: Automatically turns any Python object or module into a command line interface (CLI)
  • Python Client for Google Maps Services: Python client library for Google Maps API Web services
  • Hyou: Pythonic Interface to manipulate Google Spreadsheet
  • oauth2l: A simple CLI tool to get an OAuth token
  • mock_maps_apis: Small AppEngine application that can mock some of the Google Maps APIs
  • TensorFlow: TensorFlow is a fast, flexible, and scalable open source machine learning library

#3 Brian: How to Handle Missing Data with Python

  • Jason Brownlee
  • Real-world data often has missing values.
  • Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.
  • Handling missing data is important as many machine learning algorithms do not support data with missing values.

#4 Michael: hug REST framework

  • Drastically simplify API development over multiple interfaces
  • With hug, design and develop your API once, then expose it however your clients need to consume it (locally, over HTTP, or through the command line)
  • hug is the fastest and most modern way to create APIs on Python3
  • hug has been built from the ground up with performance in mind.
    • It is built to consume resources only when necessary
    • compiled with Cython to achieve amazing performance
  • Built in version management
  • Automatic documentation
  • Annotation powered validation
  • Write once. Use everywhere (CLI, Python package, Web API)

#5 Brian CLI with Click

#6 Michael: Python's Instance, Class, and Static Methods Demystified

  • From realpython.com, guest post from Dan Bader
  • demystify what’s behind class methods, static methods, and regular instance methods
  • Python 3 by default

        class MyClass:
            def method(self):
                return 'instance method called', self
    
            @classmethod
            def classmethod(cls):
                return 'class method called', cls
    
            @staticmethod
            def staticmethod():
                return 'static method called'
    
  • Instances is clear but static and class are not so much

    • static and class methods are also available on instances
    • choice between class vs static method (do you want inheritance?)
    • instance methods can also access the class itself through the self.__class__ attribute

Follow ups

David Bieber from Google and Python Fire sent us this note: The program noted that Fire has one "heavy" dependency, IPython. Just wanted to chime in with this: we have a game plan to remove IPython as a required dependency, but we're not there yet. (Contributions are welcome!)

News from us

Brian

Michael


Want to go deeper? Check our projects