Episode #20: Finding similar but not identical images in 128 bits via Python

Published Wed, Apr 5, 2017, recorded Tues, Apr 4, 2017.

Sponsored by Rollbar, thank you! rollbar.com/pythonbytes

#1 Brian: Duplicate image detection with perceptual hashing in Python

  • Ben Hoyt
  • From Jetsetter.com, Invitation-Only Travel Community
  • We use a perceptual image hash called dHash (“difference hash”), which was developed by Neal Krawetz in his work on photo forensics. It’s a very simple but surprisingly effective algorithm that involves the following steps (to produce a 128-bit hash value)
    • Convert the image to grayscale
    • Downsize to a 9x9 square of gray values (or 17x17 for a larger, 512-bit hash)
    • Calculate the “row hash”: for each row, move from left to right, and output a 1 bit if the next gray value is greater than or equal to the previous one, or a 0 bit if it’s less (each 9-pixel row produces 8 bits of output)
    • Calculate the “column hash”: same as above, but for each column, move top to bottom
    • Concatenate the two 64-bit values together to get the final 128-bit hash
  • Fast: Python is not very fast at bit twiddling, but all the hard work of converting to grayscale and downsizing is done by a C library: ImageMagick+wand or PIL.
  • Available via github: https://github.com/Jetsetter/pybktree

#2 Michael: Google Open Source/Python

  • subprocess32: A reliable subprocess module for Python 2
  • Grumpy: A Python to Go transcompiler and runtime
  • Python Fire: Automatically turns any Python object or module into a command line interface (CLI)
  • Python Client for Google Maps Services: Python client library for Google Maps API Web services
  • Hyou: Pythonic Interface to manipulate Google Spreadsheet
  • oauth2l: A simple CLI tool to get an OAuth token
  • mock_maps_apis: Small AppEngine application that can mock some of the Google Maps APIs
  • TensorFlow: TensorFlow is a fast, flexible, and scalable open source machine learning library

#3 Brian: How to Handle Missing Data with Python

  • Jason Brownlee
  • Real-world data often has missing values.
  • Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.
  • Handling missing data is important as many machine learning algorithms do not support data with missing values.

#4 Michael: hug REST framework

  • Drastically simplify API development over multiple interfaces
  • With hug, design and develop your API once, then expose it however your clients need to consume it (locally, over HTTP, or through the command line)
  • hug is the fastest and most modern way to create APIs on Python3
  • hug has been built from the ground up with performance in mind.
    • It is built to consume resources only when necessary
    • compiled with Cython to achieve amazing performance
  • Built in version management
  • Automatic documentation
  • Annotation powered validation
  • Write once. Use everywhere (CLI, Python package, Web API)

#5 Brian CLI with Click

#6 Michael: Python's Instance, Class, and Static Methods Demystified

  • From realpython.com, guest post from Dan Bader
  • demystify what’s behind class methods, static methods, and regular instance methods
  • Python 3 by default
    class MyClass:
        def method(self):
            return 'instance method called', self

        @classmethod
        def classmethod(cls):
            return 'class method called', cls

        @staticmethod
        def staticmethod():
            return 'static method called'
  • Instances is clear but static and class are not so much
    • static and class methods are also available on instances
    • choice between class vs static method (do you want inheritance?)
    • instance methods can also access the class itself through the self.__class__ attribute

Follow ups

David Bieber from Google and Python Fire sent us this note: The program noted that Fire has one "heavy" dependency, IPython. Just wanted to chime in with this: we have a game plan to remove IPython as a required dependency, but we're not there yet. (Contributions are welcome!)

News from us

Brian

Michael


Want to go deeper? Check our projects