10

GitHub - anishathalye/neural-hash-collider: Preimage attack against NeuralHash 💣

 3 years ago
source link: https://github.com/anishathalye/neural-hash-collider
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

neural-hash-collider

Find target hash collisions for Apple's NeuralHash perceptual hash function.

For example, starting from a picture of this cat, we can find an adversarial image that has the same hash as the picture of the dog in this post:

$ python collide.py --image cat.jpg --target 59a34eabe31910abfb06f308
...
# took about 2.5 minutes to run on an i7-5930K

We can confirm the hash collision using nnhash.py from AsuharietYgvar/AppleNeuralHash2ONNX:

$ python nnhash.py dog.png
59a34eabe31910abfb06f308
$ python nnhash.py adv.png
59a34eabe31910abfb06f308

How it works

NeuralHash is a perceptual hash function that uses a neural network. Images are resized to 360x360 and passed through a neural network to produce a 128-dimensional feature vector. Then, the vector is projected onto R^96 using a 128x96 "seed" matrix. Finally, to produce a 96-bit hash, the 96-dimensional vector is thresholded: negative entries turn into a 0 bit, and non-negative entries turn into a 1 bit.

This entire process, except for the thresholding, is differentiable, so we can use gradient descent to find hash collisions. This is a well-known property of neural networks, that they are vulnerable to adversarial examples.

We can define a loss that captures how close an image is to a given target hash: this loss is basically just the NeuralHash algorithm as described above, but with the final "hard" thresholding step tweaked so that it is "soft" (in particular, differentiable). Exactly how this is done (choices of activation functions, parameters, etc.) can affect convergence, so it can require some experimentation. Refer to collide.py to see what the implementation currently does.

After choosing the loss function, we can follow the standard method to find adversarial examples for neural networks: we perform gradient descent.

Prerequisites

  • Get Apple's NeuralHash model following the instructions in AsuharietYgvar/AppleNeuralHash2ONNX and either put all the files in this directory or supply the --model / --seed arguments
  • Install Python dependencies: pip install onnx coremltools onnx_tf tensorflow numpy Pillow

Usage

Run python collide.py --image [path to image] --target [target hash] to generate a hash collision. Run python collide.py --help to see all the options, including some knobs you can tweak, like the learning rate and some other parameters.

Limitations

The code in this repository is intended to be a demonstration, and perhaps a starting point for other exploration. Tweaking the implementation (choice of loss function, choice of parameters, etc.) might produce much better results than this code currently achieves.

The code in this repository currently implements a simple loss function that just measures the distance to the target hash value. It happens to be the case that starting from a particular image produces a final image that looks somewhat similar; to better enforce this property, the loss function could be modified to add a penalty for making the image look different, e.g. l2 distance between the original image and the computed adversarial example (another standard technique), or we could use projected gradient descent to project onto an l-infinity ball centered at the original image as we optimize (yet another standard technique).

The code in this repository does not currently use any fancy optimization algorithm, just vanilla gradient descent.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK