14

ML Privacy Meter — A comprehensive tool for Privacy Attacks on your ML model

 4 years ago
source link: https://towardsdatascience.com/ml-privacy-meter-a-comprehensive-tool-for-privacy-attacks-on-your-ml-model-b4d3f2e05cb4?gi=1522f2d60d83
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

ML Privacy Meter — A comprehensive tool to quantify privacy risks of Machine Learning Models

ML Privacy Meter, available at https://github.com/privacytrustlab/ml_privacy_meter/tree/master/tutorials

aUNvIzv.jpg!web

Photo by Markus Spiske on Unsplash

Deep Learning is increasingly being used to solve problems across domains due to improvements in computational power, as well as the availability of large amounts of data to train the models. With this rising ubiquity of Deep Learning, it is vital that these models be more robust, secure and private.

However, research shows that deep learning and machine learning models when improperly trained are often prone to various types of privacy vulnerabilities. One such attacks is the membership inference attack [1], where the attacker tries to infer whether some data was part of the training set. The data used for training a model is usually taken from real-world data, such as real images for an image classification problem, or actual users’ medical histories for a medical diagnosis application. Therefore, models that leak such data can be a threat to privacy of the individual members of the dataset.

The ML Privacy Meter [2] is a tool which analyses how prone a machine learning model is to membership inference attacks. The tool generates attacks on a trained target model assuming blackbox or whitebox access to the model to get the inference accuracy of the attack. Whitebox attacks can exploit the target model parameter’s gradients, intermediate layer outputs or prediction of the model to infer training set membership of the input, while blackbox attacks only use the target model predictions to identify membership [3]. The attack is performed by generating an inference model which uses the target model components which can be exploited for some data, and returns the probability of membership of the training set for that data.

Installing the Tool

The prerequisites for installing the ML Privacy Meter are Python 3.6. and TensorFlow 2.1. The dependencies of the tool are first installed, followed by the tool.

pip install -r requirements.txt
pip install -e .

Attacking the Model

The tool provides a method to attack a given trained model. The user is required to provide the model weights file, the data used for training the model, and the complete dataset.

The tool assumes that the user has a Keras model that has already been trained, along with its dataset and the subset of the dataset used for training. The format of the data files can be found in the ‘datasets/README’ file. The Keras model can either be directly loaded using Keras’ keras.load_model() function with the model file passed, or initialized as given in the code snippet below, with the trained weights file loaded using Keras’ load_weights() function.

We load the complete target Keras model to be analyzed, with its parameters and weights.

We then provide create an attack handler by providing the complete dataset and the dataset used for training. The attack handler extracts the data and creates the batches for the training and test sets for the attack model.

The parameters used in the attack_data() function are :

  • dataset_path : Path to the .txt dataset file. (Examples to download and create one is given in the ‘datasets’ directory of the project)
  • member_dataset_path : Path to the numpy data file which is used to create train the target model. (Examples to download and create one is given in the ‘datasets’ directory of the project)
  • batch_size : Batch size used while training the inference model
  • attack_percentage : Percentage of the training data used for training the inference model.
  • normalization : Boolean, which is set to true if data requires to be normalized. This calculates the median and standard deviations to normalize the data. These values may be overridden as given in the ‘tutorials/attack_alex.py’ example.
  • input_shape : Shape of the input data to the model

Next, the whitebox class is initialized to generate the inference model components. For whitebox attacks, the user can provide the layers’ gradients they would like to exploit, as well as other parameters of the neural network that the whitebox attacker would have access to. For blackbox attacks, just the final layer outputs can be used in the attack parameters.

In the below example, a whitebox attack is carried out which exploits the gradients of the last layer, and the outputs of the last 2 layers of the model. The output loss value, one-hot encoding of the actual label are also used (See the function argument descriptions below).

Finally, the train_attack() function is called to perform the actual attack, and generate the results. During this, the model components to be exploited are used to train the attack model, according to the given parameters.

The arguments used in the initialize() function are:

target_train_model

Once the attack is executed, the attack results can be viewed in the logs folder, and in the console.

End-to-end example on the Alexnet CIFAR-100 Attack

To perform an attack as in Nasr et al [3], the Alexnet model trained on the CIFAR-100 dataset is used. The whitebox attack can be performed on the model while exploiting the gradients, final layer outputs, loss values and label values.

First, the pretrained model from the ` tutorials/models ` directory is downloaded and placed in the root directory of the project.

unzip tutorials/models/alexnet_pretrained.zip -d .

3YZNVff.png!web

Pretrained CIFAR-100 Alexnet Model (Converted from Pytorch to Keras)

Note : The user can also train their own model to attack simliar to the example in `tutorials/alexnet.py`

Then, the script to download the required data files is executed. This downloads the dataset file and training set file and converts them into the format required by the tool.

cd datasets

chmod +x download_cifar100.sh

./download_cifar100.sh

Then, the main attack code is executed to get the results.

python tutorials/attack_alexnet.py

The `attackobj` in the file initializes the whitebox class and the attack configuration. Following are some examples of configurations that can be changed in the function.

Note : The code explicitly sets the means and standard deviations for normalizing the images, according to the CIFAR-100 distribution.

1. Whitebox attack — Exploit the final layer gradients, final layer outputs, loss values and label values (DEFAULT)

attackobj = ml_privacy_meter.attack.whitebox.initialize(  target_train_model=cmodelA,  target_attack_model=cmodelA,  train_datahandler=datahandlerA,  attack_datahandler=datahandlerA,ew  layers_to_exploit=[26],  gradients_to_exploit=[6],  device=None)

2. Whitebox attack — Exploit final two model layer outputs, loss values and label values

attackobj = ml_privacy_meter.attack.whitebox.initialize(  target_train_model=cmodelA,  target_attack_model=cmodelA,  train_datahandler=datahandlerA,  attack_datahandler=datahandlerA,  layers_to_exploit=[22, 26],  device=None)

2. Blackbox attack — Exploit final layer output and label values

attackobj = ml_privacy_meter.attack.whitebox.initialize(  target_train_model=cmodelA,  target_attack_model=cmodelA,  train_datahandler=datahandlerA,  attack_datahandler=datahandlerA,  layers_to_exploit=[26],  exploit_loss=False,  device=None)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK