Use hyperparameter tuning

bookmark_border

Hyperparameters are variables that govern the process of training a model, such as batch size or the number of hidden layers in a deep neural network. Hyperparameter tuning searches for the best combination of hyperparameter values by optimizing metric values across a series of trials. Metrics are scalar summaries that you add to your trainer, such as model accuracy.

Learn more about hyperparameter tuning on Vertex AI. For a step-by-step example, refer to the Vertex AI: Hyperparameter Tuning codelab.

This page shows you how to:

Prepare your training application

In a hyperparameter tuning job, Vertex AI creates trials of your training job with different sets of hyperparameters and evaluates the effectiveness of a trial using the metrics you specified. Vertex AI passes hyperparameter values to your training application as command-line arguments. For Vertex AI to evaluate the effectiveness of a trial, your training application must report your metrics to Vertex AI.

The following sections describe:

How Vertex AI passes hyperparameters to your training application.
Options for passing metrics from your training application to Vertex AI.

To learn more about the requirements for custom training applications that run on Vertex AI, read Training code requirements.

Handle the command-line arguments for the hyperparameters you want to tune

Vertex AI sets command-line arguments when it calls your training application. Make use of the command-line arguments in your code:

Define a name for each hyperparameter argument and parse it using whatever argument parser you prefer, such as argparse. Use the same argument names when configuring your hyperparameter training job.

For example, if your training application is a Python module named my_trainer and you are tuning a hyperparameter named learning_rate, Vertex AI starts each trial with a command like the following:
```
python3 -m my_trainer --learning_rate learning-rate-in-this-trial
```
Vertex AI determines the learning-rate-in-this-trial and passes it in using the learning_rate argument.
Assign the values from the command-line arguments to the hyperparameters in your training code.

Learn more about the requirements for parsing command-line arguments.

Report your metrics to Vertex AI

To report your metrics to Vertex AI, use the cloudml-hypertune Python package. This library provides helper functions for reporting metrics to Vertex AI.

Learn more about reporting hyperparameter metrics.

Create a hyperparameter tuning job

Depending on what tool you want to use to create a HyperparameterTuningJob, select one of the following tabs:

In the Google Cloud console, you cannot create a HyperparameterTuningJob resource directly. However, you can create a TrainingPipeline resource that creates a HyperparameterTuningJob.

The following instructions describe how to create a TrainingPipeline that creates a HyperparameterTuningJob and doesn't do anything else. If you want to use additional TrainingPipeline features, like training with a managed dataset, read Creating training pipelines.

In the Google Cloud console, in the Vertex AI section, go to the Training pipelines page.

Go to Training pipelines
Click add_box Create to open the Train new model pane.
Note: You can type model.new into a browser to go directly to the model creation page.
On the Training method step, specify the following settings:
1. In the Dataset drop-down list, select No managed dataset.
2. Select Custom training (advanced).
Click Continue.
On the Model details step, choose Train new model or Train new version. If you select train new model, enter a name of your choice, MODEL_NAME, for your model. Click Continue.
On the Training container step, specify the following settings:
1. Select whether to use a Pre-built container or a Custom container for training.
2. Depending on your choice, do one of the following:
  - If you want to use a pre-built container for training, then provide Vertex AI with information it needs to use the training package that you have uploaded to Cloud Storage:
    1. Use the Model framework and Model framework version drop-down lists to specify the pre-built container that you want to use.
    2. In the Package location field, specify the Cloud Storage URI of the Python training application that you have created and uploaded. This file usually ends with .tar.gz.
    3. In the Python module field, enter the module name of your training application's entry point.
  - If you want to use a custom container for training, then in the Container image field, specify the Artifact Registry, Container Registry, or Docker Hub URI of your container image.
3. In the Model output directory field, you may specify the Cloud Storage URI of a directory in a bucket that you have access to. The directory does not need to exist yet.
  
  This value gets passed to Vertex AI in the baseOutputDirectory API field, which sets several environment variables that your training application can access when it runs.
4. Optional: In the Arguments field, you can specify arguments for Vertex AI to use when it starts running your training code. The maximum length for all arguments combined is 100,000 characters. The behavior of these arguments differs depending on what type of container you are using:
  - If you are using a pre-built container, then Vertex AI passes the arguments as command-line flags to your Python module.
  - If you are using a custom container, then Vertex AI overrides your container's CMD instruction with the arguments.
Click Continue.
On the Hyperparameter tuning step, select Enable hyperparameter tuning checkbox and specify the following settings:
1. In the New Hyperparameter section, specify the Parameter name and Type of a hyperparameter that you want to tune. Depending on which type you specify, configure the additional hyperparameter settings that appear.
  
  Learn more about hyperparameter types and their configurations.
2. If you want to tune more than one hyperparameter, click Add new parameter and repeat the previous step in the new section that appears.
  
  Repeat this for each hyperparameter that you want to tune.
3. In the Metric to optimize field and the Goal drop-down list, specify the name and goal of the metric that you want to optimize.
4. In the Maximum number of trials field, specify the maximum number of trials that you want Vertex AI to run for your hyperparameter tuning job.
5. In the Maximum number of parallel trials field, specify the maximum number of trials to let Vertex AI run at the same time.
6. In the Search algorithm drop-down list, specify a search algorithm for Vertex AI to use.
7. Ignore the Enable early stopping toggle, which has no effect.
Click Continue.
On the Compute and pricing step, specify the following settings:
1. In the Region drop-down list, select a "region that supports custom training"
2. In the Worker pool 0 section, specify compute resources to use for training.
  
  If you specify accelerators, make sure the type of accelerator that you choose is available in your selected region.
  
  If you want to perform distributed training, then click Add more worker pools and specify an additional set of compute resources for each additional worker pool that you want.
Click Continue.
On the Prediction container step, select No prediction container.
Click Start training to start the custom training pipeline.

Hyperparameter training job configuration

Hyperparameter tuning jobs search for the best combination of hyperparameters to optimize your metrics. Hyperparameter tuning jobs do this by running multiple trials of your training application with different sets of hyperparameters.

When you configure a hyperparameter tuning job, you must specify the following details:

The hyperparameters you want to tune and the metrics that you want to use to evaluate trials.

Learn more about selecting hyperparameters and metrics.
Details about the number of trials to run as a part of this tuning job, such as the following:
Details about the custom training job that is run for each trial, such as the following:
- The machine type that the trials jobs run in and the accelerators that the job uses.
  Note: Currently, Vertex AI does not support hyperparameter tuning jobs that require TPUs.
- The details of the custom container or Python package job.
  
  Learn more about training code requirements.

Limit the number of trials

Decide how many trials you want to allow the service to run and set the maxTrialCount value in the HyperparameterTuningJob object.

There are two competing interests to consider when deciding how many trials to allow:

time (and therefore cost)
accuracy

Increasing the number of trials generally yields better results, but it is not always so. Usually, there is a point of diminishing returns after which additional trials have little or no effect on the accuracy. Before starting a job with a large number of trials, you may want to start with a small number of trials to gauge the effect your chosen hyperparameters have on your model's accuracy.

To get the most out of hyperparameter tuning, you shouldn't set your maximum value lower than ten times the number of hyperparameters you use.

Parallel trials

You can specify how many trials can run in parallel by setting parallelTrialCount in the HyperparameterTuningJob.

Running parallel trials has the benefit of reducing the time the training job takes (real time—the total processing time required is not typically changed). However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials start without having the benefit of the results of any trials still running.

If you use parallel trials, the hyperparameter tuning service provisions multiple training processing clusters (or multiple individual machines in the case of a single-process trainer). The work pool spec that you set for your job is used for each individual training cluster.

Handle failed trials

If your hyperparameter tuning trials exit with errors, you might want to end the training job early. Set the maxFailedTrialCount field in the HyperparameterTuningJob to the number of failed trials that you want to allow. After this number of trials fails, Vertex AI ends the training job. The maxFailedTrialCount value must be less than or equal to maxTrialCount.

If you do not set maxFailedTrialCount, or if you set it to 0, Vertex AI uses the following rules to handle failing trials:

If the first trial of your job fails, Vertex AI ends the job immediately. Failure during the first trial suggests a problem in your training code, so further trials are also likely to fail. Ending the job lets you diagnose the problem without waiting for more trials and incurring greater costs.
If the first trial succeeds, Vertex AI might end the job after failures during subsequent trials based on one of the following criteria:
- The number of failed trials has grown too high.
- The ratio of failed trials to successful trials has grown too high.

These rules are subject to change. To ensure a specific behavior, set the maxFailedTrialCount field.

Manage hyperparameter tuning jobs

The following sections describe how to manage your hyperparameter tuning jobs.

Retrieve information about a hyperparameter tuning job

The following code samples demonstrate how to retrieve a hyperparameter tuning job.

Use the gcloud ai hp-tuning-jobs describe command:

gcloud ai hp-tuning-jobs describe ID_OR_NAME \
    --region=LOCATION

Replace the following:

ID_OR_NAME: either the name or the numerical ID of the HyperparameterTuningJob. (The ID is the last part of the name.)

You might have seen the ID or name when you created the HyperparameterTuningJob. If you don't know the ID or name, you can run the gcloud ai hp-tuning-jobs list command and look for the appropriate resource.
LOCATION: the region where the HyperparameterTuningJob was created.

Cancel a hyperparameter tuning job

The following code samples demonstrate how to cancel a hyperparameter tuning job.

Use the gcloud ai hp-tuning-jobs cancel command:

gcloud ai hp-tuning-jobs cancel ID_OR_NAME \
    --region=LOCATION

Replace the following:

ID_OR_NAME: either the name or the numerical ID of the HyperparameterTuningJob. (The ID is the last part of the name.)

You might have seen the ID or name when you created the HyperparameterTuningJob. If you don't know the ID or name, you can run the gcloud ai hp-tuning-jobs list command and look for the appropriate resource.
LOCATION: the region where the HyperparameterTuningJob was created.

Delete a hyperparameter tuning job

The following code samples demonstrate how to delete a hyperparameter tuning job using the Vertex AI SDK for Python and the REST API.

Use the following code sample to delete a hyperparameter tuning job using the delete method of the hyperparameterTuningJob resource.

Before using any of the request data, make the following replacements:

LOCATION: Your region.
NAME: The name of the hyperparameter tuning job. The job name uses the following format projects/{project}/LOCATIONs/{LOCATION}/hyperparameterTuningJobs/{hyperparameterTuningJob}.

HTTP method and URL:

DELETE https://LOCATION-aiplatform-googleapis.com/v1/NAME

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.

PowerShell (Windows)

Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.

You should receive a successful status code (2xx) and an empty response.

What's next

Learn more about the concepts involved in hyperparameter tuning.

Use hyperparameter tuning

Use hyperparameter tuning

Prepare your training application

Handle the command-line arguments for the hyperparameters you want to tune

Report your metrics to Vertex AI

Create a hyperparameter tuning job

Hyperparameter training job configuration

Limit the number of trials

Parallel trials

Handle failed trials

Manage hyperparameter tuning jobs

Retrieve information about a hyperparameter tuning job

Cancel a hyperparameter tuning job

Delete a hyperparameter tuning job

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

What's next

Recommend

快更新！iOS 16正式推送增加超多新功能兼容22款设备

Ethereum blockchain set for 'monumental' overhaul

Apple Releases watchOS 9 With Medication Tracking, New Watch Faces, Sleep Update...

All the best from the Ubisoft Forward event

3000万美元“网红”游艇装上星链，马斯克点赞“太酷了”

Kodi+阿里云盘，搭建高清4K私人影院

Influsing ML models into production pipelines with Dataflow | Google Cloud Blog

Apple Releases tvOS 16 With Expanded Game Controller Support, Cross-Device Conne...

iOS 16 now available: Here's every new feature - 9to5Mac

Starbucks details its blockchain-based loyalty platform and NFT community, Starb...

About Joyk