3

AutoML Reading Note 1

 2 years ago
source link: https://xijunlee.github.io/2018/11/25/HPO/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

AutoML Reading Note 1

2018-11-25

Although there are currently several developed machine learning suites,such as Keras, Pytorch, etc, facilitating the prevalence of machine learning techniques, data scientists or machine learning engineers still need to face the difficulty of hyperparameter choice of machine learning models. It has been empirically proved that for a given model, different choices of hyperparameter also result in very different performances (accuracy, recall, etc). There is a trend of automating the hyperparameter selection for machine learning model, which is part of AutoML.

Recently I have been reading the ``ongoing’’ book, AutoML, of which Chapter 1 introduces the existing methods to solve the hyperparameter optimization of machine learning model, and discusses several open problems as well as future research direction of this subfield. I wrote the post after reading this chapter. Note that all references can be found in the book.

What is hyper parameter?

Hyperparameter optimization has led to new state-of-the-art performances for important machine learning benchmarks in several studies. Besides, Automated HPO is clearly more reproducible than manual search. It facilitates fair comparisons since dierent methods can only be compared fairly if they all receive the same level of tuning for the problem at hand.

Problem statement of hyperparameter optimization

Methods to optimize hyperparameter

Black hyperparameter optimization

Bayesian optimization is actually an iterative algorithm with the probabilisitic surrogate model and acquisition function. In each iteration, the surrogate model is fitted to all observations of the target function made so far. Then, the acquisition function, which uses the predictive distribution of the probabilisitic model, determines the utility of different candidate points, trading off exploration and exploitation.

Multi-fidelity optimization

Multi-task Bayesian optimization uses a multi-task Gaussian process to model the performance of related tasks and to automatically learn the tasks’ correlation during the optimization process. This method can dynamically switch between cheaper, low-fidelity tasks and the expensive, high-fidelity target task based on a cost-aware information-theoretic acquisition function. In practice, the proposed method starts exploring the conguration space on the cheaper task and only switches to the more expensive conguration space in later parts of the optimization, approximately halving the time required for HPO.

Applications

In the future


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK