Neural architecture search (NAS) is one of the hottest research areas in machine learning, with hundreds of papers released in the last few years (see this website). In neural architecture search, the goal is to use an algorithm (sometimes even a neural network) to learn the best neural architecture for a given dataset.
The most popular techniques for NAS include reinforcement learning, evolutionary algorithms, Bayesian optimization, and gradient-based methods. Each technique has its strengths and drawbacks. For example, Bayesian optimization (BayesOpt) is theoretically one of the most promising methods and has seen huge success in hyperparameter optimization for ML. However, it is very challenging to run Bayesian optimization for NAS in practice. Bayesian optimization works by modeling the space of neural architectures, and then automatically telling you which neural architecture to try next. See our previous blog post for an introduction to BayesOpt for NAS. However, setting up BayesOpt for NAS requires a huge amount of human effort in creating a hand-crafted distance function and tuning a Gaussian Process.
In our new paper, we design BANANAS, a novel NAS algorithm that uses Bayesian optimization with a neural network model instead of a GP model. That is, in every iteration of Bayesian optimization, we train a meta neural network to predict the accuracy of unseen neural architectures in the search space. This technique gets rid of the aforementioned problems with Bayesian optimization NAS: the model is powerful enough to predict neural network accuracies, and there is no need to construct a distance function between neural networks by hand.
We use a path-based encoding scheme to encode a neural architecture, which drastically improves the predictive accuracy of our meta neural network. Training on just 200 random neural architectures yields promising results. We can predict the validation accuracy of the new neural architecture to within 1% of its true accuracy for multiple search spaces on an average. BANANAS also utilizes a novel variant of Thompson sampling for the acquisition function in Bayesian optimization.
We tested BANANAS on two of the most popular search spaces, NASBench and DARTS. Our algorithm performed better than all other algorithms we tried, including evolutionary search, reinforcement learning, standard BayesOpt, AlphaX, ASHA, and DARTS. The best architecture found by BANANAS achieved a 2.57% test error on CIFAR-10, on par with state-of-the-art NAS algorithms.
Included in the GitHub repository is a Jupyter notebook that lets you easily train a meta neural network on the NASBench dataset. Input your favorite combination of hyperparameters to try to achieve the best prediction accuracy on NASBench!