Search Algorithms¶

Choose how Kubeflow searches for the best hyperparameters.

Overview¶

Different algorithms have different trade-offs:

Algorithm	Best For	Trade-off
Random Search	Quick exploration, simple problems	May miss optimal regions
Bayesian Optimization	Expensive training, small budgets	More overhead per trial
Grid Search	Exhaustive search, few parameters	Doesn’t scale well

Random Search (Default)¶

Randomly samples hyperparameters from the search space:

from kubeflow.optimizer.types import RandomSearch

client.optimize(
    trial_template=template,
    search_space=search_space,
    algorithm=RandomSearch(),
)

When to use:

You have a large search space
Training is relatively fast
You want a simple baseline

Pros: Simple, parallelizes well, surprisingly effective.

Cons: No learning between trials, may miss optimal regions.

Bayesian Optimization¶

Uses a probabilistic model to guide the search:

from kubeflow.optimizer.types import BayesianOptimization

client.optimize(
    trial_template=template,
    search_space=search_space,
    algorithm=BayesianOptimization(),
)

When to use:

Training is expensive (hours per run)
You have a limited compute budget
You want to minimize the number of trials

Pros: Learns from previous trials, converges faster.

Cons: Doesn’t parallelize as well, more complex.

Grid Search¶

Exhaustively tries all combinations:

from kubeflow.optimizer.types import GridSearch

# Grid search works best with discrete choices
search_space = {
    "learning_rate": Search.choice([0.001, 0.01, 0.1]),
    "batch_size": Search.choice([32, 64]),
}
# This will try all 6 combinations

client.optimize(
    trial_template=template,
    search_space=search_space,
    algorithm=GridSearch(),
)

When to use:

You have very few hyperparameters
You need to try all combinations
Search space is already discrete

Pros: Exhaustive, guaranteed to find best in the grid.

Cons: Exponential scaling, impractical for many parameters.

Controlling the Search¶

Limit number of trials:

from kubeflow.optimizer.types import TrialConfig

client.optimize(
    trial_template=template,
    search_space=search_space,
    trial_config=TrialConfig(max_trials=20),  # Stop after 20 trials
)

Run trials in parallel:

client.optimize(
    trial_template=template,
    search_space=search_space,
    trial_config=TrialConfig(
        max_trials=50,
        parallel_trials=5,  # Run 5 at a time
    ),
)

Algorithm Recommendations¶

Scenario	Recommended Algorithm
First exploration of a new model	Random Search with 20-50 trials
Training takes hours per run	Bayesian Optimization
Only 2-3 hyperparameters to tune	Grid Search
Large compute budget available	Random Search with many parallel trials
Need to find good config quickly	Bayesian Optimization with early stopping