Tutorial: Katib Hyperparameter Tuning

Example 1: TensorFlow

To complete this tutorial:

  1. If you have not done so already, download the Kubeflow tutorials zip file, which contains sample files for all of the included Kubeflow tutorials.
  2. Deploy the example file:

    kubectl apply -f tensorflow-example.yaml
  3. Open the Kubeflow UI and nagivate to Home > View Katib experiments.
  4. Click the experiment name, and then observe the running trials.
  5. Check the experiment status:

    kubectl get experiment
  6. Check the experiment trials:

    kubectl get trial

Example 2: Random Algorithm

This example may take some time to finish, depending on the resources allocated.

The following hyperparameters can be tuned:

  • --lr - learning rate
  • --num-layers - Number of layers in the neural networks
  • --optimizer

To launch an experiment using the random algorithm example:

  1. If you have not done so already, download the Kubeflow tutorials zip file file, which contains sample files for all of the included Kubeflow tutorials.
  2. Deploy the example file:

    kubectl apply -f random-example.yaml

This example embeds the hyperparameters as arguments. You can embed hyperparameters in another way (e.g. by using environment variables) by using the template defined in the TrialTemplate.GoTemplate.RawTemplate section of the yaml file. The template uses the Go template format (link opens an external website in a new browser tab/window).

This example randomly generates the following hyperparameters:

  • --lr - Learning rate (type: double).
  • --num-layers - Number of layers in the neural network (type: integer).
  • --optimizer - Optimizer (type: categorical).

Check the experiment status:

kubectl describe experiment random-example

Example 3: PyTorch

This example may take some time to finish, depending on the resources allocated.

  1. If you have not done so already, download the Kubeflow tutorials zip file file, which contains sample files for all of the included Kubeflow tutorials
  2. Deploy the example file:

    kubectl apply -f pytorch-example.yaml
  3. Open the Kubeflow UI and navigate to Home > View Katib experiments.
  4. Click the experiment name, and then observe the trials running.
  5. Check the experiment status:

    kubectl get experiment
  6. Use the following command to check trials of the experiment:

    kubectl get trial

Clean Up

Delete the examples with the following commands:
  • Random algorithm example:
    kubectl delete -f random-example.yaml
  • Tensorflow example:
    kubectl delete -f tensorflow-example.yaml
  • PyTorch example:
    kubectl delete -f pytorchjob-example.yaml

Sample Katib Commands

To check experiment results via the kubectl CLI.

  • List experiments:

    kubectl get experiment
    NAME                STATUS      AGE
    random-experiment   Succeeded   25m
  • Check experiment result

    kubectl get experiment random-example -o yaml
  • List trials

    kubectl get trials
    
    NAME                         STATUS      AGE
    random-experiment-24lgqghm   Succeeded   26m
  • Check trial detail

    kubectl get trials random-experiment-24lgqghm -o yaml

To check the status using the interface:

  1. Go to the Kubeflow home page.
  2. Click the View Katib experiments button.
  3. Click the name of the experiment.
  4. Observe the built experiment graph after all the trials have Succeeded.