Tutorial: Training with TensorFlow (Financial Series)

Prerequisites:
  • An Internet connection is required to download the dependencies needed for this tutorial. This tutorial is not available for Air Gapped environments.
  • If you have not done so already: Before beginning this tutorial download the Kubeflow tutorials zip file, which contains sample files for all of the included Kubeflow tutorials.

The following tutorial is based on the example at https://github.com/mapr/kubeflow-examples/tree/master/financial_time_series.

Step 1: Mount the Volume for Storing a Model

  1. Log in to the KubeDirector notebook as an LDAP user.
  2. Obtain pvc-tf-training-fin-series.yaml from the zip file mentioned above.
  3. Upload pvc-tf-training-fin-series.yaml to the KubeDirector notebook for the Persistent Volume Claim (PVC).
  4. Open the web terminal in the HPE Ezmeral Runtime Enterprise UI, or from the terminal within the KubeDirector notebook.
    NOTE
    By default, you cannot execute kubectl commands in a newly created KubeDirector notebook. To enable kubectl in a notebook, select one of the following methods:
    • Through the HPE Ezmeral Runtime Enterprise UI:
      1. In the HPE Ezmeral Runtime Enterprise UI, navigate to the Tenant section and initialize a web terminal with the corresponding button.
      2. Start a new Terminal session inside the KubeDirector notebook. Check that the files inside your KubeDirector notebook have the appropriate file permissions that allow you to work with them.
      3. Move all files you want to work with to the following path:
        /bd-fs-mnt/TenantShare
      4. You can now access the files inside the web terminal with kubectl.
    • From inside the KubeDirector notebook:
      1. To authorize your user inside the KubeDirector notebook, execute the following Jupyter code cell:
        from ezmllib.kubeconfig.ezkubeconfig import set_kubeconfig
        set_kubeconfig()
      2. A prompt appears below the code cell you executed. Enter your user password in the prompt.
      3. kubectl is now enabled for your KubeDirector notebook. Start a Terminal session in the KubeDirector notebook to work with kubectl.
  5. Apply the .yaml file to create the PVC:

    kubectl apply -f pvc-tf-training-fin-series.yaml
  6. Verify that the PVC was created and is in the bound state:

    kubectl get pvc

    The results should look like this:

    NAME         STATUS   VOLUME                                       CAPACITY   ACCESS MODES   STORAGECLASS  AGE
    pvctf        Bound    mapr-pv-edeb3067-0332-44cf-88d8-a44be8c39f7c   10Gi        RWX           default     21m

Step 2: Exploration Phase

To complete the exploration phase:

  1. Log in to the KubeDirector notebook.
  2. Perform the following:
    1. Upload FinancialTimeSerieswithFinanceData.ipynb.
    2. If your environment is behind a proxy, open the uploaded notebook and perform the following workaround:
      Add a cell above the first step, and insert the following specifying your proxies:
      %env https_proxy=YOUR_PROXY
      %env http_proxy=YOUR_PROXY
      %env no_proxy=YOUR_PROXY
      
      %env HTTPS_PROXY=YOUR_PROXY
      %env HTTP_PROXY=YOUR_PROXY
      %env NO_PROXY=YOUR_PROXY
    3. Walk throug the notebook step by step to better understand the problem and suggested solutions.

Step 3: Training Phase

To complete the training phase:

  1. Upload and apply financial-series-tfjob.yaml.
  2. Verify the TensorFlow job is created successfully:
    kubectl get tfjobs  
    NAME          STATE       AGE 
    trainingjob   Created   2m47s 
  3. Verify that pods are created, running, and then completed:
    kubectl get pods | grep trainingjob 
    trainingjob-ps-0            0/1     Completed   0     5m39s
    trainingjob-worker-0        0/1     Completed   0     5m39s
  4. Check the logs to walk through the training process description:
    kubectl logs trainingjob-ps-0
    The output should appear as follows:
    … 
    INFO:tensorflow:SavedModel written to: b'model/1/saved_model.pb'
    INFO:tensorflow:SavedModel written to: b'model/1/saved_model.pb'
    INFO:root:copy files to /data/model/1
    5000 0.5607639
    10000 0.5755208
    15000 0.5946181
    20000 0.6145833
    25000 0.6302083
    30000 0.6449653
    Precision =  0.9142857142857143
    Recall =  0.2222222222222222
    F1 Score =  0.35754189944134074
    Accuracy =  0.6006944444444444

Step 4: Clean Up the Namespace

To clean up the namespace:
  1. Delete both the pods and the job with the following command:
    kubectl delete tfjob trainingjob
  2. Delete the PVC:
    kubectl delete -f pvc-tf-training-fin-series.yaml