Notebooks

Describes how to identify and debug issues for Notebooks.

The Default User Jupyter Notebook Cannot Connect to Kubeflow

When you try to connect your default user notebook, the Kubeflow UI returns the following message:
Couldn't find any information for the status of this notebook

This occurs when a username starts with a number, such as 3user, because notebooks cannot have names that start with a number.

When a user is added to HPE Ezmeral Unified Analytics Software, the system automatically creates a default notebook for the user and assigns the notebook a name in the following format:

<username>-notebook

If the username starts with a number, such as 3user, the default user notebook name also starts with a number (3user-notebook), which is not supported. When this occurs, Kubeflow does not recognize the notebook, due to the name, and cannot connect.

Workaround

Use either of the following options to resolve the issue:
Option 1
Create a new notebook with the same image and configurations. Make sure that the notebook name consists of lowercase alphanumeric characters, with or without dashes (-) and does not start with a number. The name must start with a letter (a-z). For example, you can name a notebook my-notebook-1, but you cannot name a notebook 1-my-notebook.
Option 2
Ask your HPE Ezmeral Unified Analytics Software admin to delete the user account and then create a new one with a username that adheres to the Username Attribute naming requirements, as described in AD/LDAP Servers.

“No healthy upstream” Error in Notebook Server Connection

When connecting to the notebook server, you may get the "no healthy upstream" error message due to an unhealthy notebook pod. To identify the issue, you must check pod logs and events either using the Kubeflow UI or manually using the kubectl commands.

Using Kubeflow UI
To access pod logs, events, and check the container status from the Kubeflow UI, follow these steps:
  1. Sign in to HPE Ezmeral Unified Analytics Software.
  2. Click the Tools & Frameworks icon on the left navigation bar.
  3. Navigate to the Kubeflow tile under the Data Science tab and click Open.
  4. In the Kubeflow Central Dashboard UI, click Notebooks on the left navigation bar.
  5. Click <your-unhealthy-notebook-name> to view the notebook details.

  6. To check the current status of the container, click the OVERVIEW tab and look for the Conditions section. The Conditions section shows the current status of the container.
  7. To access pod logs, click the LOGS tab.
  8. To access pod events, click the EVENTS tab.

Using kubectl Commands
To access pod logs, events, and check the container status from the commandline, follow these steps:
  • To get pod events and container statuses, run:
    kubectl describe pod -n <user-ns> <notebook-name>-0
    Output:
    Name:             temp-0
    Namespace:        hpedemo-user01
    
    .........
    
      temp:
        Container ID:
        Image:          gcr.io/mapr-252711/kubeflow/notebooks/jupyter-tensorflow-full:ezaf-v1.8.0
        Image ID:
        Port:           8888/TCP
        Host Port:      0/TCP
        State:          Waiting
          Reason:       PodInitializing
        Ready:          False
        Restart Count:  0
    
    .......
    
    Events:
      Type     Reason                  Age   From                     Message
      ----     ------                  ----  ----                     -------
      Warning  FailedScheduling        48s   default-scheduler        0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
      Warning  FailedScheduling        46s   default-scheduler        0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
      Normal   Scheduled               44s   default-scheduler        Successfully assigned hpedemo-user01/temp-0 to mip-bd-dev04.mip.storage.hpecorp.net
      Normal   SuccessfulAttachVolume  44s   attachdetach-controller  AttachVolume.Attach succeeded for volume "mapr-pv-bd0db07c-4e43-4e78-8503-7f61649a7bd0"
      Normal   Pulling                 35s   kubelet                  Pulling image "marketplace.us1.greenlake-hpe.com/ezua/istio/proxyv2:1.16.2"
      Normal   Pulled                  34s   kubelet                  Successfully pulled image "marketplace.us1.greenlake-hpe.com/ezua/istio/proxyv2:1.16.2" in 1.127945155s (1.127954107s including waiting)
      Normal   Created                 34s   kubelet                  Created container istio-validation
      Normal   Started                 34s   kubelet                  Started container istio-validation
      Normal   Pulling                 33s   kubelet                  Pulling image "marketplace.us1.greenlake-hpe.com/ezua/istio/proxyv2:1.16.2"
      Normal   Pulled                  29s   kubelet                  Successfully pulled image "marketplace.us1.greenlake-hpe.com/ezua/istio/proxyv2:1.16.2" in 4.611252056s (4.611259156s including waiting)
      Normal   Created                 29s   kubelet                  Created container istio-proxy
      Normal   Started                 28s   kubelet                  Started container istio-proxy
      Normal   Pulling                 27s   kubelet                  Pulling image "gcr.io/mapr-252711/kubeflow/notebooks/jupyter-tensorflow-full:ezaf-v1.8.0"
  • To get pod logs, run:
    kubectl logs -n <user-ns> <notebook-name>-0

Result:

You can now identify the issue by checking pod logs, events, and the current status of the container.

Memory Accumulation and Unreleased Memory in Jupyter Notebooks

Memory consumption keeps increasing as Jupyter Notebooks are run. Even after closing the notebook, memory is not released which leads to a gradual accumulation of objects in memory with each notebook run. Eventually, the notebook server becomes unusable as memory reaches its limits and you are required to launch a new notebook server.

To release the memory, follow these steps to kill the kernels of closed notebooks:

  1. Sign in to HPE Ezmeral Unified Analytics Software.
  2. Click Notebooks icon on the left navigation bar of HPE Ezmeral Unified Analytics Software screen.
  3. Connect to the notebook server.
  4. Open the notebook you want to close.
  5. Click File in the menu bar.
  6. Select Close and Shutdown Notebook.

  7. Repeat the process for any other notebooks that are no longer in use.

Result:

By closing the notebooks using the Close and Shutdown Notebook option, you ensure that associated kernel is properly shut down which releases the memory it was using. This prevents the accumulation of objects in memory and keeps the notebook server usable for longer periods.

Specified Image Pull Policy Not Applied to a Pod

When you create a notebook server and set the imagePullPolicy to IfNotPresent or Never, the specified image pull policy is not set to the pod. In both scenarios, the imagePullPolicy is set to Always.

To verify that the specified image pull policy is not applied to a pod, follow these steps:
  1. Sign in to HPE Ezmeral Unified Analytics Software.
  2. Click Notebooks icon on the left navigation bar of HPE Ezmeral Unified Analytics Software screen.
  3. Click New Notebook Server. You will be navigated to the Kubeflow Notebooks UI.
  4. Enter the name of the notebook server.


  5. Click Custom Notebook.
  6. Click Advanced Options.
  7. Set Image pull policy to IfNotPresent.
  8. To launch the notebook server, click Launch.
  9. After creating the notebook server, click <your-notebook-name> to view the notebook details.

  10. Click the YAML tab.
  11. Select Show the full YAML of the Pod.
  12. Locate the imagePullPolicy property for the image used in creating the notebook.

Result:

The imagePullPolicy is set to Always.