Making Prediction Calls With Deployed Models

This topic describes the requirements and a template for making a prediction call to the deployed model.

Prerequisites

Required access rights: Project Administrator or Project Member

About this task

This task is part of the process to put a model into production. After the model has been developed and registered in the HPE Ezmeral Runtime Enterprise model registry, you deploy the model, which enables prediction calls to be sent to this model.

Procedure

  1. Get an auth token for the prediction call.

    To make a prediction on the deployed model, a user token is required. The procedure to get the token depends on whether you are using the grapical user interface (GUI) or the command line.

    Using the GUI:

    1. Navigate to the project in the new UI, as described in HPE Ezmeral Runtime Enterprise new UI.
    2. Select View All on the Deployed Models panel. The Deployed Models screen opens.

      Deployed Models screen
    3. Select Copy Auth Token. The Copy Auth Token menu appears.

      Copy Auth Token menu
    4. Enter your password and select Copy.
    Using the command line:
    1. Record the URL of the Kubeflow Dashboard for use in a later step. To display the URL, in a tenant view, select the Dashboard tab.
    2. Enter the following kubectl command:
      kubectl get svc kftoken-svc --n prism-ns --o yaml
    3. In the output, the annotation for hpecp-internal-gateway/10001 provides the URL to use to obtain the auth todken. Record that URL.
    4. To get the auth token, enter the following command, substituting your own values for <variable> items:
      curl --location --request POST 'http://<auth-token-provider-url>/token' --header 'Content-Type: application/json' --data-raw '{  "kubeflow_dashboard": "<dashboard-url>",  "user": "<username>", "password": "<password>"}'

    The token expires after 24 hours by default. After the token expires, existing processes continue to run, but subsequent requests are returned with a 403 error, and you must obtain a new token.

  2. Retrieve the model endpoint by selecting Actions > Copy Model Endpoint for a running model.
  3. Make a prediction call.

    Using the model endpoint, an auth token, and data, make a prediction call on the deployed model.

    For example:
    curl --cookie "authservice_session=xxxxxxxxxxxxxxxxxxxxxxxxxxx" \
    -X POST -H 'Content-Type: application/json' -d '{"data": {"ndarray":["This film has great actors"]}}' \
    http://localhost:8003/seldon/seldon/movie/api/v1.0/predictions
    A general template for a prediction call using curl is as follows:
    curl --cookie "authservice_session=<auth-token>" -X POST \
    -H 'Content-Type: application/json' -d <data> <access-point>
    The <data> in this call is a JSON representation of an array or a dataframe. For example, the following represents an array:
    -d '{"data": {"ndarray":[[39, 7, 1, 1, 1, 1, 4, 1, 2174, 0, 40, 9]]}}' 
    When the data contains a names field, a dataframe is assumed. The type of predict call (dataframe or array) depends on the input requirement for the model artifact.

    For more information about the Seldon prediction call, see External Prediction API in the official Seldon Core documentation (link opens an external site in a new browser tab or window).