Running Experiments With mlflow

mlflow is an open source platform for the machine learning lifecycle.

The following commands are performed by the acme.joe user with account ID c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc.

Create a data to hold trial metrics and artifacts run written by mlflow as part of your experiments:

$ eai data new mlflow_data

Prepare the worker image which the experiment will launch for each configuration. As part of this step you should adapt your code to use mlflow for logging metrics, parameters, and artifacts.

import random
import mlflow


if __name__ == '__main__':
    with mlflow.start_run() as run:
        print("Parameters:")
        for k, v in run.data.params.items():
            print(f"- {k}={v}")

        # main loop
        for i in range(1, 10):
            print(i)
            metrics = {
                'loss': random.random() * 10 * (1/i),
                'f1': i + random.random() * 10 * (1/i)
            }
            mlflow.log_metrics(metrics, i)

Next, create a custom Dockerfile that includes (at least) python and mlflow.

FROM python:3.6
RUN pip install mlflow==1.6.0
COPY main.py .
CMD python main.py

Build, tag and push the worker image:

$ export ACCOUNT_ID=c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc
$ export IMAGE=registry.console.elementai.com/$ACCOUNT_ID/experiment-example
$ docker build -t $IMAGE .
$ docker push $IMAGE

Create an experiment configuration defining the hyperparameters you want to tune as well as the trial worker’s job spec. In the following example we use grid search.

{
    "title": "EAI Toolkit Experiment",
    "worker_config": {
        "image": "registry.console.elementai.com/c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc/experiment-example",
        "resources": {
            "mem": 1,
            "cpu": 1.25
        }
    },
    "parameters": {
        "name": "GridSearch",
        "config": {
            "dimensions": {
                "hidden_dim": [
                    1
                ],
                "lr": [
                    1e-3, 1e-4
                ],
                "embedding_dim": [
                    300
                ],
                "batch_size": [
                    32
                ]
            }
        }
    }
}

Place the above experiment in a file in a new directory and create a new data based on it:

$ mkdir -p grid_search_experiment
$ $EDITOR grid_search_experiment/eai-experiment.json
$ eai data new grid_search_experiment ./grid_search_experiment

If you later want to tweak the hyperparameters, edit the file and reupload:

$ $EDITOR grid_search_experiment/eai-experiment.json
$ eai data upload acme.joe.grid_search_experiment ./grid_search_experiment

Create a file named eai-experiment-agent.yaml with a job spec for launching the experiment.

image: registry.console.elementai.com/shared.image/eai-experiment-agent:latest
isProcessAgent: true
preemptable: false
restartable: false

resources:
   mem: 4
   cpu: 2

environmentVars:
 # Assuming that the acme.joe.mlflow_data data has ID c512e62b-c103-453f-9921-313e2ac38fac
 - EAI_EXPERIMENT_ARTIFACT_DATA=c512e62b-c103-453f-9921-313e2ac38fac@latest:/mlflow_data
 - EAI_EXPERIMENT_CONFIG_PATH=/experiment/eai-experiment.json
 - MLFLOW_TRACKING_URI=/mlflow_data

# Mount both the data volume containing the configuration and the one for metrics
data:
 - acme.joe.mlflow_data@latest:/mlflow_data
 - acme.joe.grid_search_experiment@latest:/experiment

Finally, launch the experiment:

$ eai job submit --file eai-experiment-agent.yaml --env EAI_EXPERIMENT_ID=experiment_0
$ eai job logs -f

If you later want to launch another experiment based on the above eai-experiment-agent.yaml configuration file, ensure to also change the value of the EAI_EXPERIMENT_ID environment variable passed via the --env EAI_EXPERIMENT_ID=... argument.

Note, current limitations of the system requires that the worker_config image is specified with the account ID and not the fully qualified account name. For example, use:

registry.console.elementai.com/c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc/experiment-example

Instead of:

registry.console.elementai.com/acme.joe/experiment-example

Also note, that any data you specify in your worker_config must also be specified in eai-experiment-agent.yaml although the actual mount points can be different. Furthermore, all data specified in the experiments worker_config must use the data ID in order for the system to validate access permissions. For example, in the eai-experiment.json configuration file:

{
    "title": "...",
    "worker_config": {
        "image": "...",
        "resources": {
            "...": "..."
        },
        "data": [
            "c6df2044-6a61-4cba-9aa9-de58462fc517@version:/mount/point"
        ]
    },
    "parameters": {
        "...": "..."
    }
}

And then in eai-experiment-agent.yaml:

# ...
data:
 - acme.joe.mlflow_data@latest:/mlflow_data
 - acme.joe.grid_search_experiment@latest:/experiment
 # Any additional data mounted in the experiment's worker_config
 - c6df2044-6a61-4cba-9aa9-de58462fc517@version:/mnt/c6df2044-6a61-4cba-9aa9-de58462fc517

If you want to view the data using the mlflow web interface you can use the worker image that was built above:

$ eai job submit --image registry.console.elementai.com/shared.image/eai-experiment-mlflow:latest \
    --data acme.joe.mlflow_data:/mlflow_data \
    --env MLFLOW_TRACKING_URI=/mlflow_data

Once running, use the job ID to connect yourself to the web interface using your browser. For example:

Note

Control-C will not stop the server. To terminate the server, you need to kill the job itself:

$ eai job kill f52f7cd3-174f-48e8-a30a-8913e63cd68a