Running Experiments With mlflow
mlflow is an open source platform for the machine learning lifecycle.
The following commands are performed by the acme.joe
user with account ID
c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc
.
Create a data to hold trial metrics and artifacts run written by mlflow as part of your experiments:
$ eai data new mlflow_data
Prepare the worker image which the experiment will launch for each configuration. As part of this step you should adapt your code to use mlflow for logging metrics, parameters, and artifacts.
import random
import mlflow
if __name__ == '__main__':
with mlflow.start_run() as run:
print("Parameters:")
for k, v in run.data.params.items():
print(f"- {k}={v}")
# main loop
for i in range(1, 10):
print(i)
metrics = {
'loss': random.random() * 10 * (1/i),
'f1': i + random.random() * 10 * (1/i)
}
mlflow.log_metrics(metrics, i)
Next, create a custom Dockerfile
that includes (at least) python and
mlflow.
FROM python:3.6
RUN pip install mlflow==1.6.0
COPY main.py .
CMD python main.py
Build, tag and push the worker image:
$ export ACCOUNT_ID=c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc
$ export IMAGE=registry.console.elementai.com/$ACCOUNT_ID/experiment-example
$ docker build -t $IMAGE .
$ docker push $IMAGE
Create an experiment configuration defining the hyperparameters you want to tune as well as the trial worker’s job spec. In the following example we use grid search.
{
"title": "EAI Toolkit Experiment",
"worker_config": {
"image": "registry.console.elementai.com/c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc/experiment-example",
"resources": {
"mem": 1,
"cpu": 1.25
}
},
"parameters": {
"name": "GridSearch",
"config": {
"dimensions": {
"hidden_dim": [
1
],
"lr": [
1e-3, 1e-4
],
"embedding_dim": [
300
],
"batch_size": [
32
]
}
}
}
}
Place the above experiment in a file in a new directory and create a new data based on it:
$ mkdir -p grid_search_experiment
$ $EDITOR grid_search_experiment/eai-experiment.json
$ eai data new grid_search_experiment ./grid_search_experiment
If you later want to tweak the hyperparameters, edit the file and reupload:
$ $EDITOR grid_search_experiment/eai-experiment.json
$ eai data upload acme.joe.grid_search_experiment ./grid_search_experiment
Create a file named eai-experiment-agent.yaml
with a job spec for launching
the experiment.
image: registry.console.elementai.com/shared.image/eai-experiment-agent:latest
isProcessAgent: true
preemptable: false
restartable: false
resources:
mem: 4
cpu: 2
environmentVars:
# Assuming that the acme.joe.mlflow_data data has ID c512e62b-c103-453f-9921-313e2ac38fac
- EAI_EXPERIMENT_ARTIFACT_DATA=c512e62b-c103-453f-9921-313e2ac38fac@latest:/mlflow_data
- EAI_EXPERIMENT_CONFIG_PATH=/experiment/eai-experiment.json
- MLFLOW_TRACKING_URI=/mlflow_data
# Mount both the data volume containing the configuration and the one for metrics
data:
- acme.joe.mlflow_data@latest:/mlflow_data
- acme.joe.grid_search_experiment@latest:/experiment
Finally, launch the experiment:
$ eai job submit --file eai-experiment-agent.yaml --env EAI_EXPERIMENT_ID=experiment_0
$ eai job logs -f
If you later want to launch another experiment based on the above
eai-experiment-agent.yaml
configuration file, ensure to also
change the value of the EAI_EXPERIMENT_ID
environment variable
passed via the --env EAI_EXPERIMENT_ID=...
argument.
Note, current limitations of the system requires that the worker_config
image is specified with the account ID and not the fully qualified account
name. For example, use:
registry.console.elementai.com/c6b6fbc6-a135-4afa-b2f7-e9d8f52195bc/experiment-example
Instead of:
registry.console.elementai.com/acme.joe/experiment-example
Also note, that any data you specify in your worker_config
must also be
specified in eai-experiment-agent.yaml
although the actual mount points
can be different. Furthermore, all data specified in the experiments
worker_config
must use the data ID in order for the system to validate
access permissions. For example, in the eai-experiment.json
configuration
file:
{
"title": "...",
"worker_config": {
"image": "...",
"resources": {
"...": "..."
},
"data": [
"c6df2044-6a61-4cba-9aa9-de58462fc517@version:/mount/point"
]
},
"parameters": {
"...": "..."
}
}
And then in eai-experiment-agent.yaml
:
# ...
data:
- acme.joe.mlflow_data@latest:/mlflow_data
- acme.joe.grid_search_experiment@latest:/experiment
# Any additional data mounted in the experiment's worker_config
- c6df2044-6a61-4cba-9aa9-de58462fc517@version:/mnt/c6df2044-6a61-4cba-9aa9-de58462fc517
If you want to view the data using the mlflow web interface you can use the worker image that was built above:
$ eai job submit --image registry.console.elementai.com/shared.image/eai-experiment-mlflow:latest \
--data acme.joe.mlflow_data:/mlflow_data \
--env MLFLOW_TRACKING_URI=/mlflow_data
Once running, use the job ID to connect yourself to the web interface using your browser. For example:
Note
Control-C
will not stop the server. To terminate the server, you need to kill the job itself:
$ eai job kill f52f7cd3-174f-48e8-a30a-8913e63cd68a