Label Data

Running Label Studio to label data

Label Studio is an open source data labelling tool that can either be run locally from your machine, or simply using Docker can be deployed on-premises/on your own infrastructure or on a public cloud service enabling you to configure a virtual private cloud. You can label different data formats and complete many different types of labelling tasks. Machine learning models can also be integrated with Label Studio to enable label predictions or to support active learning, for example with BaaL.

By referencing the latest Label Studio Docker image, a Toolkit job can be used to deploy Label Studio. Label Studio can be configured in an airgapped environment so that no data leaves the infrastructure. You can read the Label Studio docs for more information about security. Note that even if security features are disabled, usual toolkit security is in effect when deploying Label Studio as a toolkit job.

The following commands are performed by the acme.joe user. Simply submit a job with the following:

$ eai job new --no-header --fields id \
  --image heartexlabs/label-studio:latest \
  # mount a data object containing data already on toolkit to be labeled (if applicable) by specifying the data UUID or full name \
  --data ${DATA}:/label-studio/files \
  # mount a data object to store the Label Studio sqlite database, uploaded data and any exported labels by specifying the data UUID or full name \
  --data ${DATA}:/label-studio/data \
  # specify the account UUID or full name where the job will be launched \
  --account ${ACCOUNT} \
  # specify a job name \
  --name ${JOBNAME} \
  # specify environment variables for source cloud storage of local data files (if already on toolkit) \
  --env LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \
  --env LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/files \
  # specify an email and password for initial log in to label studio \
  --env LABEL_STUDIO_USERNAME= ${EMAIL} \
  --env LABEL_STUDIO_PASSWORD= ${PASSWORD}

Once running, use the job ID to connect yourself to the web interface using your browser. For example:

You will need to be logged-in with your toolkit credentials to access the job url, and then when prompted, can use the login credentials specified at job launch to access the Label Studio interface.

Even if the toolkit job times out, no labelling progress will be lost and the toolkit job can be easily restarted to continue labelling.

$ eai job retry f52f7cd3-174f-48e8-a30a-8913e63cd68a

If you want to end the Label Studio service, you need to kill the job itself:

$ eai job kill f52f7cd3-174f-48e8-a30a-8913e63cd68a

Running Data Labeller to label data

Data Labeller (formerly known as Libellule) is an ElementAI-made application native to the Toolkit specifically built to label data. It supports multiple labelling tasks, but mostly image annotation for computer vision. Although currently unmaintained and unsupported by the Toolkit team, it is still reported as working.

View the documentation to learn how to launch a Data Labeller job from the CLI. Alternatively, a job can be quickly launched from the Toolkit console.

Note data to be labelled should already be available on Toolkit otherwise you will need to upload the data. If using the console to launch a job, the available task types from the drop-down menu are currently restricted to vision tasks only. Additionally, the toolkit data object storing data to be lablled needs to be tagged in order to be selected from the drop-down menu. To tag the data object, open a terminal and run:

$ eai proxy

Open a second terminal, and run:

$ curl localhost:8080/v1/data/data UUID or full name -X PUT -d '{"tags": [{"key": "data-labeller-assets", "value": "image"}]}'