How to integrate your model training with Google Vertex AI

Labelbox provides code to leverage Vertex AI and other Google Cloud Services for running model training jobs. You can easily customize a model training pipeline based on the Labelbox reference implementation. You can run ETL jobs, train, deploy, and track model performance all from a single service.

The code deploys a service called the Coordinator to Google Cloud — it exposes a rest API for launching various pipelines. The Coordinator only has to be deployed once and then will be controllable via the Labelbox web app (WIP). The custom model training pipeline is designed to easily extend to custom jobs and pipelines.

We support the following models with no additional configuration required:

Supported model types

  • Image Radio Classification
  • Image Checklist / Dropdown Classification
  • Image Bounding Box Detection
  • Text Radio Classification
  • Text Checklist / Dropdown Classification
  • Text Named Entity Recognition

We've compiled key steps and requirements needed to successfully leverage Vertex AI and other Google Cloud Services for running model training jobs.

Watch the following short video tutorials on how to set up the integration and follow along with more detailed instructions in the Github Repo below:

What are the requirements?

Step 1: Create a service account in the Google Cloud UI. This account must have the following permissions:

  • Basic editor permissions
  • Secret manager admin

Step 2: Download the private key for the service account. You can put this anywhere on your computer.

  • Set the GOOGLE_APPLICATION_CREDENTIALS to point to the private key.
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/model-training-credentials.json

Step 3: Make sure docker and docker-compose is installed.

Step 4: Make sure gcloud CLI is installed and configured for the proper service account.

  • Run the following to install:
curl http://sdk.cloud.google.com | bash
  • Run the following to load env vars:
source ~/.<bash_profile/zshrc/bashrc>
  • Use gcloud auth login to login (You can also login from the service account directly with:
gcloud auth activate-service-account SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com --key-file=$GOOGLE_APPLICATION_CREDENTIALS)
  • Set the correct google project with:
 gcloud config set project PROJECT_ID

Step 5: Connect docker to GCR by running:

gcloud auth configure-docker

How to deploy the service

Step 1: Create a .env file to keep track of the following env vars (copy .env example to get started):


  • This is the name that all o the Google resources will use. This enables multiple deployments (e.g prod-training-service or dev-training service).


  • GCS bucket will store all of the artifacts. If the bucket doesn't exist, it will automatically be created.


  • This refers to your Google Cloud project name


  • This can be anything. You will will have to use the same secret when making a request to the service.


  • This is the path to the path to the application credentials.


  • This is the Google service account. It will have the following format: <name>@<project>.iam.gserviceaccount.com.


  • This is created in the Labelbox app under 'Workspace' settings.

Step 2: Once the .env file has the correct values, you can load the env vars by running source .env

  • If you update the .env file, make sure to re-run source .env to load the latest env vars.

Step 3: Deploy the service

  • To the cloud: ./deployment/deploy.sh
  • Locally: ./run.sh

Step 4: Test that it is running with:

  • curl http://ip/ping
  • ip will be for a local deployment and the remote ip will be printed to the console when you run the deployment script.
  • The server will respond with pong if the deployment was successful

How to test the integration

Step 1: Visit the Labelbox Models tab

Step 2: Create a model and a model run with a flat, single type ontology (bounding box, NER, or classification)

Step 3: Navigate to 'Settings' in your model run and click 'Model training'.

  • Input your remote IP from the output of the deployment script and your chosen service secret.

Step 4: Save the inputted values for IP and your chosen service secret.

Step 5: Click 'Train model' and select a 'job type'

Once you select a desired ML task, Labelbox will help train your model and pull the inference back in to provide model metrics, allowing you to quickly iterate on your model.

For more detailed instructions on troubleshooting, please refer to the Github Repo or reach out to our support team. You can also refer to our documentation for an overview of the model training integration.