Deploy a pre-trained TensorFlow.js model using Node in Cloud Run

Serving TensorFlow.js' Toxicity Detection model using Google Cloud.

Deploy a pre-trained TensorFlow.js model using Node in Cloud Run

Like a broken record, I often find myself repeating the phrase "a deployed machine learning model is a happy model." While a bit inaccurate (because models don't have feelings yet), the idea behind this thought is that—and this is my opinion—an ML model shines when it is accessible and deployed out there.

This article shows how to deploy a pre-trained toxicity detection model in TensorFlow.js using Node.js and Google Cloud's Cloud Run. But before getting there, let me describe the tools we will use.

About TensorFlow.js

TensorFlow.js is TensorFlow JavaScript's counterpart library for the training, execution, and deployment of machine learning models. Being a JavaScript library allows us not just to execute it in the browser but also as a backend application using Node.js, which is the example we'll see here today.

TensorFlow.js comes with several pre-trained models that function out of the box. These models support a range of use cases, including object detection, image classification, word embedding, and the one we will use, the text toxicity detection model.

About the Text Toxicity Detection model

The toxicity detection model is a pre-trained model that detects six types of toxicity content, plus an overall toxicity class, from a string. The six types are identity attack, insult, obscene, severe toxicity, sexually explicit, and threat. For example, according to the model, the line "you suck" is an insult and overall toxic.

Toxic. Photo by Beth Jnr on Unsplash.

About Google Cloud's Cloud Run

Cloud Run is Google Cloud's service for developing scalable containerized applications on a managed serverless platform. Its major feature—I would say—is how it simplifies the container creation process (they describe it as "container to production in seconds") and how it automatically manages the infrastructure and management of the container.

The service is not free. However, Google Cloud's free tier offers 180,000 vCPU-seconds free per month, 2 million requests and 1 GiB of networking (within North America). So more than enough for a prototype project. Another cost to consider is that of the Container Registry you need to upload the Docker image. The prices start at $0.020 per GB per month.

Building the application

The application we will build is a web service on Node.js that serves the toxicity detection model. Exposing the model will allow the user to make predictions via a POST call to the service. After writing the app, we will deploy it on Cloud Run.

Writing the service

Let's start the tutorial by installing Node.js into our machines—you can find the instructions at https://nodejs.org/en/download/. Once downloaded, create a working directory at your preferred location. Then in that location, create a package.json file—a document containing metadata about the project—and copy:

{
    "name": "tfjs-node-toxicity-server",
    "version": "0.0.1",
    "description": "A server that exposes a Toxicity detection model",
    "scripts": {
        "serve": "node index.js"
    },
    "dependencies": {
        "@tensorflow/tfjs-node": "^2.7.0",
        "@tensorflow-models/toxicity": "^1.2.2",
        "express": "^4.17.1"
    }
}

The first three keys are the project's name, version, and description. Following them is the scripts property, a dictionary of commands. In our case, we have the key serve whose value is node index.js, meaning that executing npm run serve will run node index.js; you can see it as an alias. Here, the aliased command is shorter than npm run serve but if you would have a longer command instead, then you will appreciate having the shortcut. Last, we have the project's dependencies. These are tensorflow/tfjs-node (the Node.js variant of TensorFlow.js), tensorflow-models/toxicity (the toxicity model), express, a framework for making web applications. After defining the file, run npm i to install the dependencies.

Next, create a new file and name it index.js—we will write the service here. The code below is the complete program:

const tf = require('@tensorflow/tfjs-node');
const toxicity = require('@tensorflow-models/toxicity');
const express = require('express');

const threshold = 0.9;
const port = process.env.PORT || 8080;

async function runServer() {
  const model = await toxicity.load(threshold);
  const app = express();

  app.use(express.json());

  app.post('/prediction', (req, res) => {
    model.classify([req.body.sentence]).then((predictions) => {
      // Send the response to the user
      res.json({
        predictions,
      });
    });
  });

  app.listen(port, () => {
    console.log(`Listening on port ${port}`);
  });
}

runServer();

The first three lines load the required libraries. Then, we define two constant variables, threshold, the model's minimum prediction confidence, and port, the port we will use to listen for the requests, whose value is that of the environmental variable PORT or 8080. Why are we reading PORT? That's because, by default, Cloud Run exposes that port.

After the variables, we have the program's main and only function, runServer(). At the start of it, we are loading the toxicity model (with only one line!) with an argument threshold. Following it, we create the web server with express and use the method app.use(express.json()) to parse incoming JSON requests (you can find the documentation here). Below this line, we create the service's only (POST) endpoint /prediction and its handler function.

Inside the handler function, we use the model's classify() method with an argument that's a list containing the request's sentence property; you can name the property however you want, just make sure that your request JSON has the same. Now you might ask? Why is the argument a list and not a string? (I also wondered the same). That's because the model can predict the toxicity of more than one sentence at a time. But for simplicity, I'm using one. Speaking of simplicity, note that we are not doing any form of pre-check of the string. In a real or production setting, we should at least make sure the field (sentence, here) exists and that its value is not an empty string (unless you consider that something toxic :)). Right after classify(), we're using the method then() (which returns a Promise) with a callback function that uses the returned predictions object from classify() to send the requested prediction to the user.

The JSON returned by classify() has a key prediction, with a list of the labels (the toxicity types) and another key results whose values are the probabilities of the sentence having that label and match, a boolean that's true if the probability of the sentence belonging to the label is above the threshold. Below is the model's output after predicting with the phrase "you are awful."

{
    "predictions": [
        {
            "label": "identity_attack",
            "results": [
                {
                    "probabilities": {
                        "0": 0.9944738745689392,
                        "1": 0.0055261640809476376
                    },
                    "match": false
                }
            ]
        },
        {
            "label": "insult",
            "results": [
                {
                    "probabilities": {
                        "0": 0.01695447973906994,
                        "1": 0.9830455780029297
                    },
                    "match": true
                }
            ]
        },
        {
            "label": "obscene",
            "results": [
                {
                    "probabilities": {
                        "0": 0.9975994229316711,
                        "1": 0.0024006012827157974
                    },
                    "match": false
                }
            ]
        },
        {
            "label": "severe_toxicity",
            "results": [
                {
                    "probabilities": {
                        "0": 0.9999971389770508,
                        "1": 0.0000028739373192365747
                    },
                    "match": false
                }
            ]
        },
        {
            "label": "sexual_explicit",
            "results": [
                {
                    "probabilities": {
                        "0": 0.9994100332260132,
                        "1": 0.0005900039686821401
                    },
                    "match": false
                }
            ]
        },
        {
            "label": "threat",
            "results": [
                {
                    "probabilities": {
                        "0": 0.9982189536094666,
                        "1": 0.0017810455756261945
                    },
                    "match": false
                }
            ]
        },
        {
            "label": "toxicity",
            "results": [
                {
                    "probabilities": {
                        "0": 0.015485585667192936,
                        "1": 0.9845144152641296
                    },
                    "match": true
                }
            ]
        }
    ]
}

Back in the runServer() function, after app.post(), we have the method app.listen() to listen on the established port. Then, at the very end of the script, we call runServer(). And that's the service!

To test it, go to the terminal and in the project's directory, execute npm run serve to launch the service. When it starts, wait a few seconds while the program downloads the model, and then execute the following cURL command curl -X POST "http://localhost:8080/prediction" -H  "accept: application/json" -H  "Content-Type: application/json" -d '{"sentence":"you are awful"}' to predict (please make sure the port is the same one from the code). The response should be like the JSON above.

Did it work? Great! Now that we have the system running on our local environment let's take a flight and put it up in the cloud. (I haven't been on an airplane in months, and I kinda miss it, so sorry for the silly joke).

Let's put it somewhere up there. Photo by Daniel Páscoa on Unsplash.

Deploying it to Cloud Run

Deploying the service in Cloud Run is—in my opinion—a  straightforward process that requires a few steps. The first is creating the Dockerfile that dockerizes the application. Mine looks like this:

FROM node:15

WORKDIR /usr/src/app

COPY package*.json ./

RUN npm install

COPY . ./

CMD [ "node", "index.js" ]

In the first line, we are selecting the base image, and that's Node. Following this, we set the working directory, copy the package.json and package-lock.json file, and run the command npm install to install the dependencies (just like we did before). Last, we copy the rest of the directory and execute the command node index.js to start the service. Here are two things that you may find strange. First is copying the package.json(s) files and, second, running node index.js instead of npm run serve; the former one is because I prefer first installing the dependencies before copying the rest, and the latter is to make the Dockerfile more readable—it is easier to understand what node index.js is than npm run serve. Besides this, I'm using a .dockerignore file to avoid copying the Dockerfile and node_modules/ directory.

The second step is building the container using Google's Cloud Build and adding it to the Container Registry (the "place" where your store Docker images). You can do so with the instruction gcloud builds submit --tag gcr.io/your-gcp-project/node-toxic-service, where your-gcp-project is your Google Cloud's project. Keep in mind you need to have installed Google Cloud's Command Line Tool (gcloud). For a "how-to," please see the following resource: https://cloud.google.com/sdk/gcloud.

Last, we deploy to Cloud Run using gcloud run deploy --image gcr.io/your-gcp-project/node-toxic-service --platform managed. After running it, the tool will ask you to specify the Google Cloud location of choice and whether you want to allow unauthenticated invocations to the service. I selected no. When done, it will print out the address where the service is. Mine is https://node-toxic-service-sfy7qphkba-ue.a.run.app, and yes, the service is up in case you want to try it (please don't abuse it!). And with that, we finish!

To test the service, use the same cURL command from before but replacing the localhost address with yours and removing the port number. Like this:

curl -X POST "https://node-toxic-service-sfy7qphkba-ue.a.run.app/prediction" -H  "accept: application/json" -H  "Content-Type: application/json" -d '{"sentence":"you are awful"}'

If you try mine and it is inactive (because it has received no requests), the response might be "Service Unavailable." If so, give it a few seconds or a minute before trying again.

Recap

A deployed ML model is a happy model. Or so I believe. Therefore, I like to try alternative ways of deploying and serving my models. This article explains how we can deploy a Node.js service that uses TensorFlow.js and its pre-trained toxicity detector model in Google Cloud. First, we built the service, which uses the express library for the web service layer. Then, we wrote a Dockerfile for containerizing the application before adding it to Cloud Run.

You can the complete source code on my GitHub at https://github.com/juandes/tf-js-examples/tree/main/node-od-service.

What will you build with it? Please share it with me!

Shameless plug: For more TensorFlow.js, check out my book about TensorFlow.js, named Practical TensorFlow.js (Apress, 2020).

Thanks for reading :)


Feature image by Ben White on Unsplash