The nuts and bolts of VertexAI – networking

When we access a VertexAI predicition endpoint, we usually do this using the Google API endpoint, i.e. we access a public IP address. For some applications, it is helpful to be able to connect to a prediction endpoint via a private IP address from within a VPC. Today, we will see how this works and how the same approach allows you to also connect to an existing VPC from within a pipeline job.

VPC Peering

The capability of the Google cloud platform that we will have to leverage to establish a connection between VertexAI services and our own VPCs is called VPC peering, so let us discuss this first.

Usually, two different networks are fully isolated, even if they are part of the same project. Each network has its own IP range, and virtual machines running in one network cannot directly connect to virtual machines in another network. VPC peering is a technology that Google offers to bridge two VPCs by adding appropriate routes between the involved subnets.

To understand this, let us consider a simple example. Suppose you have two networks, say vpc-a and vpc-b, with subnets subnet-a (range 10.200.0.0/20) and subnet-b (10.210.0.0/20).

Google will then establish two routes in each of the networks. The first route called the default route sends all traffic to the gateway that connects the VPC with the internet. The second route called the subnet route (which exists for every subnet) has a higher priority and makes sure that traffic to a destination address within the IP address range of the respective subnet is delivered within the VPC.

As expected, there is not route that is connecting the two networks. This is changed if you decide to peer the two networks. Peering sets up additional routes in each of the networks so that traffic targeting a subnet in the respective peer network is routed into this network, thus establishing a communication channel between the two networks. If, in our example, you peer network-a and network-b, then a route will be set up for network-a that sends traffic with destination in the range 10.210.0.0/20 to network-b and vice versa. Effectively, we map subnet-b into the IP address space of network-a and subnet-a into the IP address space of network-b.

Note that peering is done on the network level and applies for all subnets of both networks involved. As a consequence, this can only work if the address ranges of the subnets in both networks do not overlap, as we would otherwise produce conflicting routes. This is the reason why peering does not work with networks created in auto-mode, but requires subnet mode custom.

Private prediction endpoints

Let us now see how VPC peering can be applied to access prediction endpoints from within your VPC using a private IP address. Obviously, we need a model to do this that we can deploy. So follow the steps in our previous post on models to upload a model to the Vertex AI registry (i.e. train a model, create an archive, upload the archive to GCS and the model to the registry) with ID vertexaimodel.

Next we will deploy the model. Recall that the process of deploying a model consists of two steps – creating and endpoint and deploying the model to this endpoint. When we want to deploy a model that can be reached from within a VPC, we will need to create what Google calls a private endpoint instead of an endpoint. This is very similar to an endpoint, with the difference that as an additional parameter, we need to pass the name of a VPC. Behind the scenes, Google will then create a peering between the network in which the endpoint is running and our VPC, very similar to what we have done manually in the previous section.

For this to work, we first need a few preparations. First, we need to set up our VPC, and we also create a few firewall rules that allow SSH access to machines running in this network (which we will need later), and allow ICPM and internal traffic.

gcloud compute networks create peered-network \
                        --subnet-mode=custom
gcloud compute firewall-rules create peered-network-ssh \
                                --network peered-network \
                                --allow tcp:22
gcloud compute firewall-rules create peered-network-icmp \
                                --network peered-network \
                                --allow icmp
gcloud compute firewall-rules create peered-network-internal \
                                --network peered-network \
                                --source-ranges=10.0.0.0/8 \
                                --allow tcp 

Now we need to let Google know which IP address range it can use to establish a peering. This needs to be a range that we do not actively use. To inform Google about our choice, we create an address range with the special purpose VPC_PEERING. We also need to create a peering between our network and the service network that Google uses.

gcloud compute addresses create peering-range \
    --global \
    --purpose=VPC_PEERING \
    --addresses=10.8.0.0 \
    --prefix-length=16 \
    --network=peered-network
gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --network=peered-network \
    --ranges=peering-range \
    --project=$GOOGLE_PROJECT_ID

To create a private endpoint and deploy our model to it, we can use more or less the same Python code that we have used for an ordinary endpoint, except that we create an instance of the class PrivateEndpoint instead of Endpoint and pass the network to which we want to peer as additional argument (note, however, that private endpoints do currently not support traffic splitting, so the parameters referring to this are not allowed).

endpoint = aip.PrivateEndpoint.create(
    display_name = "vertex-ai-private-endpoint",
    project = google_project_id, 
    location = google_region,
    network = f"projects/{project_number}/global/networks/{args.vpc}"
)

Let us try this – simply navigate to the directory networking in the repository for this series and execute the command

python3 deploy_pe.py --vpc=peered-network

Again this will probably take some time, but the first step (creating the endpoint) should only take a few seconds. Once the deployment completes, we should be able to run predictions from within our VPC. To be able to verify this, let us next bring up a virtual machine in our VPC, more precisely within a subnet inside our VPC that we need to create first. For that purpose, set the environment variable GOOGLE_ZONE to your preferred zone within the region GOOGLE_REGION and run

gcloud compute networks subnets create peered-subnet \
       --network=peered-network \
       --range=10.210.0.0/20 \
       --purpose=PRIVATE \
       --region=$GOOGLE_REGION
gcloud compute instances create client \
       --project=$GOOGLE_PROJECT_ID \
       --zone=$GOOGLE_ZONE \
       --machine-type=e2-medium \
       --network=peered-network \
       --subnet=peered-subnet \
       --create-disk=boot=yes,device-name=instance-1,image-family=ubuntu-2204-lts,image-project=ubuntu-os-cloud

Next, we need to figure out under which network address our client can reach the newly created endpoint. So run gcloud ai endpoints list to find the ID of our endpoint and then use gcloud ai endpoints describe to print out some details. In the output, you will find a property called predictHttpUri. Take note of that address, which should be a combination of the endpoint ID, the domain aiplatform.googleapis.com and the ID of the model. Then SSH into our client machine and run a curl against that address (set URL in the command below to the address that you have just noted).

gcloud compute ssh client --zone=$GOOGLE_ZONE
URL=...
curl \
        --data '{"instances": [ [0.5, 0.35]]}' \
        --header "Content-Type: application/json" $URL

At first glance, this seems to be very similar to a public endpoint, but there are a few notable differences. First, you can use getent hosts in the client machine to convince yourself that the DNS name in the URL that we use does in fact resolve to a private IP address in the range that we have defined for the peering (when I tried this, I got 10.8.0.6, but the result could of course be different in your case). We can even repeat the curl command and use the IP address instead of the DNS name, and we should get the same result.

The second thing that is interesting here is we did not have to associate our virtual machine with any service account. In fact, everyone who has access to your VPC can invoke the endpoint, and there is no additional permission check. This is in contrast to a public endpoint which can only be reached when passing a bearer token in the request. So you might want to be very thoughtful about who has access to your network before deploying a model to a private endpoint – this is in fact not private at all.

We also remark that creating a private endpoint is not the only way to reach a prediction endpoint via a private IP address. The alternative approach that Google offers you is called private service connect which provides a Google API endpoint at an IP address in your VPC. You can then deploy a model as usual to an ordinary endpoint, but use this private access point to run predictions.

Connecting to your VPC from a VertexAI pipeline

So far, we have seen how we can reach a prediction endpoint from within our VPC using a peering between the Google service network and our VPC. However, this also works in the other direction – once we have the peering in place, we can also reach a service running in our VPC from within jobs and pipelines running on the VertexAI platform.

Let us quickly discuss how this works for a pipeline. Again, the preconditions that you need are as above – a VPC, a reserved IP address range in that VPC and a peering between your VPC and the Google service network.

Once you have that, you can build a pipeline that somehow accesses your VPC, for instance by submitting a HTTP GET request to a server running in your VPC. When you submit this job, you need to specify the additional parameter network as we have done it when creating our private endpoint. In Python, this would look as follows.

#
# Make sure that project_number is the number (not the ID) of the project 
# you are using
#
project_number = ...
#
# Create a job
#
job = aip.PipelineJob(
    display_name = "Connectivity test pipeline",
    template_path = "connect.yaml",
    pipeline_root = pipeline_root,
    location = location,
)    

#
#
# Submit the job
job.submit(service_account = service_account, 
           networkm= f"projects/{project_number}/global/networks/{args.vpc}")

Again, this will make Vertex AI use the peering to connect the running job to your network so that you can access the server. I encourage you to try this out, just modify any of the pipelines we have discussed in our corresponding post to submit a HTTP request to a simple Flask server or an NGINX instance running in your VPC to convince yourself that this really works.

This closes our post for today. Next time, we will take a look at how you can use Vertex AI managed datasets to organize your training data.

1 Comment

Leave a Comment