A PyTorch model deployed on nl-highcpu-16 machines in us-central1 region of Google Cloud exhibits high latency, particularly in Singapore. The model classifies transactions as fraudulent or not and uses numerical and categorical features.
Several solutions are proposed:
The suggested answer is C, deploying the model to Vertex AI private endpoints in both the US and Singapore regions to minimize latency.