u/Relative-Security-75

Hi everyone,

I'm designing a secure architecture for a desktop application and I would love a sanity check from this community, especially regarding networking and cost traps.

Context & Workload:

Client: A desktop executable (Delphi) running on our customers' local machines over the public internet.

Backend: A custom, heavy LLM hosted on our own GCP Compute Engine VM (requires GPUs).

Volume: Processing ~30,000 requests/month containing mixed media (mostly video, plus images/text). Estimated Egress: ~1.8 TB/month.

Hard Constraints (My hands are tied here!):

No Managed Services (Vertex AI, etc.): The team configuring the LLM explicitly specified that it must run on a dedicated VM. Because of this technical requirement, managed services like Vertex AI are off the table for this project.

No VPN: End-users cannot be forced to use a VPN. It must be a standard HTTPS request from the desktop app.

No Public IP on VM: The security team demands that the LLM VM remains strictly private (no external IP) to protect the expensive GPU compute.

API Key Auth: We need a robust way to validate x-api-key before the traffic hits the internal network, to block unauthorized requests and avoid DDoS on our expensive GPU instances.

Proposed Architecture:

Client sends a POST request (HTTPS/TLS 1.3) with x-api-key in the header.

Google Cloud API Gateway receives the request, validates the API key (blocking invalid ones immediately).

Cloud Run (Reverse Proxy): Since API Gateway cannot route directly to a VPC internal IP, it forwards the valid request to a simple Cloud Run service (just a tiny proxy container).

VPC / VM: The Cloud Run service uses Direct VPC Egress to forward the request to the internal IP of the LLM VM.

Response: The VM processes the video/text and sends the payload back through the same path.

My specific questions for the experts:

The API Gateway + Cloud Run Bridge: I know using a tiny Cloud Run container as a reverse proxy to reach the VPC is a common workaround for API Gateway's lack of native VPC support. Is this still the recommended best practice, or is there a cleaner/cheaper way that doesn't involve managed LLM APIs?

Load Balancers vs. API Gateway: I considered using an External HTTPS Load Balancer with NEGs instead of the Gateway, but I would lose the out-of-the-box API Key management. Am I missing a way to easily validate API keys at the Load Balancer level without building custom auth logic on the VM itself?

Cost Blindspots: I've estimated the Network Egress (1.8 TB) to be around $216/month (South America), plus the massive cost of the GPU VM running. Are there any hidden networking costs (e.g., inter-zone traffic, Cloud Run egress to VPC) for this volume of video data that I should be aware of?

Any feedback or red flags regarding this specific setup would be highly appreciated! Thanks!

Architecture Review: API Gateway to Private VM (No VPN) for heavy LLM video workload. Is Cloud Run proxy the best practice?