Here’s a high-level architecture of what you need to do to move from “model on EC2” to “SageMaker + MLOps, scalable Pan India”.
High-level architecture: what you need to do
1. Target picture (end state)
┌─────────────────────────────────────────────────────────────────────────────────┐ │ XpressBees / External clients │ └─────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ API Gateway / ALB (optional) OR Direct call to SageMaker endpoint │ └─────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ AWS SageMaker (inference + MLOps) │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │ │ │ Real-time │ │ Async / Batch │ │ Model registry + pipelines │ │ │ │ endpoint │ │ (optional) │ │ (build, test, deploy) │ │ │ │ (your FastAPI) │ │ │ │ │ │ │ └────────┬─────────┘ └──────────────────┘ └──────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────────────┐ │ │ │ Your container │ │ Autoscaling │ │ Monitoring (drift, infra, │ │ │ │ (InternVL + │ │ (0–N instances │ │ inspections count, latency) │ │ │ │ fast_prod_new) │ │ peak ~15) │ │ │ │ │ └────────┬─────────┘ └──────────────────┘ └──────────────────────────────┘ │ └───────────┼─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────┐ │ Supporting AWS services │ │ • S3: model artifact, (optional) batch input/output │ │ • RDS / Redshift / existing DB: request + result storage (replace sql_conn) │ │ • VPC: endpoint + DB in same VPC │ │ • IAM: role-based access for endpoints, pipelines, monitoring │ └─────────────────────────────────────────────────────────────────────────────────┘
So at a high level you need: SageMaker endpoint(s) running your current app in a custom container, autoscaling, model + pipeline (MLOps), monitoring, and DB/access wired correctly.
2. What you need to do (by layer)
| # | Layer | What you need to do |
|---|---|---|
| 1 | Container | Package fast_prod_new.py + dependencies + sql_conn (or replacement) in a Docker image that loads the model from a path SageMaker provides (e.g. /opt/ml/model) and runs your FastAPI app (e.g. uvicorn). Ensure the container listens on the port SageMaker expects and (if needed) adapt request/response to SageMaker’s invocation format. |
| 2 | Model artifact | Put the trained model (full_14_08_2025_ch_28k or equivalent) in S3. Create a SageMaker Model that points to this image + artifact so the container gets the model at startup. |
| 3 | SageMaker endpoint | Create a real-time endpoint (or multi-variant) using that Model, on GPU instances (e.g. g5/g6) to meet 14–17s latency and peak concurrency. |
| 4 | Scaling & queueing | Configure autoscaling (min/max/target metrics) so you can handle peak concurrency ~15 and queueing for 5 inspections as per Readme. Optionally put an SQS queue in front for async/queueing. |
| 5 | Database | Replace or wrap sql_conn so the same tables (rvpxbverification_request_data, rvpxbverification, rvpxbverification_fail) are written from inside the SageMaker VPC (e.g. RDS, Redshift, or existing DB with correct security groups). |
| 6 | CI/CD & MLOps pipeline | Dev vs Prod: separate SageMaker endpoints or stages (e.g. dev/prod). Use SageMaker Pipelines (or CodePipeline + SageMaker) to: build container → run tests → deploy to dev → (manual/gated) promote to prod. Use Model Registry for versioned model artifacts. |
| 7 | Monitoring & reporting | Model Monitor: data drift, quality. Infra: CloudWatch for CPU/GPU/memory and endpoint latency. Business: “inspections done”, success/fail counts (from DB or logs). Role-based access: IAM and optionally API Gateway for who can call the endpoint. |
| 8 | Pre-checks (optional) | If you need “pre-checks (blurriness, angles, brightness)” before inference, add them either inside the same FastAPI service (before calling the model) or in a separate Lambda/step that calls your endpoint only when checks pass. |
3. Flow in one sentence per stage
-
Build → Build Docker image from your code + deps; upload model to S3; register Model in SageMaker.
-
Deploy → Create SageMaker endpoint (real-time) with that Model + GPU instance type; set autoscaling.
-
Integrate → Point clients (or API Gateway) to the endpoint URL; ensure DB and S3/CloudFront are reachable from the endpoint VPC.
-
Automate → CI/CD builds image and model, runs tests, deploys to dev then prod via pipeline; Model Registry tracks versions.
-
Operate → Monitor drift, infra, and business metrics; use IAM for access; schedule or scale so the service runs 6:00–24:00 as required.
4. Order of work (suggested)
-
Containerize the app and run it locally with the model; confirm /process_request works.
-
Upload model to S3 and create a SageMaker Model (your image + artifact).
-
Create one SageMaker endpoint (single instance) and test from a client.
-
Wire DB (replace or adapt sql_conn) and verify writes from the endpoint.
-
Add autoscaling and optional queue (SQS) for concurrency and queueing.
-
Add pipeline (build → test → deploy dev → deploy prod) and Model Registry.
-
Add monitoring (Model Monitor + CloudWatch + business metrics) and RBAC.
This is the high-level architecture and the set of things you need to do; each step can be broken into smaller tasks (e.g. “write Dockerfile”, “implement SageMaker handler in container”) when you’re ready to implement.