production

8 guides tagged “production”.

Production ML Workflows: How We Serve an ONNX Model with FastAPI

A real, honest production architecture: an ONNX image classifier served by FastAPI on Railway, loaded from object storage at startup, with one shared inference session on CPU — and what we'd improve.

May 25, 20266 min read

The Silent Accuracy Killer: Preprocessing Mismatch

Your model scores 94% in the notebook and falls apart in production. The cause is usually not the model — it's a preprocessing mismatch between training and inference. Here's how to find and prevent it.

May 22, 20264 min read

inference debugging machine-learning production

Containerizing a FastAPI ML Service for Production

A practical guide to Dockerizing a FastAPI inference service: a lean multi-stage build, why you shouldn't bake the model into the image, sane defaults for uvicorn, and the mistakes that bloat images.

May 20, 20264 min read

fastapi docker mlops production deployment

Model Versioning and Rollback for ML Services

Models change more often than code, and a bad model can be worse than a bug. A practical guide to versioning model artifacts and rolling back fast when a new model underperforms.

May 18, 20264 min read

mlops production deployment machine-learning

Batching Inference Requests: Throughput vs Latency

Processing requests one at a time wastes hardware; batching them trades a little latency for a lot of throughput. How dynamic batching works, when it helps, and when a single shared session is enough.

May 17, 20264 min read

inference optimization mlops production

Monitoring ML Models in Production: Drift, Logging, and Alerts

A model that passed every test can still rot in production as the world changes. What to monitor for an ML service — latency, data drift, prediction drift — and how to start when you have zero instrumentation today.

May 16, 20265 min read

mlops monitoring production machine-learning

Securing an ML Inference API: Validation and Abuse Prevention

An ML endpoint that accepts file uploads is an attack surface. Practical hardening for inference APIs — input validation, size and rate limits, and the defenses that matter before you take traffic.

May 15, 20264 min read

security fastapi production mlops

Reducing Cold Starts in Containerized ML Services

When your service loads a model from object storage at boot, cold starts get slow. Why ML services start slowly, and the practical levers — image size, lazy loading, warm instances, and model size — to fix it.

May 14, 20264 min read

mlops production deployment optimization