mlops

10 guides tagged “mlops”.

AutoML vs Custom Models: When to Use Each

A decision framework for choosing between AutoML platforms and hand-built models — covering cost, control, accuracy, and the trade-offs that actually matter in production.

May 27, 20264 min read

ML Automation for Developers: AI Workflows That Work

How to automate the repetitive parts of the ML lifecycle — retraining, evaluation, and inference pipelines — using tools developers already know.

May 26, 20264 min read

ml-automation mlops machine-learning workflow

Production ML Workflows: How We Serve an ONNX Model with FastAPI

A real, honest production architecture: an ONNX image classifier served by FastAPI on Railway, loaded from object storage at startup, with one shared inference session on CPU — and what we'd improve.

May 25, 20266 min read

mlops production onnx fastapi inference

Server-Side vs On-Device ML Inference: How to Choose

The core trade-off in ML deployment: run inference on a central server or push it to the device. Real case: why TrichAi chose the server, what it cost, and when you'd choose differently.

May 24, 20265 min read

inference deployment mlops machine-learning

Containerizing a FastAPI ML Service for Production

A practical guide to Dockerizing a FastAPI inference service: a lean multi-stage build, why you shouldn't bake the model into the image, sane defaults for uvicorn, and the mistakes that bloat images.

May 20, 20264 min read

fastapi docker mlops production deployment

Model Versioning and Rollback for ML Services

Models change more often than code, and a bad model can be worse than a bug. A practical guide to versioning model artifacts and rolling back fast when a new model underperforms.

May 18, 20264 min read

mlops production deployment machine-learning

Batching Inference Requests: Throughput vs Latency

Processing requests one at a time wastes hardware; batching them trades a little latency for a lot of throughput. How dynamic batching works, when it helps, and when a single shared session is enough.

May 17, 20264 min read

inference optimization mlops production

Monitoring ML Models in Production: Drift, Logging, and Alerts

A model that passed every test can still rot in production as the world changes. What to monitor for an ML service — latency, data drift, prediction drift — and how to start when you have zero instrumentation today.

May 16, 20265 min read

mlops monitoring production machine-learning

Securing an ML Inference API: Validation and Abuse Prevention

An ML endpoint that accepts file uploads is an attack surface. Practical hardening for inference APIs — input validation, size and rate limits, and the defenses that matter before you take traffic.

May 15, 20264 min read

security fastapi production mlops

Reducing Cold Starts in Containerized ML Services

When your service loads a model from object storage at boot, cold starts get slow. Why ML services start slowly, and the practical levers — image size, lazy loading, warm instances, and model size — to fix it.

May 14, 20264 min read

mlops production deployment optimization