deployment

7 guides tagged “deployment”.

Getting Started with ONNX: Train and Deploy Custom Models

A practical, end-to-end guide to ONNX: what it is, how to export models from PyTorch and TensorFlow, run fast inference with ONNX Runtime, and ship to production.

May 28, 20265 min read

Server-Side vs On-Device ML Inference: How to Choose

The core trade-off in ML deployment: run inference on a central server or push it to the device. Real case: why TrichAi chose the server, what it cost, and when you'd choose differently.

May 24, 20265 min read

inference deployment mlops machine-learning

Choosing an ONNX Runtime Execution Provider: CPU, CUDA, TensorRT, CoreML

ONNX Runtime can dispatch the same model to very different hardware backends. A practical guide to execution providers — what each is for, how the fallback chain works, and how to choose.

May 21, 20264 min read

onnx inference optimization deployment

Containerizing a FastAPI ML Service for Production

A practical guide to Dockerizing a FastAPI inference service: a lean multi-stage build, why you shouldn't bake the model into the image, sane defaults for uvicorn, and the mistakes that bloat images.

May 20, 20264 min read

fastapi docker mlops production deployment

Choosing a Model Format: ONNX vs TorchScript vs SavedModel

Once a model is trained, how you serialize it shapes everything downstream. A practical comparison of ONNX, TorchScript, and TensorFlow SavedModel — portability, performance, and lock-in.

May 19, 20264 min read

onnx deployment machine-learning inference

Model Versioning and Rollback for ML Services

Models change more often than code, and a bad model can be worse than a bug. A practical guide to versioning model artifacts and rolling back fast when a new model underperforms.

May 18, 20264 min read

mlops production deployment machine-learning

Reducing Cold Starts in Containerized ML Services

When your service loads a model from object storage at boot, cold starts get slow. Why ML services start slowly, and the practical levers — image size, lazy loading, warm instances, and model size — to fix it.

May 14, 20264 min read

mlops production deployment optimization