A real, honest production architecture: an ONNX image classifier served by FastAPI on Railway, loaded from object storage at startup, with one shared inference session on CPU — and what we'd improve.
A practical guide to Dockerizing a FastAPI inference service: a lean multi-stage build, why you shouldn't bake the model into the image, sane defaults for uvicorn, and the mistakes that bloat images.
An ML endpoint that accepts file uploads is an attack surface. Practical hardening for inference APIs — input validation, size and rate limits, and the defenses that matter before you take traffic.