Securing an ML Inference API: Validation and Abuse Prevention
An inference endpoint that accepts an uploaded image is, from a security standpoint, a service that lets strangers send arbitrary bytes to your server and trigger expensive computation. That's not a reason to panic — it's a reason to validate at the boundary. This guide covers the practical hardening that belongs on any ML API before it sees real traffic.
The threat model
Be concrete about what can go wrong when anyone can POST to /analyze:
- Resource exhaustion — huge files, or a flood of requests, exhaust memory, CPU, or your bill.
- Malformed input — corrupt or hostile files crash the decoder or the model.
- Decompression bombs — a tiny file that expands to gigabytes when decoded.
- Abuse / scraping — someone runs your model as their free API.
None of these are exotic. All of them are cheap to defend against.
Validate at the boundary
The single most important principle: untrusted input gets checked before it reaches the model. The endpoint is the only door, so it's where the locks go.
Enforce content type and size
Check the declared type and cap the size before reading the whole body into memory:
from fastapi import FastAPI, UploadFile, File, HTTPException
app = FastAPI()
ALLOWED = {"image/jpeg", "image/png", "image/webp"}
MAX_BYTES = 10 * 1024 * 1024 # 10 MB
@app.post("/analyze")
async def analyze(file: UploadFile = File(...)):
if file.content_type not in ALLOWED:
raise HTTPException(415, "Unsupported media type")
data = await file.read(MAX_BYTES + 1)
if len(data) > MAX_BYTES:
raise HTTPException(413, "File too large")
...
Reading MAX_BYTES + 1 means you stop before pulling an unbounded body into
memory — you never trust the client's declared length.
Don't trust the content type — verify the bytes
A client can claim image/png and send anything. Confirm the file really is the
image it claims by validating it with your image library, and treat a decode
failure as a rejection, not a crash:
import io
from PIL import Image
try:
img = Image.open(io.BytesIO(data))
img.verify() # structural check
img = Image.open(io.BytesIO(data)) # re-open to actually use
except Exception:
raise HTTPException(422, "Invalid or corrupt image")
Defend against decompression bombs
A 50 KB PNG can decode to a 25,000×25,000 image that eats gigabytes. Pillow has a
guard (MAX_IMAGE_PIXELS) that raises on absurd dimensions — keep it enabled, and
add your own explicit dimension cap:
MAX_DIM = 8000
if img.width > MAX_DIM or img.height > MAX_DIM:
raise HTTPException(422, "Image dimensions too large")
Since the model resizes everything to 224×224 anyway (see Preprocessing Mismatch), there's no legitimate reason to accept a 25-megapixel upload.
Rate limiting
Validation stops bad requests; rate limiting stops too many requests. Without it, one client can saturate your CPU-bound inference and deny service to everyone else. A simple per-IP limit covers most of the risk:
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
@app.post("/analyze")
@limiter.limit("30/minute")
async def analyze(...):
...
For anything serious, enforce limits at the edge too (your platform's WAF or a reverse proxy) so the abusive traffic never reaches your app.
CORS: don't use a wildcard
If a browser app calls your API, configure CORS for your origins — not *. A
wildcard lets any website on the internet call your API from a user's browser:
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["https://yourapp.com"], # not "*"
allow_methods=["POST"],
allow_headers=["*"],
)
Don't leak internals in errors
A stack trace in a 500 response hands an attacker a map of your system. Return a generic message to the client and log the detail server-side:
@app.exception_handler(Exception)
async def on_error(request, exc):
logger.exception("unhandled error") # full detail in logs
return JSONResponse(status_code=500,
content={"error": "Internal error"}) # generic to client
Secrets stay in the environment
Your R2/S3 keys, any API tokens — none of them belong in the code or the image.
Read them from environment variables, keep them out of git (.env in
.gitignore), and set them in your platform's secret manager. A model loaded
from object storage (the pattern in
Production ML Workflows) needs credentials; those
credentials must never land in a commit or a Docker layer.
A pre-traffic checklist
Before an inference API takes real traffic:
- Content-type allowlist ✓
- Size cap enforced before full read ✓
- Bytes verified as a real image, decode failures handled ✓
- Dimension / pixel cap against decompression bombs ✓
- Per-IP rate limiting ✓
- CORS restricted to your origins ✓
- Generic error responses, detailed server logs ✓
- Secrets in env vars, never in code or image ✓
- Run the container as non-root (see Containerizing a FastAPI ML Service) ✓
Conclusion
Securing an ML API isn't about exotic ML-specific attacks — it's web security fundamentals applied to an endpoint that happens to run a model: validate input at the boundary, cap size and dimensions, rate-limit, lock down CORS, hide internal errors, and keep secrets in the environment. Do these before you take traffic, not after the first incident.
For the container hardening that complements this, see Containerizing a FastAPI ML Service. For the serving architecture it protects, see Production ML Workflows.