Choosing the right LLM for your product: a practical guide
"Which model should we use?" is the wrong first question. The right one is "what does each part of our product actually need?" Once you break a feature into its real tasks, the model choices usually pick themselves — and you often end up using more than one.
Match the model to the task
A quick classification or routing step does not need a frontier model — a small, fast one is cheaper and lower-latency. A nuanced summarisation or reasoning step might justify a larger model. Treat model choice as a per-task decision, not a single platform-wide bet.
- Latency: is this in a user’s critical path, or a background job?
- Cost: how many calls per request, and at what volume?
- Quality: how much does an occasional miss actually cost?
- Privacy: can this data leave your environment at all?
Privacy can decide it for you
If you are handling regulated or sensitive data, where the model runs matters as much as how good it is. Sometimes a smaller model you can host yourself beats a stronger hosted one you are not allowed to send the data to. We design for that constraint up front rather than discovering it at launch.
Build so you can switch
Models change monthly. Put a thin abstraction between your product and any specific provider, keep your prompts and evals in one place, and you can swap or mix models without rewriting the app. The goal is a system that gets better as models do — not one welded to today’s best option.
When we build AI features, model selection is an engineering decision tied to your latency, cost and privacy needs — not a logo on a slide.