AI Glossaryinfrastructure
Latency
The time delay between sending a request to an AI model and receiving the response, critical for real-time user-facing applications.
How It Works
In AI applications, latency has two key measurements: time-to-first-token (TTFT), how long before the first word appears, and total generation time. TTFT matters most for user experience because streaming makes the rest feel fast. Typical TTFT for cloud APIs: GPT-5-mini ~200-400ms, GPT-5.2 ~400-800ms, Claude Opus 4.6 ~500-1000ms.
Factors that increase latency: larger models, longer prompts (more input tokens to process), complex reasoning modes, geographic distance to the API server, and provider load. Factors that decrease it: smaller models, shorter prompts, edge deployment, request caching, and streaming.
For production apps, target <500ms TTFT for conversational features and <2 seconds total for short responses. Strategies to reduce latency: (1) Use the smallest model that meets quality needs, (2) Keep prompts concise, (3) Enable streaming for all user-facing features, (4) Cache common requests, (5) Use provider regions closest to your users. For non-user-facing tasks, latency matters less and you can optimize for cost instead.
Common Use Cases
- 1Optimizing chat response times
- 2Choosing between model tiers
- 3Real-time feature design
- 4User experience benchmarking
Related Terms
Inference
The process of running a trained AI model to generate predictions or outputs from new inputs, as opposed to training the model.
StreamingA method of receiving AI model output token-by-token in real time as it is generated, rather than waiting for the complete response.
Model ServingThe infrastructure and process of hosting a trained AI model and exposing it as an API endpoint for real-time or batch inference.
GPU / TPUSpecialized processors designed for the parallel mathematical operations that AI models require for training and inference.
Need help implementing Latency?
AI 4U Labs builds production AI apps in 2-4 weeks. We use Latency in real products every day.
Let's Talk