What an LLM does
At its core, an LLM predicts what text should come next given the text that came before. This simple mechanism, when scaled to billions of parameters and trained on internet-scale data, produces remarkably sophisticated behaviors: understanding questions, following instructions, reasoning through problems, summarizing documents, translating languages, and generating creative content.
For AI agents, LLMs provide the ability to interpret user requests, retrieve and synthesize information, decide on actions, and formulate responses. However, an LLM alone is not an agent. It needs surrounding infrastructure: knowledge retrieval (RAG), tool integration, memory systems, and workflow controls.
Major LLM families
- GPT-4 and GPT-4o (OpenAI): Among the most capable general-purpose models. Strong reasoning, instruction following, and function calling. GPT-4o offers faster responses and multimodal capabilities. Widely available through API and used by many AI agent platforms.
- Claude 3 family (Anthropic): Opus for complex reasoning, Sonnet for balanced performance, Haiku for speed and cost efficiency. Known for strong safety practices, long context windows, and nuanced instruction following. Popular for enterprise applications.
- Gemini (Google): Pro and Ultra models with strong multimodal capabilities. Native integration with Google's ecosystem. Flash models for faster responses. Competitive reasoning and coding abilities.
- Llama (Meta): Open-weight models that can run on your own infrastructure. Llama 3 offers competitive performance with the advantage of data control and customization. Requires more technical setup.
- Mistral and others: European models with strong performance-to-cost ratios. Often used for specialized deployments or cost-optimized configurations.
Capabilities
Modern LLMs can perform a wide range of tasks relevant to AI agents:
- Natural language understanding: Parse user requests, identify intent, extract key information, and handle variations in phrasing.
- Instruction following: Execute detailed instructions about format, tone, constraints, and workflow steps.
- Reasoning: Work through multi-step problems, consider alternatives, and explain decisions.
- Function calling: Structure outputs to trigger external tools, APIs, and workflows.
- Context handling: Maintain conversation history and reference earlier statements.
- Multimodal processing: Many models can understand images, audio, and documents alongside text.
Limitations
Understanding LLM limitations is essential for building reliable AI agents:
- Hallucination: LLMs can generate plausible-sounding but false information. They do not distinguish between knowledge they have and patterns they infer. Always ground LLM outputs in verified sources.
- No inherent knowledge access: LLMs have no direct access to your business data, policies, or real-time information. They only know what was in their training data and what you provide through context or retrieval.
- Knowledge cutoffs: Training data has a cutoff date. Models do not know recent events, updated policies, or new product information unless provided through RAG.
- Reasoning failures: Complex reasoning can fail in subtle ways. Models may make logical errors, miss edge cases, or reach incorrect conclusions confidently.
- Prompt sensitivity: Small changes in wording can produce different outputs. Results may vary between runs on the same input.
- Security vulnerabilities: Prompt injection can override instructions. Models can be tricked into revealing training data patterns or bypassing constraints.
- Cost and latency: Larger models are slower and more expensive per token. Long conversations and complex retrieval add to costs.
Model selection factors
When evaluating AI agent platforms, consider these LLM-related factors:
- Model choice: Does the platform let you choose models, or is it locked to one provider? Can you mix models for different tasks?
- Performance on your tasks: Test models against your specific evaluation set, not generic benchmarks. A model that excels at coding may struggle with your support conversations.
- Latency: What response times does the model deliver under load? How does latency change with context length and complexity?
- Cost: What is the cost per conversation, per token, per tool call? How do costs scale with usage?
- Data privacy: Where is the model hosted? Does data leave your region? What are the provider's data retention and training policies?
- Stability: How often does the model change? Can you pin to specific versions? What happens when the provider updates?
LLMs in AI agent architecture
The LLM is one component in a larger system:
- Input processing: User requests pass through intent detection, entity extraction, and context assembly before reaching the LLM.
- Knowledge retrieval: RAG systems fetch relevant documents, policies, and data to ground the LLM's responses.
- Tool integration: Function calling enables the LLM to trigger actions, but execution happens outside the model.
- Response filtering: Outputs may pass through moderation, PII detection, and business rule checks before reaching users.
- Memory systems: Conversation history and user context are stored and retrieved separate from the LLM itself.
What buyers should ask
- Which LLMs does the platform support? Can I choose or switch models?
- How does the platform handle model updates and versioning?
- What is the pricing model for LLM usage? Are there caps or overage charges?
- How does the platform mitigate hallucination and ensure grounded responses?
- What happens when the primary model has an outage? Are there fallback options?
- Can I bring my own model or run models on my infrastructure?
- How are model outputs logged and audited for compliance?
Related terms
- AI Agent - The system architecture around the LLM
- RAG - Retrieval augmented generation for knowledge grounding
- Prompt Engineering - Designing instructions for the LLM
- Multimodal AI - LLMs extended to images, audio, and more
