Pre-Evaluation Checklist
Before contacting vendors, complete these steps to define your requirements.
- ☐ Define the workflow: Write one sentence describing the AI agent's job: who it helps, where conversations start, what information it uses, what actions it takes, and when humans must take over.
- ☐ List required channels: Which channels must the AI support? (Web chat, WhatsApp, email, Instagram, phone, in-app messaging)
- ☐ Identify knowledge sources: What documents, URLs, FAQs, or systems will train the AI? How often do they change?
- ☐ Define automation depth: Should the AI only answer questions, or also execute actions (refunds, order lookups, routing)?
- ☐ Set handoff rules: When should the AI escalate to humans? (Low confidence, sensitive topics, VIP customers, specific intents)
- ☐ Estimate volume: How many conversations per month? What's the growth projection?
- ☐ Set budget range: What can you afford monthly, including seats, channels, and overages?
Platform Evaluation Criteria
Use this checklist to evaluate each platform on your shortlist.
| Category | Criteria | Platform A | Platform B |
|---|---|---|---|
| Channels | Web chat | ☐ | ☐ |
| ☐ | ☐ | ||
| ☐ | ☐ | ||
| Social (Instagram, Messenger) | ☐ | ☐ | |
| Knowledge | Document upload (PDF, docs) | ☐ | ☐ |
| URL/website crawling | ☐ | ☐ | |
| FAQ/training interface | ☐ | ☐ | |
| Knowledge update frequency | ☐ | ☐ | |
| Workflow | Multi-step conversations | ☐ | ☐ |
| Action execution (APIs) | ☐ | ☐ | |
| CRM integrations | ☐ | ☐ | |
| Conditional logic/rules | ☐ | ☐ | |
| Handoff | Context preservation | ☐ | ☐ |
| Transcript visibility | ☐ | ☐ | |
| Confidence thresholds | ☐ | ☐ | |
| Agent suggested replies | ☐ | ☐ | |
| Pricing | Transparent pricing | ☐ | ☐ |
| Volume limits clear | ☐ | ☐ | |
| No hidden add-ons | ☐ | ☐ | |
| 12-month projection available | ☐ | ☐ |
Questions to Ask Vendors
Use these questions during demos and vendor calls. Record answers for comparison.
Knowledge & Training
- Can you demo with our actual documents and edge cases?
- How do you handle conflicting or outdated information?
- What file formats and size limits do you support?
- How quickly do knowledge updates propagate?
- Can we control which sources the AI uses for each topic?
Channels & Coverage
- Which channels are included in our plan tier?
- Are WhatsApp/SMS fees included or pass-through?
- Do you support our specific social media accounts?
- Can conversations switch between channels?
- What are the channel-specific limitations?
Workflow & Actions
- What actions can the AI perform without custom code?
- Which integrations are native vs. require setup?
- Can we set approval gates for sensitive actions?
- How do you handle failed API calls?
- What's the rate limit for workflow actions?
Handoff & Escalation
- What does a human agent see after escalation?
- Can we customize handoff triggers?
- How is customer context preserved?
- Can agents edit AI-suggested replies?
- What's the average handoff time?
Pricing & Limits
- What's included in the base price?
- What happens when we exceed limits?
- Are there implementation or onboarding fees?
- What add-ons might we need?
- Can you provide a 12-month cost projection?
Security & Compliance
- What security certifications do you have?
- Where is data stored and processed?
- Do you train models on our data?
- Can we export our data if we switch platforms?
- What's your data retention policy?
Demo Checklist
Request these specific demos to evaluate real-world performance.
- ☐ Knowledge test: Upload your actual documents and ask questions requiring the newest policy, an exception, and a source that shouldn't be used.
- ☐ Channel test: Run the same issue through web chat and your other required channels to compare quality.
- ☐ Handoff test: Force a low-confidence or sensitive case and verify what the human agent receives.
- ☐ Integration test: Show exactly what the AI can read, write, update, or trigger in your existing systems.
- ☐ Failed answer test: Ask the vendor to demonstrate how you fix an incorrect answer after launch.
- ☐ Volume test: Ask about performance under expected monthly conversation volume.
Red Flags
Walk away or investigate further if you encounter these warning signs.
- Vendor cannot demo with your actual content or edge cases
- Pricing is unclear or requires sales calls for basic information
- Channel support is described as "available" but requires third-party providers
- No visible audit trail or approval gates for sensitive actions
- Knowledge import fails on your actual documents or formats
- No clear answer on data ownership or export capabilities
- Reference customers are all in different industries or use cases
- Contract requires annual commitment without trial period
Decision Framework
Use this framework to score and compare your final candidates.
| Criteria | Weight | Platform A Score | Platform B Score |
|---|---|---|---|
| Workflow fit | 25% | /10 | /10 |
| Channel coverage | 15% | /10 | /10 |
| Knowledge quality | 20% | /10 | /10 |
| Handoff quality | 15% | /10 | /10 |
| Pricing transparency | 10% | /10 | /10 |
| Implementation ease | 10% | /10 | /10 |
| Security/compliance | 5% | /10 | /10 |
| Weighted Total | 100% | /10 | /10 |
Adjust weights based on your priorities. Support teams may weight handoff higher; ecommerce teams may weight integrations higher.
Pilot Checklist
Before full commitment, run a controlled pilot with these elements.
- ☐ Real knowledge sources: Use your actual documents, policies, and FAQs—not vendor demo content.
- ☐ Edge case test set: Prepare 20-50 questions covering common, edge, and failure scenarios.
- ☐ Named reviewers: Assign specific team members to review AI answers and flag issues.
- ☐ Written escalation policy: Define when humans must take over during the pilot.
- ☐ Success metrics: Define what success looks like before the pilot starts (resolution rate, accuracy, handoff rate).
- ☐ Cost tracking: Monitor actual usage costs vs. vendor projections.
- ☐ Feedback loop: Create a process for fixing failed answers and retesting.
- ☐ Exit criteria: Define what would cause you to reject the platform after pilot.
Related resources
- AI Chatbot Pricing Guide - Compare costs and calculate ROI
- How to Choose an AI Agent Platform - Complete buyer guide
- Our Methodology - How we evaluate platforms
- Vendor Scorecard - Request our evaluation scorecard
FAQ
Common questions
What should I look for when buying an AI agent platform?
Look for workflow fit (channels, actions, handoff), knowledge training quality, integration depth, pricing transparency, and implementation support. Test with your actual content before committing.
How do I evaluate AI chatbot vendors?
Request demos with your own knowledge sources, test edge cases, verify channel coverage, model pricing at scale, and run a pilot with real conversations before full commitment.
What questions should I ask AI chatbot vendors?
Ask about channel coverage, knowledge training limits, workflow actions, handoff context, pricing at scale, implementation time, security compliance, and what happens when you exceed limits.
How long should an AI chatbot pilot be?
A meaningful pilot typically runs 2-4 weeks with real knowledge sources, edge cases, and defined success metrics. Rush pilots often miss critical issues that surface in production.
Next step
Ready to compare platforms?
Use our comparison pages to evaluate your top candidates side by side.
