Learn how to configure answer verification in Pika, where a second LLM agent automatically evaluates and verifies the accuracy of responses before they're sent to users.
What You'll Accomplish
Section titled “What You'll Accomplish”By the end of this guide, you will:
- Enable answer verification for chat apps
- Configure verification prompts and criteria
- Understand the verification workflow
- Handle verification failures
- Monitor verification effectiveness
- Optimize verification performance
Prerequisites
Section titled “Prerequisites”- A running Pika installation
- Agents configured for your chat apps
- Understanding of your accuracy requirements
- Familiarity with LLM capabilities
Understanding Answer Verification
Section titled “Understanding Answer Verification”Answer verification uses a separate LLM agent to evaluate responses before delivery:
- Primary Agent generates a response
- Verifier Agent evaluates the response
- System either delivers or regenerates based on verification
- User receives verified, high-quality response
Benefits
Section titled “Benefits”- Improved Accuracy: Catches errors before users see them
- Quality Assurance: Maintains consistent response quality
- Trust Building: Increases user confidence
- Error Detection: Identifies hallucinations and mistakes
Step 1: Enable Verification Feature
Section titled “Step 1: Enable Verification Feature”Configure answer verification at the site or chat app level.
Site-Wide Configuration
Section titled “Site-Wide Configuration”Location: apps/pika-chat/pika-config.ts
export const pikaConfig: PikaConfig = { siteFeatures: { verifyResponse: { enabled: true,
// Default verification prompt defaultVerificationPrompt: `Review the following response for accuracy, relevance, and quality.
Original Question: {question}Agent's Response: {response}
Evaluate:1. Accuracy: Is the information correct?2. Relevance: Does it answer the question?3. Completeness: Is anything important missing?4. Clarity: Is it easy to understand?
If the response passes all criteria, respond with "VERIFIED".If issues exist, respond with "FAILED: [brief explanation]".`,
// Optional: Max verification attempts maxAttempts: 2,
// Optional: Verifier model verifierModelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0' } }};Chat App Configuration
Section titled “Chat App Configuration”Enable for specific chat apps:
chatApps: [ { chatAppId: 'medical-advisor', title: 'Medical Advisor', agentId: 'medical-agent',
// Enable verification for this high-stakes app features: { verifyResponse: { enabled: true,
// Custom verification for medical domain verificationPrompt: `You are a medical accuracy reviewer.
Patient Question: {question}AI Response: {response}
Verify:1. Medical accuracy and current guidelines2. Appropriate disclaimers present3. No harmful advice given4. Encourages professional consultation when appropriate
Respond "VERIFIED" only if all criteria met.Respond "FAILED: [reason]" if any issues found.`,
maxAttempts: 3, // Higher for critical domains verifierModelId: 'anthropic.claude-3-opus-20240229-v1:0' // Use most capable model } } }]Step 2: Configure Verification Criteria
Section titled “Step 2: Configure Verification Criteria”Define what makes a response acceptable.
General Purpose Verification
Section titled “General Purpose Verification”verificationPrompt: `Evaluate the response for quality.
User Question: {question}Agent Response: {response}Context: {context}
Criteria:- Factually accurate- Directly addresses the question- Well-structured and clear- Appropriate tone and length- No contradictions or confusion
Result: VERIFIED or FAILED: [explanation]`Factual Accuracy Focus
Section titled “Factual Accuracy Focus”verificationPrompt: `Verify factual accuracy of the response.
Question: {question}Response: {response}Available Tools: {tools}
Check:1. All facts are correct and verifiable2. No speculation presented as fact3. Sources or reasoning provided when appropriate4. Uncertainty acknowledged when present
If factually sound: VERIFIEDIf errors detected: FAILED: [specific issues]`Safety and Compliance
Section titled “Safety and Compliance”verificationPrompt: `Review response for safety and compliance.
User Input: {question}AI Response: {response}
Verify:1. No harmful, illegal, or unethical advice2. Appropriate warnings and disclaimers3. Complies with company policies4. Protects user privacy5. Escalates sensitive issues appropriately
Safe to deliver: VERIFIEDIssues found: FAILED: [concerns]`Domain-Specific Verification
Section titled “Domain-Specific Verification”// Financial advice verificationverificationPrompt: `Review financial advice for accuracy and compliance.
Client Question: {question}Advisor Response: {response}
Requirements:1. Complies with financial regulations2. Includes required risk disclosures3. Factually accurate market information4. Appropriate for client's stated situation5. Recommends professional consultation when needed
Approved: VERIFIEDIssues: FAILED: [details]`Step 3: Configure Verification Behavior
Section titled “Step 3: Configure Verification Behavior”Control how the system handles verification results.
Retry on Failure
Section titled “Retry on Failure”verifyResponse: { enabled: true, maxAttempts: 3, // Regenerate up to 3 times
// Prompt modification for retry retryPrompt: `Previous response failed verification: {failureReason}
Please provide an improved response that addresses these issues.
Original Question: {question}`,}Fallback Behavior
Section titled “Fallback Behavior”verifyResponse: { enabled: true, maxAttempts: 2,
// What to do if all attempts fail fallbackBehavior: 'show_error', // or 'show_last_attempt'
fallbackMessage: 'I apologize, but I was unable to generate a sufficiently accurate response. Please try rephrasing your question or contact support for assistance.'}Verification Timeout
Section titled “Verification Timeout”verifyResponse: { enabled: true, verificationTimeoutMs: 10000, // 10 seconds max for verification
// Behavior on timeout onTimeout: 'deliver', // or 'fail'}Step 4: Monitor Verification
Section titled “Step 4: Monitor Verification”Track verification performance and effectiveness.
CloudWatch Metrics
Section titled “CloudWatch Metrics”Verification automatically logs metrics:
// Logged automatically{ metric: 'ResponseVerification', agentId: 'medical-agent', chatAppId: 'medical-advisor', verified: true, // or false attempts: 1, verificationTimeMs: 1250, failureReason: null // or string explaining failure}View Verification Logs
Section titled “View Verification Logs”# View verification results in CloudWatchaws logs tail /aws/lambda/pika-dev-converse --follow --filter-pattern "ResponseVerification"
# Count failuresaws logs filter-log-events \ --log-group-name /aws/lambda/pika-dev-converse \ --filter-pattern '"verified":false' \ --start-time $(date -d '1 hour ago' +%s)000Custom Verification Logging
Section titled “Custom Verification Logging”// In custom verification handlerexport function customVerificationLogger(result: VerificationResult) { if (!result.verified) { console.warn('Verification failed', { chatAppId: result.chatAppId, reason: result.failureReason, attempts: result.attempts, timestamp: new Date().toISOString() });
// Send to monitoring service sendToDatadog({ metric: 'verification.failure', tags: [`app:${result.chatAppId}`, `reason:${result.failureReason}`] }); }}Step 5: Optimize Verification
Section titled “Step 5: Optimize Verification”Balance accuracy with performance and cost.
Selective Verification
Section titled “Selective Verification”Only verify certain types of responses:
verifyResponse: { enabled: true,
// Verify only when certain conditions met verifyWhen: { // Verify if response contains specific keywords responseContains: ['medical', 'financial', 'legal', 'safety'],
// Verify if tools were used toolsInvoked: true,
// Verify for specific user types userTypes: ['external-user', 'trial-user'], }}Faster Verifier Model
Section titled “Faster Verifier Model”Use a faster model for verification:
verifyResponse: { enabled: true, // Use Haiku for speed (Sonnet for balance, Opus for accuracy) verifierModelId: 'anthropic.claude-3-haiku-20240307-v1:0',}Cached Verification
Section titled “Cached Verification”Cache verification results for similar responses:
verifyResponse: { enabled: true, cacheResults: true, cacheTTLSeconds: 3600, // 1 hour
// Cache key includes question and response cacheKeyGenerator: (question, response) => { return `${hashString(question)}-${hashString(response)}`; }}Testing Checklist
Section titled “Testing Checklist”Verify answer verification works correctly:
Best Practices
Section titled “Best Practices”Verification Design
Section titled “Verification Design”- Clear Criteria: Define specific, measurable verification criteria
- Binary Decision: Verification should clearly pass or fail
- Actionable Feedback: Failure reasons should guide regeneration
- Domain-Specific: Tailor verification to your use case
Performance
Section titled “Performance”- Model Selection: Balance accuracy with speed
- Timeout Configuration: Prevent hanging requests
- Selective Verification: Don't verify everything
- Cache When Possible: Reduce redundant verifications
Quality
Section titled “Quality”- Test Extensively: Verify verification works as intended
- Monitor Failures: Track patterns in verification failures
- Iterate Prompts: Refine verification criteria over time
- User Feedback: Collect user input on quality
Cost Management
Section titled “Cost Management”- Use Appropriate Models: Haiku for simple checks, Sonnet for balanced, Opus for critical
- Limit Attempts: Prevent excessive regeneration
- Cache Results: Reduce redundant API calls
- Selective Application: Verify only high-risk responses
Common Patterns
Section titled “Common Patterns”Progressive Verification
Section titled “Progressive Verification”Start with fast checks, escalate to thorough review:
verifyResponse: { enabled: true, stages: [ { name: 'quick-check', verifierModelId: 'anthropic.claude-3-haiku-20240307-v1:0', prompt: 'Quick safety and relevance check...', timeoutMs: 2000 }, { name: 'thorough-review', verifierModelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0', prompt: 'Detailed accuracy and quality review...', timeoutMs: 5000, onlyIfPreviousPassed: true } ]}Confidence-Based Verification
Section titled “Confidence-Based Verification”Verify based on primary agent's confidence:
verifyResponse: { enabled: true, verifyWhen: { primaryAgentConfidence: { lessThan: 0.8 }, or: { responseLength: { greaterThan: 500 }, toolsUsed: { count: { greaterThan: 2 } } } }}Troubleshooting
Section titled “Troubleshooting”All Responses Failing
Section titled “All Responses Failing”- Review verification prompt for overly strict criteria
- Check verifier model has appropriate capabilities
- Test verification prompt independently
- Verify verifier has access to necessary context
Verification Too Slow
Section titled “Verification Too Slow”- Use faster verifier model (Haiku instead of Sonnet)
- Reduce verification timeout
- Implement caching
- Make verification prompt more concise
False Positives/Negatives
Section titled “False Positives/Negatives”- Refine verification criteria
- Add specific examples to prompt
- Test with edge cases
- Iterate based on production data
High Costs
Section titled “High Costs”- Use less expensive verifier model
- Reduce max attempts
- Implement selective verification
- Cache verification results
Next Steps
Section titled “Next Steps”- Enable Self-Correcting Responses - Automatic correction
- Use Instruction Assistance - Improve prompts
- Monitor with Traces - Debug verification
Related Documentation
Section titled “Related Documentation”- Verify Response Feature - Complete feature details
- Model Selection Guide - Choose appropriate models
- Monitoring Guide - Track performance