Learn how to enable the Verify Response feature that automatically evaluates AI responses using another LLM agent invocation, providing quality grading and automatic correction of inaccurate answers.
What You'll Accomplish
Section titled “What You'll Accomplish”By the end of this guide, you will:
- Enable response verification at the site level
- Configure auto-reprompt quality thresholds
- Set up access control for the feature
- Override settings per chat app
- Understand the quality grading system
- Monitor verification results
Prerequisites
Section titled “Prerequisites”- A running Pika installation
- Access to
pika-config.tsfor site-level configuration - Understanding of your quality requirements
- Users with appropriate access configured
Understanding Self-Correcting Responses
Section titled “Understanding Self-Correcting Responses”The Verify Response feature uses an independent LLM to evaluate each AI response, assign a quality grade, and automatically retry if the response falls below your quality threshold.
How It Works
Section titled “How It Works”- Initial Response: The agent generates an answer
- Verification: A separate LLM evaluates the response accuracy
- Grade Assignment: Response receives a grade (A, B, C, or F)
- Auto-Reprompt: If grade is below threshold, automatically retry
- Trace Display: Verification grade is shown to users (if traces enabled)
Quality Grades
Section titled “Quality Grades”| Grade | Classification | Description |
|---|---|---|
| A | Accurate | Factually accurate and complete |
| B | Accurate with Stated Assumptions | Accurate but contains clearly stated assumptions |
| C | Accurate with Unstated Assumptions | Accurate but contains unstated assumptions |
| F | Inaccurate | Inaccurate or contains made-up information |
Step 1: Enable at Site Level
Section titled “Step 1: Enable at Site Level”Configure the Verify Response feature in your pika-config.ts.
Location: apps/pika-chat/pika-config.ts
export const pikaConfig: PikaConfig = { siteFeatures: { verifyResponse: { enabled: true, autoRepromptThreshold: 'C', // Retry on C or F grades userTypes: ['internal-user', 'external-user'], userRoles: ['customer-support'], applyRulesAs: 'or' // User needs userType OR userRole } }};Configuration Options
Section titled “Configuration Options”| Property | Type | Description |
|---|---|---|
enabled | boolean | Enable the verify response feature |
autoRepromptThreshold | 'B' | 'C' | 'F' | Grade threshold for auto-retry |
userTypes | string[] | User types that can use this feature |
userRoles | string[] | User roles that can use this feature |
applyRulesAs | 'and' | 'or' | How to combine userTypes and userRoles |
Step 2: Configure Auto-Reprompt Threshold
Section titled “Step 2: Configure Auto-Reprompt Threshold”Choose when the system should automatically retry generating a response.
Threshold Options
Section titled “Threshold Options”// Retry only on inaccurate responsesautoRepromptThreshold: 'F' // Most lenient
// Retry on responses with unstated assumptions or worse (recommended)autoRepromptThreshold: 'C' // Balanced
// Retry on responses with any assumptions or worseautoRepromptThreshold: 'B' // StrictestThreshold Guidance
Section titled “Threshold Guidance”Use 'F' (Inaccurate Only) when:
- Performance and cost are primary concerns
- Only critical inaccuracies need correction
- Users can handle responses with assumptions
Use 'C' (Recommended) when:
- Balancing quality and performance
- Most production use cases
- Want to catch both inaccuracies and unclear assumptions
Use 'B' (Strictest) when:
- Absolute accuracy is critical
- Cost/performance are less important
- Healthcare, finance, or compliance-heavy domains
Step 3: Configure Access Control
Section titled “Step 3: Configure Access Control”Determine which users should have verified responses.
All Users
Section titled “All Users”verifyResponse: { enabled: true, autoRepromptThreshold: 'C', userTypes: ['internal-user', 'external-user']}Internal Users Only
Section titled “Internal Users Only”verifyResponse: { enabled: true, autoRepromptThreshold: 'C', userTypes: ['internal-user']}Specific Roles
Section titled “Specific Roles”verifyResponse: { enabled: true, autoRepromptThreshold: 'C', userTypes: ['internal-user', 'external-user'], userRoles: ['customer-support', 'sales-rep'], applyRulesAs: 'or' // Either user type OR role}Combined Rules
Section titled “Combined Rules”verifyResponse: { enabled: true, autoRepromptThreshold: 'B', userTypes: ['internal-user'], userRoles: ['quality-assurance'], applyRulesAs: 'and' // Must be internal AND have QA role}Step 4: Override Per Chat App (Optional)
Section titled “Step 4: Override Per Chat App (Optional)”Individual chat apps can customize verification settings.
Disable for Specific Chat App
Section titled “Disable for Specific Chat App”const simpleChatApp: ChatApp = { chatAppId: 'quick-faq', title: 'Quick FAQ', // ... other properties features: { verifyResponse: { featureId: 'verifyResponse', enabled: false // Disable verification for this app } }};Different Threshold Per App
Section titled “Different Threshold Per App”const criticalChatApp: ChatApp = { chatAppId: 'medical-advice', title: 'Medical Information', // ... other properties features: { verifyResponse: { featureId: 'verifyResponse', enabled: true, autoRepromptThreshold: 'B', // Stricter than site default userTypes: ['internal-user', 'external-user'] } }};More Restrictive Access
Section titled “More Restrictive Access”const internalChatApp: ChatApp = { chatAppId: 'internal-tools', title: 'Internal Tools', // ... other properties features: { verifyResponse: { featureId: 'verifyResponse', enabled: true, autoRepromptThreshold: 'C', userTypes: ['internal-user'], // More restrictive than site userRoles: ['engineer', 'analyst'] } }};Step 5: Deploy and Test
Section titled “Step 5: Deploy and Test”Deploy Your Configuration
Section titled “Deploy Your Configuration”# If using local developmentcd apps/pika-chatpnpm run dev
# If deploying to AWScd services/pikapnpm run deployTest Verification
Section titled “Test Verification”- Start a chat session with verification enabled
- Ask questions that might have quality issues
- Observe verification badges in responses (A, B, C, F)
- Check for auto-reprompts when responses fall below threshold
Enable Traces for Visibility
Section titled “Enable Traces for Visibility”To see verification grades and reasoning:
siteFeatures: { verifyResponse: { enabled: true, autoRepromptThreshold: 'C', userTypes: ['internal-user'] }, traces: { enabled: true, // Enable to see verification details userTypes: ['internal-user'] }}Verification Process Details
Section titled “Verification Process Details”How Verification Works
Section titled “How Verification Works”// Simplified verification flowif (features.verifyResponse.enabled) { // 1. Generate main response let mainResponse = await invokeAgent(userQuestion);
// 2. Verify the response let verificationResult = await invokeAgentToVerifyAnswer( userQuestion, mainResponse );
// 3. Check if auto-reprompt is needed if (shouldAutoReprompt(verificationResult.grade, autoRepromptThreshold)) { // 4. Generate improved response mainResponse = await invokeAgent( userQuestion, 'Please provide a more accurate response' );
// 5. Verify the new response verificationResult = await invokeAgentToVerifyAnswer( userQuestion, mainResponse ); }
// 6. Add verification trace addVerificationTrace(verificationResult.grade);
return mainResponse;}Auto-Reprompt Logic
Section titled “Auto-Reprompt Logic”Auto-reprompting triggers when:
- Response grade is at or below configured threshold
- Grades B, C, and F are "retryable" (Grade A is not)
- Feature is enabled and user has appropriate permissions
Use Cases
Section titled “Use Cases”Customer Support
Section titled “Customer Support”// Ensure customers receive accurate informationverifyResponse: { enabled: true, autoRepromptThreshold: 'C', userTypes: ['external-user']}Benefits:
- Accurate information for customers
- Build trust in AI responses
- Reduce support escalations
Healthcare & Finance
Section titled “Healthcare & Finance”// Critical accuracy for sensitive domainsverifyResponse: { enabled: true, autoRepromptThreshold: 'B', // Strictest userTypes: ['internal-user', 'external-user']}Benefits:
- Meet regulatory requirements
- Ensure information accuracy
- Reduce liability from inaccurate info
Internal Knowledge Management
Section titled “Internal Knowledge Management”// Verified information for employeesverifyResponse: { enabled: true, autoRepromptThreshold: 'C', userTypes: ['internal-user']}Benefits:
- High-quality internal documentation
- Policy compliance
- Employee training accuracy
Performance Considerations
Section titled “Performance Considerations”Response Time Impact
Section titled “Response Time Impact”- Doubled Processing: Each response requires two LLM calls (initial + verification)
- Auto-Reprompt Overhead: Poor responses trigger additional retries
- Mitigation: Enable selectively for critical chat apps only
Cost Implications
Section titled “Cost Implications”- Increased Token Usage: Verification requires additional tokens
- Retry Costs: Auto-reprompted responses use more compute
- Balance: Quality improvements vs increased operational costs
Optimization Strategies
Section titled “Optimization Strategies”- Threshold Tuning: Set appropriate thresholds to minimize unnecessary retries
- User-Based Enablement: Enable only for users who need high accuracy
- Chat App Targeting: Focus on chat apps where accuracy is most critical
- Monitor Metrics: Track verification rates and costs
Testing Checklist
Section titled “Testing Checklist”Verify the feature works correctly:
Troubleshooting
Section titled “Troubleshooting”Verification Not Working
Section titled “Verification Not Working”- Verify
enabled: truein site configuration - Check user types/roles are configured
- Ensure user has required permissions
- Review CloudWatch logs for errors
Auto-Reprompt Not Triggering
Section titled “Auto-Reprompt Not Triggering”- Verify threshold configuration matches expectations
- Check user permissions for the feature
- Review verification grades in traces
- Ensure threshold is set to retryable grade ('B', 'C', or 'F')
Performance Issues
Section titled “Performance Issues”- Lower auto-reprompt threshold (e.g., 'F' instead of 'C')
- Reduce user access scope
- Disable for non-critical chat apps
- Monitor token usage in CloudWatch
High False Positive Rate
Section titled “High False Positive Rate”- Adjust threshold to be more lenient
- Review agent instructions for clarity
- Check if verification prompts are appropriate
- Monitor verification accuracy over time
Next Steps
Section titled “Next Steps”- Monitor with Traces - View verification results
- Configure User Memory - Improve response quality
- Use Instruction Assistance - Better prompt engineering
Related Documentation
Section titled “Related Documentation”- Self-Correcting Capability - Learn more about verification
- Answer Reasoning - Understanding response quality
- Feature Configuration Reference - Complete feature options