Skip to content

Enable Self-Correcting Responses

Learn how to enable the Verify Response feature that automatically evaluates AI responses using another LLM agent invocation, providing quality grading and automatic correction of inaccurate answers.

By the end of this guide, you will:

  • Enable response verification at the site level
  • Configure auto-reprompt quality thresholds
  • Set up access control for the feature
  • Override settings per chat app
  • Understand the quality grading system
  • Monitor verification results
  • A running Pika installation
  • Access to pika-config.ts for site-level configuration
  • Understanding of your quality requirements
  • Users with appropriate access configured

The Verify Response feature uses an independent LLM to evaluate each AI response, assign a quality grade, and automatically retry if the response falls below your quality threshold.

  1. Initial Response: The agent generates an answer
  2. Verification: A separate LLM evaluates the response accuracy
  3. Grade Assignment: Response receives a grade (A, B, C, or F)
  4. Auto-Reprompt: If grade is below threshold, automatically retry
  5. Trace Display: Verification grade is shown to users (if traces enabled)
GradeClassificationDescription
AAccurateFactually accurate and complete
BAccurate with Stated AssumptionsAccurate but contains clearly stated assumptions
CAccurate with Unstated AssumptionsAccurate but contains unstated assumptions
FInaccurateInaccurate or contains made-up information

Configure the Verify Response feature in your pika-config.ts.

Location: apps/pika-chat/pika-config.ts

export const pikaConfig: PikaConfig = {
siteFeatures: {
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C', // Retry on C or F grades
userTypes: ['internal-user', 'external-user'],
userRoles: ['customer-support'],
applyRulesAs: 'or' // User needs userType OR userRole
}
}
};
PropertyTypeDescription
enabledbooleanEnable the verify response feature
autoRepromptThreshold'B' | 'C' | 'F'Grade threshold for auto-retry
userTypesstring[]User types that can use this feature
userRolesstring[]User roles that can use this feature
applyRulesAs'and' | 'or'How to combine userTypes and userRoles

Choose when the system should automatically retry generating a response.

// Retry only on inaccurate responses
autoRepromptThreshold: 'F' // Most lenient
// Retry on responses with unstated assumptions or worse (recommended)
autoRepromptThreshold: 'C' // Balanced
// Retry on responses with any assumptions or worse
autoRepromptThreshold: 'B' // Strictest

Use 'F' (Inaccurate Only) when:

  • Performance and cost are primary concerns
  • Only critical inaccuracies need correction
  • Users can handle responses with assumptions

Use 'C' (Recommended) when:

  • Balancing quality and performance
  • Most production use cases
  • Want to catch both inaccuracies and unclear assumptions

Use 'B' (Strictest) when:

  • Absolute accuracy is critical
  • Cost/performance are less important
  • Healthcare, finance, or compliance-heavy domains

Determine which users should have verified responses.

verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['internal-user', 'external-user']
}
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['internal-user']
}
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['internal-user', 'external-user'],
userRoles: ['customer-support', 'sales-rep'],
applyRulesAs: 'or' // Either user type OR role
}
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'B',
userTypes: ['internal-user'],
userRoles: ['quality-assurance'],
applyRulesAs: 'and' // Must be internal AND have QA role
}

Individual chat apps can customize verification settings.

const simpleChatApp: ChatApp = {
chatAppId: 'quick-faq',
title: 'Quick FAQ',
// ... other properties
features: {
verifyResponse: {
featureId: 'verifyResponse',
enabled: false // Disable verification for this app
}
}
};
const criticalChatApp: ChatApp = {
chatAppId: 'medical-advice',
title: 'Medical Information',
// ... other properties
features: {
verifyResponse: {
featureId: 'verifyResponse',
enabled: true,
autoRepromptThreshold: 'B', // Stricter than site default
userTypes: ['internal-user', 'external-user']
}
}
};
const internalChatApp: ChatApp = {
chatAppId: 'internal-tools',
title: 'Internal Tools',
// ... other properties
features: {
verifyResponse: {
featureId: 'verifyResponse',
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['internal-user'], // More restrictive than site
userRoles: ['engineer', 'analyst']
}
}
};
Terminal window
# If using local development
cd apps/pika-chat
pnpm run dev
# If deploying to AWS
cd services/pika
pnpm run deploy
  1. Start a chat session with verification enabled
  2. Ask questions that might have quality issues
  3. Observe verification badges in responses (A, B, C, F)
  4. Check for auto-reprompts when responses fall below threshold

To see verification grades and reasoning:

siteFeatures: {
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['internal-user']
},
traces: {
enabled: true, // Enable to see verification details
userTypes: ['internal-user']
}
}
// Simplified verification flow
if (features.verifyResponse.enabled) {
// 1. Generate main response
let mainResponse = await invokeAgent(userQuestion);
// 2. Verify the response
let verificationResult = await invokeAgentToVerifyAnswer(
userQuestion,
mainResponse
);
// 3. Check if auto-reprompt is needed
if (shouldAutoReprompt(verificationResult.grade, autoRepromptThreshold)) {
// 4. Generate improved response
mainResponse = await invokeAgent(
userQuestion,
'Please provide a more accurate response'
);
// 5. Verify the new response
verificationResult = await invokeAgentToVerifyAnswer(
userQuestion,
mainResponse
);
}
// 6. Add verification trace
addVerificationTrace(verificationResult.grade);
return mainResponse;
}

Auto-reprompting triggers when:

  • Response grade is at or below configured threshold
  • Grades B, C, and F are "retryable" (Grade A is not)
  • Feature is enabled and user has appropriate permissions
// Ensure customers receive accurate information
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['external-user']
}

Benefits:

  • Accurate information for customers
  • Build trust in AI responses
  • Reduce support escalations
// Critical accuracy for sensitive domains
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'B', // Strictest
userTypes: ['internal-user', 'external-user']
}

Benefits:

  • Meet regulatory requirements
  • Ensure information accuracy
  • Reduce liability from inaccurate info
// Verified information for employees
verifyResponse: {
enabled: true,
autoRepromptThreshold: 'C',
userTypes: ['internal-user']
}

Benefits:

  • High-quality internal documentation
  • Policy compliance
  • Employee training accuracy
  • Doubled Processing: Each response requires two LLM calls (initial + verification)
  • Auto-Reprompt Overhead: Poor responses trigger additional retries
  • Mitigation: Enable selectively for critical chat apps only
  • Increased Token Usage: Verification requires additional tokens
  • Retry Costs: Auto-reprompted responses use more compute
  • Balance: Quality improvements vs increased operational costs
  1. Threshold Tuning: Set appropriate thresholds to minimize unnecessary retries
  2. User-Based Enablement: Enable only for users who need high accuracy
  3. Chat App Targeting: Focus on chat apps where accuracy is most critical
  4. Monitor Metrics: Track verification rates and costs

Verify the feature works correctly:

  • Verify enabled: true in site configuration
  • Check user types/roles are configured
  • Ensure user has required permissions
  • Review CloudWatch logs for errors
  • Verify threshold configuration matches expectations
  • Check user permissions for the feature
  • Review verification grades in traces
  • Ensure threshold is set to retryable grade ('B', 'C', or 'F')
  • Lower auto-reprompt threshold (e.g., 'F' instead of 'C')
  • Reduce user access scope
  • Disable for non-critical chat apps
  • Monitor token usage in CloudWatch
  • Adjust threshold to be more lenient
  • Review agent instructions for clarity
  • Check if verification prompts are appropriate
  • Monitor verification accuracy over time