Skip to content

AI-Driven Insights

AI-Driven Insights provide automatic analytics for every chat session. Understand how your agents perform with metrics like goal completion, sentiment, complexity, and quality - all computed automatically without manual analysis.

Session Insights

For each completed session, Pika automatically calculates comprehensive metrics:

  • Goal Completion - Likelihood the user's goal was achieved
  • Sentiment - User's emotional tone throughout conversation
  • Complexity - How difficult the session was to resolve
  • Quality Indicators - Response accuracy, completeness, helpfulness
  • Performance Metrics - Response time, turn count, tool usage
  • Issue Flags - Problems detected during the session

These insights enable data-driven agent improvement without manual session review.

Without automatic insights:

  • You don't know if users accomplished their goals
  • User satisfaction is a black box
  • Can't identify problematic patterns
  • No baseline for measuring improvements
  • Hard to prioritize what to fix

Insights transform thousands of chat sessions into actionable intelligence:

  • Track success rates across chat apps and agents
  • Identify struggling users who need intervention
  • Measure improvements after instruction changes
  • Compare performance across different agents
  • Prioritize work based on data, not guesses

What it measures: Whether the user achieved what they came for

How it's computed: AI analyzes the conversation to determine:

  • Did the user's questions get answered?
  • Were requested actions completed?
  • Did the conversation reach natural resolution?
  • Did user express satisfaction?

Example insights:

{
"goalCompletion": {
"achieved": true,
"confidence": 0.85,
"evidence": "User confirmed order was found and address updated successfully"
}
}

Why it matters: The ultimate success metric - did we help the user?

What it measures: User's emotional tone throughout the conversation

How it's computed: Natural language analysis of user messages:

  • Positive indicators (thanks, great, perfect, etc.)
  • Negative indicators (frustrated, confused, angry, etc.)
  • Tone progression over the conversation
  • Overall sentiment trajectory

Example insights:

{
"sentiment": {
"overall": "positive",
"progression": ["neutral", "frustrated", "satisfied"],
"finalSentiment": "satisfied"
}
}

Why it matters: Unhappy users = problem sessions needing review

What it measures: How difficult the session was to resolve

How it's computed: Multiple factors contribute:

  • Number of turns required
  • Tool invocations needed
  • Failures and retries
  • Topic switches
  • Clarification requests

Example insights:

{
"complexity": {
"score": 0.65, // 0-1 scale
"factors": {
"turns": 8, // More turns = higher complexity
"toolCalls": 3,
"failures": 1,
"topicSwitches": 2
}
}
}

Why it matters: High complexity sessions may need agent improvements

What it measures: Response quality across multiple dimensions

How it's computed: Analysis of agent responses:

  • Accuracy (factually correct)
  • Completeness (fully answered question)
  • Helpfulness (actionable information)
  • Appropriateness (tone and content)
  • Efficiency (minimal back-and-forth)

Example insights:

{
"quality": {
"overall": 0.82,
"dimensions": {
"accuracy": 0.90,
"completeness": 0.85,
"helpfulness": 0.80,
"efficiency": 0.75
}
}
}

Why it matters: Identifies specific quality dimensions needing improvement

What it measures: Operational performance

Tracked metrics:

  • Time to first response
  • Average response time per turn
  • Total session duration
  • Tool execution times
  • Number of tool invocations

Example insights:

{
"performance": {
"timeToFirstResponse": 2.3, // seconds
"avgResponseTime": 3.1,
"totalDuration": 185, // seconds
"toolInvocations": 3,
"avgToolTime": 0.8
}
}

Why it matters: Performance issues indicate infrastructure or efficiency problems

What it measures: Problems encountered during session

Detected issues:

  • Tool failures
  • Verification failures (low grades)
  • Incomplete responses
  • User frustration signals
  • Policy violations
  • Error conditions

Example insights:

{
"issues": [
{
"type": "toolFailure",
"tool": "order-lookup",
"turn": 2,
"severity": "medium"
},
{
"type": "userFrustration",
"turn": 5,
"severity": "low",
"evidence": "User said 'that's not what I asked'"
}
]
}

Why it matters: Surface problems that need immediate attention

  1. Session completes

    User ends chat or period of inactivity

  2. EventBridge schedules analysis

    Background process queued to run

  3. Insight runner analyzes session

    Lambda function computes all metrics:

    • Reads full conversation history
    • Analyzes user and agent messages
    • Reviews tool invocations and results
    • Incorporates verification grades
    • Considers traces and reasoning
  4. Results stored

    Insights written as JSON to:

    • DynamoDB (with session data)
    • S3 (for long-term storage)
    • OpenSearch (for searching and filtering)
  5. Available in Admin Site

    Insights immediately queryable and explorable

Flexible scheduling options:

  • Process immediately after session ends
  • Batch process every N minutes
  • Run during off-peak hours
  • On-demand processing for specific sessions
const siteConfig = {
sessionInsights: {
enabled: true,
computeImmediately: false, // Batch process instead
scheduleExpression: 'rate(15 minutes)'
}
};

Control which insights to compute:

sessionInsights: {
enabled: true,
metrics: {
goalCompletion: true,
sentiment: true,
complexity: true,
quality: true,
performance: true,
issues: true
}
}

Define what constitutes good/bad:

sessionInsights: {
thresholds: {
minGoalCompletion: 0.7, // 70% confidence minimum
minQuality: 0.75, // 75% quality minimum
maxComplexity: 0.6, // Flag if > 60% complexity
negativeSentiment: 'concern' // Alert on negative sentiment
}
}

Comprehensive views of your agents' performance:

Overview Dashboard:

  • Goal completion rate trend
  • Average sentiment over time
  • Quality score distribution
  • Most common issues

Session List with Filters:

  • Filter by goal achieved (yes/no)
  • Filter by sentiment (positive/negative/neutral)
  • Filter by quality score range
  • Filter by complexity level
  • Filter by date range
  • Filter by chat app or agent

Session Detail View:

  • All metrics for specific session
  • Full conversation with insights overlay
  • Traces with quality indicators
  • Feedback (LLM and human)

See patterns across many sessions:

By Chat App:

Customer Support: 85% goal completion, 3.2 avg complexity
Sales Assistant: 92% goal completion, 2.1 avg complexity
Tech Support: 78% goal completion, 4.5 avg complexity

By Time Period:

Week 1: 82% goal completion, 0.78 quality score
Week 2: 88% goal completion, 0.84 quality score ← Improving!
Week 3: 91% goal completion, 0.87 quality score

By Agent:

general-agent: 85% goal completion
specialist-agent: 94% goal completion ← Better for complex queries
fallback-agent: 72% goal completion

Track agent effectiveness:

  • Daily/weekly goal completion rates
  • Quality score trends
  • Performance regression detection
  • Comparison across agents

Data-driven refinement:

  1. Measure baseline metrics
  2. Make instruction changes
  3. Compare new metrics to baseline
  4. Iterate based on results

Find issues quickly:

  • Sessions with goal completion < 70%
  • Negative sentiment sessions
  • High complexity sessions
  • Sessions with multiple issues

Improve satisfaction:

  • Identify frustration patterns
  • Reduce complexity in common scenarios
  • Optimize slow responses
  • Fix recurring failures

Understand load and performance:

  • Peak usage times
  • Average session duration
  • Tool usage patterns
  • Infrastructure bottlenecks

Define domain-specific measurements:

customInsights: [
{
name: 'policyCompliance',
compute: (session) => {
// Your custom logic
return complianceScore;
}
},
{
name: 'escalationRate',
compute: (session) => {
// Detect human escalations
return wasEscalated;
}
}
]

Get notified of concerning patterns:

  • Goal completion drops below threshold
  • Spike in negative sentiment
  • Unusual number of failures
  • Performance degradation

Compare across dimensions:

  • Agent A vs Agent B
  • This week vs last week
  • Internal users vs external users
  • Chat app X vs chat app Y

Use insights elsewhere:

  • Export to CSV/JSON
  • Feed into BI tools (Tableau, Looker)
  • Integrate with monitoring (Datadog, New Relic)
  • Generate executive reports

Understand normal performance:

  • Not every session will complete goal
  • Some complexity is normal
  • Initial quality may be lower
  • Track trends, not absolute perfection

Make insights review a habit:

  • Weekly review of key metrics
  • Monthly deep dive into patterns
  • Quarterly goal-setting based on data
  • Continuous improvement mindset

Insights are only valuable if you use them:

  • Prioritize fixes for common issues
  • Celebrate improvements
  • Share wins with team
  • Document what worked

Holistic view of quality:

  • Insights + LLM Feedback + Human Feedback
  • Quantitative metrics + Qualitative reviews
  • Short-term spikes + Long-term trends
  • Technical metrics + Business outcomes

Insights generate quickly:

  • Average: 1-2 seconds per session
  • Asynchronous (no user impact)
  • Batch processing optimized
  • Scales automatically

Minimal storage overhead:

  • Average 2-3KB per session
  • Compressed JSON in DynamoDB
  • Indexed in OpenSearch
  • S3 for long-term archive

Insights use minimal resources:

  • Simple computation (not LLM-based)
  • Efficient batch processing
  • Pay only for compute used
  • Typical cost: <$0.001 per session

Enable Insights

Configure insights generation for your deployment.

How-To Guide →

View in Admin Site

Explore session insights in the admin interface.

Admin Site →

Understanding Insights

Deep dive into insight computation architecture.

Read Concepts →

LLM-Generated Feedback

Qualitative analysis complements quantitative insights.

Learn More →

Self-Correcting Responses

Real-time quality control generates data for insights.

Learn More →