Observability and evaluation

Aurelia should be observable like a guest-facing product layer, not treated as a black-box chat surface. That means tracking the interaction chain from launch to retrieval to handoff, then using that signal to improve both product behavior and hotel coverage.

CTOs, engineers, analytics owners, and product teams responsible for rollout quality.7 min

This page covers

Telemetry modelTrack the guest journey as a product flow, not as isolated chat events.Answer review and evaluation loopsAurelia needs a repeatable review process, not just a launch dashboard.Operational health and alertsWatch the system for freshness, latency, and evidence drift before the pilot scales.Executive scorecardGive leadership a small set of signals that reflect product value clearly.

Page details

Audience

CTOs, engineers, analytics owners, and product teams responsible for rollout quality.

Read time

7 min

Focus

Instrument Aurelia like a product system by tracking launches, evidence quality, answer outcomes, and handoffs.

Telemetry model

Track the guest journey as a product flow, not as isolated chat events.

Aurelia should emit a readable event chain that shows where the assistant launched, what kind of question it handled, which evidence class it used, and whether the guest moved into the next meaningful step. That event stream is what turns a pilot into an actual operating system rather than a demo.

Event	Why it matters	Representative fields
Assistant launched	Shows which surfaces create engagement	pageType, sectionId, launcherId
Prompt submitted	Reveals guest intent and friction clusters	promptText, promptCategory, pinnedHotelSlug
Evidence used	Shows whether the answer came from snapshot data or live verification	sourceClass, retrievalHits, usedLiveLookup
Answer shown	Lets teams review quality against actual outputs	answerType, responseLength, confidenceLabel
Next step taken	Measures whether the answer helped movement	compareOpened, hotelClicked, rateHandoffClicked

Representative host event hook already supported by the prototype contract.

window.PreferredConcierge?.init({
  onEvent: (event) => {
    analytics.track("aurelia_event", event);
  }
});

Answer review and evaluation loops

Aurelia needs a repeatable review process, not just a launch dashboard.

Sample real conversations
Review prompts from each major page type so the team sees what guests actually ask instead of relying on imagined use cases.
Inspect evidence quality
Check whether the answer stayed grounded to the right hotel set, first-party data, or clearly attributed live sources.
Tag answer gaps
Separate missing data, weak prompt placement, poor context, and retrieval misses so the fix path is clear.
Ship targeted improvements
Update knowledge, prompt surfaces, context fields, or answer guardrails based on the actual failure mode.

Operational health and alerts

Watch the system for freshness, latency, and evidence drift before the pilot scales.

Monitor answer latency separately for snapshot-only answers and live-verified answers.
Track how often the system falls back to live lookup because the core hotel snapshot is thin.
Flag repeated unanswered questions by hotel, destination, or prompt surface.
Watch rate handoff drop-off so the team can tell whether Aurelia is building confidence or creating another dead end.

Important product distinction

Aurelia should not be measured like a support bot. The core question is whether it helps guests narrow the right stay faster and reach the next product step with more confidence.

Executive scorecard

Give leadership a small set of signals that reflect product value clearly.

Which prompt surfaces create the most qualified launches.
Which question classes most often lead to hotel-detail engagement or rate handoff.
Which hotels or destinations create the most unresolved questions.
How often live verification is needed because first-party knowledge is incomplete.
What changes in qualified hotel evaluation and booking-path progression after launch.

PreviousIntegration surfaces NextOperating model