Aurelia docs
Observability and evaluation
Aurelia should be observable like a guest-facing product layer, not treated as a black-box chat surface. That means tracking the interaction chain from launch to retrieval to handoff, then using that signal to improve both product behavior and hotel coverage.
CTOs, engineers, analytics owners, and product teams responsible for rollout quality.
7 min
Instrument Aurelia like a product system by tracking launches, evidence quality, answer outcomes, and handoffs.
Telemetry model
Track the guest journey as a product flow, not as isolated chat events.
Aurelia should emit a readable event chain that shows where the assistant launched, what kind of question it handled, which evidence class it used, and whether the guest moved into the next meaningful step. That event stream is what turns a pilot into an actual operating system rather than a demo.
| Event | Why it matters | Representative fields |
|---|---|---|
| Assistant launched | Shows which surfaces create engagement | pageType, sectionId, launcherId |
| Prompt submitted | Reveals guest intent and friction clusters | promptText, promptCategory, pinnedHotelSlug |
| Evidence used | Shows whether the answer came from snapshot data or live verification | sourceClass, retrievalHits, usedLiveLookup |
| Answer shown | Lets teams review quality against actual outputs | answerType, responseLength, confidenceLabel |
| Next step taken | Measures whether the answer helped movement | compareOpened, hotelClicked, rateHandoffClicked |
Representative host event hook already supported by the prototype contract.
window.PreferredConcierge?.init({
onEvent: (event) => {
analytics.track("aurelia_event", event);
}
});Answer review and evaluation loops
Aurelia needs a repeatable review process, not just a launch dashboard.
- Sample real conversations
Review prompts from each major page type so the team sees what guests actually ask instead of relying on imagined use cases.
- Inspect evidence quality
Check whether the answer stayed grounded to the right hotel set, first-party data, or clearly attributed live sources.
- Tag answer gaps
Separate missing data, weak prompt placement, poor context, and retrieval misses so the fix path is clear.
- Ship targeted improvements
Update knowledge, prompt surfaces, context fields, or answer guardrails based on the actual failure mode.
Operational health and alerts
Watch the system for freshness, latency, and evidence drift before the pilot scales.
- Monitor answer latency separately for snapshot-only answers and live-verified answers.
- Track how often the system falls back to live lookup because the core hotel snapshot is thin.
- Flag repeated unanswered questions by hotel, destination, or prompt surface.
- Watch rate handoff drop-off so the team can tell whether Aurelia is building confidence or creating another dead end.
Aurelia should not be measured like a support bot. The core question is whether it helps guests narrow the right stay faster and reach the next product step with more confidence.
Executive scorecard
Give leadership a small set of signals that reflect product value clearly.
- Which prompt surfaces create the most qualified launches.
- Which question classes most often lead to hotel-detail engagement or rate handoff.
- Which hotels or destinations create the most unresolved questions.
- How often live verification is needed because first-party knowledge is incomplete.
- What changes in qualified hotel evaluation and booking-path progression after launch.