Tourist
AI-Powered Audio Tour Guide
A voice-first travel companion that generates personalized audio narrations in real-time as users explore cities worldwide. Virtual tours and live GPS-triggered experiences.
The Challenge
Real-Time Voice
Generate and play AI narrations with sub-200ms latency. Users expect instant audio when they arrive at a point of interest.
Dual Experience
Support both virtual tours (explore from home) and live walks (GPS-triggered narrations when physically present).
Engaging Content
Create narrations that feel like a knowledgeable local guide—not generic Wikipedia summaries. Multiple voice styles and depth levels.
Our Solution
A native iOS app with MVVM architecture, combining GPT-5.2 for dynamic narration and OpenAI TTS for natural voice synthesis over immersive 3D maps.
Virtual Tours
Pre-built routes through major cities
Live Walks
GPS-triggered POI discovery
AI Narration
GPT-5.2 generated stories
Natural Voice
OpenAI TTS synthesis
8 Cities
NYC, Paris, London, Rome...
100+ POIs
Curated points of interest
3D Maps
Hybrid realistic elevation
Voice Styles
Multiple narrator personas
Architecture Overview
- SwiftUI Views
- MapKit 3D
- AVFoundation
- Framer Motion
- SessionManager
- LocationService
- ProfileManager
- AudioEngine
- GPT-5-mini
- OpenAI TTS
- AIClient
- Streaming
- POIRepository
- Wikipedia Images
- UserDefaults
- JSON Tours
Two Tour Experiences
Virtual Tours
Explore cities from anywhere. Pre-built walking routes guide users through curated POIs with auto-advancing narrations and beautiful 3D map animations.
- Classic or Minimal view modes
- Auto-advance after narration
- Image carousel per POI
- Progress tracking & completion
Live Walks
Real-time discovery using GPS. As users physically approach points of interest, the app automatically triggers relevant narrations.
- Background location tracking
- Proximity-based triggers
- NowPlaying card with controls
- Frequency customization
Supported Cities
Technical Achievements
Voice Latency
From user action to audio playback. GPT-5.2 generates text, TTS converts to speech, AudioEngine plays—all in under 200ms perceived latency.
View Modes
Classic (full-featured with mini-map) and Minimal (clean, focused) views. User preference persisted via ProfileManager.
Uptime
Robust error handling with fallbacks. Location permission flows handle all edge cases. Share sheet iPad fixes applied.
Need voice AI for your app?
We specialize in voice-first AI experiences—from tour guides to customer support to accessibility tools.