LioShuTan uses SIRAYA Model Router to power AI-driven language learning and speaking assessment experiences through Speech-to-Text (ASR) and Text-to-Speech (TTS) technologies.
概览
LioShuTan is an education platform focused on language learning experiences, including AI speaking practice, listening exercises, pronunciation assessment, and interactive learning workflows.
The platform currently utilizes BytePlus Speech-to-Text (ASR) and Text-to-Speech (TTS) services for speaking verification and AI-assisted listening interactions. Its core workflow evaluates whether users can accurately pronounce and express words and sentences across different learning scenarios and age groups.
At the same time, the platform is actively evaluating advanced pronunciation assessment capabilities such as Azure Pronunciation Assessment to enable more fine-grained phonetic analysis and detailed pronunciation feedback.
As the product approaches full production launch in the coming months, scalability, low-latency feedback, and flexible AI orchestration have become increasingly important.
Real Challenges
In AI-powered language learning environments, LioShuTan encountered several production-level challenges:
- Different ASR models showed noticeable differences in pronunciation assessment accuracy
- Current word-level confidence scoring could not precisely identify specific pronunciation issues
- Different languages, age groups, and learning scenarios required different AI speech capabilities
- Real-time speaking assessment workloads required low latency and stable infrastructure
- The platform needed continuous evaluation and testing across multiple speech AI providers
Additionally, educational speech applications require significantly higher standards for feedback quality, responsiveness, and user experience compared to standard speech AI use cases.
解决方案
SIRAYA provided LioShuTan with a unified Model Router architecture, enabling centralized access to multiple speech AI services, dynamic routing, and flexible model evaluation.
With SIRAYA Model Router, the platform can flexibly orchestrate BytePlus ASR/TTS, Azure Pronunciation Assessment, and other speech AI services while dynamically selecting the most suitable models for different learning tasks and pronunciation evaluation scenarios.
Key capabilities include:
- Unified access and management across multiple speech AI services
- Flexible testing of different ASR and pronunciation assessment models
- Dynamic speech AI orchestration based on learning scenarios
- Improved stability for real-time speaking assessment workloads
- Reduced complexity for multi-provider speech AI integration
Measurable Impact
After integrating SIRAYA Model Router, LioShuTan achieved significant improvements in both AI language learning experiences and speech AI operations:
- More stable AI speaking feedback and interactive learning experiences
- Greater flexibility in testing different ASR and pronunciation assessment capabilities
- Better adaptation across different age groups and language learning scenarios
- Reduced operational complexity for multi-provider speech AI systems
- Easier expansion into new AI language learning features and workflows
More importantly, LioShuTan established a scalable speech AI infrastructure foundation designed for long-term AI-assisted language learning environments.