Santosh Dahal

Human-Centered
Speech Intelligence

Designing speech systems that recognize, diagnose, and assist diverse speakers through efficient and adaptive AI

Santosh Dahal
🏢FormerCTO · Diyo.ai2022–2025 — led AI product & engineering
đź“…5+Years in AI & NLPLLMs, ASR, conversational AI & production ML
🎓M.ScInformatics & Intelligence SystemsNLP, machine learning & intelligent systems engineering
🚀5+Production AI SystemsDeployed across vision, NLP, and robotics
Research

My Research Vision

Human speech is central to communication, learning, and social participation. Yet many people—language learners, speakers with strong accents, and individuals with speech impairments—face barriers when interacting with modern speech technologies.

My research explores how speech AI can move beyond simple transcription to become diagnostic, adaptive, and assistive.

🔬

Diagnostic Speech Systems

Build systems that analyze pronunciation patterns, phonetic deviations, and accent variation to help with language learning and speech therapy.

đź’¬

Personalized Feedback

Transform diagnostic outputs into actionable guidance for pronunciation training, speech rehabilitation, and adaptive language learning.

📱

Accessible Speech AI

Make speech intelligence accessible and deployable through knowledge distillation, parameter-efficient adaptation, and on-device inference.

Experience

Research Experience

On-Device ASR with Knowledge Distillation

Speech

Built a lightweight on-device ASR system via knowledge distillation from a Fast Conformer RNN-T teacher, achieving a 32.7% relative WER improvement — the highest reported for Conformer RNN-T + KD.

ASRKnowledge DistillationFast ConformerEdge AI

MoPE-LoRA: Mixture of Phonetic Experts for Accented ASR

Speech

Proposed MoPE-LoRA, a PEFT framework that routes Conformer encoder frames to six manner-of-articulation LoRA experts via hybrid phoneme-supervision and learned acoustic gating — no accent labels required. On L2-ARCTIC, it achieves 10.43% WER, outperforming Full FT (12.80%) and Single LoRA (11.33%), with 12.3% relative zero-shot improvement over Single LoRA across unseen accents.

ASRMoPE-LoRAMixture of ExpertsConformerAccent AdaptationPEFTL2-ARCTIC

Browser Based ASR

Speech

Implemented a full in-browser ASR pipeline in JavaScript using ONNX Runtime Web, covering audio capture, log-mel spectrogram extraction, encoder-decoder inference, and text decoding — entirely client-side with no server dependency.

ASRONNX RuntimeJavaScriptWeb MLClient-Side Inference

LLM Evaluation for Low-Resource Nepali (SRMH)

NLP

Technical lead on a Bill & Melinda Gates Foundation-funded project. Deployed a RAG-based SRMH chatbot serving 5,000+ Nepali users and fine-tuned BLOOM 7B for low-resource Nepali generation.

LLMsRAGLow-Resource NLPBLOOM 7B

Improving Government Service Delivery via NLP

NLP

Technical lead on a UNDP-funded project. Built a multilingual RAG chatbot (LangChain + FAISS) to digitize citizen charters, deployed across two municipalities after surveying 3,000+ citizens.

NLPRAGLangChain

SLAM-Based AR Localization

Computer Vision

Built SLAM-based localization APIs enabling persistent AR on mobile devices, implemented RGBD and Monocular Visual SLAM systems, and developed the ROS package Kachuwa during a research internship at NAAMII.

SLAMARComputer VisionROS
Background

Academic Foundation

My academic training spans electronics engineering, machine learning, and intelligent systems — providing a strong foundation for research in speech AI and NLP.

M.Sc in Informatics and Intelligence System Engineering

Institute of Engineering, Tribhuvan University — Kathmandu, Nepal

2024 – 2026

Modern Natural Language Processing

N-gram Language ModelsPOS and Named Entity RecognitionVector Semantics and EmbeddingSemantic AnalysisNeural Language Models and Deep Learning Architectures

Applied Machine Learning

Regression and Classification ModelsCluster Methods and Mixture ModelsProbabilistic Graphical Models and InferencesReinforcement Learning

Computer Vision

Multiple View GeometryMotion AnalysisImage Classification and Object Detection

Bachelor in Electronics and Communication Engineering

Institute of Engineering, Tribhuvan University — Pokhara, Nepal

2015 – 2019

Statistics and Probability

Discrete and Normal DistributionsSampling DistributionsTest of HypothesisLinear Regression

Object Oriented Programming

ObjectsClassesTemplatesOperator OverloadingInheritancePolymorphism and Dynamic Binding

Artificial Intelligence

Fundamentals of Intelligent SystemsSearch & Constraint SatisfactionKnowledge Representation & Reasoning
Professional

Professional Background

Alongside my research focus, I have built diverse expertise as a full-stack engineer and AI practitioner. My professional work spans conversational AI, computer vision, robotics, and IoT systems, complementing my research vision with practical engineering experience.

Chief Technology Officer·Diyo.ai Technologies

2022 – 2025
NLPASRLLMsTeam Leadership

Led development of Nepali Speech Recognition APIs, overseeing a team of 5+ engineers. Designed a human+AI transcription platform cutting costs by 75%. Built a versatile RAG-based chatbot platform serving multiple domains. Served as technical lead for major grants including the Bill & Melinda Gates Foundation (SRMH) and the AMPLIFY project.

Research & Development Engineer·Diyo.ai Technologies

2020 – 2022
Computer VisionSLAMARNLP

Built SLAM-based localization API services enabling persistent AR on mobile devices, in collaboration with ETH Zurich Computer Vision Lab. Developed web-based jewelry AR with NLP-enabled voice interaction, enhancing the e-commerce customer experience.

Research Intern·NAAMII

2019 – 2020
SLAMComputer VisionPythonC++

Conducted literature reviews and designed experiments for SLAM algorithms. Implemented RGBD SLAM and Monocular Visual SLAM systems, and developed the ROS package Kachuwa.

Full Stack Developer·Dreamsys IT Solution

2016 – 2019
Full-StackREST APIsPostgreSQL

Designed databases and implemented RESTful APIs for SurveyChan, improving data retrieval speed by 50%. Collaborated with mobile app developers to integrate e-commerce and CMS systems across platforms.

Technology Stack

PythonC/C++PyTorchNVIDIA NeMoHugging Face TransformersLangChainFAISSLanceDBPostgreSQLReactNext.jsVue.jsDjangoFastAPIDockerRasaSLAMROSComputer Vision
Contact

Let's Connect

If you are interested in collaboration, research discussions, or just want to chat about speech AI and accessibility, feel free to reach out.

Always open to interesting research conversations and collaboration opportunities.