Designing speech systems that recognize, diagnose, and assist diverse speakers through efficient and adaptive AI

Human speech is central to communication, learning, and social participation. Yet many people—language learners, speakers with strong accents, and individuals with speech impairments—face barriers when interacting with modern speech technologies.
My research explores how speech AI can move beyond simple transcription to become diagnostic, adaptive, and assistive.
Build systems that analyze pronunciation patterns, phonetic deviations, and accent variation to help with language learning and speech therapy.
Transform diagnostic outputs into actionable guidance for pronunciation training, speech rehabilitation, and adaptive language learning.
Make speech intelligence accessible and deployable through knowledge distillation, parameter-efficient adaptation, and on-device inference.
Built a lightweight on-device ASR system via knowledge distillation from a Fast Conformer RNN-T teacher, achieving a 32.7% relative WER improvement — the highest reported for Conformer RNN-T + KD.
Proposed MoPE-LoRA, a PEFT framework that routes Conformer encoder frames to six manner-of-articulation LoRA experts via hybrid phoneme-supervision and learned acoustic gating — no accent labels required. On L2-ARCTIC, it achieves 10.43% WER, outperforming Full FT (12.80%) and Single LoRA (11.33%), with 12.3% relative zero-shot improvement over Single LoRA across unseen accents.
Implemented a full in-browser ASR pipeline in JavaScript using ONNX Runtime Web, covering audio capture, log-mel spectrogram extraction, encoder-decoder inference, and text decoding — entirely client-side with no server dependency.
Technical lead on a Bill & Melinda Gates Foundation-funded project. Deployed a RAG-based SRMH chatbot serving 5,000+ Nepali users and fine-tuned BLOOM 7B for low-resource Nepali generation.
Technical lead on a UNDP-funded project. Built a multilingual RAG chatbot (LangChain + FAISS) to digitize citizen charters, deployed across two municipalities after surveying 3,000+ citizens.
Built SLAM-based localization APIs enabling persistent AR on mobile devices, implemented RGBD and Monocular Visual SLAM systems, and developed the ROS package Kachuwa during a research internship at NAAMII.
My academic training spans electronics engineering, machine learning, and intelligent systems — providing a strong foundation for research in speech AI and NLP.
Institute of Engineering, Tribhuvan University — Kathmandu, Nepal
Modern Natural Language Processing
Applied Machine Learning
Computer Vision
Institute of Engineering, Tribhuvan University — Pokhara, Nepal
Statistics and Probability
Object Oriented Programming
Artificial Intelligence
Alongside my research focus, I have built diverse expertise as a full-stack engineer and AI practitioner. My professional work spans conversational AI, computer vision, robotics, and IoT systems, complementing my research vision with practical engineering experience.
Led development of Nepali Speech Recognition APIs, overseeing a team of 5+ engineers. Designed a human+AI transcription platform cutting costs by 75%. Built a versatile RAG-based chatbot platform serving multiple domains. Served as technical lead for major grants including the Bill & Melinda Gates Foundation (SRMH) and the AMPLIFY project.
Built SLAM-based localization API services enabling persistent AR on mobile devices, in collaboration with ETH Zurich Computer Vision Lab. Developed web-based jewelry AR with NLP-enabled voice interaction, enhancing the e-commerce customer experience.
Conducted literature reviews and designed experiments for SLAM algorithms. Implemented RGBD SLAM and Monocular Visual SLAM systems, and developed the ROS package Kachuwa.
Designed databases and implemented RESTful APIs for SurveyChan, improving data retrieval speed by 50%. Collaborated with mobile app developers to integrate e-commerce and CMS systems across platforms.
If you are interested in collaboration, research discussions, or just want to chat about speech AI and accessibility, feel free to reach out.
Always open to interesting research conversations and collaboration opportunities.