I build speech and language AI systems that are accurate, efficient, and inclusive for diverse speakers, with experience from research prototypes to production deployment.

Built a lightweight on-device ASR system via knowledge distillation from a Fast Conformer RNN-T teacher, achieving a 32.7% relative WER improvement โ the highest reported for Conformer RNN-T + KD.
Proposed MoPE-LoRA, a PEFT framework that routes Conformer encoder frames to six manner-of-articulation LoRA experts via hybrid phoneme-supervision and learned acoustic gating โ no accent labels required. On L2-ARCTIC, it achieves 10.43% WER, outperforming Full FT (12.80%) and Single LoRA (11.33%), with 12.3% relative zero-shot improvement over Single LoRA across unseen accents.
Implemented a full in-browser ASR pipeline in JavaScript using ONNX Runtime Web, covering audio capture, log-mel spectrogram extraction, encoder-decoder inference, and text decoding โ entirely client-side with no server dependency.
Technical lead on a Bill & Melinda Gates Foundation-funded project. Deployed a RAG-based SRMH chatbot serving 5,000+ Nepali users and fine-tuned LLMs for low-resource Nepali generation.
Technical lead on a UNDP-funded project. Built a multilingual RAG chatbot to digitize citizen charters, deployed across two municipalities after surveying 3,000+ citizens.
Built SLAM-based localization APIs enabling persistent AR on mobile devices, implemented RGBD and Monocular Visual SLAM systems, and developed the ROS package Kachuwa during a research internship at NAAMII.
My academic training spans electronics engineering, machine learning, and intelligent systems โ providing a strong foundation for research in speech AI and NLP.
Institute of Engineering, Tribhuvan University โ Kathmandu, Nepal
Modern Natural Language Processing
Applied Machine Learning
Computer Vision
Institute of Engineering, Tribhuvan University โ Pokhara, Nepal
Statistics and Probability
Object Oriented Programming
Artificial Intelligence
Alongside my research focus, I have built diverse expertise as a full-stack engineer and AI practitioner. My professional work spans conversational AI, computer vision, robotics, and IoT systems, complementing my research vision with practical engineering experience.
Led development of Nepali Speech Recognition APIs, overseeing a team of 5+ engineers. Designed a human+AI transcription platform cutting costs by 75%. Built a versatile RAG-based chatbot platform serving multiple domains. Served as technical lead for major grants including the Bill & Melinda Gates Foundation (SRMH) and the AMPLIFY project.
Built SLAM-based localization API services enabling persistent AR on mobile devices, in collaboration with ETH Zurich Computer Vision Lab. Developed web-based jewelry AR with NLP-enabled voice interaction, enhancing the e-commerce customer experience.
Conducted literature reviews and designed experiments for SLAM algorithms. Implemented RGBD SLAM and Monocular Visual SLAM systems, and developed the ROS package Kachuwa.
Designed databases and implemented RESTful APIs for SurveyChan, improving data retrieval speed by 50%. Collaborated with mobile app developers to integrate e-commerce and CMS systems across platforms.
If you are interested in collaboration, research discussions, or just want to chat about speech AI and accessibility, feel free to reach out.
Always open to interesting research conversations and collaboration opportunities.