Santosh Dahal

Applied AIResearcher & Engineer

I build speech and language AI systems that are accurate, efficient, and inclusive for diverse speakers, with experience from research prototypes to production deployment.

Santosh Dahal
๐ŸขFormerCTO ยท Diyo.ai2022โ€“2025 โ€” led AI product & engineering
๐Ÿ“…5+Years in AI & NLPLLMs, ASR, conversational AI & production ML
๐ŸŽ“M.ScInformatics & Intelligence SystemsNLP, machine learning & intelligent systems engineering
๐Ÿš€5+Production AI SystemsDeployed across vision, NLP, and robotics
Experience

Research Experience

On-Device ASR with Knowledge Distillation

Speech

Built a lightweight on-device ASR system via knowledge distillation from a Fast Conformer RNN-T teacher, achieving a 32.7% relative WER improvement โ€” the highest reported for Conformer RNN-T + KD.

ASRKnowledge DistillationFast ConformerEdge AI

MoPE-LoRA: Mixture of Phonetic Experts for Accented ASR

Speech

Proposed MoPE-LoRA, a PEFT framework that routes Conformer encoder frames to six manner-of-articulation LoRA experts via hybrid phoneme-supervision and learned acoustic gating โ€” no accent labels required. On L2-ARCTIC, it achieves 10.43% WER, outperforming Full FT (12.80%) and Single LoRA (11.33%), with 12.3% relative zero-shot improvement over Single LoRA across unseen accents.

ASRMoPE-LoRAMixture of ExpertsConformerAccent AdaptationPEFTL2-ARCTIC

Browser Based ASR

Speech

Implemented a full in-browser ASR pipeline in JavaScript using ONNX Runtime Web, covering audio capture, log-mel spectrogram extraction, encoder-decoder inference, and text decoding โ€” entirely client-side with no server dependency.

ASRONNX RuntimeJavaScriptWeb MLClient-Side Inference

LLM Evaluation for Low-Resource Nepali (SRMH)

NLP

Technical lead on a Bill & Melinda Gates Foundation-funded project. Deployed a RAG-based SRMH chatbot serving 5,000+ Nepali users and fine-tuned LLMs for low-resource Nepali generation.

LLMsRAGLow-Resource NLPBLOOM 7B

Improving Government Service Delivery via NLP

NLP

Technical lead on a UNDP-funded project. Built a multilingual RAG chatbot to digitize citizen charters, deployed across two municipalities after surveying 3,000+ citizens.

NLPRAGLangChain

SLAM-Based AR Localization

Computer Vision

Built SLAM-based localization APIs enabling persistent AR on mobile devices, implemented RGBD and Monocular Visual SLAM systems, and developed the ROS package Kachuwa during a research internship at NAAMII.

SLAMARComputer VisionROS
Background

Academic Foundation

My academic training spans electronics engineering, machine learning, and intelligent systems โ€” providing a strong foundation for research in speech AI and NLP.

M.Sc in Informatics and Intelligence System Engineering

Institute of Engineering, Tribhuvan University โ€” Kathmandu, Nepal

2024 โ€“ 2026

Modern Natural Language Processing

N-gram Language ModelsPOS and Named Entity RecognitionVector Semantics and EmbeddingSemantic AnalysisNeural Language Models and Deep Learning Architectures

Applied Machine Learning

Regression and Classification ModelsCluster Methods and Mixture ModelsProbabilistic Graphical Models and InferencesReinforcement Learning

Computer Vision

Multiple View GeometryMotion AnalysisImage Classification and Object Detection

Bachelor in Electronics and Communication Engineering

Institute of Engineering, Tribhuvan University โ€” Pokhara, Nepal

2015 โ€“ 2019

Statistics and Probability

Discrete and Normal DistributionsSampling DistributionsTest of HypothesisLinear Regression

Object Oriented Programming

ObjectsClassesTemplatesOperator OverloadingInheritancePolymorphism and Dynamic Binding

Artificial Intelligence

Fundamentals of Intelligent SystemsSearch & Constraint SatisfactionKnowledge Representation & Reasoning
Professional

Professional Background

Alongside my research focus, I have built diverse expertise as a full-stack engineer and AI practitioner. My professional work spans conversational AI, computer vision, robotics, and IoT systems, complementing my research vision with practical engineering experience.

Chief Technology OfficerยทDiyo.ai Technologies

2022 โ€“ 2025
NLPASRLLMsTeam Leadership

Led development of Nepali Speech Recognition APIs, overseeing a team of 5+ engineers. Designed a human+AI transcription platform cutting costs by 75%. Built a versatile RAG-based chatbot platform serving multiple domains. Served as technical lead for major grants including the Bill & Melinda Gates Foundation (SRMH) and the AMPLIFY project.

Research & Development EngineerยทDiyo.ai Technologies

2020 โ€“ 2022
Computer VisionSLAMARNLP

Built SLAM-based localization API services enabling persistent AR on mobile devices, in collaboration with ETH Zurich Computer Vision Lab. Developed web-based jewelry AR with NLP-enabled voice interaction, enhancing the e-commerce customer experience.

Research InternยทNAAMII

2019 โ€“ 2020
SLAMComputer VisionPythonC++

Conducted literature reviews and designed experiments for SLAM algorithms. Implemented RGBD SLAM and Monocular Visual SLAM systems, and developed the ROS package Kachuwa.

Full Stack DeveloperยทDreamsys IT Solution

2016 โ€“ 2019
Full-StackREST APIsPostgreSQL

Designed databases and implemented RESTful APIs for SurveyChan, improving data retrieval speed by 50%. Collaborated with mobile app developers to integrate e-commerce and CMS systems across platforms.

Technology Stack

PythonC/C++PyTorchNVIDIA NeMoHugging Face TransformersLangChainFAISSLanceDBPostgreSQLReactNext.jsVue.jsDjangoFastAPIDockerRasaSLAMROSComputer Vision
Contact

Let's Connect

If you are interested in collaboration, research discussions, or just want to chat about speech AI and accessibility, feel free to reach out.

Always open to interesting research conversations and collaboration opportunities.