Publications

Adapted XLM-R for low-resource language families, improving performance and accuracy on POS tagging & dependency parsing through targeted multilingual training strategies and evaluated hyperparameters to enhance performance across 15+ languages
Identified key hyperparameters through regression analysis, establishing best practices for up-sampling low-resource languages without compromising high-resource language performance

Developed a preprocessing module specifically designed for code-switched data, utilizing a hybrid approach that combined rulebased phonemic transcription methods with machine learning techniques, including a seq2seq model employing LSTM networks, resulting in an accuracy rate of 78.6%
Engineered a novel grapheme-to-phoneme (G2P) conversion technique specifically tailored for romanized Hindi data, enhancing the processing and analysis of code-switched text in social media contexts
Contributed to the field by releasing a valuable dataset of script-corrected Hindi-English code-switched sentences, meticulously labeled for named entity recognition and part-of-speech tagging tasks, fostering further advancements in code-switching research within NLP