Publications
Normalization and Back-transliteration for Code-Switched Text, CALCS (NAACL 2021), Dwija Parikh and Thamar Solorio
- Developed a preprocessing module specifically designed for code-switched data, utilizing a hybrid approach that combined rulebased phonemic transcription methods with machine learning techniques, including a seq2seq model employing LSTM networks, resulting in an accuracy rate of 78.6%
- Engineered a novel grapheme-to-phoneme (G2P) conversion technique specifically tailored for romanized Hindi data, enhancing the processing and analysis of code-switched text in social media contexts
- Contributed to the field by releasing a valuable dataset of script-corrected Hindi-English code-switched sentences, meticulously labeled for named entity recognition and part-of-speech tagging tasks, fostering further advancements in code-switching research within NLP
Page Design