Vanya Bannihatti Kumar

I am an ML/NLP grad student at Carnegie Mellon University, working with Prof. Shinji Watanabe on analysing speech foundational models and improving ASR results in multimodal contexts. I have also worked with Prof. Graham Neubig on multilingual NLP. During the summer of 2024, I interned at Adobe Research in Dr. Aseem Agarwala's Video AI Lab. Before this I was working as an NLP Research Engineer at Bytedance AI Lab in Dr. Hang Li' s NLP group, where I am worked on efficient machine translation and text-to-speech translation systems for TikTok. I have also worked extensively on Grammar Error Correction and synthetic data generation during my tenure at ByteDance. I received my bachelors degree from the Indian Institute of Technology, Madras in 2021. Durig my senior year of undergrad studies, I worked at IBM Research on explainability of Graph Neural Networks. My broad research interests are in the areas of multimodal learning, natural language processing and interpretability/explainability of machine learning algorithms.

[ Email / Github / CV / Google Scholar / ]

News

August 2023 : Starting my internship at Adobe Research in the Video AI Lab
August 2023 : Starting my masters at CMU
February 2023 : Our work on using visual information for ASR Error Correction was accepted at ICASSP 2023
December 2021 : Presenting our work on Automated evaluation of Graph Neural Networks Explanations with Neuro-Symbolic Reasoning at NeurIPS 2021
October 2021 : Starting professional career as NLP Research Engineer at Dr. Hang Li's NLP group, ByteDance AI Lab in Singapore!
July 2021 : Graduated from IIT Madras and won the HAL Prize for Department Rank 1

Research

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Authors:Jiatong Shi, Shih-Heng Wang*, William Chen*, Martijn Bartelds*, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

Published in:Interspeech 2024(Oral)

Abstract: ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.

Link: View Paper

Visual Information Matters for ASR Error Correction

Authors: Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang

Published in:ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Abstract: Explaining Graph Neural Networks predictions to end users of AI applications in easily understandable terms remains an unsolved problem. In particular, we do not have well developed methods for automatically evaluating explanations, in ways that are closer to how users consume those explanations. Based on recent application trends and our own experiences in real world problems, we propose an automatic evaluation approach for GNN Explanations using Neuro Symbolic Reasoning.

Link: View Paper

Automated Evaluation of GNN Explanations with Neuro Symbolic Reasoning

Authors: Vanya Bannihatti Kumar, Balaji Ganesan, Muhammed Ameen, Devbrat Sharma, Arvind Agarwal

Published in: NeurIPS 2021 Competitions and Demonstrations Track

Link: View Paper