Arushi Goel

According to our database1, Arushi Goel authored at least 32 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Benchmarking Single-Factor Physical Video-to-Audio Generation.
CoRR, May, 2026

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music.
CoRR, April, 2026

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos.
CoRR, March, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.
CoRR, January, 2026

2025
Music Flamingo: Scaling Music Understanding in Audio Language Models.
CoRR, November, 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM.
CoRR, October, 2025

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning.
CoRR, October, 2025

Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding.
CoRR, August, 2025

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models.
CoRR, July, 2025

ETTA: Elucidating the Design Space of Text-to-Audio Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Fugatto 1: Foundational Generative Audio Transformer Opus 1.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Visually Interpretable Subtask Reasoning for Visual Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

2024
OMCAT: Omni Context Aware Transformer.
CoRR, 2024

Audio Dialogues: Dialogues dataset for audio and music understanding.
CoRR, 2024

TiV-ODE: A Neural ODE-based Approach for Controllable Video Generation From Text-Image Pairs.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE.
CoRR, 2023

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Who are you referring to? Coreference resolution in image narrations.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Semi-supervised multimodal coreference resolution in image narrations.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter.
Proceedings of the Conference on Robot Learning, 2023

2022
Who are you referring to? Weakly supervised coreference resolution with multimodal grounding.
CoRR, 2022

WiCV 2022: The Tenth Women In Computer Vision Workshop.
CoRR, 2022

WiCV 2021: The Eighth Women In Computer Vision Workshop.
CoRR, 2022

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels.
CoRR, 2022

Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2020
Injecting Prior Knowledge into Image Caption Generation.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

2019
Learning to Caption Images with Two-Stream Attention and Sentence Auto-Encoder.
CoRR, 2019

Cross-Domain Image Classification through Neural-Style Transfer Data Augmentation.
CoRR, 2019

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time.
Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019

An End-To-End Network for Generating Social Relationship Graphs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Semantic Roles in VerbNet and FrameNet: Statistical Analysis and Evaluation.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2019


  Loading...