Sai Rajeswar

Sagar Davasam

Seganrasan Subramanian

Vipul Mittal

Sridhar Krishna Nemala

Christopher Pal

Srinivas Sunkara

Valliappan Chidambaram Adaikkappan

CoRR, May, 2026

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning.

[BibT_eX]

[DOI]

David Meger

Pietro Mazzaglia

CoRR, May, 2026

Therefore I am. I Think.

[BibT_eX]

[DOI]

Esakkivel Esakkiraja

Rajagopal Venkatesaramani

Denis Akhiyarov

CoRR, April, 2026

Terminal Agents Suffice for Enterprise Automation.

[BibT_eX]

[DOI]

Patrice Béchard

Orlando Marquez Ayala

CoRR, April, 2026

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2026

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents.

[BibT_eX]

[DOI]

CoRR, March, 2026

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings.

[BibT_eX]

[DOI]

Jishnu Sethumadhavan Nair

Shravan Nayak

Sagar Davasam

Aman Tiwari

Sridhar Krishna Nemala

Srinivas Sunkara

CoRR, March, 2026

StarFlow: Generating Structured Workflow Outputs From Sketch Images.

[BibT_eX]

[DOI]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

Grammar Search for Multi-Agent Systems.

[BibT_eX]

[DOI]

Mayank Singh

Vikas Yadav

Shravan Nayak

Eduardo Blanco

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

Grounding Computer Use Agents on Human Demonstrations.

[BibT_eX]

[DOI]

Adriana Romero-Soriano

CoRR, November, 2025

Apriel-1.5-15b-Thinker.

[BibT_eX]

[DOI]

CoRR, October, 2025

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval.

[BibT_eX]

[DOI]

CoRR, October, 2025

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs.

[BibT_eX]

[DOI]

CoRR, September, 2025

Apriel-Nemotron-15B-Thinker.

[BibT_eX]

[DOI]

CoRR, August, 2025

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Rendering-Aware Reinforcement Learning for Vector Graphics Generation.

[BibT_eX]

[DOI]

Mohammad Reza Samsami

CoRR, May, 2025

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA.

[BibT_eX]

[DOI]

Rishabh Maheshwary

Masoud Hashemi

Khyati Mahajan

Spandana Gella

Vikas Yadav

CoRR, May, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.

[BibT_eX]

[DOI]

CoRR, March, 2025

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs.

[BibT_eX]

[DOI]

Aarash Feizi

Adriana Romero-Soriano

Reihaneh Rabbany

Spandana Gella

João Monteiro

CoRR, February, 2025

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2025

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

The Promise of RL for Autoregressive Image Editing.

[BibT_eX]

[DOI]

Saba Ahmadi

Rabiul Awal

Ankur Sikarwar

Amirhossein Kazemnejad

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval.

[BibT_eX]

[DOI]

Ahmed Masry

Megh Thakkar

Patrice Béchard

Rabiul Awal

Shambhavi Mishra

Akshay Kalkunte Suresh

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Representing Positional Information in Generative World Models for Object Manipulation.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2025 - 28th European Conference on Artificial Intelligence, 25-30 October 2025, Bologna, Italy, 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal foundation world models for generalist embodied agents.

[BibT_eX]

[DOI]

CoRR, 2024

VCR: Visual Caption Restoration.

[BibT_eX]

[DOI]

CoRR, 2024

GenRL: Multimodal-foundation world models for generalization in embodied agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory.

[BibT_eX]

[DOI]

Arnab Kumar Mondal

Siba Smarak Panigrahi

Kaleem Siddiqi

Siamak Ravanbakhsh

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Capture the Flag: Uncovering Data Insights with Large Language Models.

[BibT_eX]

[DOI]

Issam H. Laradji

Perouz Taslakian

CoRR, 2023

Equivariant Adaptation of Large Pretrained Models.

[BibT_eX]

[DOI]

Arnab Kumar Mondal

Siba Smarak Panigrahi

Sékou-Oumar Kaba

Siamak Ravanbakhsh

CoRR, 2023

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Hyperbolic Deep Reinforcement Learning for Continuous Control.

[BibT_eX]

[DOI]

Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Choreographer: Learning and Adapting Skills in Imagination.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Unsupervised Model-based Pre-training for Data-efficient Control from Pixels.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-label Iterated Learning for Image Classification with Label Ambiguity.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Consistency-CAM: Towards Improved Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Touch-based Curiosity for Sparse-Reward Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

Haptics-based Curiosity for Sparse-reward Tasks.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020

Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation.

[BibT_eX]

[DOI]

Fahim Mannan

Florian Golemo

Jérôme Parent-Lévesque

David Vázquez

Derek Nowrouzezahrai

Aaron C. Courville

Int. J. Comput. Vis., 2020

2019

Adversarial Computation of Optimal Transport Maps.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Hierarchical Adversarially Learned Inference.

[BibT_eX]

[DOI]

Mohamed Ishmael Belghazi

CoRR, 2018

A Deep Reinforcement Learning Chatbot (Short Version).

[BibT_eX]

[DOI]

Alexandre de Brébisson

CoRR, 2018

MINE: Mutual Information Neural Estimation.

[BibT_eX]

[DOI]

CoRR, 2018

Towards Text Generation with Adversarially Learned Neural Outlines.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mutual Information Neural Estimation.

[BibT_eX]

[DOI]

Mohamed Ishmael Belghazi

Proceedings of the 35th International Conference on Machine Learning, 2018

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

Adversarial Generation of Natural Language.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017

2015

OCR for bilingual documents using language modeling.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

A hypothesize-and-verify framework for text recognition using deep recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Text recognition using deep BLSTM networks.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Advances in Pattern Recognition, 2015

2014

Scene Text Analysis using Deep Belief Networks.

[BibT_eX]

[DOI]