Sai Rajeswar

According to our database1, Sai Rajeswar authored at least 61 papers between 2014 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
R2V Agent: Teaching SLMs When to Ask for Help.
CoRR, May, 2026

Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics.
CoRR, May, 2026

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning.
CoRR, May, 2026

Therefore I am. I Think.
CoRR, April, 2026

Terminal Agents Suffice for Enterprise Automation.
CoRR, April, 2026

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing.
CoRR, March, 2026

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents.
CoRR, March, 2026

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings.
CoRR, March, 2026

StarFlow: Generating Structured Workflow Outputs From Sketch Images.
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

Grammar Search for Multi-Agent Systems.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025
Grounding Computer Use Agents on Human Demonstrations.
CoRR, November, 2025

Apriel-1.5-15b-Thinker.
CoRR, October, 2025

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval.
CoRR, October, 2025

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs.
CoRR, September, 2025

Apriel-Nemotron-15B-Thinker.
CoRR, August, 2025

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning.
CoRR, August, 2025

Rendering-Aware Reinforcement Learning for Vector Graphics Generation.
CoRR, May, 2025

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA.
CoRR, May, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.
CoRR, March, 2025

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs.
CoRR, February, 2025

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding.
CoRR, February, 2025

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

The Promise of RL for Autoregressive Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Representing Positional Information in Generative World Models for Object Manipulation.
Proceedings of the ECAI 2025 - 28th European Conference on Artificial Intelligence, 25-30 October 2025, Bologna, Italy, 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks.
CoRR, 2024

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation.
CoRR, 2024

Multimodal foundation world models for generalist embodied agents.
CoRR, 2024

VCR: Visual Caption Restoration.
CoRR, 2024

GenRL: Multimodal-foundation world models for generalization in embodied agents.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Capture the Flag: Uncovering Data Insights with Large Language Models.
CoRR, 2023

Equivariant Adaptation of Large Pretrained Models.
CoRR, 2023

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels.
Proceedings of the International Conference on Machine Learning, 2023

Hyperbolic Deep Reinforcement Learning for Continuous Control.
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Choreographer: Learning and Adapting Skills in Imagination.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Unsupervised Model-based Pre-training for Data-efficient Control from Pixels.
CoRR, 2022

Multi-label Iterated Learning for Image Classification with Label Ambiguity.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Consistency-CAM: Towards Improved Weakly Supervised Semantic Segmentation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Touch-based Curiosity for Sparse-Reward Tasks.
CoRR, 2021

Haptics-based Curiosity for Sparse-reward Tasks.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation.
Int. J. Comput. Vis., 2020

2019
Adversarial Computation of Optimal Transport Maps.
CoRR, 2019

2018
Hierarchical Adversarially Learned Inference.
CoRR, 2018

A Deep Reinforcement Learning Chatbot (Short Version).
CoRR, 2018

MINE: Mutual Information Neural Estimation.
CoRR, 2018

Towards Text Generation with Adversarially Learned Neural Outlines.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mutual Information Neural Estimation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Adversarial Generation of Natural Language.
Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017

2015
OCR for bilingual documents using language modeling.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

A hypothesize-and-verify framework for text recognition using deep recurrent neural networks.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Text recognition using deep BLSTM networks.
Proceedings of the Eighth International Conference on Advances in Pattern Recognition, 2015

2014
Scene Text Analysis using Deep Belief Networks.
Proceedings of the 2014 Indian Conference on Computer Vision, 2014


  Loading...