Sai Rajeswar

According to our database1, Sai Rajeswar authored at least 49 papers between 2014 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval.
CoRR, November, 2025

Apriel-1.5-15b-Thinker.
CoRR, October, 2025

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval.
CoRR, October, 2025

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs.
CoRR, September, 2025

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation.
CoRR, August, 2025

Apriel-Nemotron-15B-Thinker.
CoRR, August, 2025

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning.
CoRR, August, 2025

The Promise of RL for Autoregressive Image Editing.
CoRR, August, 2025

Rendering-Aware Reinforcement Learning for Vector Graphics Generation.
CoRR, May, 2025

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA.
CoRR, May, 2025

StarFlow: Generating Structured Workflow Outputs From Sketch Images.
CoRR, March, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.
CoRR, March, 2025

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs.
CoRR, February, 2025

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding.
CoRR, February, 2025

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks.
CoRR, 2024

Representing Positional Information in Generative World Models for Object Manipulation.
CoRR, 2024

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation.
CoRR, 2024

Multimodal foundation world models for generalist embodied agents.
CoRR, 2024

VCR: Visual Caption Restoration.
CoRR, 2024

GenRL: Multimodal-foundation world models for generalization in embodied agents.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Capture the Flag: Uncovering Data Insights with Large Language Models.
CoRR, 2023

Equivariant Adaptation of Large Pretrained Models.
CoRR, 2023

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels.
Proceedings of the International Conference on Machine Learning, 2023

Hyperbolic Deep Reinforcement Learning for Continuous Control.
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Choreographer: Learning and Adapting Skills in Imagination.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Unsupervised Model-based Pre-training for Data-efficient Control from Pixels.
CoRR, 2022

Multi-label Iterated Learning for Image Classification with Label Ambiguity.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Consistency-CAM: Towards Improved Weakly Supervised Semantic Segmentation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Touch-based Curiosity for Sparse-Reward Tasks.
CoRR, 2021

Haptics-based Curiosity for Sparse-reward Tasks.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation.
Int. J. Comput. Vis., 2020

2019
Adversarial Computation of Optimal Transport Maps.
CoRR, 2019

2018
Hierarchical Adversarially Learned Inference.
CoRR, 2018

A Deep Reinforcement Learning Chatbot (Short Version).
CoRR, 2018

MINE: Mutual Information Neural Estimation.
CoRR, 2018

Towards Text Generation with Adversarially Learned Neural Outlines.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mutual Information Neural Estimation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Adversarial Generation of Natural Language.
Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017

2015
OCR for bilingual documents using language modeling.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

A hypothesize-and-verify framework for text recognition using deep recurrent neural networks.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Text recognition using deep BLSTM networks.
Proceedings of the Eighth International Conference on Advances in Pattern Recognition, 2015

2014
Scene Text Analysis using Deep Belief Networks.
Proceedings of the 2014 Indian Conference on Computer Vision, 2014


  Loading...