Tianyi Bai

Orcid: 0009-0009-5057-7100

According to our database1, Tianyi Bai authored at least 27 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models.
CoRR, April, 2026

TAG: Thinking with Action Unit Grounding for Facial Expression Recognition.
CoRR, February, 2026

Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code.
CoRR, February, 2026

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks.
CoRR, February, 2026

From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning.
CoRR, January, 2026

2025
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI.
CoRR, December, 2025

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation.
CoRR, December, 2025

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images.
CoRR, November, 2025

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models.
CoRR, November, 2025

VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL.
CoRR, November, 2025

UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models.
CoRR, October, 2025

Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification.
CoRR, June, 2025

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning.
CoRR, June, 2025

TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network.
CoRR, June, 2025

Unsupervised Topic Models are Data Mixers for Pre-training Language Models.
CoRR, February, 2025

Fast, Secure, Adaptable: LionsOS Design, Implementation and Performance.
CoRR, January, 2025

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.
CoRR, 2024

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models.
CoRR, 2024

KeyVideoLLM: Towards Large-scale Video Keyframe Selection.
CoRR, 2024

A Survey of Multimodal Large Language Model from A Data-centric Perspective.
CoRR, 2024

2023
Transfer Learning for Bayesian Optimization: A Survey.
CoRR, 2023

2022
Transfer Learning based Search Space Design for Hyperparameter Tuning.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022


  Loading...