Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.

[BibT_eX]

[DOI]

Yifan Zhang

Proceedings of the Forty-second International Conference on Machine Learning, 2025

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Stable Segment Anything Model.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Towards Precise Scaling Laws for Video Diffusion Transformers.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StyleMaster: Stylize Your Video with Artistic Generation and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SketchVideo: Sketch-based Video Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

iMOVE : Instance-Motion-Aware Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Owl-1: Omni World Model for Consistent Long Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints.

[BibT_eX]

[DOI]

CoRR, 2024

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing.

[BibT_eX]

[DOI]

CoRR, 2024

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts.

[BibT_eX]

[DOI]

CoRR, 2024

DMQR-RAG: Diverse Multi-Query Rewriting for RAG.

[BibT_eX]

[DOI]

CoRR, 2024

Kwai-STaR: Transform LLMs into State-Transition Reasoners.

[BibT_eX]

[DOI]

CoRR, 2024

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning.

[BibT_eX]

[DOI]

CoRR, 2024

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs.

[BibT_eX]

[DOI]

CoRR, 2024

ViMo: Generating Motions from Casual Videos.

[BibT_eX]

[DOI]

CoRR, 2024

EVLM: An Efficient Vision-Language Model for Visual Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control.

[BibT_eX]

[DOI]

CoRR, 2024

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B.

[BibT_eX]

[DOI]

CoRR, 2024

VideoTetris: Towards Compositional Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance.

[BibT_eX]

[DOI]

CoRR, 2024

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

Motion Inversion for Video Customization.

[BibT_eX]

[DOI]

CoRR, 2024

ChemLLM: A Chemical Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Unified 3D Hair Reconstruction from Single-View Portraits.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

VideoTetris: Towards Compositional Text-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

DialogBench: Evaluating LLMs as Human-like Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

PlacidDreamer: Advancing Harmony in Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Learning Multi-Dimensional Human Preference for Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Ask One More Time: Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios.

[BibT_eX]

[DOI]

CoRR, 2023

DialogBench: Evaluating LLMs as Human-like Dialogue Systems.

[BibT_eX]

[DOI]

CoRR, 2023

KwaiYiiMath: Technical Report.

[BibT_eX]

[DOI]

CoRR, 2023

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions.

[BibT_eX]

[DOI]

CoRR, 2023

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization.

[BibT_eX]

[DOI]

CoRR, 2023

Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

AMCAD: Adaptive Mixed-Curvature Representation based Advertisement Retrieval System.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

2021

Exploring Sparse Expert Models and Beyond.

[BibT_eX]

[DOI]

CoRR, 2021

SMAD: Scalable Multi-view Ad Retrieval System for E-Commerce Sponsored Search.

[BibT_eX]

[DOI]

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

Di Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...