Tianyu He

CoRR, June, 2025

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration.

[BibT_eX]

[DOI]

CoRR, June, 2025

Playing with Transformer at 30+ FPS via Next-Frame Diffusion.

[BibT_eX]

[DOI]

CoRR, June, 2025

MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft.

[BibT_eX]

[DOI]

CoRR, April, 2025

Fast Autoregressive Video Generation with Diagonal Decoding.

[BibT_eX]

[DOI]

CoRR, March, 2025

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

AR4D: Autoregressive 4D Generation from Monocular Videos.

[BibT_eX]

[DOI]

CoRR, January, 2025

(How) Can Transformers Predict Pseudo-Random Numbers?

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos.

[BibT_eX]

[DOI]

Dayal Singh Kalra

Maissam Barkeshli

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VidTwin: Video VAE with Decoupled Structure and Dynamics.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Memories are One-to-Many Mapping Alleviators in Talking Face Generation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Conditional Consistency Regularization for Semi-Supervised Multi-Label Image Classification.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Generative data augmentation with differential privacy for non-IID problem in decentralized clinical machine learning.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2024

VidTok: A Versatile and Open-Source Video Tokenizer.

[BibT_eX]

[DOI]

CoRR, 2024

IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI.

[BibT_eX]

[DOI]

CoRR, 2024

A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy.

[BibT_eX]

[DOI]

CoRR, 2024

Cheems: Wonderful Matrices More Efficient and More Effective Architecture.

[BibT_eX]

[DOI]

CoRR, 2024

Video In-context Learning.

[BibT_eX]

[DOI]

CoRR, 2024

GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors.

[BibT_eX]

[DOI]

CoRR, 2024

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2024

Grokking Modular Polynomials.

[BibT_eX]

[DOI]

CoRR, 2024

CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency.

[BibT_eX]

[DOI]

Hanxin Zhu

CoRR, 2024

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing.

[BibT_eX]

[DOI]

CoRR, 2024

First-principles Based 3D Virtual Simulation Testing for Discovering SOTIF Corner Cases of Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2024

Compositional 3D-aware Video Generation with LLM Director.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

GAIA: Zero-shot Talking Avatar Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

To Grok or not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

End-to-End Rate-Distortion Optimized 3D Gaussian Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Is Vanilla MLP in Neural Radiance Field Enough for Few-Shot View Synthesis?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Semantical video coding: Instill static-dynamic clues into structured bitstream for AI tasks.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., May, 2023

HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis.

[BibT_eX]

[DOI]

Entropy, January, 2023

Data Masking for Chinese Electronic Medical Records with Named Entity Recognition.

[BibT_eX]

[DOI]

Intell. Autom. Soft Comput., 2023

Towards a Psychological Generalist AI: A Survey of Current Applications of Large Language Models and Future Prospects.

[BibT_eX]

[DOI]

CoRR, 2023

Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape.

[BibT_eX]

[DOI]

CoRR, 2023

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications.

[BibT_eX]

[DOI]

Darshil Doshi

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data.

[BibT_eX]

[DOI]

CoRR, 2022

AutoInit: Automatic Initialization via Jacobian Tuning.

[BibT_eX]

[DOI]

Darshil Doshi

CoRR, 2022

Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks.

[BibT_eX]

[DOI]

CoRR, 2022

Meta Clustering Learning for Large-scale Unsupervised Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Impedance Control of Upper Limb Rehabilitation Robot Based on Series Elastic Actuator.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Robotics and Applications - 15th International Conference, 2022

Unleashing the Potential of Adaptation Models via Go-getting Domain Labels.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Image Coding for Machines with Omnipotent Feature Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Semantic Structured Image Coding Framework for Multiple Intelligent Applications.

[BibT_eX]

[DOI]

Simeng Sun

IEEE Trans. Circuits Syst. Video Technol., 2021

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm.

[BibT_eX]

[DOI]

Darshil Doshi

CoRR, 2021

Meta Clustering Learning for Large-scale Unsupervised Person Re-identification.

[BibT_eX]

[DOI]

CoRR, 2021

Enhance Images as You Like with Unpaired Learning.

[BibT_eX]

[DOI]

CoRR, 2021

A method of implanting combinational hardware Trojan based on evolvable hardware.

[BibT_eX]

[DOI]

Comput. Electr. Eng., 2021

The establishment and evaluation of the automatic crisis balance analysis model for social network users based on artificial intelligence technology.

[BibT_eX]

[DOI]

Proceedings of the ISAIMS 2021: 2nd International Symposium on Artificial Intelligence for Medicine Sciences, Beijing, China, October 29, 2021

Analysis of emotional characteristics of Weibo "tree hole" users with different suicide risk.

[BibT_eX]

[DOI]

Proceedings of the ISAIMS 2021: 2nd International Symposium on Artificial Intelligence for Medicine Sciences, Beijing, China, October 29, 2021

Enhance Image as You Like with Unpaired Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Dense Interaction Learning for Video-based Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Partial Person Re-Identification With Part-Part Correspondence Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Learning for Video Compression.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

Relationship Between Brightness and Current of the Propagating Positive Leaders in Laboratory High Voltage Atmospheric Discharges.

[BibT_eX]

[DOI]

IEEE Access, 2020

Memorize, Then Recall: A Generative Framework for Low Bit-Rate Surveillance Video Compression.

[BibT_eX]

[DOI]

Yaojun Wu

Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

Learning to Transfer: Unsupervised Domain Translation via Meta-Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

ImmerTai: Immersive Motion Learning in VR Environments.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2019

Learning based Facial Image Compression with semantic fidelity metric.

[BibT_eX]

[DOI]

Neurocomputing, 2019

Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation.

[BibT_eX]

[DOI]

Xu Tan

Tao Qin

CoRR, 2019

Language Graph Distillation for Low-Resource Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2019

Learning to Transfer: Unsupervised Meta Domain Translation.

[BibT_eX]

[DOI]

CoRR, 2019

A Contribution to the Investigation of Leader Tortuosity in Positive Long Rod-Plane Air Discharge.

[BibT_eX]

[DOI]

IEEE Access, 2019

Beyond Coding: Detection-driven Image Compression with Semantically Structured Bit-stream.

[BibT_eX]

[DOI]

Proceedings of the Picture Coding Symposium, 2019

Deliberation Learning for Image-to-Image Translation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Energy Consumption Assessment of College Tennis Players Based on Actigraph GT9X Accelerometer.

[BibT_eX]

[DOI]

Qi Luo

Proceedings of the Human Interaction and Emerging Technologies, 2019

Multi-Agent Dual Learning.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

End-to-End Facial Image Compression with Integrated Semantic Distortion Metric.

[BibT_eX]

[DOI]

Proceedings of the IEEE Visual Communications and Image Processing, 2018

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Survey of Motion Capture Technology and Its Application in Sports.

[BibT_eX]

[DOI]