Jian Luan
Orcid: 0000-0002-2383-226XAffiliations:
- Xiaomi, Inc
- Microsoft Asia, Xiaoice Software Technology Center (STCA)
According to our database1,
Jian Luan authored at least 142 papers
between 2006 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2026
CoRR, May, 2026
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation.
CoRR, May, 2026
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking.
CoRR, May, 2026
CoRR, May, 2026
CoRR, May, 2026
TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis.
CoRR, April, 2026
ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling.
CoRR, April, 2026
CoRR, April, 2026
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models.
CoRR, April, 2026
Iterate to Differentiate: Enhancing Discriminability and Reliability in Zero-Shot TTS Evaluation.
CoRR, March, 2026
ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding.
CoRR, March, 2026
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models.
CoRR, March, 2026
ExPosST: Explicit Positioning with Adaptive Masking for LLM-Based Simultaneous Machine Translation.
CoRR, March, 2026
CoRR, March, 2026
IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation.
CoRR, March, 2026
CoRR, March, 2026
CoRR, March, 2026
CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning.
CoRR, February, 2026
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models.
CoRR, February, 2026
DashengTokenizer: One layer is enough for unified audio understanding and generation.
CoRR, February, 2026
CoRR, February, 2026
MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding.
CoRR, February, 2026
CoRR, February, 2026
Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models.
CoRR, February, 2026
Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation.
CoRR, February, 2026
Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models.
CoRR, February, 2026
FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation.
CoRR, February, 2026
C<sup>2</sup>-Cite: Contextual-Aware Citation Generation for Attributed Large Language Models.
CoRR, February, 2026
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment.
CoRR, January, 2026
CoRR, January, 2026
CoRR, January, 2026
C<sup>2</sup>-Cite: Contextual-Aware Citation Generation for Attributed Large Language Models.
Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, 2026
Proceedings of the 24th Annual International Conference on Mobile Systems, 2026
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
End-to-End Optimization of LLM-Driven Multi-Agent Search Systems via Heterogeneous-Group-Based Reinforcement Learning.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
CoRR, December, 2025
ICPO: Intrinsic Confidence-Driven Group Relative Preference Optimization for Efficient Reinforcement Learning.
CoRR, November, 2025
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding.
CoRR, November, 2025
CoRR, November, 2025
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding.
CoRR, November, 2025
CoRR, November, 2025
CoRR, October, 2025
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition.
CoRR, September, 2025
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle.
CoRR, August, 2025
CoRR, August, 2025
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks.
CoRR, July, 2025
CoRR, July, 2025
TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation.
CoRR, June, 2025
CoRR, June, 2025
CoRR, June, 2025
CoRR, May, 2025
CoRR, May, 2025
CoRR, May, 2025
Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents.
CoRR, May, 2025
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering.
CoRR, March, 2025
Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2025
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Text-Enhanced Audio Encoder for Large Language Model based Speech Recognition via Cross-Modality Pre-training with Unpaired Audio-Text Data.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
GLCLAP: A Novel Contrastive Learning Pre-trained Model for Contextual Biasing in ASR.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models.
Proceedings of the 2025 IEEE International Conference on Knowledge Graph (ICKG), 2025
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Let Your Car Listen to Your Respiration Contactlessly with Ubiquitous Acoustic Signals.
Proceedings of the Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2025
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Proceedings of the 31st International Conference on Computational Linguistics, 2025
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Global Eye: Breaking the "Fixed Thinking Pattern" during the Instruction Expansion Process.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Stability and Generalization of Zeroth-Order Decentralized Stochastic Gradient Descent with Changing Topology.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security.
CoRR, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024
Proceedings of the ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation with Weighted Prefix-to-Prefix Training.
CoRR, 2023
From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Models.
CoRR, 2023
UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning.
CoRR, 2023
Overview of the NLPCC 2023 Shared Task 9: User Feedback Prediction and Response Generation.
Proceedings of the Natural Language Processing and Chinese Computing, 2023
The Xiaomi AI Lab's Speech Translation Systems for IWSLT 2023 Offline Task, Simultaneous Task and Speech-to-Speech Task.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Improving Bilingual TTS Using Language And Phonology Embedding With Embedding Strength Modulator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Joint Training and Decoding for Multilingual End-to-End Simultaneous Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
J-TranPSP: A Joint Transition-based Model for Prosodic Structure Prediction, Word Segmentation and PoS Tagging.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation.
Proceedings of the IEEE International Conference on Acoustics, 2022
MSDTRON: A High-Capability Multi-Speaker Speech Synthesis System for Diverse Data Using Characteristic Information.
Proceedings of the IEEE International Conference on Acoustics, 2022
PAMA-TTS: Progression-Aware Monotonic Attention for Stable SEQ2SEQ TTS with Accurate Phoneme Duration Control.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis.
CoRR, 2021
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
2020
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
2009
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009
2008
Proceedings of the Blizzard Challenge 2008, 2008
2007
Codebook-Based Pseudo-Impostor Data Generation and Template Compression for Text-Dependent Speaker Verification.
IEICE Trans. Inf. Syst., 2007
2006
Template Compression and Distance Normalization for Reliable Text-dependent Speaker Verification.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006