Yifan Yang

Affiliations:
  • Xiaomi Corp., Beijing, China
  • Shanghai Jiao Tong University, X-LANCE Lab, AI Institute, MoE Key Lab of Artificial Intelligence, Shanghai, China


According to our database1, Yifan Yang authored at least 38 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Phi-Ground Tech Report: Advancing Perception in GUI Grounding.
CoRR, July, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.
CoRR, June, 2025

ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL.
CoRR, May, 2025

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining.
CoRR, May, 2025

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling.
CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.
CoRR, April, 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis.
CoRR, April, 2025

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs.
CoRR, March, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.
CoRR, February, 2025

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
MageBench: Bridging Large Multimodal Models to Agents.
CoRR, 2024

REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents.
CoRR, 2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought.
CoRR, 2024

Exploring SSL Discrete Tokens for Multilingual ASR.
CoRR, 2024

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.
CoRR, 2024

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.
CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LibriheavyMix: A 20, 000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Zipformer: A faster and better encoder for automatic speech recognition.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Enhancing Generative Aspect-Based Sentiment Analysis with Relation-Level Supervision and Prompt.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
Proceedings of the IEEE International Conference on Acoustics, 2024

PromptASR for Contextualized ASR with Controllable Style.
Proceedings of the IEEE International Conference on Acoustics, 2024

Libriheavy: A 50, 000 Hours ASR Corpus with Punctuation Casing and Context.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Delay-penalized CTC Implemented Based on Finite State Transducer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Target-to-Source Augmentation for Aspect Sentiment Triplet Extraction.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Generative Model for Structured Sentiment Analysis.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2023, 2023

An Empirical Study of Sentiment-Enhanced Pre-Training for Aspect-Based Sentiment Analysis.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
HITSZ-HLT at SemEval-2022 Task 10: A Span-Relation Extraction Framework for Structured Sentiment Analysis.
Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022

Boundary-Driven Table-Filling for Aspect Sentiment Triplet Extraction.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022


  Loading...