Yifan Yang

Orcid: 0009-0003-0588-1812

Affiliations:

Xiaomi Corp., Beijing, China
Shanghai Jiao Tong University, X-LANCE Lab, AI Institute, MoE Key Lab of Artificial Intelligence, Shanghai, China

According to our database¹, Yifan Yang authored at least 41 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Phi-Ground Tech Report: Advancing Perception in GUI Grounding.

[BibT_eX]

[DOI]

CoRR, July, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, June, 2025

ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL.

[BibT_eX]

[DOI]

CoRR, May, 2025

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling.

[BibT_eX]

[DOI]

CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, April, 2025

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs.

[BibT_eX]

[DOI]

Abdelrahman Abouelenin

CoRR, March, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.

[BibT_eX]

[DOI]

CoRR, February, 2025

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

MageBench: Bridging Large Multimodal Models to Agents.

[BibT_eX]

[DOI]

CoRR, 2024

REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents.

[BibT_eX]

[DOI]

CoRR, 2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring SSL Discrete Tokens for Multilingual ASR.

[BibT_eX]

[DOI]

CoRR, 2024

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.

[BibT_eX]

[DOI]

CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LibriheavyMix: A 20, 000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Zipformer: A faster and better encoder for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Enhancing Generative Aspect-Based Sentiment Analysis with Relation-Level Supervision and Prompt.

[BibT_eX]

[DOI]

Yifan Yang

Yice Zhang

Ruifeng Xu

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

PromptASR for Contextualized ASR with Controllable Style.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Libriheavy: A 50, 000 Hours ASR Corpus with Punctuation Casing and Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Delay-penalized CTC Implemented Based on Finite State Transducer.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Target-to-Source Augmentation for Aspect Sentiment Triplet Extraction.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Generative Model for Structured Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2023, 2023

An Empirical Study of Sentiment-Enhanced Pre-Training for Aspect-Based Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

HITSZ-HLT at SemEval-2022 Task 10: A Span-Relation Extraction Framework for Structured Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022

Boundary-Driven Table-Filling for Aspect Sentiment Triplet Extraction.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Yifan Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...