Bowen Shi

Affiliations:
  • Meta, Meta AI, Fundamental AI Research (FAIR), Audiobox Team, USA
  • Toyota Technological Institute at Chicago, IL, USA


According to our database1, Bowen Shi authored at least 39 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound.
CoRR, February, 2025

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024
Scaling Speech Technology to 1, 000+ Languages.
J. Mach. Learn. Res., 2024

Movie Gen: A Cast of Media Foundation Models.
CoRR, 2024

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching.
CoRR, 2024

Data Efficient Reflow for Few Step Audio Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Generative Pre-training for Speech with Flow Matching.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Audiobox: Unified Audio Generation with Natural Language Prompts.
CoRR, 2023

TTIC's Submission to WMT-SLT 23.
Proceedings of the Eighth Conference on Machine Translation, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Comparative Layer-Wise Analysis of Self-Supervised Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement.
CoRR, 2022

A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer.
CoRR, 2022

TTIC's WMT-SLT 22 Sign Language Translation System.
Proceedings of the Seventh Conference on Machine Translation, 2022

u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Robust Self-Supervised Audio-Visual Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Open-Domain Sign Language Translation Learned from Online Video.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Searching for fingerspelled content in American Sign Language.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Fingerspelling Detection in American Sign Language.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
A Cross-Task Analysis of Text Span Representations.
Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Few-Shot Acoustic Event Detection Via Meta Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training.
CoRR, 2019

Compression of Acoustic Event Detection Models with Quantized Distillation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Fingerspelling Recognition in the Wild With Iterative Visual Attention.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Semi-supervised Acoustic Event Detection Based on Tri-training.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
American Sign Language Fingerspelling Recognition in the Wild.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

2017
Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017


  Loading...