Zhifeng Kong

According to our database1, Zhifeng Kong authored at least 43 papers between 2017 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Benchmarking Single-Factor Physical Video-to-Audio Generation.
CoRR, May, 2026

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music.
CoRR, April, 2026

2025
Music Flamingo: Scaling Music Understanding in Audio Language Models.
CoRR, November, 2025

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning.
CoRR, October, 2025

Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding.
CoRR, August, 2025

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models.
CoRR, July, 2025

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge.
CoRR, May, 2025



A2SB: Audio-to-Audio Schrodinger Bridges.
CoRR, January, 2025

On the Reliability of Membership Inference Attacks.
Proceedings of the IEEE Conference on Secure and Trustworthy Machine Learning, 2025

ETTA: Elucidating the Design Space of Text-to-Audio Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Fugatto 1: Foundational Generative Audio Transformer Opus 1.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PDGD-NET: Polarized Dynamic Grasp Detection Network.
Proceedings of the International Conference on Control, Automation and Diagnosis, 2025

Research on Whole-Body Coordinated Motion of Humanoid Robots Based on LSTM-Integrated Reinforcement Learning.
Proceedings of the 11th International Conference on Control, 2025

2024

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization.
CoRR, 2024

A Geometry-Aware Algorithm to Learn Hierarchical Embeddings in Hyperbolic Space.
CoRR, 2024

Improving Text-To-Audio Models with Synthetic Captions.
CoRR, 2024

Audio Dialogues: Dialogues dataset for audio and music understanding.
CoRR, 2024

Data Redaction from Conditional Generative Models.
Proceedings of the IEEE Conference on Secure and Trustworthy Machine Learning, 2024

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Understanding Expressivity and Trustworthy Aspects of Deep Generative Models
PhD thesis, 2023

Can Membership Inferencing be Refuted?
CoRR, 2023

Data Redaction from Pre-trained GANs.
Proceedings of the 2023 IEEE Conference on Secure and Trustworthy Machine Learning, 2023

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Approximate Data Deletion in Generative Models.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2022
Forgetting Data from Pre-trained GANs.
CoRR, 2022

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Speech Denoising in the Waveform Domain With Self-Attention.
Proceedings of the IEEE International Conference on Acoustics, 2022

Forgeability and Membership Inference Attacks.
Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, 2022

2021
On Fast Sampling of Diffusion Probabilistic Models.
CoRR, 2021

Universal Approximation of Residual Flows in Maximum Mean Discrepancy.
CoRR, 2021

Understanding Instance-based Interpretability of Variational Auto-Encoders.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DiffWave: A Versatile Diffusion Model for Audio Synthesis.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
The Expressive Power of a Class of Normalizing Flow Models.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Fastened CROWN: Tightened Neural Network Robustness Certificates.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2018
Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory.
Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

2017
Convergence Analysis of the Dynamics of a Special Kind of Two-Layered Neural Networks with 𝓵<sub>1</sub> and 𝓵<sub>2</sub> Regularization.
CoRR, 2017

WristAuthen: A Dynamic Time Wrapping Approach for User Authentication by Hand-Interaction through Wrist-Worn Devices.
CoRR, 2017

Generative Adversarial Networks with Inverse Transformation Unit.
CoRR, 2017


  Loading...