Adam Polyak

Orcid: 0000-0003-2563-2111

According to our database1, Adam Polyak authored at least 33 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Video Editing via Factorized Diffusion Distillation.
CoRR, 2024

2023
Emu Edit: Precise Image Editing via Recognition and Generation Tasks.
CoRR, 2023

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning.
CoRR, 2023

X&Fuse: Fusing Visual Information in Text-to-Image Generation.
CoRR, 2023

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Text-To-4D Dynamic Scene Generation.
Proceedings of the International Conference on Machine Learning, 2023

Make-A-Video: Text-to-Video Generation without Text-Video Data.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

kNN-Diffusion: Image Generation via Large-Scale Retrieval.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

AudioGen: Textually Guided Audio Generation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Audio Language Modeling using Perceptually-Guided Discrete Representations.
CoRR, 2022

Multilingual Text-To-Speech Training Using Cross Language Voice Conversion And Self-Supervised Learning Of Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2022

Textless Speech Emotion Conversion using Discrete & Decomposed Representations.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors.
Proceedings of the Computer Vision - ECCV 2022, 2022

Direct Speech-to-Speech Translation With Discrete Units.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Text-Free Prosody-Aware Generative Spoken Language Modeling.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Locally Shifted Attention With Early Global Integration.
CoRR, 2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations.
CoRR, 2021

Direct speech-to-speech translation with discrete units.
CoRR, 2021

Generative Spoken Language Modeling from Raw Audio.
CoRR, 2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

High Fidelity Speech Regeneration with Application to Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021

2020
Unsupervised Generation of Free-Form and Parameterized Avatars.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

TTS Skins: Speaker Conversion via ASR.
Proceedings of the Interspeech 2020, 2020

Unsupervised Cross-Domain Singing Voice Conversion.
Proceedings of the Interspeech 2020, 2020

2019
A Universal Music Translation Network.
Proceedings of the 7th International Conference on Learning Representations, 2019

Attention-based Wavenet Autoencoder for Universal Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Fitting New Speakers Based on a Short Untranscribed Sample.
Proceedings of the 35th International Conference on Machine Learning, 2018

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Voice Synthesis for in-the-Wild Speakers via a Phonological Loop.
CoRR, 2017

Unsupervised Cross-Domain Image Generation.
Proceedings of the 5th International Conference on Learning Representations, 2017

Unsupervised Creation of Parameterized Avatars.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2015
Channel-Level Acceleration of Deep Face Representations.
IEEE Access, 2015


  Loading...