Ankit Gupta

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

2024

Exploring the limits of decoder-only models trained on public speech recognition corpora.

[BibT_eX]

[DOI]

George Saon

Brian Kingsbury

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors.

[BibT_eX]

[DOI]

Ido Amos

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Long Range Language Modeling via Gated State Spaces.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Diagonal State Space Augmented Transformers for Speech Recognition.

[BibT_eX]

[DOI]

George Saon

Xiaodong Cui

Proceedings of the IEEE International Conference on Acoustics, 2023

Analyzing Transformers in Embedding Space.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Simplifying and Understanding State Space Models with Diagonal Linear RNNs.

[BibT_eX]

[DOI]

Harsh Mehta

CoRR, 2022

Diagonal State Spaces are as Effective as Structured State Spaces.

[BibT_eX]

[DOI]

CoRR, 2022

On the Parameterization and Initialization of Diagonal State Space Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Diagonal State Spaces are as Effective as Structured State Spaces.

[BibT_eX]

[DOI]

Albert Gu

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SCROLLS: Standardized CompaRison Over Long Language Sequences.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Memory-efficient Transformers via Top-k Attention.

[BibT_eX]

[DOI]

Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, 2021

Value-aware Approximate Attention.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Break It Down: A Question Understanding Benchmark.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2020

GMAT: Global Memory Augmentation for Transformers.

[BibT_eX]

[DOI]

CoRR, 2020

Injecting Numerical Reasoning Skills into Language Models.

[BibT_eX]

[DOI]

Mor Geva

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2017

Unexpected power of low-depth arithmetic circuits.

[BibT_eX]

[DOI]

Commun. ACM, 2017

2016

Arithmetic Circuits: A Chasm at Depth 3.

[BibT_eX]

[DOI]

SIAM J. Comput., 2016

2014

Algebraic Geometric Techniques for Depth-4 PIT & Sylvester-Gallai Conjectures for Varieties.

[BibT_eX]

[DOI]

Electron. Colloquium Comput. Complex., 2014

2013

Arithmetic Circuits: A Chasm at Depth Three.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science, 2013

Random Arithmetic Formulas Can Be Reconstructed Efficiently.

[BibT_eX]

[DOI]

Youming Qiao

Proceedings of the 28th Conference on Computational Complexity, 2013

Approaching the Chasm at Depth Four.

[BibT_eX]

[DOI]

Proceedings of the 28th Conference on Computational Complexity, 2013

2012

An exponential lower bound for homogeneous depth four arithmetic circuits with bounded bottom fanin.

[BibT_eX]

[DOI]

Electron. Colloquium Comput. Complex., 2012

Reconstruction of depth-4 multilinear circuits with top fan-in 2.

[BibT_eX]

[DOI]

Satyanarayana V. Lokam

Proceedings of the 44th Symposium on Theory of Computing Conference, 2012

2011

Reconstruction of Depth-4 Multilinear Circuits with Top fanin 2.

[BibT_eX]

[DOI]

Satyanarayana V. Lokam

Electron. Colloquium Comput. Complex., 2011

Efficient Reconstruction of Random Multilinear Formulas.

[BibT_eX]

[DOI]