We stand with Ukraine

We stand with Ukraine

Dara Bahri

Orcid: 0000-0003-0144-2911

According to our database¹, Dara Bahri authored at least 34 papers between 2018 and 2023.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2023

Efficient Transformers: A Survey.

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

,

ACM Comput. Surv., 2023

Surprise: Result List Truncation via Extreme Value Theory.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Sharpness-Aware Minimization Leads to Low-Rank Features.

[BibT_eX]

[DOI]

Maksym Andriushchenko

,

,

,

Nicolas Flammarion

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

UL2: Unifying Language Learning Paradigms.

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

,

,

,

,

Hyung Won Chung

,

,

,

Huaixiu Steven Zheng

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Is margin all you need? An extensive empirical study of active learning on tabular data.

[BibT_eX]

[DOI]

,

,

,

Afshin Rostamizadeh

CoRR, 2022

Confident Adaptive Language Modeling.

[BibT_eX]

[DOI]

,

,

Jai Prakash Gupta

,

Mostafa Dehghani

,

,

,

,

CoRR, 2022

Unifying Language Learning Paradigms.

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

,

,

,

,

Huaixiu Steven Zheng

,

,

CoRR, 2022

Transformer Memory as a Differentiable Search Index.

[BibT_eX]

[DOI]

,

,

Mostafa Dehghani

,

,

,

,

,

,

,

Jai Prakash Gupta

,

,

William W. Cohen

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Confident Adaptive Language Modeling.

[BibT_eX]

[DOI]

,

,

,

Mostafa Dehghani

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization.

[BibT_eX]

[DOI]

,

,

Sebastian Ruder

,

Jai Prakash Gupta

,

Hyung Won Chung

,

,

,

Simon Baumgartner

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Churn Reduction via Distillation.

[BibT_eX]

[DOI]

,

Harikrishna Narasimhan

,

,

,

Afshin Rostamizadeh

Proceedings of the Tenth International Conference on Learning Representations, 2022

Scarf: Self-Supervised Contrastive Learning using Random Feature Corruption.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning.

[BibT_eX]

[DOI]

,

,

,

,

Huaixiu Steven Zheng

,

Sanket Vaibhav Mehta

,

,

,

,

,

Jai Prakash Gupta

,

,

Sebastian Ruder

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Sharpness-Aware Minimization Improves Language Model Generalization.

[BibT_eX]

[DOI]

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Jai Prakash Gupta

,

Cícero Nogueira dos Santos

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

Rethinking search: making domain experts out of dilettantes.

[BibT_eX]

[DOI]

,

,

,

SIGIR Forum, 2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

Jai Prakash Gupta

,

,

,

,

CoRR, 2021

Rethinking Search: Making Experts out of Dilettantes.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

Locally Adaptive Label Smoothing for Predictive Churn.

[BibT_eX]

[DOI]

,

CoRR, 2021

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the WSDM '21, 2021

Synthesizer: Rethinking Self-Attention for Transformer Models.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

OmniNet: Omnidirectional Representations from Transformers.

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

,

Jai Prakash Gupta

,

,

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Locally Adaptive Label Smoothing Improves Predictive Churn.

[BibT_eX]

[DOI]

,

Proceedings of the 38th International Conference on Machine Learning, 2021

HyperGrid Transformers: Towards A Single Model for Multiple Tasks.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

Long Range Arena : A Benchmark for Efficient Transformers.

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

,

,

,

,

,

,

Sebastian Ruder

,

Proceedings of the 9th International Conference on Learning Representations, 2021

Are Pretrained Convolutions Better than Pretrained Transformers?

[BibT_eX]

[DOI]

,

Mostafa Dehghani

,

Jai Prakash Gupta

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

Aaron C. Courville

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2020

Choppy: Cut Transformer for Ranked List Truncation.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Sparse Sinkhorn Attention.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

Deep k-NN for Noisy Labels.

[BibT_eX]

[DOI]

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

Reverse Engineering Configurations of Neural Text Generation Models.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2018

Diminishing Returns Shape Constraints for Interpretability and Regularization.

[BibT_eX]

[DOI]

,

,

,

Kevin Robert Canini

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Loading...