Yulia Tsvetkov

Orcid: 0000-0002-4634-7128

Affiliations:
  • University of Washington, Paul G. Allen School of Computer Science and Engineering, USA
  • Carnegie Mellon University, Pittsburgh, PA, USA


According to our database1, Yulia Tsvetkov authored at least 147 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages.
CoRR, 2024

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs.
CoRR, 2024

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers.
CoRR, 2024

Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks.
CoRR, 2024

DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection.
CoRR, 2024

Do Membership Inference Attacks Work on Large Language Models?
CoRR, 2024

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection.
CoRR, 2024

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration.
CoRR, 2024

Tuning Language Models by Proxy.
CoRR, 2024

Fine-grained Hallucination Detection and Editing for Language Models.
CoRR, 2024

Mental Health Stigma across Diverse Genders in Generative Large Language Models - Abstract (abstract).
Proceedings of Machine Learning for Cognitive and Mental Health Workshop (ML4CMH 2024) Co-located with the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024), 2024

2023
What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization.
CoRR, 2023

Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions.
CoRR, 2023

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory.
CoRR, 2023

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting.
CoRR, 2023

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models.
CoRR, 2023

MatFormer: Nested Transformer for Elastic Inference.
CoRR, 2023

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation.
CoRR, 2023

Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models.
CoRR, 2023

Resolving Knowledge Conflicts in Large Language Models.
CoRR, 2023

LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud.
CoRR, 2023

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer.
CoRR, 2023

SSD-2: Scaling and Inference-time Fusion of Diffusion Language Models.
CoRR, 2023

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding.
CoRR, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.
CoRR, 2023

TalkUp: A Novel Dataset Paving the Way for Understanding Empowering Language.
CoRR, 2023

Can Language Models Solve Graph Problems in Natural Language?
CoRR, 2023

CooK: Empowering General-Purpose Language Models with Modular and Collaborative Knowledge.
CoRR, 2023

Assessing Language Model Deployment with Risk Cards.
CoRR, 2023

BotPercent: Estimating Twitter Bot Populations from Groups to Crowds.
CoRR, 2023

LEXPLAIN: Improving Model Explanations via Lexicon Supervision.
Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics, 2023

Can Language Models Solve Graph Problems in Natural Language?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Examining risks of racial biases in NLP tools for child protective services.
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

BotPercent: Estimating Bot Populations in Twitter Communities.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

TalkUp: Paving the Way for Understanding Empowering Language.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

On the Zero-Shot Generalization of Machine-Generated Text Detectors.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Unsupervised Keyphrase Extraction via Interpretable Neural Networks.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Understanding In-Context Learning via Supportive Pretraining Data.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data.
CoRR, 2022

Constrained Sampling from Language Models via Langevin Dynamics in Embedding Spaces.
CoRR, 2022

VoynaSlov: A Data Set of Russian Social Media Activity during the 2022 Ukraine-Russia War.
CoRR, 2022

Controlled Analyses of Social Biases in Wikipedia Bios.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022


SimVLM: Simple Visual Language Model Pretraining with Weak Supervision.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Gendered Mental Health Stigma in Masked Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Gradient-based Constrained Sampling from Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Threat Scenarios and Best Practices to Detect Neural Fake News.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs.
CoRR, 2021

Improving Span Representation for Domain-adapted Coreference Resolution.
CoRR, 2021

Simple and Efficient ways to Improve REALM.
CoRR, 2021

An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation.
Proceedings of the 2nd AfricaNLP Workshop Proceedings, AfricaNLP@EACL 2021, Virtual Event, 2021

Controlled Analyses of Social Biases in Wikipedia Bios.
CoRR, 2021

Controlled Text Generation as Continuous Optimization with Multiple Constraints.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Controlling Dialogue Generation with Semantic Exemplars.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia.
Proceedings of the Fifteenth International AAAI Conference on Web and Social Media, 2021

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models.
Proceedings of the 9th International Conference on Learning Representations, 2021

DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues.
Proceedings of the 9th International Conference on Learning Representations, 2021

Efficient Test Time Adapter Ensembling for Low-resource Language Varieties.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Evaluating the Morphosyntactic Well-formedness of Generated Texts.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Detecting Community Sensitive Norm Violations in Online Conversations.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

StructSum: Summarization via Structured Representations.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Machine Translation into Low-resource Language Varieties.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

A Survey of Race, Racism, and Anti-Racism in NLP.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
A Framework for the Computational Linguistic Analysis of Dehumanization.
Frontiers Artif. Intell., 2020

Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis.
CoRR, 2020

StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization.
CoRR, 2020

Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods.
CoRR, 2020

A Computational Analysis of Polarization on Indian and Pakistani Social Media.
Proceedings of the Social Informatics - 12th International Conference, 2020

LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification.
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

Stress and burnout in open source: toward finding, understanding, and mitigating unhealthy interactions.
Proceedings of the ICSE-NIER 2020: 42nd International Conference on Software Engineering, New Ideas and Emerging Results, Seoul, South Korea, 27 June, 2020

Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History.
Proceedings of the 8th International Conference on Learning Representations, 2020

End-to-End Differentiable GANs for Text Generation.
Proceedings of the "I Can't Believe It's Not Better!" at NeurIPS Workshops, 2020

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Fortifying Toxic Speech Detectors Against Veiled Toxicity.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Unsupervised Discovery of Implicit Gender Bias.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Automatic Extraction of Rules Governing Morphological Agreement.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues.
Proceedings of the 24th Conference on Computational Natural Language Learning, 2020

A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards.
Proceedings of the Fourth Workshop on Neural Generation and Translation, 2020

Balancing Training for Multilingual Neural Machine Translation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Demoting Racial Bias in Hate Speech Detection.
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, 2020

2019
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology.
CoRR, 2019

Measuring Bias in Contextualized Word Representations.
CoRR, 2019

Socially Responsible Natural Language Processing.
Proceedings of the Companion of The 2019 World Wide Web Conference, 2019

A Dynamic Strategy Coach for Effective Negotiation.
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, 2019

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo Stories.
Proceedings of the Thirteenth International Conference on Web and Social Media, 2019

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs.
Proceedings of the 7th International Conference on Learning Representations, 2019

Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation.
Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP 2019, 2019

Topics to Avoid: Demoting Latent Confounds in Text Classification.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation.
Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP 2019, 2019

Entity-Centric Contextual Affective Analysis.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Native Language Cognate Effects on Second Language Lexical Choice.
Trans. Assoc. Comput. Linguistics, 2018

Style Transfer Through Multilingual and Feedback-Based Back-Translation.
CoRR, 2018

Socially Responsible NLP.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018

RtGender: A Corpus for Studying Differential Responses to Gender.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Framing and Agenda-Setting in Russian News: a Computational Analysis of Intricate Political Strategies.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Style Transfer Through Back-Translation.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Writer Profiling Without the Writer's Text.
Proceedings of the Social Informatics, 2017

Incorporating Dialectal Variability for Socially Equitable Language Identification.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Cross-Lingual Bridges with Models of Lexical Borrowing.
J. Artif. Intell. Res., 2016

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning.
CoRR, 2016

Massively Multilingual Word Embeddings.
CoRR, 2016

Correlation-based Intrinsic Evaluation of Word Vector Representations.
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, 2016

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks.
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, 2016

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning.
Proceedings of the NAACL HLT 2016, 2016

Morphological Inflection Generation Using Character Sequence to Sequence Learning.
Proceedings of the NAACL HLT 2016, 2016

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
A bottom up approach to category mapping and meaning change.
Proceedings of the NetWordS Final Conference on Word Knowledge and Word Usage: Representations and Processes in the Mental Lexicon, Pisa, Italy, March 30, 2015

Constraint-Based Models of Lexical Borrowing.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Evaluation of Word Vector Representations by Subspace Alignment.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Not All Contexts Are Created Equal: Better Word Representations with Variable Attention.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Lexicon Stratification for Translating Out-of-Vocabulary Words.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Sparse Overcomplete Word Vector Representations.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources.
Comput. Linguistics, 2014

The CMU Machine Translation Systems at WMT 2014.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

Augmenting English Adjective Senses with Supersenses.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Automatic Classification of Communicative Functions of Definiteness.
Proceedings of the COLING 2014, 2014

Metaphor Detection with Cross-Lingual Model Transfer.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options.
Proceedings of the Eighth Workshop on Statistical Machine Translation, 2013

Identification and modeling of word fragments in spontaneous speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Identifying the L1 of non-native writers: the CMU-Haifa system.
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, 2013

2012
Extraction of multi-word expressions from small parallel corpora.
Nat. Lang. Eng., 2012

2011
Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011

2010
Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content.
Proceedings of the International Conference on Language Resources and Evaluation, 2010


  Loading...