Kimmo Kettunen

Orcid: 0000-0003-2747-1382

  • University of Eastern Finland, School of Humanities, Finnish Language and Cultural Research, Joensuu, Finland
  • University of Helsinki, DH Research, Finland
  • National Library of Finland, National Digitisation Centre Mikkeli, Helsinki, Finland
  • University of Tampere, Department of Information Studies, Finland (PhD 2007)

According to our database1, Kimmo Kettunen authored at least 45 papers between 2005 and 2023.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Analyzing gender clues in war-time letters.
Digit. Scholarsh. Humanit., April, 2023

Optical character recognition quality affects subjective user perception of historical newspaper clippings.
J. Documentation, 2023

Optical character recognition quality affects perceived usefulness of historical newspaper clippings.
CoRR, 2022

OCR Quality Affects Perceived Usefulness of Historical Newspaper Clippings - A User Study.
Proceedings of the 18th Italian Research Conference on Digital Libraries, 2022

Geographic Space in Pentti Haanpää's Novel Korpisotaa - Where Does the War Happen?
Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022

Adding Compound Splitting and Analysis to a Semantic Tagger of Modern Standard Finnish - On the Way to FiSTComp.
Proceedings of the Human Language Technologies - The Baltic Perspective, 2020

Name the Name - Named Entity Recognition in OCRed 19th and Early 20th Century Finnish Newspaper and Journal Collection Data.
Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, 2020

Digging Deeper into the Finnish Parliamentary Protocols - Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman's Rights (Allemansrätten).
Proceedings of the Digital Humanities in the Nordic Countries 5th Conference, 2020

Clipping the Page - Automatic Article Detection and Marking Software in Production of Newspaper Clippings of a Digitized Historical Journalistic Collection.
Proceedings of the Digital Libraries for Open Knowledge, 2019

Finding Nineteenth-century Berry Spots: Recognizing and Linking Place Names in a Historical Newspaper Berry-picking Corpus.
Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, 2019

Open Source Tesseract in Re-OCR of Finnish Fraktur from 19th and Early 20th Century Newspapers and Journals - Collected Notes on Quality Improvement.
Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, 2019

Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771-1929: Early Results Using the PIVAJ Software.
Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, 2019

The Challenges of Language Variation in Information Access.
Proceedings of the Information Retrieval Evaluation in a Changing World, 2019

Digitisation and Digital Library Presentation System - A Resource-Conscientious Approach.
Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference, 2018

Research and Development Efforts on the Digitized Historical Newspaper and Journal Collection of The National Library of Finland.
Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference, 2018

Creating and Using Ground Truth OCR Sample Data for Finnish Historical Newspapers and Journals.
Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference, 2018

Old Content and Modern Tools - Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910.
Digit. Humanit. Q., 2017

Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur & Antiqua Models and Image Preprocessing.
Proceedings of the 21st Nordic Conference on Computational Linguistics, 2017

Tagging Named Entities in 19th Century and Modern Finnish Newspaper Material with a Finnish Semantic Tagger.
Proceedings of the 21st Nordic Conference on Computational Linguistics, 2017

How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine - Final Notes on Development and Evaluation.
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2017

Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection.
Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage, 2017

Information retrieval from historical newspaper collections in highly inflectional languages: A query expansion approach.
J. Assoc. Inf. Sci. Technol., 2016

Exporting Finnish Digitized Historical Newspaper Contents for Offline Use.
D Lib Mag., 2016

How to do lexical quality estimation of a large OCRed historical Finnish newspaper collection with scarce resources.
CoRR, 2016

Modern Tools for Old Content - in Search of Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910.
Proceedings of the Conference "Lernen, 2016

Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Between Diachrony and Synchrony: Evaluation of Lexical Quality of a Digitized Historical Finnish Newspaper and Journal Collection with Morphological Analyzers.
Proceedings of the Human Language Technologies - The Baltic Perspective, 2016

Keep, Change or Delete? Setting up a Low Resource OCR Post-correction Framework for a Digitized Old Finnish Newspaper Collection.
Proceedings of the Digital Libraries on the Move, 2015

Can Type-Token Ratio be Used to Show Morphological Complexity of Languages?
J. Quant. Linguistics, 2014

Generating Variant Keyword Forms for a Morphologically Complex Language Leads to Successful Information Retrieval with Finnish.
Proceedings of the Multidisciplinary Information Retrieval, 2012

Managing Word Form Variation of Text Retrieval in Practice - why Five Character Truncation Takes it all?
Proceedings of the Human Language Technologies - The Baltic Perspective, 2012

Frequent Case Generation in Ad Hoc Retrieval of Three Indian Languages - Bengali, Gujarati and Marathi.
Proceedings of the Multilingual Information Access in South Asian Languages, 2011

Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval: An overview.
J. Documentation, 2009

Does dictionary based bilingual retrieval work in a non-normalized index?
Inf. Process. Manag., 2009

Choosing the Best MT Programs for CLIR Purposes - Can MT Metrics Be Helpful?
Proceedings of the Advances in Information Retrieval, 2009

Complexity of European Union Languages: A comparative approach.
J. Quant. Linguistics, 2008

Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval.
Proceedings of the Advances in Natural Language Processing, 2008

Reductive and Generative Approaches to Morphological Variation of Keywords in Monolingual Information Retrieval.
PhD thesis, 2007

Restricted inflectional form generation in management of morphological keyword variation.
Inf. Retr., 2007

Management of keyword variation with frequency based generation of word forms in IR.
Proceedings of the SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007

Managing Keyword Variation with Frequency Based Generation of Word Forms in IR.
Proceedings of the 16th Nordic Conference of Computational Linguistics, 2007

Developing an automatic linguistic truncation operator for best-match retrieval of Finnish in inflected word form text database indexes.
J. Inf. Sci., 2006

Analysis of EU Languages Through Text Compression.
Proceedings of the Advances in Natural Language Processing, 2006

Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval?
Proceedings of the Advances in Natural Language Processing, 2006

To stem or lemmatize a highly inflectional language in a probabilistic IR environment?
J. Documentation, 2005
