Vassilis Katsouros

Orcid: 0000-0002-4185-2344

Affiliations:
  • Athena RIC, Institute for Language & Speech Processing, Athens, Greece


According to our database1, Vassilis Katsouros authored at least 58 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: A Case Study for Modern Greek.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Meltemi: The first open Large Language Model for Greek.
CoRR, 2024

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data.
CoRR, 2024

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 Workshops, 2024

Investigating Personalization Methods in Text to Music Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

AI-Enabled Art Education: Unleashing Creative Potential and Exploring Co-Creation Frontiers.
Proceedings of the 16th International Conference on Computer Supported Education, 2024

2023
Weakly-supervised Automated Audio Captioning via text only training.
CoRR, 2023

Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

2022
DanceConv: Dance Motion Generation With Convolutional Networks.
IEEE Access, 2022

Regotron: Regularizing the Tacotron2 Architecture Via Monotonic Alignment Loss.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Towards a DHH Accessible Theater: Real-Time Synchronization of Subtitles and Sign Language Videos with ASR and NLP Solutions.
Proceedings of the PETRA '22: The 15th International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 29 June 2022, 2022

SciPar: A Collection of Parallel Corpora from Scientific Abstracts.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

NLP-Theatre: Employing Speech Recognition Technologies for Improving Accessibility and Augmenting the Theatrical Experience.
Proceedings of the Intelligent Systems and Applications, 2022

A Few-Sample Strategy for Guitar Tablature Transcription Based on Inharmonicity Analysis and Playability Constraints.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Attention-based Multimodal Feature Fusion for Dance Motion Generation.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

2020
On the Adaptability of Recurrent Neural Networks for Real-Time Jazz Improvisation Accompaniment.
Frontiers Artif. Intell., 2020

Educational Technology - Introduction to the Special Theme.
ERCIM News, 2020

Music in Education through Technology.
ERCIM News, 2020

Air-Writing Recognition using Deep Convolutional and Recurrent Neural Network Architectures.
Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition, 2020

2019
Enabling the human in the loop: Linked data and knowledge in industrial cyber-physical systems.
Annu. Rev. Control., 2019

Deep Convolutional and LSTM Neural Network Architectures on Leap Motion Hand Tracking Data Sequences.
Proceedings of the 27th European Signal Processing Conference, 2019

An Environment for Gestural Interaction with 3D Virtual Musical Instruments as an Educational Tool.
Proceedings of the 27th European Signal Processing Conference, 2019

2018
iMuSciCA: Interactive Music Science Collaborative Activities for STEAM Learning.
Proceedings of the Designing for the User Experience in Learning Systems, 2018

Musical track popularity mining dataset: Extension & experimentation.
Neurocomputing, 2018

A web-based 3D environment for gestural interaction with virtual music instruments as a STEAM education tool.
Proceedings of the 18th International Conference on New Interfaces for Musical Expression, 2018

A Web-based Real-Time Kinect Application for Gestural Interaction with Virtual Musical Instruments.
Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion, 2018

Interactive Control of Explicit Musical Features in Generative LSTM-based Systems.
Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion, 2018

2017
Convolutional Neural Networks for Real-Time Beat Tracking: A Dancing Robot Application.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017

2016
Towards Multi-Purpose Spectral Rhythm Features: An Application to Dance Style, Meter and Tempo Estimation.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Musical Track Popularity Mining Dataset.
Proceedings of the Artificial Intelligence Applications and Innovations, 2016

Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs.
Proceedings of the 12th IAPR Workshop on Document Analysis Systems, 2016

2015
Recognition of online handwritten mathematical formulas using probabilistic SVMs and stochastic context free grammars.
Pattern Recognit. Lett., 2015

Recognition of historical Greek polytonic scripts using LSTM networks.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

GRPOLY-DB: An old Greek polytonic document image database.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

2014
Deploying Deep Belief Nets for content based audio music similarity.
Proceedings of the 5th International Conference on Information, 2014

Recognition of Spatial Relations in Mathematical Formulas.
Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014

2013
Structural analysis of online handwritten mathematical symbols based on support vector machines.
Proceedings of the Document Recognition and Retrieval XX, 2013

2012
Mean shift algorithm for exponential families with applications to speaker clustering.
Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

A mean shift algorithm for manifolds of exponential families.
Proceedings of the 11th International Conference on Information Science, 2012

Reducing Tempo Octave Errors by Periodicity Vector Coding And SVM Learning.
Proceedings of the 13th International Society for Music Information Retrieval Conference, 2012

A System for Recognition of On-Line Handwritten Mathematical Expressions.
Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, 2012

A Morphology Based Approach for Binarization of Handwritten Documents.
Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, 2012

Music tempo estimation and beat tracking by applying source separation and metrical relations.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Enhancing Handwritten Word Segmentation by Employing Local Spatial Features.
Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

Closed-form expressions vs. BIC: A comparison for speaker clustering.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Handwritten document image segmentation into text lines and words.
Pattern Recognit., 2010

The Segmental Bayesian Information Criterion and Its Applications to Speaker Diarization.
IEEE J. Sel. Top. Signal Process., 2010

Speaker clustering via the mean shift algorithm.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Tempo Induction Using Filterbank Analysis and Tonal Features.
Proceedings of the 11th International Society for Music Information Retrieval Conference, 2010

A Morphological Approach for Text-Line Segmentation in Handwritten Documents.
Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2010

A new penalty term for the BIC with respect to speaker diarization.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Redefining the Bayesian information criterion for speaker diarisation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008
Robust text-line and word segmentation for handwritten documents images.
Proceedings of the IEEE International Conference on Acoustics, 2008

PANOPTIS: A System for Intelligent Monitoring of the Hellenic Broadcast Sector.
Proceedings of the 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 2008

2007
A Parametric Spectral-Based Method for Verification of Text in Videos.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007

Efficient combination of parametric spaces, models and metrics for speaker diarization<sup>1</sup>.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007


  Loading...