Yi Yu

Orcid: 0000-0002-0294-6620

Affiliations:
  • Graduate School of Advanced Science and Engineering, Hiroshima University, Japan
  • National Institute of Informatics, Digital Content and Media Sciences Research, Tokyo, Japan (former)
  • National University of Singapore, School of Computing, Singapore (former)
  • University of Milan, Italy (former)
  • New Jersey Institute of Technology, Newark, NJ, USA (former)
  • Nara Women's University, Department of Advanced Information and Computer Sciences, Japan (PhD 2009)


According to our database1, Yi Yu authored at least 148 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Progressive Placeholder Learning Network for Multimodal Zero-Shot Learning.
IEEE Trans. Multim., 2024

Semantic dependency network for lyrics generation from melody.
Neural Comput. Appl., 2024

LM2D: Lyrics- and Music-Driven Dance Synthesis.
CoRR, 2024

Anchor-aware Deep Metric Learning for Audio-visual Retrieval.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

Syllable-level lyrics generation from melody exploiting character-level language model.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Scalable Motion Style Transfer with Constrained Diffusion Generation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Stripe-Transformer: deep stripe feature learning for music source separation.
EURASIP J. Audio Speech Music. Process., December, 2023

Multi-scale network with shared cross-attention for audio-visual correlation learning.
Neural Comput. Appl., September, 2023

Controllable lyrics-to-melody generation.
Neural Comput. Appl., September, 2023

Conditional hybrid GAN for melody generation from lyrics.
Neural Comput. Appl., February, 2023

Learning Explicit and Implicit Dual Common Subspaces for Audio-visual Cross-modal Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Melody Generation from Lyrics with Local Interpretability.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Variational Autoencoder with CCA for Audio-Visual Cross-modal Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2023

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation.
IEEE Trans. Multim., 2023

Context-Patch Representation Learning With Adaptive Neighbor Embedding for Robust Face Image Super-Resolution.
IEEE Trans. Multim., 2023

A neural harmonic-aware network with gated attentive fusion for singing melody extraction.
Neurocomputing, 2023

LiveChat: Video Comment Generation from Audio-Visual Multimodal Contexts.
CoRR, 2023

Music- and Lyrics-driven Dance Synthesis.
CoRR, 2023

Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism.
CoRR, 2023

Emotionally Enhanced Talking Face Generation.
CoRR, 2023

Graph-Based Video-Language Learning with Multi-Grained Audio-Visual Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Emotionally Enhanced Talking Face Generation.
Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, 2023

Detecting Dialogue Hallucination Using Graph Neural Networks.
Proceedings of the International Conference on Machine Learning and Applications, 2023

MFAE: Masked frame-level autoencoder with hybrid-supervision for low-resource music transcription.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

LC-Beating: An Online System for Beat and Downbeat Tracking using Latency-Controlled Mechanism.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Leaning compact and representative features for cross-modality person re-identification.
World Wide Web, 2022

Towards Multi-Domain Face Synthesis Via Domain-Invariant Representations and Multi-Level Feature Parts.
IEEE Trans. Multim., 2022

MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation.
IEEE Trans. Intell. Transp. Syst., 2022

Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust Cross-Resolution Face Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2022

Recent Advances and Challenges in Deep Audio-Visual Correlation Learning.
CoRR, 2022

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN.
Proceedings of the MultiMedia Modeling - 28th International Conference, 2022

Emotional Talking Faces: Making Videos More Expressive and Realistic.
Proceedings of the 4th ACM International Conference on Multimedia in Asia, 2022


Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics.
Proceedings of the IEEE International Symposium on Multimedia, 2022

Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

HarmoF0: Logarithmic Scale Dilated Convolution for Pitch Estimation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

Deepchorus: A Hybrid Model of Multi-Scale Convolution And Self-Attention for Chorus Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022


Feature Distillation Interaction Weighting Network for Lightweight Image Super-resolution.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Conditional LSTM-GAN for Melody Generation from Lyrics.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Correlation Discrepancy Insight Network for Video Re-identification.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Robust Facial Image Super-Resolution by Kernel Locality-Constrained Coupled-Layer Regression.
ACM Trans. Internet Techn., 2021

HANME: Hierarchical Attention Network for Singing Melody Extraction.
IEEE Signal Process. Lett., 2021

Constructing multilayer locality-constrained matrix regression framework for noise robust face super-resolution.
Pattern Recognit., 2021

LBAN-IL: A novel method of high discriminative representation for facial expression recognition.
Neurocomputing, 2021

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval.
CoRR, 2021

JDSR-GAN: Constructing A Joint and Collaborative Learning Network for Masked Face Super-Resolution.
CoRR, 2021

Adversarial Learning with Mask Reconstruction for Text-Guided Image Inpainting.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Lightweight Image Super-Resolution with Multi-Scale Feature Interaction Network.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

Singer Identification Using Deep Timbre Feature Learning with KNN-NET.
Proceedings of the IEEE International Conference on Acoustics, 2021

Frequency-Temporal Attention Network for Singing Melody Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2021

Interpretable Visual Understanding with Cognitive Attention Network.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2021, 2021

Long-short Term Prediction for Occluded Multiple Object Tracking.
Proceedings of the IEEE Global Communications Conference, 2021

2020
Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Ensemble Super-Resolution With a Reference Dataset.
IEEE Trans. Cybern., 2020

Context-Patch Face Hallucination Based on Thresholding Locality-Constrained Representation and Reproducing Learning.
IEEE Trans. Cybern., 2020

Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression.
Inf. Sci., 2020

SIST: Online Scale-Adaptive Object tracking with Stepwise Insight.
Neurocomputing, 2020

Image super-resolution via multi-view information fusion networks.
Neurocomputing, 2020

MTM Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio?
CoRR, 2020

Automatic Neural Lyrics and Melody Composition.
CoRR, 2020

Conditional Hybrid GAN for Sequence Generation.
CoRR, 2020

Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data.
CoRR, 2020

Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation.
CoRR, 2020

Multi-scale patch based representation feature learning for low-resolution face recognition.
Appl. Soft Comput., 2020

A Relation Learning Hierarchical Framework for Multi-label Charge Prediction.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2020

Lyrics-Conditioned Neural Melody Generation.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

C3VQG: category consistent cyclic visual question generation.
Proceedings of the MMAsia 2020: ACM Multimedia Asia, 2020

End-to-End Named Entity Recognition from English Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics.
Proceedings of the 6th IEEE International Conference on Multimedia Big Data, 2020

PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability.
Proceedings of the 6th IEEE International Conference on Multimedia Big Data, 2020

2019
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data.
IEEE Trans. Neural Networks Learn. Syst., 2019

Incremental Re-Identification by Cross-Direction and Cross-Ranking Adaption.
IEEE Trans. Multim., 2019

Graph-Regularized Locality-Constrained Joint Dictionary and Residual Learning for Face Sketch Synthesis.
IEEE Trans. Image Process., 2019

Face hallucination through differential evolution parameter map learning with facial structure prior.
Inf. Sci., 2019

Conditional LSTM-GAN for Melody Generation from Lyrics.
CoRR, 2019

Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA.
CoRR, 2019

Personalized Music Recommendation with Triplet Network.
CoRR, 2019

Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions.
Proceedings of the Fifth IEEE International Conference on Multimedia Big Data, 2019

2018
Person Reidentification via Discrepancy Matrix and Matrix Metric.
IEEE Trans. Cybern., 2018

Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA.
Proceedings of the 2018 IEEE International Symposium on Multimedia, 2018

Deep Learning of Human Perception in Audio Event Classification.
Proceedings of the 2018 IEEE International Symposium on Multimedia, 2018

Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing.
Proceedings of the IEEE International Conference on Data Mining, 2018

Residual Learning for Face Sketch Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Video-Based Person Re-Identification via Self Paced Weighting.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
A query refinement framework for xml keyword search.
World Wide Web, 2017

Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification.
IEEE Geosci. Remote. Sens. Lett., 2017

Towards Deep Modeling of Music Semantics using EEG Regularizers.
CoRR, 2017

Statistical Inference of Gaussian-Laplace Distribution for Person Verification.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

VenueNet: Fine-Grained Venue Discovery by Deep Correlation Learning.
Proceedings of the 19th IEEE International Symposium on Multimedia, 2017

JSFox: integrating static and dynamic type analysis of JavaScript programs.
Proceedings of the 39th International Conference on Software Engineering, 2017

Context-patch based face hallucination via thresholding locality-constrained representation and reproducing learning.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Compact LBP and WLBP descriptor with magnitude and direction difference for face recognition.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Taichi distance for person re-identification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep Multi-label Hashing for Large-Scale Visual Search Based on Semantic Graph.
Proceedings of the Web and Big Data - First International Joint Conference, 2017

2016
Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing.
IEEE Trans. Multim., 2016

Zero-Shot Person Re-identification via Cross-View Consistency.
IEEE Trans. Multim., 2016

Leveraging multimodal information for event summarization and concept-level sentiment analysis.
Knowl. Based Syst., 2016

NEWSMAN: Uploading Videos over Adaptive Middleboxes to News Servers in Weak Network Infrastructures.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Camera Network Based Person Re-identification by Leveraging Spatial-Temporal Constraint and Multiple Cameras Relations.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Videopedia: Lecture Video Recommendation for Educational Blogs Using Topic Modeling.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Concept-Level Multimodal Ranking of Flickr Photo Tags via Recall Based Weighting.
Proceedings of the 2016 ACM Workshop on Multimedia COMMONS, 2016

PROMPT: Personalized User Tag Recommendation for Social Media Photos Leveraging Personal and Social Contexts.
Proceedings of the IEEE International Symposium on Multimedia, 2016

Scale-Adaptive Low-Resolution Person Re-Identification via Learning a Discriminating Surface.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Using Psychoacoustic Models for Sound Analysis in Music.
Proceedings of the 8th annual meeting of the Forum on Information Retrieval Evaluation, 2016

Fuzzy clustering of lecture videos based on topic modeling.
Proceedings of the 14th International Workshop on Content-Based Multimedia Indexing, 2016

Predicting User Preference Based on Matrix Factorization by Exploiting Music Attributes.
Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering, 2016

2015
Efficient Geo-Fencing via Hybrid Hashing: A Combination of Bucket Selection and In-Bucket Binary Search.
ACM Trans. Spatial Algorithms Syst., 2015

On Generating Content-Oriented Geo Features for Sensor-Rich Outdoor Video Search.
IEEE Trans. Multim., 2015

Adaptive Margin Nearest Neighbor for Person Re-Identification.
Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

Multi-Level Fusion for Person Re-identification with Incomplete Marks.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

EventBuilder: Real-time Multimedia Event Summarization by Visualizing Social Media.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

TRACE: Linguistic-Based Approach for Automatic Lecture Video Segmentation Leveraging Wikipedia Texts.
Proceedings of the 2015 IEEE International Symposium on Multimedia, 2015

Social Interactions over Location-Aware Multimedia Systems.
Proceedings of the Multimedia Data Mining and Analytics - Disruptive Innovation, 2015

2014
A Probabilistic Associative Model for Segmenting Weakly Supervised Images.
IEEE Trans. Image Process., 2014

User preference-aware music video generation based on modeling scene moods.
Proceedings of the Multimedia Systems Conference 2014, 2014

WISMM'14 - First ACM International Workshop on Internet-Scale Multimedia Management.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Empirical Observation of User Activities: Check-ins, Venue Photos and Tips in Foursquare.
Proceedings of the First International Workshop on Internet-Scale Multimedia Management, 2014

Emerging Topics on Personalized and Localized Multimedia Information Systems.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

ADVISOR: Personalized Video Soundtrack Recommendation by Late Fusion with Heuristic Rankings.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

ATLAS: Automatic Temporal Segmentation and Annotation of Lecture Videos Based on Modelling Transition Time.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Student Performance Evaluation of Multimodal Learning via a Vector Space Model.
Proceedings of the First International Workshop on Internet-Scale Multimedia Management, 2014

2013
Scalable Content-Based Music Retrieval Using Chord Progression Histogram and Tree-Structure LSH.
IEEE Trans. Multim., 2013

Query-Document-Dependent Fusion: A Case Study of Multimodal Music Retrieval.
IEEE Trans. Multim., 2013

Social interactions over geographic-aware multimedia systems.
Proceedings of the ACM Multimedia Conference, 2013

Edge-based locality sensitive hashing for efficient geo-fencing application.
Proceedings of the 21st SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2013

2012
Automatic music soundtrack generation for outdoor videos from contextual sensor information.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval.
Proceedings of the 2012 IEEE International Symposium on Multimedia, 2012

2010
Combining multi-probe histogram and order-statistics based LSH for scalable audio content retrieval.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Recommender system for MIR research community.
Proceedings of the 2010 Joint International Conference on Digital Libraries, 2010

2009
Multi-Version Music Search Using Acoustic Feature Union and Exact Soft Mapping.
Int. J. Semantic Comput., 2009

Local summarization and multi-level LSH for retrieving multi-variant audio tracks.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

2008
Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison.
IEICE Trans. Inf. Syst., 2008

COSIN: content-based retrieval system for cover songs.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Searching musical audio datasets by a batch of multi-variant tracks.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Using Exact Locality Sensitive Mapping to Group and Detect Audio-Based Cover Songs.
Proceedings of the Tenth IEEE International Symposium on Multimedia (ISM2008), 2008

Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach.
Proceedings of the EDBT 2008, 2008

2007
Similarity Searching Techniques in Content-Based Audio Retrieval Via Hashing.
Proceedings of the Advances in Multimedia Modeling, 2007

2005
Fast Algorithm for Symbol Rate Estimation.
IEICE Trans. Commun., 2005

Towards a Fast and Efficient Match Algorithm for Content-Based Music Retrieval on Acoustic Data.
Proceedings of the ISMIR 2005, 2005


  Loading...