Cha Zhang

According to our database1, Cha Zhang authored at least 125 papers between 2000 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Kosmos-2.5: A Multimodal Literate Model.
CoRR, 2023

Diffusion-Based Document Layout Generation.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Unifying Vision, Text, and Layout for Universal Document Processing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Editorial for Special Issue on Computer Vision in the Wild.
Int. J. Comput. Vis., 2022

Understanding Long Documents with Different Position-Aware Attentions.
CoRR, 2022

DiT: Self-supervised Pre-training for Document Image Transformer.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

A Simple yet Effective Learnable Positional Encoding Method for Improving Document Transformer Model.
Proceedings of the Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, 2022

XDoc: Unified Pre-training for Cross-Format Document Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Improving Structured Text Recognition with Regular Expression Biasing.
CoRR, 2021

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models.
CoRR, 2021

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding.
CoRR, 2021

Beyond $L_{p}$ Norms: Delving Deeper into Robustness to Physical Image Transformations.
Proceedings of the 2021 IEEE Military Communications Conference, 2021

TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Renofeation: A Simple Transfer Learning Method for Improved Adversarial Robustness.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation.
CoRR, 2020

Multimodal Active Speaker Detection and Virtual Cinematography for Video Conferencing.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Towards Efficient Model Compression via Learned Global Ranking.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
LeGR: Filter Pruning via Learned Global Ranking.
CoRR, 2019

RePr: Improved Training of Convolutional Filters.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks.
CoRR, 2018

2017
Orthogonal and Idempotent Transformations for Learning Deep Neural Networks.
CoRR, 2017

Deep Learning for Intelligent Video Analysis.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Automatic speech emotion recognition using recurrent neural networks with local attention.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Addressing bias in machine learning algorithms: A pilot study on emotion recognition for intelligent systems.
Proceedings of the 2017 IEEE Workshop on Advanced Robotics and its Social Impacts, 2017

2016
Image Bit-Depth Enhancement via Maximum A Posteriori Estimation of AC Signal.
IEEE Trans. Image Process., 2016

Training deep networks for facial expression recognition with crowd-sourced label distribution.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Emotion recognition in the wild from videos using images.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

2015
Precision Enhancement of 3-D Surfaces from Compressed Multiview Depth Maps.
IEEE Signal Process. Lett., 2015

A survey on face detection in the wild: Past, present and future.
Comput. Vis. Image Underst., 2015

Image based Static Facial Expression Recognition with Multiple Deep Network Learning.
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

2014
Rate-Constrained 3D Surface Estimation From Noise-Corrupted Multiview Depth Videos.
IEEE Trans. Image Process., 2014

A robust optical/inertial data fusion system for motion tracking of the robot manipulator.
J. Zhejiang Univ. Sci. C, 2014

Iterative transductive learning for automatic image segmentation and matting with RGB-D data.
J. Vis. Commun. Image Represent., 2014

Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps.
CoRR, 2014

Improving multiview face detection with multi-task deep convolutional neural networks.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014

Immersive 3D Communication.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Video face beautification.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Facial expression tracking from head-mounted, partially observing cameras.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Point cloud attribute compression with graph transform.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Image bit-depth enhancement via maximum-a-posteriori estimation of graph AC component.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

2013
Analyzing the Optimality of Predictive Transform Coding Using Graph-Based Models.
IEEE Signal Process. Lett., 2013

Viewport: A Distributed, Immersive Teleconferencing System with Infrared Dot Pattern.
IEEE Multim., 2013

3D Imaging Techniques and Multimedia Applications [Guest editor's introduction].
IEEE Multim., 2013

Precision enhancement of 3D surfaces from multiple quantized depth maps.
Proceedings of the 11th IVMSP Workshop: 3D Image/Video Technologies and Applications, 2013

Rate-distortion optimized 3D reconstruction from noise-corrupted multiview depth videos.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Robust part-based face matching with multiple templates.
Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2013

Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling.
Proceedings of the High-Performance Graphics 2013, 2013

Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Wide-Baseline Hair Capture Using Strand-Based Refinement.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
Geometrically Constrained Room Modeling With Compact Microphone Arrays.
IEEE Trans. Speech Audio Process., 2012

Automatic Real-Time Video Matting Using Time-of-Flight Camera and Multichannel Poisson Equations.
Int. J. Comput. Vis., 2012

Virtual View Reconstruction Using Temporal Information.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

See-through Image Enhancement through Sensor Fusion.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

3D scene reconstruction by multiple structured-light based commodity depth cameras.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
An Interactive 3-D Audio System With Loudspeakers.
IEEE Trans. Multim., 2011

Improving Immersive Experiences in Telecommunication with Motion Parallax [Applications Corner].
IEEE Signal Process. Mag., 2011

Low-complexity, near-lossless coding of depth maps from kinect-like depth cameras.
Proceedings of the IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), 2011

Calibration between depth and color sensors for commodity depth cameras.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

A novel see-through screen based on weave fabrics.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

CROWDMOS: An approach for crowdsourcing mean opinion score studies.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Boosting-Based Face Detection and Adaptation
Synthesis Lectures on Computer Vision, Morgan & Claypool Publishers, ISBN: 978-3-031-01809-1, 2010

Using Reverberation to Improve Range and Elevation Discrimination for Small Array Sound Source Localization.
IEEE Trans. Speech Audio Process., 2010

Joint tracking and multiview video compression.
Proceedings of the Visual Communications and Image Processing 2010, 2010

Enhancing loudspeaker-based 3D audio with room modeling.
Proceedings of the 2010 IEEE International Workshop on Multimedia Signal Processing, 2010

Personal 3D audio system with loudspeakers.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Turning enemies into friends: Using reflections to improve sound source localization.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

L1 regularized room modeling with compact microphone arrays.
Proceedings of the IEEE International Conference on Acoustics, 2010

3D Deformable Face Tracking with a Commodity Depth Camera.
Proceedings of the Computer Vision, 2010

2009
Improving depth perception with motion parallax and its application in teleconferencing.
Proceedings of the 2009 IEEE International Workshop on Multimedia Signal Processing, 2009

ACM 2009 workshop on ambient media computing (AMC'09) overview.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Multiview video compression and streaming based on predicted viewer position.
Proceedings of the IEEE International Conference on Acoustics, 2009

Boosted multi-task learning for face verification with applications to web image and video search.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Efficient Scale-Space Spatiotemporal Saliency Tracking for Distortion-Free Video Retargeting.
Proceedings of the Computer Vision, 2009

2008
An automated end-to-end lecture capture and broadcasting system.
ACM Trans. Multim. Comput. Commun. Appl., 2008

Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos.
IEEE Trans. Multim., 2008

Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings.
IEEE Trans. Multim., 2008

Active Multicamera Networks: From Rendering to Surveillance.
IEEE J. Sel. Top. Signal Process., 2008

Multimedia Immersive Technologies and Networking.
Adv. Multim., 2008

Semantic saliency driven camera control for personal remote collaboration.
Proceedings of the International Workshop on Multimedia Signal Processing, 2008

Requirements and recommendations for an enhanced meeting viewing experience.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Why does PHAT work well in lownoise, reverberative environments?
Proceedings of the IEEE International Conference on Acoustics, 2008

Taylor expansion based classifier adaptation: Application to person detection.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Active Rearranged Capturing of Image-Based Rendering Scenes-Theory and Practice.
IEEE Trans. Multim., 2007

Multiview Imaging and 3DTV.
IEEE Signal Process. Mag., 2007

Multiple-Instance Pruning For Learning Efficient Cascade Detectors.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Learning-Based Perceptual Image Quality Improvement for Video Conferencing.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Enhanced MVDR Beamforming for Arrays of Directional Microphones.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Maximum Likelihood Sound Source Localization for Multiple Directional Microphones.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Light Field Sampling
Synthesis Lectures on Image, Video, and Multimedia Processing, Morgan & Claypool Publishers, ISBN: 978-3-031-02241-8, 2006

Boosting-Based Multimodal Speaker Detection for Distributed Meetings.
Proceedings of the IEEE 8th Workshop on Multimedia Signal Processing, 2006

Robust Visual Tracking via Pixel Classification and Integration.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

A Three-Layer Virtual Director Model for Supporting Automated Multi-Site Distributed Education.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Light Weight Background Blurring for Video Conferencing Applications.
Proceedings of the International Conference on Image Processing, 2006

2005
On the compression and streaming of concentric mosaic data for free wandering in a realistic environment over the Internet.
IEEE Trans. Multim., 2005

Multiple Instance Boosting for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

An automated end-to-end lecture capturing and broadcasting system.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Hybrid speaker tracking in an automated lecture room.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Light field capturing with lensless cameras.
Proceedings of the 2005 International Conference on Image Processing, 2005

2004
A survey on image-based rendering - representation, sampling and compression.
Signal Process. Image Commun., 2004

A Self-Reconfigurable Camera Array.
Proceedings of the 15th Eurographics Workshop on Rendering Techniques, 2004

Non-Uniform Sampling for Image-Based Rendering: Convergence of Image, Vision, and Graphic.
Proceedings of the 10th International Multimedia Modeling Conference (MMM 2004), 2004

Distributed hosting of Web content with erasure coding and unequal weight assignment.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Semantic propagation from relevance feedbacks.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Security analysis for key generation systems using face images.
Proceedings of the 2004 International Conference on Image Processing, 2004

View-dependent non-uniform sampling for image-based rendering.
Proceedings of the 2004 International Conference on Image Processing, 2004

2003
Spectral analysis for sampling image-based rendering data.
IEEE Trans. Circuits Syst. Video Technol., 2003

Color image sharpening based on collective time-evolution of simultaneous nonlinear reaction-diffusion.
Proceedings of the Visual Communications and Image Processing 2003, 2003

Nonuniform sampling of image-based rendering data with the position-interval-error (PIE) function.
Proceedings of the Visual Communications and Image Processing 2003, 2003

A system for active image-based rendering.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Annotating retrieval database with active learning.
Proceedings of the 2003 International Conference on Image Processing, 2003

Surface plenoptic function: a tool for the sampling analysis of image-based rendering.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

On generalized sampling for image-based rendering data.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
An active learning framework for content-based information retrieval.
IEEE Trans. Multim., 2002

Smart rebinning for the compression of concentric mosaic.
IEEE Trans. Multim., 2002

Towards optimal least square filters using the eigenfilter approach.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
Interactive browsing of 3D environment over the Internet.
Proceedings of the Visual Communications and Image Processing 2001, 2001

Indexing and retrieval of 3D models aided by active learning.
Proceedings of the 9th ACM International Conference on Multimedia 2001, Ottawa, Ontario, Canada, September 30, 2001

Efficient feature extraction for 2D/3D objects in mesh representation.
Proceedings of the 2001 International Conference on Image Processing, 2001

2000
Compression and rendering of concentric mosaics with reference block codec (RBC).
Proceedings of the Visual Communications and Image Processing 2000, 2000

Smart rebinning for compression of concentric mosaics.
Proceedings of the 8th ACM International Conference on Multimedia 2000, Los Angeles, CA, USA, October 30, 2000

Compression of Lumigraph with Multiple Reference Frame (MRF) Prediction and Just-in-Time Rendering.
Proceedings of the Data Compression Conference, 2000


  Loading...