Kurt Keutzer

Orcid: 0000-0003-3868-8501

Affiliations:
  • University of California, Berkeley, USA


According to our database1, Kurt Keutzer authored at least 355 papers between 1987 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 1996, "For contributions to logic synthesis and computer-aided design; specifically for the development of algorithms for the optimization of area, delay, testability, and power of digital circuits.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement.
CoRR, 2024

AI and Memory Wall.
CoRR, 2024

RouterBench: A Benchmark for Multi-LLM Routing System.
CoRR, 2024

Q-SLAM: Quadric Representations for Monocular SLAM.
CoRR, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights.
CoRR, 2024

Magic-Me: Identity-Specific Video Customized Diffusion.
CoRR, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization.
CoRR, 2024

Learned Best-Effort LLM Serving.
CoRR, 2024

VeCAF: VLM-empowered Collaborative Active Finetuning with Training Objective Awareness.
CoRR, 2024

Multitask Vision-Language Prompt Tuning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Quadric Representations for LiDAR Odometry, Mapping and Localization.
IEEE Robotics Autom. Lett., 2023

Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation.
CoRR, 2023

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation.
CoRR, 2023

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting.
CoRR, 2023

An LLM Compiler for Parallel Function Calling.
CoRR, 2023

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration.
CoRR, 2023

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome.
CoRR, 2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters.
CoRR, 2023

CVPR 2023 Text Guided Video Editing Competition.
CoRR, 2023

SPEED: Speculative Pipelined Execution for Efficient Decoding.
CoRR, 2023

Towards Unified and Effective Domain Generalization.
CoRR, 2023

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources.
CoRR, 2023

HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption.
CoRR, 2023

Aligning Large Multimodal Models with Factually Augmented RLHF.
CoRR, 2023

SqueezeLLM: Dense-and-Sparse Quantization.
CoRR, 2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts.
CoRR, 2023

Full Stack Optimization of Transformer Inference: a Survey.
CoRR, 2023

Big Little Transformer Decoder.
CoRR, 2023

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Speculative Decoding with Big Little Decoder.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models are Visual Reasoning Coordinators.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Q-Diffusion: Quantizing Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Simple and Effective Input Reformulations for Translation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Scaling Vision-Language Models with Sparse Mixture of Experts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
A Review of Single-Source Deep Unsupervised Visual Domain Adaptation.
IEEE Trans. Neural Networks Learn. Syst., 2022

Emotional Semantics-Preserved and Feature-Aligned CycleGAN for Visual Emotion Adaptation.
IEEE Trans. Cybern., 2022

Affective Image Content Analysis: Two Decades Review and New Perspectives.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Analysis of Quantization on MLP-based Vision Models.
CoRR, 2022

Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers.
CoRR, 2022

Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning.
CoRR, 2022

The ArtBench Dataset: Benchmarking Generative Models with Artworks.
CoRR, 2022

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data.
CoRR, 2022

UnrealNAS: Can We Search Neural Architectures with Unreal Data?
CoRR, 2022

Cross-Domain Object Detection with Mean-Teacher Transformer.
CoRR, 2022

Hessian-Aware Pruning and Optimal Neural Implant.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Self-Supervised Pretraining Improves Self-Supervised Pretraining.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Fast Post-Training Pruning Framework for Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learned Token Pruning for Transformers.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Prototype-Voxel Contrastive Learning for LiDAR Point Cloud Panoptic Segmentation.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Staged Training for Transformer Language Models.
Proceedings of the International Conference on Machine Learning, 2022

How Much Can CLIP Benefit Vision-and-Language Tasks?
Proceedings of the Tenth International Conference on Learning Representations, 2022

Integer-Only Zero-Shot Quantization for Efficient Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

MTTrans: Cross-domain Object Detection with Mean Teacher Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models.
Proceedings of the Computer Vision - ECCV 2022, 2022

PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map.
Proceedings of the Computer Vision - ECCV 2022, 2022

Invariant Information Bottleneck for Domain Generalization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2021

Emotion Recognition From Multiple Modalities: Fundamentals and methodologies.
IEEE Signal Process. Mag., 2021

MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation.
Int. J. Comput. Vis., 2021

Differentiable NAS Framework and Application to Ads CTR Prediction.
CoRR, 2021

Multi-source Few-shot Domain Adaptation.
CoRR, 2021

Learned Token Pruning for Transformers.
CoRR, 2021

Invariant Information Bottleneck for Domain Generalization.
CoRR, 2021

Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets.
CoRR, 2021

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models.
CoRR, 2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition.
CoRR, 2021

A Survey of Quantization Methods for Efficient Neural Network Inference.
CoRR, 2021

Improving Context-Based Meta-Reinforcement Learning with Self-Supervised Trajectory Contrastive Learning.
CoRR, 2021

Hessian-Aware Pruning and Optimal Neural Implant.
CoRR, 2021

Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources.
Proceedings of the WWW '21: The Web Conference 2021, 2021

NovelD: A Simple yet Effective Exploration Criterion.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Annotation-Efficient Untrimmed Video Action Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Scene-aware Learning Network for Radar Object Detection.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

HAWQ-V3: Dyadic Neural Network Quantization.
Proceedings of the 38th International Conference on Machine Learning, 2021

I-BERT: Integer-only BERT Quantization.
Proceedings of the 38th International Conference on Machine Learning, 2021

Region Similarity Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Visual Transformers: Where Do Transformers Really Belong in Vision Models?
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization.
Proceedings of the IEEE International Conference on Acoustics, 2021

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

What's Hidden in a One-layer Randomly Weighted Transformer?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Reservoir Transformers.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Fast LSTM by dynamic decomposition on cloud and distributed systems.
Knowl. Inf. Syst., 2020

Reservoir Transformer.
CoRR, 2020

BeBold: Exploration Beyond the Boundary of Explored Regions.
CoRR, 2020

Cross-Domain Sentiment Classification with In-Domain Contrastive Learning.
CoRR, 2020

FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge.
CoRR, 2020

HAWQV3: Dyadic Neural Network Quantization.
CoRR, 2020

Multi-Agent Collaboration via Reward Attribution Decomposition.
CoRR, 2020

Evaluating Self-Supervised Pretraining Without Using Labels.
CoRR, 2020

Rethinking Distributional Matching Based Domain Adaptation.
CoRR, 2020

CoDeNet: Algorithm-hardware Co-design for Deformable Convolution.
CoRR, 2020

Visual Transformers: Token-based Image Representation and Processing for Computer Vision.
CoRR, 2020

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.
CoRR, 2020

Rethinking Batch Normalization in Transformers.
CoRR, 2020

MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation.
CoRR, 2020

Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey.
CoRR, 2020

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.
CoRR, 2020

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis.
CoRR, 2020

Boundary thickness and robustness in learning models.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization.
Proceedings of Machine Learning and Systems 2020, 2020

PowerNorm: Rethinking Batch Normalization in Transformers.
Proceedings of the 37th International Conference on Machine Learning, 2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.
Proceedings of the 37th International Conference on Machine Learning, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
Proceedings of the 8th International Conference on Learning Representations, 2020

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020

SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

ZeroQ: A Novel Zero Shot Quantization Framework.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PyHessian: Neural Networks Through the Lens of the Hessian.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

Multi-Source Distilling Domain Adaptation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Inefficiency of K-FAC for Large Batch Size Training.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs.
IEEE Trans. Parallel Distributed Syst., 2019

Personalized Emotion Recognition by Personality-Aware High-Order Learning of Physiological Signals.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Co-design of deep neural nets and neural net accelerators for embedded vision applications.
IBM J. Res. Dev., 2019

Domain-Aware Dynamic Networks.
CoRR, 2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.
CoRR, 2019

ANODEV2: A Coupled Neural ODE Evolution Framework.
CoRR, 2019

Large-batch training for LSTM and beyond.
Proceedings of the International Conference for High Performance Computing, 2019

Multi-source Domain Adaptation for Semantic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

ANODEV2: A Coupled Neural ODE Framework.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Algorithm-hardware Co-design for Deformable Convolution.
Proceedings of the Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing, 2019

PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking.
Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference, 2019

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud.
Proceedings of the International Conference on Robotics and Automation, 2019

Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Trust Region Based Adversarial Attack on Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

CycleEmotionGAN: Emotional Semantic Consistency Preserved CycleGAN for Adapting Image Emotions.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Roundtable: Machine Learning for Embedded Systems: Hype or Lasting Impact?
IEEE Des. Test, 2018

Parameter Re-Initialization through Cyclical Batch Size Schedules.
CoRR, 2018

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search.
CoRR, 2018

Large batch size training of neural networks with adversarial training and second-order information.
CoRR, 2018

Unsupervised Domain Adaptation: from Simulation Engine to the RealWorld.
CoRR, 2018

Integrated Model, Batch, and Domain Parallelism in Training Neural Networks.
Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, 2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

EmotionGAN: Unsupervised Domain Adaptation for Learning Discrete Probability Distributions of Image Emotions.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

A Novel Domain Adaptation Framework for Medical Image Segmentation.
Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2018

Affective Image Content Analysis: A Comprehensive Survey.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Counterexample-Guided Data Augmentation.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

ImageNet Training in Minutes.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Regret Minimization for Partially Observable Deep Reinforcement Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

Spatially Parallel Convolutions.
Proceedings of the 6th International Conference on Learning Representations, 2018

Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

SqueezeNext: Hardware-Aware Neural Network Design.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

2017
Integrated Model and Data Parallelism in Training Neural Networks.
CoRR, 2017

Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures.
CoRR, 2017

Shallow Networks for High-accuracy Road Object-detection.
Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems, 2017

SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

Small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017

Boda: A Holistic Approach for Implementing Neural Network Computations.
Proceedings of the Computing Frontiers Conference, 2017

2016
Technology Mapping.
Encyclopedia of Algorithms, 2016

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications.
CoRR, 2016

How to scale distributed deep learning?
CoRR, 2016

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size.
CoRR, 2016

If I could only design one circuit ...: technical perspective.
Commun. ACM, 2016

Boda-RTC: Productive generation of portable, efficient code for convolutional neural networks on mobile computing platforms.
Proceedings of the 12th IEEE International Conference on Wireless and Mobile Computing, 2016

FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Convolutional Monte Carlo Rollouts in Go.
CoRR, 2015

DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer.
CoRR, 2015

Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

libHOG: Energy-Efficient Histogram of Oriented Gradient Computation.
Proceedings of the IEEE 18th International Conference on Intelligent Transportation Systems, 2015

2014
Scalable multimedia content analysis on parallel platforms using python.
ACM Trans. Multim. Comput. Commun. Appl., 2014

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids.
CoRR, 2014

2013
Hardware/software codesign for mobile speech recognition.
Proceedings of the INTERSPEECH 2013, 2013

Communication-minimizing 2D convolution in GPU registers.
Proceedings of the IEEE International Conference on Image Processing, 2013

Three Fingered Jack: Tackling Portability, Performance, and Productivity with Auto-Parallelized Python.
Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism, 2013

Measuring the gap between programmable and fixed-function accelerators: A case study on speech recognition.
Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

2012
Fast ℓ<sub>1</sub>-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime.
IEEE Trans. Medical Imaging, 2012

Accelerating Value-at-Risk estimation on highly parallel architectures.
Concurr. Comput. Pract. Exp., 2012

A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs.
Proceedings of the International Conference on Supercomputing, 2012

Automatic generation of application-specific accelerators for FPGAs from python loop nests.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

2011
A Special Section on Multicore Parallel CAD: Algorithm Design and Programming.
ACM Trans. Design Autom. Electr. Syst., 2011

Guest Editors' Introduction: Parallelism on the Desktop.
IEEE Softw., 2011

A parallel region based object recognition system.
Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV 2011), 2011

Copperhead: compiling an embedded data parallel language.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Efficient Parallel CKY Parsing on GPUs.
Proceedings of the 12th International Conference on Parsing Technologies, 2011

Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients.
Proceedings of the 8th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011

Communication-Avoiding QR Decomposition for GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Long term video segmentation through pixel level spectral clustering on GPUs.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

Considerations When Evaluating Microprocessor Platforms.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

Fast speaker diarization using a high-level scripting language.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

PALLAS: Mapping Applications onto Manycore.
Proceedings of the Multiprocessor System-on-Chip - Hardware Design and Tool Integration., 2011

2010
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford.
IEEE Micro, 2010

Parallel computing with patterns and frameworks.
XRDS, 2010

Efficient manycore CHMM speech recognition for audiovisual and multistream data.
Proceedings of the INTERSPEECH 2010, 2010

Exploring recognition network representations for efficient speech inference on highly parallel platforms.
Proceedings of the INTERSPEECH 2010, 2010

Parallel BFS graph traversal on images using structured grid.
Proceedings of the International Conference on Image Processing, 2010

Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow.
Proceedings of the Computer Vision, 2010

2009
ACM Transactions on Design Automation of Electronic Systems (TODAES) special section call for papers: Parallel CAD: Algorithm design and programming.
ACM Trans. Design Autom. Electr. Syst., 2009

Parallel scalability in speech recognition.
IEEE Signal Process. Mag., 2009

A view of the parallel computing landscape.
Commun. ACM, 2009

Acceleration of market value-at-risk estimation.
Proceedings of the 2nd Workshop on High Performance Computational Finance, 2009

A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit.
Proceedings of the INTERSPEECH 2009, 2009

Image feature extraction for mobile processors.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

Scalable HMM based inference engine in large vocabulary continuous speech recognition.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Efficient, high-quality image contour detection.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

Optimizing the use of GPU memory in applications with large data sets.
Proceedings of the 16th International Conference on High Performance Computing, 2009

2008
Technology Mapping.
Proceedings of the Encyclopedia of Algorithms - 2008 Edition, 2008

The Concurrency Challenge.
IEEE Des. Test Comput., 2008

Fast support vector machine training and classification on graphics processors.
Proceedings of the Machine Learning, 2008

Architecting parallel programs.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Scheduling task dependence graphs with variable task execution times onto heterogeneous multiprocessors.
Proceedings of the 8th ACM & IEEE International conference on Embedded software, 2008

Reinventing EDA with manycore processors.
Proceedings of the 45th Design Automation Conference, 2008

Parallelizing CAD: a timely research agenda for EDA.
Proceedings of the 45th Design Automation Conference, 2008

2007
Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Megatrends and EDA 2017.
Proceedings of the 44th Design Automation Conference, 2007

Closing the Power Gap between ASIC and Custom - Tools and Techniques for Low Power Design.
Springer, ISBN: 978-0-387-25763-1, 2007

2005
Linear programming for sizing, V<sub>th</sub> and V<sub>dd</sub> assignment.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

An FPGA-based Soft Multiprocessor System for IPv4 Packet Forwarding.
Proceedings of the 2005 International Conference on Field Programmable Logic and Applications (FPL), 2005

Soft multiprocessor systems for network applications (abstract only).
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

Closing the power gap between ASIC and custom: an ASIC perspective.
Proceedings of the 42nd Design Automation Conference, 2005

Using minimal minterms to represent programmability.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

An automated exploration framework for FPGA-based soft multiprocessor systems.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

2004
NP-Click: A Productive Software Development Approach for Network Processors.
IEEE Micro, 2004

Developing a Flexible Interface for RapidIO, Hypertransport, and PCI-Express.
Proceedings of the 2004 International Conference on Parallel Computing in Electrical Engineering (PARELEC 2004), 2004

Is statistical timing statistically significant?
Proceedings of the 41th Design Automation Conference, 2004

EDA: this is serious business.
Proceedings of the 41th Design Automation Conference, 2004

Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Static Crosstalk-Noise Analysis - For Deep Sub-Micron Digital Designs.
Springer, ISBN: 978-1-4020-8091-3, 2004

Closing the Gap Between ASIC and Custom - Tools and Techniques for High-Performance ASIC Design.
Springer, ISBN: 978-1-4020-7113-3, 2004

2003
Minimization of dynamic and static power through joint assignment of threshold voltages and sizing optimization.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Low Power Multiplication Algorithm for Switching Activity Reduction through Operand Decomposition.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Comparing Analytical Modeling with Simulation for Network Processors: A Case Study.
Proceedings of the 2003 Design, 2003

Programming challenges in network processor deployment.
Proceedings of the International Conference on Compilers, 2003

Mapping Concurrent Applications onto Architectural Platforms.
Proceedings of the Networks on Chip, 2003

2002
Impact of spatial intrachip gate length variability on theperformance of high-speed digital circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2002

Developing Architectural Platforms: A Disciplined Approach.
IEEE Des. Test Comput., 2002

Minimum-power retiming for dual-supply CMOS circuits.
Proceedings of the 8th ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, 2002

From blind certainty to informed uncertainty.
Proceedings of the 8th ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, 2002

On convergence of switching windows computation in presence of crosstalk noise.
Proceedings of 2002 International Symposium on Physical Design, 2002

From ASIC to ASIP: The Next Design Discontinuity.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Refining switching window by time slots for crosstalk noise calculation.
Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, 2002

Design Tools for Application Specific Embedded Processors.
Proceedings of the Embedded Software, Second International Conference, 2002

Unified tools for SoC embedded systems: mission critical, mission impossible or mission irrelevant?
Proceedings of the 39th Design Automation Conference, 2002

A general probabilistic framework for worst case timing analysis.
Proceedings of the 39th Design Automation Conference, 2002

2001
OCCOM-efficient computation of observability-based code coveragemetrics for functional verification.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Functional vector generation for HDL models using linearprogramming and Boolean satisfiability.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Impact of small process geometries on microarchitectures in systems on a chip.
Proc. IEEE, 2001

Limitations and challenges of computer-aided design technology for CMOS VLSI.
Proc. IEEE, 2001

Why is Combinational ATPG Efficiently Solvable for Practical VLSI Circuits?
J. Electron. Test., 2001

Coverage Metrics for Functional Validation of Hardware Designs.
IEEE Des. Test Comput., 2001

Scripting for EDA Tools: A Case Study.
Proceedings of the 2nd International Symposium on Quality of Electronic Design (ISQED 2001), 2001

A Functional Validation Technique: Biased-Random Simulation Guided by Observability-Based Coverage.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Bus Encoding to Prevent Crosstalk Delay.
Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design, 2001

Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design.
Proceedings of the 38th Design Automation Conference, 2001

A Quick Safari Through the Reconfiguration Jungle.
Proceedings of the 38th Design Automation Conference, 2001

Achieving 550Mhz in an ASIC Methodology.
Proceedings of the 38th Design Automation Conference, 2001

2000
A global wiring paradigm for deep submicron design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2000

System-level design: orthogonalization of concerns andplatform-based design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2000

Impact of Systematic Spatial Intra-Chip Gate Length Variability on Performance of High-Speed Digital Circuits.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Switching Window Computation for Static Timing Analysis in Presence of Crosstalk Noise.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Miller Factor for Gate-Level Coupling Delay Calculation.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Closing the gap between ASIC and custom: an ASIC perspective.
Proceedings of the 37th Conference on Design Automation, 2000

1999
A text-compression-based method for code size minimization in embedded systems.
ACM Trans. Design Autom. Electr. Syst., 1999

Rethinking Deep-Submicron Circuit Design.
Computer, 1999

Getting to the bottom of deep submicron II: a global wiring paradigm.
Proceedings of the 1999 International Symposium on Physical Design, 1999

The MARCO/DARPA Gigascale Silicon Research Center.
Proceedings of the IEEE International Conference On Computer Design, 1999

Towards true crosstalk noise analysis.
Proceedings of the 1999 IEEE/ACM International Conference on Computer-Aided Design, 1999

Why is ATPG Easy?
Proceedings of the 36th Conference on Design Automation, 1999

Panel: Cell Libraries - Build vs. Buy; Static vs. Dynamic.
Proceedings of the 36th Conference on Design Automation, 1999

HW and SW in Embedded System Design: Loveboat, Shipwreck, or Ships Passing in the Night.
Proceedings of the 36th Conference on Design Automation, 1999

1998
A new viewpoint on code generation for directed acyclic graphs.
ACM Trans. Design Autom. Electr. Syst., 1998

Code density optimization for embedded DSP processors using data compression techniques.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1998

Code Optimization Techniques in Embedded DSP Microprocessors.
Des. Autom. Embed. Syst., 1998

An algorithmic approach to optimizing fault coverage for BIST logic synthesis.
Proceedings of the Proceedings IEEE International Test Conference 1998, 1998

Getting to the bottom of deep submicron.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

OCCOM: Efficient Computation of Observability-Based Code Coverage Metrics for Functional Verification.
Proceedings of the 35th Conference on Design Automation, 1998

Functional Vector Generation for HDL Models Using Linear Programming and 3-Satisfiability.
Proceedings of the 35th Conference on Design Automation, 1998

1997
Estimation of average switching activity in combinational logic circuits using symbolic simulation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1997

The future of logic synthesis and physical design in deep-submicron process geometries.
Proceedings of the 1997 International Symposium on Physical Design, 1997

Challenges in CAD for the One Million Gate FPGA.
Proceedings of the 1997 ACM/SIGDA Fifth International Symposium on Field Programmable Gate Arrays, 1997

1996
Storage Assignment to Decrease Code Size.
ACM Trans. Program. Lang. Syst., 1996

Addendum to "Synthesis of robust delay-fault testable circuits: Theory".
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1996

Register Transfer Level Synthesis: From Theory to Practice.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Mobile Communications: Demands on VLSI Technology, Design and CAD.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

What is the state of the art in commercial EDA tools for low power?
Proceedings of the 1996 International Symposium on Low Power Electronics and Design, 1996

An observability-based code coverage metric for functional simulation.
Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, 1996

The Need for Formal Methods for Integrated Circuit Design.
Proceedings of the Formal Methods in Computer-Aided Design, First International Conference, 1996

1995
Synthesis of hazard-free asynchronous circuits with bounded wire delays.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1995

Synthesis for testability techniques for asynchronous circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1995

Instruction selection using binate covering for code size optimization.
Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design, 1995

A Design and Validation System for Asynchronous Circuits.
Proceedings of the 32st Conference on Design Automation, 1995

1994
Verification of asynchronous interface circuits with bounded wire delays.
J. VLSI Signal Process., 1994

Certified timing verification and the transition delay of a logic circuit.
IEEE Trans. Very Large Scale Integr. Syst., 1994

Event suppression: improving the efficiency of timing simulation for synchronous digital circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1994

Challenges in code generation for embedded processors.
Proceedings of the Code Generation for Embedded Processors [Dagstuhl Workshop, Dagstuhl, Germany, August 31, 1994

Hardware-Software Co-Design and ESDA.
Proceedings of the 31st Conference on Design Automation, 1994

1993
Analysis and Design of Regular Structures for Robust Dynamic Fault Testability.
VLSI Design, 1993

Statistical timing analysis of combinational logic circuits.
IEEE Trans. Very Large Scale Integr. Syst., 1993

Computation of floating mode delay in combinational circuits: practice and implementation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

Computation of floating mode delay in combinational circuits: theory and algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

Delay-fault test generation and synthesis for testability under a standard scan design methodology.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

Path-delay-fault testability properties of multiplexor-based networks.
Integr., 1993

Gate-Delay-Fault Testability Properties of Multiplexor-Based Networks.
Formal Methods Syst. Des., 1993

A synthesis-based test generation and compaction algorithm for multifaults.
J. Electron. Test., 1993

What is the Next Big Productivity Boost for Designers? (Panel Abstract).
Proceedings of the 30th Design Automation Conference. Dallas, 1993

1992
On properties of algebraic transformations and the synthesis of multifault-irredundant circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Estimation of power dissipation in CMOS combinational circuits using Boolean function manipulation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Validatable nonrobust delay-fault testable circuits via logic synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Synthesis of robust delay-fault-testable circuits: practice.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Synthesis of robust delay-fault-testable circuits: theory.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Necessary and sufficient conditions for hazard-free robust transistor stuck-open-fault testability in multilevel networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Statistical Timing Analysis of Combinational Circuits.
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992

On average power dissipation and random pattern testability of CMOS combinational logic networks.
Proceedings of the 1992 IEEE/ACM International Conference on Computer-Aided Design, 1992

Estimation of Average Switching Activity in Combinational and Sequential Circuits.
Proceedings of the 29th Design Automation Conference, 1992

1991
Is redundancy necessary to reduce delay?
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1991

A unified approach to the synthesis of fully testable sequential machines.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1991

An automata-theoretic approach to behavioral equivalence.
Integr., 1991

Recent progress in synthesis for testability.
Proceedings of the 9th IEEE VLSI Test Symposium (VTS'91), 1991

The Need for Formal Verification in Hardware Design and What Formal Verification Has Not Done for Me Lately.
Proceedings of the 1991 International Workshop on the HOL Theorem Proving System and its Applications, 1991

A Partial Enhanced-Scan Approach to Robust Delay-Fault Test Generation for Sequential Circuits.
Proceedings of the Proceedings IEEE International Test Conference 1991, 1991

Design Verfication and Reachability Analysis Using Algebraic Manipulation.
Proceedings of the Proceedings 1991 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1991

Delay Computation in Combinational Logic Circuits: Theory and Algorithms.
Proceedings of the 1991 IEEE/ACM International Conference on Computer-Aided Design, 1991

Algorithms for Synthesis of Hazard-Free Asynchronous Circuits.
Proceedings of the 28th Design Automation Conference, 1991

Robust Delay-Fault Test Generation and Synthesis for Testability Under A Standard Scan Design Methodology.
Proceedings of the 28th Design Automation Conference, 1991

1990
Design of integrated circuits fully testable for delay-faults and multifaults.
Proceedings of the Proceedings IEEE International Test Conference 1990, 1990

Testability-Preserving Circuit Transformations.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 1990

Impact and Evaluation of Competing Implementation Media for ASIC's (Panel Abstract).
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

Synthesis and Optimization Procedures for Robustly Delay-Fault Testable Combinational Logic Circuits.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1989
Addendum to 'A kernel-finding state assignment algorithm for multi-level logic'.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1989

On properties of algebraic transformation and the multifault testability of multilevel logic.
Proceedings of the 1989 IEEE International Conference on Computer-Aided Design, 1989

Boolean minimization and algebraic factorization procedures for fully testable sequential machines.
Proceedings of the 1989 IEEE International Conference on Computer-Aided Design, 1989

Three Competing Design Methodologies for ASIC's: Architectual Synthesis, Logic Synthesis, Logic Synthesis and Module Generation.
Proceedings of the 26th ACM/IEEE Design Automation Conference, 1989

1988
Anatomy of a Hardware Compiler.
Proceedings of the ACM SIGPLAN'88 Conference on Programming Language Design and Implementation (PLDI), 1988

A Kernel-Finding State Assignment Algorithm for Multi-Level Logic.
Proceedings of the 25th ACM/IEEE Conference on Design Automation, 1988

1987
DAGON: Technology Binding and Local Optimization by DAG Matching.
Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987


  Loading...