We stand with Ukraine

We stand with Ukraine

Arun Kejariwal

Orcid: 0009-0006-6172-2973

According to our database¹, Arun Kejariwal authored at least 67 papers between 2003 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

BLENDER: Blended Text Embeddings and Diffusion Residuals for Intra-Class Image Synthesis in Deep Metric Learning.

[DOI]

Jan Niklas Kolf

,

,

,

,

,

Bhargav Bhushanam

,

,

,

,

CoRR, January, 2026

2024

Layer Compression of Deep Networks with Straight Flows.

[DOI]

,

,

Bhargav Bhushanam

,

,

,

Dhruv Choudhary

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

HHVM Performance Optimization for Large Scale Web Services.

[DOI]

,

,

,

,

,

,

,

Maximilian Balandat

,

,

Benjamin C. Lee

Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models.

[DOI]

,

,

,

Bhargav Bhushanam

,

,

,

,

,

,

,

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Characterization of Data Compression in Datacenters.

[DOI]

,

,

,

Abhishek Dhanotia

,

,

,

,

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

2022

Future gradient descent for adapting the temporal shifting data distribution in online recommendation systems.

[DOI]

,

,

,

Dhruv Choudhary

,

,

Bhargav Bhushanam

,

,

,

Proceedings of the Uncertainty in Artificial Intelligence, 2022

DreamShard: Generalizable Embedding Table Placement for Recommender Systems.

[DOI]

,

,

,

,

,

Bhargav Bhushanam

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Harmless Transfer Learning for Item Embeddings.

[DOI]

,

,

Dhruv Choudhary

,

Bhargav Bhushanam

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

AutoShard: Automated Embedding Table Sharding for Recommender Systems.

[DOI]

,

,

Bhargav Bhushanam

,

Dhruv Choudhary

,

,

,

,

,

,

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Understanding Data Compression in Warehouse-Scale Datacenter Services.

[DOI]

,

,

,

Abhishek Dhanotia

,

,

,

,

Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs.

[DOI]

,

,

Ehsan K. Ardestani

,

,

,

,

,

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021

Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism.

[DOI]

,

Dhruv Choudhary

,

Ping Tak Peter Tang

,

,

,

,

,

Kannan Ramchandran

,

Michael W. Mahoney

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems.

[DOI]

,

Bhargav Bhushanam

,

,

Dhruv Choudhary

,

,

,

,

,

,

Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

2020

Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism.

[DOI]

,

Dhruv Choudhary

,

Ping Tak Peter Tang

,

,

,

,

,

Kannan Ramchandran

,

Michael W. Mahoney

CoRR, 2020

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data.

[DOI]

,

Dhruv Choudhary

,

,

,

,

,

,

,

CoRR, 2020

Le Taureau: Deconstructing the Serverless Landscape & A Look Forward.

[DOI]

Anurag Khandelwal

,

,

Karthikeyan Ramasamy

Proceedings of the 2020 International Conference on Management of Data, 2020

2017

On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data.

[DOI]

Dhruv Choudhary

,

,

Francois Orsini

CoRR, 2017

Automatic Anomaly Detection in the Cloud Via Statistical Learning.

[DOI]

Jordan Hochenbaum

,

,

CoRR, 2017

2016

On the Definition of Real-Time: Applications and Systems.

[DOI]

,

Francois Orsini

Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Leveraging cloud data to mitigate user experience from 'breaking bad'.

[DOI]

Nicholas A. James

,

,

David S. Matteson

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015

Real Time Analytics: Algorithms and Systems.

[DOI]

,

Sanjeev Kulkarni

,

Karthik Ramasamy

Proc. VLDB Endow., 2015

2014

A Novel Technique for Long-Term Anomaly Detection in the Cloud.

[DOI]

,

Jordan Hochenbaum

,

Proceedings of the 6th USENIX Workshop on Hot Topics in Cloud Computing, 2014

2013

Chiffchaff: Observability and analytics to achieve high availability.

[DOI]

,

,

Proceedings of the IEEE Symposium on Large-Scale Data Analysis and Visualization, 2013

Techniques for Optimizing Cloud Footprint.

[DOI]

Proceedings of the 2013 IEEE International Conference on Cloud Engineering, 2013

A Tool for Practical Garbage Collection Analysis in the Cloud.

[DOI]

Proceedings of the 2013 IEEE International Conference on Cloud Engineering, 2013

Visual Analytics Framework for Cloud Infrastructure Data.

[DOI]

,

,

,

Jordan Hochenbaum

,

Proceedings of the 16th IEEE International Conference on Computational Science and Engineering, 2013

On the Determination of Inlining Vectors for Program Optimization.

[DOI]

Rosario Cammarota

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

,

,

Mukund Madhugiri

Proceedings of the Compiler Construction - 22nd International Conference, 2013

2012

Trin-Trin: Who's Calling? A Pin-Based Dynamic Call Graph Extraction Framework.

[DOI]

,

Int. J. Parallel Program., 2012

Big Data Challenges: A Program Optimization Perspective.

[DOI]

Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

Selective search of inlining vectors for program optimization.

[DOI]

Rosario Cammarota

,

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011

Modulo Scheduling and Loop Pipelining.

[DOI]

,

Alexandru Nicolau

Proceedings of the Encyclopedia of Parallel Computing, 2011

Pruning hardware evaluation space via correlation-driven application similarity analysis.

[DOI]

Rosario Cammarota

,

,

Paolo D'Alberto

,

Sapan Panigrahi

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

Proceedings of the 8th Conference on Computing Frontiers, 2011

2010

On the efficacy of call graph-level thread-level speculation.

[DOI]

,

,

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

,

Constantine D. Polychronopoulos

Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

How Many Threads to Spawn during Program Multithreading?

[DOI]

Alexandru Nicolau

,

Proceedings of the Languages and Compilers for Parallel Computing, 2010

Exploitation of nested thread-level speculative parallelism on multi-core systems.

[DOI]

,

,

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

,

Constantine D. Polychronopoulos

Proceedings of the 7th Conference on Computing Frontiers, 2010

2009

On the exploitation of loop-level parallelism in embedded applications.

[DOI]

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

,

,

,

ACM Trans. Embed. Comput. Syst., 2009

Cache-aware partitioning of multi-dimensional iteration spaces.

[DOI]

,

Alexandru Nicolau

,

,

Alexander V. Veidenbaum

,

Constantine D. Polychronopoulos

Proceedings of of SYSTOR 2009: The Israeli Experimental Systems Conference 2009, 2009

Performance Characterization of Itanium® 2-Based Montecito Processor.

[DOI]

,

Gerolf Hoflehner

,

,

Daniel M. Lavery

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

Cameron McNairy

Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

Techniques for efficient placement of synchronization primitives.

[DOI]

Alexandru Nicolau

,

,

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs.

[DOI]

,

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Synchronization optimizations for efficient execution on multi-cores.

[DOI]

Alexandru Nicolau

,

,

Alexander V. Veidenbaum

,

Proceedings of the 23rd international conference on Supercomputing, 2009

Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems.

[DOI]

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

,

Constantine D. Polychronopoulos

Proceedings of the ICPP 2009, 2009

2008

Improving SDRAM access energy efficiency for low-power embedded systems.

[DOI]

Jelena Trajkovic

,

Alexander V. Veidenbaum

,

ACM Trans. Embed. Comput. Syst., 2008

Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core<sup>TM</sup> 2 Duo processor.

[DOI]

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

,

,

,

,

Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Cache-aware iteration space partitioning.

[DOI]

,

Alexandru Nicolau

,

,

Alexander V. Veidenbaum

,

Constantine D. Polychronopoulos

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Compiler-Driven Dependence Profiling to Guide Program Parallelization.

[DOI]

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 2008

2007

A predictive decode filter cache for reducing power consumption in embedded processors.

[DOI]

,

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

ACM Trans. Design Autom. Electr. Syst., 2007

Comparative characterization of SPEC CPU2000 and CPU2006 on Itanium architecture.

[DOI]

,

Gerolf Hoflehner

,

,

Daniel M. Lavery

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Tight analysis of the performance potential of thread speculation using spec CPU 2006.

[DOI]

,

,

,

,

Sergey Kozhukhov

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

Constantine D. Polychronopoulos

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

2006

Energy efficient watermarking on mobile devices using proxy-based partitioning.

[DOI]

,

,

Alexandru Nicolau

,

,

Rajesh K. Gupta

IEEE Trans. Very Large Scale Integr. Syst., 2006

A general approach for partitioning N-dimensional parallel nested loops with conditionals.

[DOI]

,

Alexandru Nicolau

,

,

,

,

,

Constantine D. Polychronopoulos

Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

Rapid Resource-Constrained Hardware Performance Estimation.

[DOI]

Basant Kumar Dwivedi

,

,

M. Balakrishnan

,

Proceedings of the 17th IEEE International Workshop on Rapid System Prototyping (RSP 2006), 2006

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings.

[DOI]

,

,

,

,

Sergey Kozhukhov

,

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

Constantine D. Polychronopoulos

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Lightweight lock-free synchronization methods for multithreading.

[DOI]

,

,

,

,

,

,

Alexandru Nicolau

,

Constantine D. Polychronopoulos

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

History-aware Self-Scheduling.

[DOI]

,

Alexandru Nicolau

,

Constantine D. Polychronopoulos

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Probablistic Self-Scheduling.

[DOI]

,

,

,

,

Alexandru Nicolau

,

Alexander V. Veidenbaum

,

Constantine D. Polychronopoulos

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Challenges in exploitation of loop parallelism in embedded applications.

[DOI]

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

,

,

,

Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005

A novel approach for partitioning iteration spaces with variable densities.

[DOI]

,

Alexandru Nicolau

,

,

Constantine D. Polychronopoulos

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

An Efficient Approach for Self-scheduling Parallel Loops on Multiprogrammed Parallel Computers.

[DOI]

,

Alexandru Nicolau

,

Constantine D. Polychronopoulos

Proceedings of the Languages and Compilers for Parallel Computing, 2005

An Efficient Load Balancing Scheme for Grid-based High Performance Scientific Computing.

[DOI]

,

Alexandru Nicolau

Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces.

[DOI]

,

Alexandru Nicolau

,

Constantine D. Polychronopoulos

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Energy Analysis of Multimedia Watermarking on Mobile Handheld Devices.

[DOI]

,

,

Alexandru Nicolau

,

,

Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005

High performance annotation-aware JVM for Java cards.

[DOI]

,

,

Alexander V. Veidenbaum

,

Alexandru Nicolau

Proceedings of the EMSOFT 2005, 2005

2004

Synthesis-driven Exploration of Pipelined Embedded Processors.

[DOI]

,

,

Proceedings of the 17th International Conference on VLSI Design (VLSI Design 2004), 2004

A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces.

[DOI]

,

Paolo D'Alberto

,

Alexandru Nicolau

,

Constantine D. Polychronopoulos

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Proxy-based task partitioning of watermarking algorithms for reducing energy consumption in mobile devices.

[DOI]

,

,

Alexandru Nicolau

,

,

Proceedings of the 41th Design Automation Conference, 2004

2003

Rapid Exploration of Pipelined Processors through Automatic Generation of Synthesizable RTL Models.

[DOI]

,

,

Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP 2003), 2003

Loading...