José M. García

Orcid: 0000-0002-6388-2835

Affiliations:
  • University of Murcia, Spain


According to our database1, José M. García authored at least 168 papers between 1991 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Code Detection for Hardware Acceleration Using Large Language Models.
IEEE Access, 2024

2023
Expanding the deep-learning model to diagnosis LVNC: Limitations and trade-offs.
CoRR, 2023

Matching Linear Algebra and Tensor Code to Specialized Hardware Accelerators.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

2022
HDNN: a cross-platform MLIR dialect for deep neural networks.
J. Supercomput., 2022

Performance portability in a real world application: PHAST applied to Caffe.
Int. J. High Perform. Comput. Appl., 2022

POAS: A high-performance scheduling framework for exploiting Accelerator Level Parallelism.
CoRR, 2022

Applying Intel's oneAPI to a machine learning case study.
Concurr. Comput. Pract. Exp., 2022

Left ventricular non-compaction cardiomyopathy automatic diagnosis using a deep learning approach.
Comput. Methods Programs Biomed., 2022

2021
Deploying deep learning approaches to left ventricular non-compaction measurement.
J. Supercomput., 2021

ACOTSP-MF: A memory-friendly and highly scalable ACOTSP approach.
Eng. Appl. Artif. Intell., 2021

2020
Re-engineering the ant colony optimization for CMP architectures.
J. Supercomput., 2020

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance.
Int. J. High Perform. Comput. Appl., 2020

High-throughput fuzzy clustering on heterogeneous architectures.
Future Gener. Comput. Syst., 2020

Deep learning approach to left ventricular non-compaction measurement.
CoRR, 2020

A novel auction system for selecting advertisements in Real-Time bidding.
CoRR, 2020

Using PHAST to port Caffe library: First experiences and lessons learned.
CoRR, 2020

Boosting the extraction of elementary flux modes in genome-scale metabolic networks using the linear programming approach.
Bioinform., 2020

2019
Efficient, semantics-rich transformation and integration of large datasets.
Expert Syst. Appl., 2019

First Experiences on Applying Deep Learning Techniques to Prostate Cancer Detection.
Proceedings of the Parallel Computing: Technology Trends, 2019

2018
Improving the EFMs quality by augmenting their representativeness in LP methods.
BMC Syst. Biol., 2018

Application of High Performance Computing Techniques to the Semantic Data Transformation.
Proceedings of the Trends and Advances in Information Systems and Technologies, 2018

2017
A methodology based on Deep Learning for advert value calculation in CPM, CPC and CPA networks.
Soft Comput., 2017

Multi-objective evolutionary feature selection for online sales forecasting.
Neurocomputing, 2017

Code modernization strategies to 3-D Stencil-based applications on Intel Xeon Phi: KNC and KNL.
Comput. Math. Appl., 2017

Optimizing Semantic Data Transformation Using High Performance Computing Techniques.
Proceedings of the 10th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences (SWAT4LS 2017), 2017

Vectorization Strategies for Ant Colony Optimization on Intel Architectures.
Proceedings of the Parallel Computing is Everywhere, 2017

Representativeness of a Set of Metabolic Pathways.
Proceedings of the Bioinformatics and Biomedical Engineering, 2017

2016
Dynamic load balancing on heterogeneous clusters for parallel ant colony optimization.
Clust. Comput., 2016

Calculating Elementary Flux Modes with Variable Neighbourhood Search.
Proceedings of the Bioinformatics and Biomedical Engineering, 2016

2015
Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses.
IEEE Trans. Computers, 2015

ICCI: In-Cache Coherence Information.
IEEE Trans. Computers, 2015

Soft-error mitigation by means of decoupled transactional memory threads.
Distributed Comput., 2015

Evaluation of the 3-D finite difference implementation of the acoustic diffusion equation model on massively parallel architectures.
Comput. Electr. Eng., 2015

TreeEFM: calculating elementary flux modes using linear optimization in a tree-based algorithm.
Bioinform., 2015

Evaluation of 3-D Stencil Codes on the Intel Xeon Phi Coprocessor.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

A New Approach to Obtain EFMs Using Graph Methods Based on the Shortest Path between End Nodes.
Proceedings of the Bioinformatics and Biomedical Engineering, 2015

2014
ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2014

Comparative evaluation of platforms for parallel Ant Colony Optimization.
J. Supercomput., 2014

Evaluating the SAT problem on P systems for different high-performance architectures.
J. Supercomput., 2014

Bringing Networks together to Improve Advertising Performance.
Res. Comput. Sci., 2014

Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures.
Int. J. High Perform. Comput. Appl., 2014

A performance/cost model for a CUDA drug discovery application on physical and public cloud infrastructures.
Concurr. Comput. Pract. Exp., 2014

Toward energy efficiency in heterogeneous processors: findings on virtual screening methods.
Concurr. Comput. Pract. Exp., 2014

Managing resources dynamically in hybrid photonic-electronic networks-on-chip.
Concurr. Comput. Pract. Exp., 2014

Exploiting silicon photonics for energy-efficient heterogeneous parallel architectures.
Concurr. Comput. Pract. Exp., 2014

2013
Eager Beats Lazy: Improving Store Management in Eager Hardware Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2013

Efficient Eager Management of Conflicts for Scalable Hardware Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2013

Enhancing GPU parallelism in nature-inspired algorithms.
J. Supercomput., 2013

Modeling the impact of permanent faults in caches.
ACM Trans. Archit. Code Optim., 2013

Adaptive Neuromorphic Architecture (ANA).
Neural Networks, 2013

Enhancing data parallelism for Ant Colony Optimization on GPUs.
J. Parallel Distributed Comput., 2013

Accelerated Conformational Entropy Calculations Using Graphic Processing Units.
J. Chem. Inf. Model., 2013

A GPU based Conformational Entropy Calculation Method.
Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, 2013

Impact of implicit solvation models on database enrichment in GPU based blind Virtual Screening.
Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, 2013

Improving drug discovery using a neural networks based parallel scoring function.
Proceedings of the 2013 International Joint Conference on Neural Networks, 2013

2012
A fault-tolerant architecture for parallel applications in tiled-CMPs.
J. Supercomput., 2012

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE.
J. Supercomput., 2012

Extending Magny-Cours Cache Coherence.
IEEE Trans. Computers, 2012

Hardware transactional memory with software-defined conflicts.
ACM Trans. Archit. Code Optim., 2012

DAPSCO: Distance-aware partially shared cache organization.
ACM Trans. Archit. Code Optim., 2012

The GPU on the simulation of cellular computing models.
Soft Comput., 2012

High-Throughput parallel blind Virtual Screening using BINDSURF.
BMC Bioinform., 2012

Accelerating Fibre Orientation Estimation from Diffusion Weighted Magnetic Resonance Imaging Using GPUs.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Parallelization of Virtual Screening in Drug Discovery on Massively Parallel Architectures.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

ASCIB: adaptive selection of cache indexing bits for removing conflict misses.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Energy Efficiency Analysis of GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

π-TM: Pessimistic invalidation for scalable lazy hardware transactional memory.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2011
Leakage-efficient design of value predictors through state and non-state preserving techniques.
J. Supercomput., 2011

Accelerating Grid Kernels for Virtual Screening on Graphics Processing Units.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Effective Parallelization of Non-bonded Interactions Kernel for Virtual Screening on GPUs.
Proceedings of the 5th International Conference on Practical Applications of Computational Biology & Bioinformatics, 2011

A Pipeline Pilot based SOAP implementation of FlexScreen for High-Throughput Virtual Screening.
Proceedings of the 3rd International Workshop on Science Gateways for Life Sciences, 2011

The Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Parallelization Strategies for Ant Colony Optimisation on GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

An analytical model for the calculation of the Expected Miss Ratio in faulty caches.
Proceedings of the 17th IEEE International On-Line Testing Symposium (IOLTS 2011), 2011

ZEBRA: a data-centric, hybrid-policy hardware transactional memory design.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory.
Proceedings of the International Conference on Parallel Processing, 2011

Energy-Efficient Cache Coherence Protocols in Chip-Multiprocessors for Server Consolidation.
Proceedings of the International Conference on Parallel Processing, 2011

Accelerating multiple target drug screening on GPUs.
Proceedings of the Computational Methods in Systems Biology, 9th International Conference, 2011

Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
A Direct Coherence Protocol for Many-Core Chip Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2010

Dealing with Transient Faults in the Interconnection Network of CMPs at the Cache Coherence Level.
IEEE Trans. Parallel Distributed Syst., 2010

A scalable organization for distributed directories.
J. Syst. Archit., 2010

Simulating a P system based efficient solution to SAT by using GPUs.
J. Log. Algebraic Methods Program., 2010

Simulation of P systems with active membranes on CUDA.
Briefings Bioinform., 2010

Analyzing Cache Coherence Protocols for Server Consolidation.
Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

CUDA 2D Stencil Computations for the Jacobi Method.
Proceedings of the Applied Parallel and Scientific Computing, 2010

A log-based redundant architecture for reliable parallel computation.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

EMC<sup>2</sup>: Extending Magny-Cours coherence for large-scale servers.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

2009
A lossy 3D wavelet transform for high-quality compression of medical video.
J. Syst. Softw., 2009

The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

Implementing P Systems Parallelism by Means of GPUs.
Proceedings of the Membrane Computing, 10th International Workshop, 2009

Extending SRT for parallel applications in tiled-CMP architectures.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Speculation-based conflict resolution in hardware transactional memory.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Efficient microarchitecture policies for accurately adapting to power constraints.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Distance-aware round-robin mapping for large NUCA caches.
Proceedings of the 16th International Conference on High Performance Computing, 2009

REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs.
Proceedings of the Advanced Parallel Processing Technologies, 8th International Symposium, 2009

2008
Extending the TokenCMP Cache Coherence Protocol for Low Overhead Fault Tolerance in CMP Architectures.
IEEE Trans. Parallel Distributed Syst., 2008

Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors.
J. Parallel Distributed Comput., 2008

Characterization of Conflicts in Log-Based Transactional Memory (LogTM).
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

DiCo-CMP: Efficient cache coherency in tiled CMP architectures.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Directory-Based Conflict Detection in Hardware Transactional Memory.
Proceedings of the High Performance Computing, 2008

Fault-Tolerant Cache Coherence Protocols for CMPs: Evaluation and Trade-Offs.
Proceedings of the High Performance Computing, 2008

A fault-tolerant directory-based cache coherence protocol for CMP architectures.
Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2008

Scalable Directory Organization for Tiled CMP Architectures.
Proceedings of the 2008 International Conference on Computer Design, 2008

2007
The Design of New Journaling File Systems: The DualFS Case.
IEEE Trans. Computers, 2007

An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology.
Parallel Comput., 2007

Using AOP to Automatically Provide Distribution, Fault Tolerance, and Load Balancing to the CORBA-LC Component Model.
Proceedings of the Parallel Computing: Architectures, 2007

Aspect-Oriented Programing Techniques to support Distribution, Fault Tolerance, and Load Balancing in the CORBA-LC Component Model.
Proceedings of the Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007), 12, 2007

Leakage Energy Reduction in Value Predictors through Static Decay.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Direct Coherence: Bringing Together Performance and Scalability in Shared-Memory Multiprocessors.
Proceedings of the High Performance Computing, 2007

Adaptive VP decay: making value predictors leakage-efficient designs for high performance processors.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
On the Evaluation of Dense Chip-Multiprocessor Architectures.
Proceedings of 2006 International Conference on Embedded Computer Systems: Architectures, 2006

Automatic Code Generation for Non-Funtional Aspects in the CORBA-LC Component Model.
Proceedings of the I. International Conference on Ubiquitous Computing: Applications, 2006

An efficient cache design for scalable glueless shared-memory multiprocessors.
Proceedings of the Third Conference on Computing Frontiers, 2006

2005
Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions.
J. VLSI Signal Process., 2005

A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2005

Evaluating IA-32 web servers through simics: a practical experience.
J. Syst. Archit., 2005

Assessing MPI Performance on QsNet<sup>II</sup>.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Optimizing a 3D-FWT Video Encoder for SMPs and HyperThreading Architectures.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture.
Proceedings of the High Performance Computing and Communications, 2005

A Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004
An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration.
IEEE Trans. Parallel Distributed Syst., 2004

Traditional File Systems versus DualFS: A Performance Comparison Approach.
IEICE Trans. Inf. Syst., 2004

On the Evaluation of x86 Web Servers Using Simics: Limitations and Trade-Offs.
Proceedings of the Computational Science, 2004

2003
Reducing 3D Wavelet Transform Execution Time through the Streaming SIMD Extensions.
Proceedings of the 11th Euromicro Workshop on Parallel, 2003

Grid-aware Component-based development in CORBA Lightweight Components.
Proceedings of the VIII Jornadas Ingeniería del Software y Bases de Datos (JISBD 2003), 2003

Real-Time Extraction of Colored Segments for Robot Visual Navigation.
Proceedings of the Computer Vision Systems, Third International Conference, 2003

Design and Implementation of a Grid-Enabled Component Container for CORBA Lightweight Components.
Proceedings of the Grid Computing, 2003

Congestion Control for High Performance Virtual Cut-through Networks.
Proceedings of the 21st IASTED International Multi-Conference on Applied Informatics (AI 2003), 2003

2002
MPI-Delphi: an MPI implementation for visual programming environments and heterogeneous computing.
Future Gener. Comput. Syst., 2002

Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Reducing the Latency of L2 Misses in Shared-Memory Multiprocessors through On-Chip Directory Integration.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Improving the Performance of Real-Time Communication Services on High-Speed LANs under Topology Changes.
Proceedings of the 27th Annual IEEE Conference on Local Computer Networks (LCN 2002), 2002

A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

DualFS: a new journaling file system without meta-data duplication.
Proceedings of the 16th international conference on Supercomputing, 2002

Memory Conscious 3D Wavelet Transform.
Proceedings of the 28th EUROMICRO Conference 2002, 4-6 September 2002, Dortmund, Germany, 2002

The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Evaluating the DIPORSI Framework: Distributed Processing of Remotely Sensed Imagery.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2001

CORBA Lightweight Compontents: An Early Report.
Proceedings of the VI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2001), 2001

A New Approach to Provide Real-Time Services on High-Speed Local Area Networks.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Design and Implementation Requirements for CORBA Lightweight Components.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

Selective Branch Prediction Reversal By Correlating with Data Values and Control Flow.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

A New Scalable Directory Architecture for Large-Scale Multiprocessors.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Performance Evaluation of Real-Time Communication Services on High-Speed LANs under Topology Changes.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

Confidence Estimation for Branch Prediction Reversal.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

CORBA Lightweight Components : A Model for Distributed Component-BasedHeterogeneous Computation.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

On Deadlock Frequency during Dynamic Reconfiguration in NOWs.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
Dynamic reconfiguration of node location in wormhole networks.
J. Syst. Archit., 2000

A Parallel Algorithm for Tracking of Segments in Noisy Edge Images.
Proceedings of the 15th International Conference on Pattern Recognition, 2000

1999
Cluster Computing Using MPI and Windows NT to Solve the Processing of Remotely Sensed Imagery.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999

The Parallel EM Algorithm and its Applications in Computer Vision.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

The MPI-Delphi Interface: A Visual Programming Environment for Clusters of Workstations.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

A Performance Evaluation of P-EDR in Different Parallel Environments.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

P-EDR: An Algorithm for Parallel Implementation of Parzen Density Estimation from Uncertain Observations.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

An Evaluation of Parallel Computing in PC Clusters with Fast Ethernet.
Proceedings of the Parallel Computation, 1999

1998
Using channel pipelining in reconfigurable interconnection networks.
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

Improving the Performance of Scientific Parallel Applications in a Cluster of Workstations.
Proceedings of the Applied Parallel Computing, 1998

Reconfigurable Wormhole Networks: A Realistic Approach.
Proceedings of the Parallel and Distributed Processing, 10 IPPS/SPDP'98 Workshops Held in Conjunction with the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing, Orlando, Florida, USA, March 30, 1998

1997
Analyzing the Performance of MPI in a Cluster of Workstations Based on Fast Ethernet.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1997

1996
PEPE: A Trace-Driven Simulator to Evaluate Reconfigurable Multicomputer Architectures.
Proceedings of the Applied Parallel Computing, 1996

A Novel Approach to Improve the Performance of Interconnection Networks with Hot - Spots.
Proceedings of the 22rd EUROMICRO Conference '96, 1996

1995
The Specification of a Generic Multicomputer Using Lotos.
ACM SIGPLAN Notices, 1995

Improving the Performance of Parallel Triangularization of a Sparse Matrix Using a Reconfigurable Multicomputer.
Proceedings of the Applied Parallel Computing, 1995

1993
Dynamic reconfiguration of multicomputer networks: limitations and tradeoffs.
Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993

1992
A new language for multicomputer programming.
ACM SIGPLAN Notices, 1992

1991
An algorithm for dynamic reconfiguration of a multicomputer network.
Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991


  Loading...