According to our database1, Marc González authored at least 45 papers between 1997 and 2016.
Legend:Book In proceedings Article PhD thesis Other
Coarse grain parallelization of deep neural networks.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
A Methodology to Build Models and Predict Performance-Power in CMPs.
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015
User experience on heterogenous Liquid Galaxy cluster display walls.
Proceedings of the Proceeding of IEEE International Symposium on a World of Wireless, 2014
Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100, 000× Reduction in Energy-to-Solution.
Proceedings of the International Conference for High Performance Computing, 2014
A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013
Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013
DMA++: On the Fly Data Realignment for On-Chip Memories.
IEEE Trans. Computers, 2012
Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Generation Comp. Syst., 2012
POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012
Hardware-software coherence protocol for the coexistence of caches and local memories.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories.
Proceedings of the Computing Frontiers Conference, CF'12, 2012
Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011
Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture.
IEEE Trans. Parallel Distrib. Syst., 2010
Extending OpenMP to Survive the Heterogeneous Multi-Core Era.
International Journal of Parallel Programming, 2010
Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL.
Proceedings of the Languages and Compilers for Parallel Computing, 2010
Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010
DMA++: on the fly data realignment for on-chip memories.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010
Analysis of Task Offloading for Accelerators.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010
Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010
Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories.
Proceedings of the Languages and Compilers for Parallel Computing, 2009
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009
Speeding Up Distributed MapReduce Applications Using Hardware Accelerators.
Proceedings of the ICPP 2009, 2009
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008
Prefetching irregular references for software cache on cell.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008
Hybrid access-specific software cache techniques for the cell BE architecture.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008
A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor.
Proceedings of the Languages and Compilers for Parallel Computing, 2007
Runtime Address Space Computation for SDSM Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2006
A Proposal for Error Handling in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006
Techniques supporting threadprivate in OpenMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Experiences Parallelizing a Web Server with OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005
Automatic thread distribution for nested parallelism in OpenMP.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004
Automatic multilevel parallelization using OpenMP.
Scientific Programming, 2003
Dual-Level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling.
Proceedings of the High Performance Computing, 4th International Symposium, 2002
Defining and Supporting Pipelined Executions in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001
Complex Pipelined Executions in OpenMP Parallel Applications.
Proceedings of the 2001 International Conference on Parallel Processing, 2001
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurrency - Practice and Experience, 2000
OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000
Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999
Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997