Josep Llosa

Orcid: 0000-0001-7740-3148

According to our database1, Josep Llosa authored at least 51 papers between 1994 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




ECCPA: Calculation of classical and quantum cross sections for elastic collisions of charged particles with atoms.
Comput. Phys. Commun., 2022

CSMT: Simultaneous Multithreading for Clustered VLIW Processors.
IEEE Trans. Computers, 2010

A low cost split-issue technique to improve performance of SMT clustered VLIW processors.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Thread Merging Schemes for Multithreaded Clustered VLIW Processors.
Proceedings of the ICPP 2009, 2009

Hybrid multithreading for VLIW processors.
Proceedings of the 2009 International Conference on Compilers, 2009

Power-efficient VLIW design using clustering and widening.
Int. J. Embed. Syst., 2008

Cluster-level simultaneous multithreading for VLIW processors.
Proceedings of the 25th International Conference on Computer Design, 2007

Silicon Compaction/Defragmentation for Partial Runtime Reconfiguration.
Proceedings of the Tenth Euromicro Conference on Digital System Design: Architectures, 2007

Merge Logic for Clustered Multithreaded VLIW Processors.
Proceedings of the Tenth Euromicro Conference on Digital System Design: Architectures, 2007

An accurate cost model for guiding data locality transformations.
ACM Trans. Program. Lang. Syst., 2005

Register Constrained Modulo Scheduling.
IEEE Trans. Parallel Distributed Syst., 2004

A fast and accurate framework to analyze and optimize cache memory behavior.
ACM Trans. Program. Lang. Syst., 2004

A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors.
SIGARCH Comput. Archit. News, 2004

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures.
Int. J. Parallel Program., 2004

High-performance and low-power VLIW cores for numerical computations.
Int. J. High Perform. Comput. Netw., 2004

Future ILP processors.
Int. J. High Perform. Comput. Netw., 2004

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.
Proceedings of the Computer Systems: Architectures, 2004

Out-of-Order Commit Processors.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

A Case for Resource-conscious Out-of-order Processors.
IEEE Comput. Archit. Lett., 2003

Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Kilo-instruction Processors.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Hierarchical Clustered Register File Organization for VLIW Processors.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Optimizing Program Locality Through CMEs and GAs.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

Reduced code size modulo scheduling in the absence of hardware support.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Near-Optimal Padding for Removing Conflict Misses.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

A comparative study of modulo scheduling techniques.
Proceedings of the 16th international conference on Supercomputing, 2002

Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms.
Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002

Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures.
IEEE Trans. Computers, 2001

Lifetime-Sensitive Modulo Scheduling in a Production Environment.
IEEE Trans. Computers, 2001

Modulo scheduling with integrated register spilling for clustered VLIW architectures.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

<i>MIRS</i>: Modulo Scheduling with Integrated Register Spilling.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Optimizing cache miss equations polyhedra.
SIGARCH Comput. Archit. News, 2000

Improved spill code generation for software pipelined loops.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Two-level hierarchical register file organization for VLIW processors.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

An efficient solver for Cache Miss Equations.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

A Fast and Accurate Approach to Analyze Cache Memory Behavior (Research Note).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Impact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Distributed Modulo Scheduling.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Modulo Scheduling with Reduced Register Pressure.
IEEE Trans. Computers, 1998

Quantitative Evaluation of Register Pressure on Software Pipelined Loops.
Int. J. Parallel Program., 1998

Widening Resources: A Cost-effective Technique for Aggressive ILP Architectures.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Partitioned Schedules for Clustered VLIW Architectures.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Resource Widening Versus Replication: Limits and Performance-cost Trade-off.
Proceedings of the 12th international conference on Supercomputing, 1998

Increasing Memory Bandwidth with Wide Buses: Compiler, Hardware and Performance Trade-Offs.
Proceedings of the 11th international conference on Supercomputing, 1997

Allocating Lifetimes to Queues in Software Pipelined Architectures.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

Heuristics for Register-Constrained Software Pipelining.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Swing module scheduling: a lifetime-sensitive approach.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

Hypernode reduction modulo scheduling.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Non-Consistent Dual Register Files to Reduce Register Pressure.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Using Sacks to Organize Registers in VLIW Machines.
Proceedings of the Parallel Processing: CONPAR 94, 1994
