Tien-Fu Chen

According to our database1, Tien-Fu Chen authored at least 94 papers between 1991 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Self-Supervised Learning of Disentangled Representations for Multivariate Time-Series.
CoRR, 2024

Efficient Inference of Transformers on Bare-Metal Devices with RISC-V Vector Processors.
Proceedings of the 22nd IEEE Interregional NEWCAS Conference, 2024

TimeDRL: Disentangled Representation Learning for Multivariate Time-Series.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

A 40-nm 13.88-TOPS/W FC-DNN Engine for 16-bit Intelligent Audio Processing Featuring Weight-Sharing and Approximate Computing.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024

2023
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs.
CoRR, 2023

Adaptive Similarity-Aware Hyperparameter Tuners for Classification Tasks.
IEEE Access, 2023

2017
ULV-Turbo Cache for an Instantaneous Performance Boost on Asymmetric Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Energy-Efficient TCAM Search Engine Design Using Priority-Decision in Memory Technology.
IEEE Trans. Very Large Scale Integr. Syst., 2017

A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control.
ACM Trans. Design Autom. Electr. Syst., 2017

eTag: Tag-Comparison in Memory to Achieve Direct Data Access based on eDRAM to Improve Energy Efficiency of DRAM Cache.
IEEE Trans. Circuits Syst. I Regul. Pap., 2017

A Resistance Drift Compensation Scheme to Reduce MLC PCM Raw BER by Over 100× for Storage Class Memory Applications.
IEEE J. Solid State Circuits, 2017

A 3T1R Nonvolatile TCAM Using MLC ReRAM for Frequent-Off Instant-On Filters in IoT and Big-Data Processing.
IEEE J. Solid State Circuits, 2017

2016
High-Performance Deadlock-Free ID Assignment for Advanced Interconnect Protocols.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Zero-Counting and Adaptive-Latency Cache Using a Voltage-Guardband Breakthrough for Energy-Efficient Operations.
IEEE Trans. Circuits Syst. II Express Briefs, 2016

A ReRAM-Based 4T2R Nonvolatile TCAM Using RC-Filtered Stress-Decoupled Scheme for Frequent-OFF Instant-ON Search Engines Used in IoT and Big-Data Processing.
IEEE J. Solid State Circuits, 2016

Cross-matching caches: Dynamic timing calibration and bit-level timing-failure mask caches to reduce timing discrepancies with low voltage processors.
Integr., 2016

Variable-length VLIW encoding for code size reduction in embedded processors.
Proceedings of the 29th IEEE International System-on-Chip Conference, 2016

7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell.
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016

7.3 A resistance-drift compensation scheme to reduce MLC PCM raw BER by over 100× for storage-class memory applications.
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016

2015
Soft-Error-Tolerant Design Methodology for Balancing Performance, Power, and Reliability.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A latency-elastic and fault-tolerant cache for improving performance and reliability on low voltage operation.
Proceedings of the VLSI Design, Automation and Test, 2015

Lifetime-aware LRU promotion policy for last-level cache.
Proceedings of the VLSI Design, Automation and Test, 2015

Adaptive granularity and coordinated management for timely prefetching in multi-core systems.
Proceedings of the VLSI Design, Automation and Test, 2015

17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time.
Proceedings of the 2015 IEEE International Solid-State Circuits Conference, 2015

Energy-efficient non-volatile TCAM search engine design using priority-decision in memory technology for DPI.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Low-cost low-power droop-voltage-aware delay-fault-prevention designs for DVS caches.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014
Reconfigurable vertical profiling framework for the android runtime system.
ACM Trans. Embed. Comput. Syst., 2014

ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing.
Proceedings of the Symposium on VLSI Circuits, 2014

Leveraging Data Lifetime for Energy-Aware Last Level Non-Volatile SRAM Caches using Redundant Store Elimination.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

DAPs: Dynamic Adjustment and Partial Sampling for Multithreaded/Multicore Simulation.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
Variation-aware and adaptive-latency accesses for reliable low voltage caches.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013

Cross-layer dynamic prefetching allocation strategies for high-performance multicores.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013

A configurable bus-tracer for error reproduction in post-silicon validation.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013

A 0.48V 0.57nJ/pixel video-recording SoC in 65nm CMOS.
Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013

2012
A Scalable High-Performance Virus Detection Processor Against a Large Pattern Set for Embedded Network Security.
IEEE Trans. Very Large Scale Integr. Syst., 2012

NUDA: A Non-Uniform Debugging Architecture and Nonintrusive Race Detection for Many-Core Systems.
IEEE Trans. Computers, 2012

IMITATOR: A deterministic multicore replay system with refining techniques.
Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, 2012

2011
Maintaining performance on power gating of microprocessor functional units by using a predictive pre-wakeup strategy.
ACM Trans. Archit. Code Optim., 2011

Hierarchical circuit-switched NoC for multicore video processing.
Microprocess. Microsystems, 2011

Load and storage balanced posting file partitioning for parallel information retrieval.
J. Syst. Softw., 2011

2010
Adaptive Pipeline voltage Scaling in High Performance Microprocessor.
J. Circuits Syst. Comput., 2010

RunAssert: A non-intrusive run-time assertion for parallel programs debugging.
Proceedings of the Design, Automation and Test in Europe, 2010

2009
VisoMT: A Collaborative Multithreading Multicore Processor for Multimedia Applications With a Fast Data Switching Mechanism.
IEEE Trans. Circuits Syst. Video Technol., 2009

An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices.
IEEE J. Solid State Circuits, 2009

VeriC: A semi-hardware description language to bridge the gap between ESL design and RTL models.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009

dIP: A Non-intrusive Debugging IP for Dynamic Data Race Detection in Many-Core.
Proceedings of the 10th International Symposium on Pervasive Systems, 2009

NUDA: a non-uniform debugging architecture and non-intrusive race detection for many-core.
Proceedings of the 46th Design Automation Conference, 2009

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips.
Proceedings of the 46th Design Automation Conference, 2009

2008
Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes.
ACM Trans. Design Autom. Electr. Syst., 2008

Low-power algorithm for automatic topology generation for application-specific networks on chips.
IET Comput. Digit. Tech., 2008

2007
Efficient segment-based video transcoding proxy for mobile multimedia services.
J. Syst. Archit., 2007

Reducing Branch Misprediction Penalties Via Adaptive Pipeline Scaling.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

An Embedded Coherent-Multithreading Multimedia Processor and Its Programming Model.
Proceedings of the 44th Design Automation Conference, 2007

2006
On a design of crossroad switches for low-power on-chip communication architectures.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Design of customized functional units for the VLIW-based multi-threading processor core targeted at multimedia applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Collaborative Multithreading: An Open Scalable Processor Architecture for Embedded Multimedia Applications.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs.
Proceedings of the 43rd Design Automation Conference, 2006

Fast Run-Time Power Monitoring Methodology for Embedded Systems.
Proceedings of the 2006 International Conference on Embedded Systems & Applications, 2006

2005
Flexible Heterogeneous Multicore Architectures for Versatile Media Processing Via Customized Long Instruction Words.
IEEE Trans. Circuits Syst. Video Technol., 2005

Design techniques for single-low-V<sub>DD</sub> CMOS systems.
IEEE J. Solid State Circuits, 2005

Development of Architecture and Software Technologies in High-Performance Low-Power SoC Design.
Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2005), 2005

A low-power crossroad switch architecture and its core placement for network-on-chip.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Efficient Segment-Based Video Transcoding Proxy for Mobile Multimedia Services.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

System-Level Power-Aware Scheduling by Operation-based Prediction.
Proceedings of the 2005 International Conference on Pervasive Systems and Computing, 2005

Crossroad System-on-Chip Communication Architecture for Low Power Embedded Systems.
Proceedings of The 2005 International Conference on Embedded Systems and Applications, 2005

2004
Branch-and-bound task allocation with task clustering-based pruning.
J. Parallel Distributed Comput., 2004

Scalable locality-aware event dispatching mechanism for network servers.
IEE Proc. Softw., 2004

A parameterized power-aware IP core generator for the 2-D 8×8 DCT/IDCT.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

A power-aware IP core generator for the one-dimensional discrete Fourier transform.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Unified bus encoding by stream reconstruction with variable strides.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

A power-aware IP core design for the variable-length DCT/IDCT targeting at MPEG4 shape-adaptive transforms.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

2003
Variable-size data item placement for load and storage balancing.
J. Syst. Softw., 2003

Energy Efficient Caching-on-Cache Architectures for Embedded Systems.
J. Inf. Sci. Eng., 2003

Inverted file compression through document identifier reassignment.
Inf. Process. Manag., 2003

Flexible Heterogeneous Multicore Architectures for Media Processing via Customized Long Instruction Words.
Proceedings of the IFIP VLSI-SoC 2003, 2003

A Tree-Based inverted File for Fast Ranked-Document Retrieval.
Proceedings of the International Conference on Information and Knowledge Engineering. IKE'03, June 23, 2003

2002
Decoupling of data and tag arrays for on-chip caches.
Microprocess. Microsystems, 2002

Posting file partitioning and parallel information retrieval.
J. Syst. Softw., 2002

Dynamic voltage leveling scheduling for real-time embedded systems on low-power variable speed processors.
Proceedings of the International Conference on Compilers, 2002

2001
Compressing inverted files in scalable information systems by binary decision diagram encoding .
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

2000
Dynamic memory management for real-time embedded Java chips.
Proceedings of the 7th International Workshop on Real-Time Computing and Applications Symposium (RTCSA 2000), 2000

1999
Segmented bus design for low-power systems.
IEEE Trans. Very Large Scale Integr. Syst., 1999

SBA: a server-initiated playback scheme supporting variable bit rate control.
IEEE Trans. Consumer Electron., 1999

1998
Supporting Highly-Speculative Execution via Adaptive Branch Trees.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

1997
Reducing memory penalty by a programmable prefetch engine for on-chip caches.
Microprocess. Microsystems, 1997

1996
Techniques for The Efficient Analysis of Cache Performance.
J. Inf. Sci. Eng., 1996

Efficient trace-sampling simulation techniques for cache performance analysis.
Proceedings of the Proceedings 29st Annual Simulation Symposium (SS '96), 1996

1995
Effective Hardware Based Data Prefetching for High-Performance Processors.
IEEE Trans. Computers, 1995

An effective programmable prefetch engine for on-chip caches.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

1994
A Performance Study of Software and Hardware Data Prefetching Schemes.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

An Evaluation of Hardware and Software Data Prefetching.
Proceedings of the Applications in Parallel and Distributed Computing, 1994

1992
Reducing Memory Latency via Non-blocking and Prefetching Caches.
Proceedings of the ASPLOS-V Proceedings, 1992

1991
An effective on-chip preloading scheme to reduce data access penalty.
Proceedings of the Proceedings Supercomputing '91, 1991


  Loading...