Browsing by Author "Yalcin, Gulay"
Now showing items 1-4 of 4
-
CRC-based Memory Reliability for Task-parallel HPC Applications
Subasi, Omer; Unsal, Osman; Labarta, Jesus; Yalcin, Gulay; Cristal, Adrian (IEEE345 E 47TH ST, NEW YORK, NY 10017 USA, 2016)Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale ... -
Designing and Modelling Selective Replication for Fault-tolerant HPC Applications
Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus (IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2017)Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However ... -
A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCs
Cini, Nevin; Yalcin, Gulay (ASSOC COMPUTING MACHINERY, 2 PENN PLAZA, STE 701, NEW YORK, NY 10121-0701 USA, 2020)Today, GPUs are widely used as coprocessors/accelerators in High-Performance Heterogeneous Computing due to their many advantages. However, many researches emphasize that GPUs are not as reliable as desired yet. Despite ... -
A runtime heuristic to selectively replicate tasks for application-specific reliability targets
Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus (IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2016)n this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require ...