Yazar "Labarta, Jesus" için listeleme
-
CRC-based Memory Reliability for Task-parallel HPC Applications
Subasi, Omer; Unsal, Osman; Labarta, Jesus; Yalcin, Gulay; Cristal, Adrian (IEEE345 E 47TH ST, NEW YORK, NY 10017 USA, 2016)Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale ... -
Designing and Modelling Selective Replication for Fault-tolerant HPC Applications
Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus (IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2017)Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However ... -
A runtime heuristic to selectively replicate tasks for application-specific reliability targets
Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman; Labarta, Jesus (IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA, 2016)n this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require ...