Cray Containment Domains

About Containment Domains

Cray logo

Containment Domains

CDs can be thought of as a mechanism to support fine-grained, hierarchical, application-directed, uncoordinated checkpoint and restore. Containment Domains (CDs) were created to allow an application to recover in the face of various types of component failures, including hard (e.g. persistent) and soft (e.g. transient) hardware failures, transient software failures, and complete node failures.

The principles behind Containment Domains were first described by Michael Sullivan, Doe Hyun Yoon, and Mattan Erez. The original publication(s) can be downloaded at Professor Erez web site at University of Texas (here).

The software you can find on these sourceforge pages is an implementation of one component of the Containment Domain model. The software here focuses on the persistence infrastructure as implemented by Cray Inc.

The software on this site is a open source (BSD 2.0) implementation of some of the Containment Domain concepts by Cray Inc.