Personal tools
You are here: Home Science and Research XtreemOS cluster flavour LinuxSSI
Document Actions

XtreemOS cluster flavour

LinuxSSI

LinuxSSI is the basis of the cluster flavour of the foundation layer (XtreemOS-F) of the XtreemOS Grid operating system. LinuxSSI implements the Single System Image concept leveraging Kerrighed technology.

Objectives

One of our objectives is to design and implement a cluster flavour of XtreemOS Grid operating system. The cluster favour of XtreemOS is designed to execute the standard XtreemOS-G high level services on top of the customized cluster version of XtreemOS-F foundation layer.

The cluster flavour of XtreemOS-F is based on the Single System Image concept. A Single System Image (SSI) cluster operating system gives the illusion that a cluster is a single multiprocessor machine. With LinuxSSI, which implements the SSI concept in the XtreemOS-F layer, a cluster appears as a single powerful Grid node providing the POSIX interface. LinuxSSI leverages the existing Linux-based Kerrighed SSI technology, resulting from research activities conducted at INRIA in collaboration with EDF R&D and now being developed by a community in the framework of Kerrighed open source project.

 

Research Topics

We investigate the following research topics in the context of the design and implementation of a full SSI cluster operating system based on Linux. Experimentations are performed based on Kerrighed cluster SSI operating system based on Linux.

  • Process checkpointing based on KDDM

XtreemOS implements a hierarchical three level checkpointing architecture for grid environments. Together with a restart facility this is the fault tolerance foundation for a backward error recovery scheme. Major research topics of interest are the development of new scalable checkpointing techniques and the evaluation of existing checkpointing techniques in a large scale grid environment.The topmost grid checkpointer implements checkpointing strategies for grid applications running on (a large scale of) grid nodes.The system checkpointer is responsible for checkpointing single grid nodes which may be single PCs  or clusters . The cluster version of the system checkpointer uses itself distributed strategies to checkpoint an application running on a cluster. The kernel checkpointer is responsible for saving and restarting the state of a process on a single node. LinuxSSI implements a kernel checkpointer based on the KDDMs (Kerrighed Distributed Data Management) of Kerrighed. The kernel checkpointer saves the process state by saving all relevant system KDDM objects and other non-distributed system data. The restart facility allows the restart of checkpointed processes. The next steps will be to finalize the kernel checkpointer and restart facility and to extend saving the process state taking into account open files, streams and System V IPC shared segments.

  • kDFS distributed file system exploiting the cluster disks

A lot of work has been done on distributed file servers with various issues tackled, e.g. security, performance, and high availability. In a cluster context, the most common approach consists of using specific nodes to provide the distributed file system and therefore to divide the cluster into two groups: the compute nodes and the I/O nodes. The hard drives available on the compute nodes are only used for the system and temporary files, thus wasting both a lot of space and throughput.

Our goal is to provide a cluster file system distributed over all nodes of a cluster by combining the use of CPU resources by the applications and the use of each available hard drive by the distributed file system. From a general point of view, designing such an integrated cluster file system will lead to several contributions. Moreover to be fully distributed, such a file system will exploit, cooperate with, and complement existing cluster system services, like scheduler or load-balancing strategies.

Now, the first step is almost done. The new distributed file system is called kDFS (kernel Distributed File System). It exploits a fully distributed meta-data management policy to improve efficiency based on the kddm mechanisms provided by Kerrighed. Second, cluster nodes can access kDFS files even without providing storage space. We also started to work on the design and implementation of efficient data access mechanisms (cache, stripping and I/O scheduling policies). The implementation required strong efforts to develop the « backbone » of kDFS file system. A first version of meta-data management mechanisms is operational.

kernel/kerrighed Distributed File System

kDFS aims at providing an integrated cluster file system for High Performance Computing.

Based on several concepts suggested in KerFS (Kerrighed 1.02), the kernel Distributed File system has been develop from scratch. One of the main idea consists in developing a distributed file system pluggable under the VFS and only based on the KDDM component of Kerrighed. The KDDM features are used to build a cooperative cache for both data and meta-data using all available memory in the cluster.

Further information: http://www.kerrighed.org/wiki/index.php/KernelDevelKdFS

 

  • Customizable scheduler

One of our goals in LinuxSSI is to build customizable SSI scheduler which would be able to adapt to the current load of the system it is running on. The scheduler should also be highly configurable (user can extend it with his own probes and scheduling policies).  So far, we have successfully implemented pluggable probes and scheduling policies framework. It enables user to write his own probes (which are used for monitoring usage of different resources) as well as scheduling policies (which are used for distributing load across the cluster) and plug them to Kerrighed. Based on this framework we have implemented probes for monitoring CPU load and memory. Both probes and scheduling policies are implemented using Linux's configfs file system so users are able to add and remove scheduling policies with all the necessary probes in runtime, without the need to restart a cluster.


Our current goal is to integrate this framework into Kerrighed scheduler and to implement the most basic scheduling policies (e.g. cpu load balancing policy, ...). After this has been done, we will investigate more sophisticated policies which will be able to take into account inter-resource dependencies (IPC, shared memory) when choosing which process to migrate. Besides that, we will also design interface for linking SSI scheduler with XtreemOS Grid scheduler (belonging to the application execution management service).

  • Scalability of the SSI technology

We study the scalability of the SSI technology, aiming at developing a scalability benchmark and identifying the parameters and algorithms limiting the scalability of a SSI cluster operating system.

 

Results

  • Distributed meta data management in kDFS cluster distributed file system exploiting computation nodes' disks (prototype not yet publicly available)
  • Process checkpointing mechanism (initial prototype available in the latest version of Kerrighed, see Software section below)
  • Infrastructure allowing hot-plug of resource probes and global scheduling policies

 

Software

XtreemOS participants contribute to the Kerrighed open source project. The latest version of Kerrighed, Kerrighed V2.1.0 based on Linux 2.6.20, has been released under the GPL licence on June 4, 2007 (download). XtreemOS consortium contributed in different ways to this new release of Kerrighed:

  • participation to the port from Linux 2.6.11 to Linux 2.6.20 kernel (new structure of Kerrighed patches to Linux kernel),
  • process checkpointing mechanism exploiting KDDM (currently exprimental version),
  • testing and debugging.

XtreemOS is going to build the RPM and Debian packages of the new Kerrighed version and integrate them in the OSCAR-SSI distribution (http://ssi-oscar.gforge.inria.fr/).

 

Publications

Project deliverables

  • Specification of federation resource management mechanisms (D2.2.1) - December 2006
  • Design and implementation of scalable SSI mechanisms in LinuxSSI (D2.2.2) - December 2007
  • Design and implementation of basic checkpoint-restart mechanisms in LinuxSSI (D2.2.3) - December 2007
  • Design and implementation of basic reconfiguration mechanisms in LinuxSSI (D2.2.4) - January 2008
  • Design and implementation of high performance disk input-output operations in a cluster (D2.2.5) - December 2007
  • Design and implementation of a basic customizable scheduler (D2.2.6) - December 2007
  • Prototype of the basic version of LinuxSSI (D2.2.7) - January 2008
  • Design and Implementation of First Advanced Version of LinuxSSI (D2.2.8) - December 2008
  • First prototype of a standalone KDDM module (D2.2.9) - December 2008

 

Journals and conferences

  • Jérôme Gallard, Adrien Lebre, Christine Morin, Pascal Gallard and Geoffroy Vallée. Is Virtualization Killing Single System Image Research? Technical report, RR-INRIA 6389, November 2007 (pdf).
  • XtreemOS grid checkpointing architecture, John Mehnert-Spahn, Michael Schoettner, David Margery, and Christine Morin. . In IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2008), poster session, Lyon, France, May 2007.

Presentations at the second Kerrighed Summit held on February 1st, 2008 in Paris (France)

- Adrien Lebre - kDFS overview (results from T2.2.5)
- John Mehnert-Spahn - IPC SYSV Incremental Checkpointing (results from T2.2.3)
- Matthieu Fertré - IPC Checkpointing (results from T2.2.3)
- Christine Morin - XtreemOS plans for the Kerrighed project
- Erich Focht - Kerrighed and Virtualization (results from T2.2.11)
- Jérôme Gallard - Is Virtualization Killing SSI Research? (results from T2.2.11)
- Marko Novak - Implementation of DRMAA job submission interface for Kerrighed (results from T2.2.6)
- Runtime configurable scheduler framework (joint work between Marko Novak from XLAB resulting from T2.2.6 and Louis Rilling, key Kerrighed developers from Kerlabs)

All the slides are available at the following URL: http://www.kerrighed.org/wiki/index.php/Summit.

 

 

 

by Christine Morin last modified 2008-12-18 10:53