How to build a diskless cluster?

Pegasus IV Cluster

How to build a diskless Linux cluster?

These pages describe the Pegasus Cluster concept for building a diskless computing cluster based on Scientific Linux 7 / CentOS 7 / Red Hat Enterprise Linux 7.

Reasons why you would want your compute nodes to be diskless:

money saved on hard disks
compute nodes consume less power and thus produce less heat and noise
increased reliability as the compute nodes contain fewer parts that may fail
cluster administration is simplified (instead of updating each individual compute node you modify a single boot image)

Of course, there are some disadvantages, too:

root file system occupies part of the compute nodes' RAM and reduces the amount of RAM available for applications
(this is hardly an issue, since 128 MByte out of a total RAM of 16 GByte is not really noticeable)
compute nodes do not have swap space (swapping over the network is obviously a bad idea)
cluster installation becomes more involved since one has to create a custom Linux kernel and root file system

Root-NFS versus Root-RAM file system

Where to put the root file systems of the compute nodes, in the compute node RAM or on the server, mounted via NFS? We strongly prefer the Root-RAM setup mainly for three reasons:

generates less network traffic

better scalability to many compute nodes; in the (naive) Root-NFS approach one needs a separate file system on the server
for each of the nodes while the same image can be used to populate all of the RAM file systems

compute nodes are not just diskless, they are stateless; after a reboot they are guaranteed to have the same well-defined
configuration which does not wander in time

Pegasus Cluster concept

cluster server runs standard installation of Scientific Linux 7.1 + some extra packages to facilitate the cluster operation
diskless compute nodes are booted using PXE network boot ROM and pxelinux network boot loader
root file system resides in a 128MByte tmpfs file system in the compute node's RAM
/usr and /home file systems are then mounted via NFS from the cluster server

The Pegasus Cluster concept is flexible; it supports heterogeneous clusters as long as all compute nodes can run 64-bit x86 Scientific Linux. Differences in node hardware can be dealt with by distributing different Linux kernels and root file systems to different types of nodes (see Node Installation).

We have used the Pegasus Cluster concept for clusters with almost 200 compute nodes, without notable strain on the server. We thus believe it can be scaled up beyond 200 nodes. Large clusters of several thousand nodes probably require changes to the node provisioning mechanism to avoid overloading the server.