Pegasus IV Cluster |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Server installation Pegasus Cluster Software is based on Scientific Linux 7.1. Using Red Hat Enterprise Linux 7 or CentOS 7 should work just as well because Scientific Linux is a Red Hat rebuild. Server hardware Any reasonably powerful PC can be used as the cluster server, at least for smaller clusters (for our current server, see Hardware). The only non-standard requirement is that it needs to have two network interfaces, one to connect to the outside world and one for the private network between the server and the compute nodes. If necessary, simply add a PCI or PCIe network card to your machine. Basic installation
Perform a standard installation of Scientific Linux on the server. We have manually partitioned our hard drives as shown in the table on the right. On the software selection screen, select "Creative and Development Workstation". (This is not crucial as missing packages can always be installed later.) Set the root password and create the first user. After the installation finishes, reboot, and then run "Software update" to receive all the lastest security patches and fixes. Most of the following configuration steps require root rights.
Configure the network interface used for the private cluster network: Assign the static ip number 192.168.0.254, the subnet mask 255.255.255.0, and set the interface to be activated on boot. To do so, edit the file "/etc/sysconfig/network-scripts/ifcfg-..." where the ellipsis stands for the interface name (enp3s0 in our case). Add or edit the lines shown on the right. The other network interface connecting to the external world (enp1s0, in our case) should have been properly configured during the installation process. To configure the firewall, start the GUI firewall configuration tool by calling
Assign the external network interface permanently to the "public" zone. Assign the internal network interface permanently to the "trusted" zone. (You should do this only if the cluster network is isolated and safe.) Alternatively, you can achieve this by typing the commands
Install Midnight Commander via the command
PXE, DHCP, and TFTP The goal of this section is to provide the infrastructure for net-booting the compute nodes. We need:
Edit the file "/etc/hosts" and add lines for the cluster server and all compute nodes as shown on the right. Create or edit the file "/etc/ethers" to list the MAC addresses of all compute nodes by adding lines such as the one shown on the right. Create the folder "/tftpboot". This folder will hold all the files to be handed to the nodes during their boot process (the Linux kernel, the root file system, and the pxelinux boot loader). Copy pxelinux.0 from the syslinux package into /tftpboot
DHCP service and TFTP service are both provided by dnsmasq. Install the dnsmasq package (if it is not already installed).
Edit the dnsmasq configuration file "/etc/dnsmasq.conf". dnsmasq has a large number of configuration options. Edit or add the lines on the right.
To allow the tftp service to access the "/tftpboot" directory under SE Linux, you need to change the security context via
Finally, you can start dnsmasq and enable its automatic start at boot time by typing the commands
NFS, NIS, and chrony The goals of this section are to
To install the rpm packages necessary for the network file system (NFS), type the command
Now you can start the NFS server and enable its automatic start at boot by typing
To test the NFS installation, type the command
The rpm packages required for installing NIS (Yellow Pages) can be installed by typing
Set the NIS domain name by typing
Now you can start NIS and enable its automatic start at boot by typing
Initialize the NIS maps via
Specify "pegasus4", then type Ctrl-D and finish. If you later wish to update the NIS maps, for example after adding a new user, cd into the directory "/var/yp" and type
You can test your NIS installion via
where <user> is a user name that exists in the NIS map. We use chrony to allow the compute nodes to synchronize their system time with the cluster server. Install the chrony package via
Now you can start chronyd and enable its automatic start at boot by typing
Passwordless rlogin and rsh between the nodes NOTE: rlogin and rsh are insecure, they should only be used if your private cluster network is isolated and safe. Otherwise, look into ssh which also allows passwordless logins.
If you wish to enable passwordless rlogin/rsh for root you also need to create or edit the file "/root/.rhosts" and add the lines shown on the right for the server and all nodes. Make sure that the permissions of both files are set to 644. Install the rsh and rsh-server packages
The files "/usr/bin/rsh", "/usr/bin/rlogin", "/usr/bin/rcp", and "/usr/bin/rexec" should have the SUID bit set.
(Note: Linux file capabilities that can assign certain root priviliges without the SUID bit cannot be used here because the compute nodes will mount "/usr" via the NFS file system which does not support extended attributes.) Edit the file "/etc/securetty" and append "rsh, "rexec", and "rlogin" at the end of the file.To allow rlogind on the cluster server to access the file "/root/.rhosts" under SE Linux:
Finally, you can start rsh, rlogin and rexec and enable their automatic start at boot by typing
Intel Fortran and C/C++ compilers To install Intel Parallel Studio XE 2016 for Linux , download the installation packages and the license file and copy them into a staging directory such as "/var/install". (Note: In contrast to all other softare we use on Pegasus, Intel Parallel Studio is commericial software. We use it because it tends to produce faster code than gfortran, at least for our applications.) Unpack the installation package and run the installation script "install.sh". Make sure you install the compilers into a directory that is exported to the compute nodes, such as "/usr/local/intel", because the nodes will need access to the libraries. The installer nontheless puts the license file under "/opt/intel" where the compute nodes cannot see it. Therefore copy the license file to the folder "/usr/local/intel/compilers_and_libraries_2016.0.150/linux/licenses/" or the corresponding folder for your compiler version. If you have a floating license, you also need to install the Intel flexlm license manager. Download the installation package and copy it to the staging directory. Unpack the package and run the installation script. To start flexlm automatically, add the following line to the file "/etc/rc.d/rc.local"
where <server-install-dir> is the full path to the flexlm directory. Do not forget to make "rc.local" executable. Alternatively, you could write a proper systemd service file for flexlm. Finally, add the following line to the user's ".cshrc"
to set the path and environment variables. The corresponding line for ".bashrc" is
Torque resource manager and Maui scheduler Detailed installation instructions can be found in the Torque Administrator Guide (pdf version), here we just give a brief summary. Install prerequisite packages:
Download the source code of Torque from www.adaptivecomputing.com. Unpack the tar ball into a directory. In this directory, run the commands
Torque does not know how to find its libraries. Therefore type
Copy the systemd service files into the directory "/usr/lib/systemd/system":
Start the authentication daemon and enable its automatic start at boot via
Now initialize the Torque server by executing from the build directory
<user> becomes a manager and operator of Torque.
Start the Torque server and enable its automatic start at boot via
Create and configure the desired queues, for example a queue "qsNormal".
Further configuration parameters are found in the pbs*.service files in the folder /usr/lib/systemd/system (such as the stacksize limit for processes spanned by pbs_mom). Torque will also need to be configured on the compute nodes; the required steps are discussed in "Nodes". To install the Maui scheduler, download the Maui source from adaptivecomputing.com. (You will need to fill out a free registration to get access.) Unpack the tar ball into a directory. In this directory, run the commands
Add "/usr/local/maui/bin" to the user's path (in user's ".cshrc" and ".bashrc"). To start Maui at boot, add the line "/usr/local/maui/sbin/maui" to the file "/etc/rc.d/rc/local". (Alternatively, write a proper systemd service file for Maui.) OpenMPI Download source code of OpenMPI 1.10 from http://www.open-mpi.org/software/ompi/v1.10/. Unpack the tar ball into a directory. In this directory, run the commands
Here, the first four options specify the use of the Intel compilers, the "--prefix" switch sets the installation directory, and the switch "--disable-dlopen" disables the use of modules (this reduces the file system traffic when starting large jobs).
Make sure "/usr/local/bin" is in the user's path and "/usr/local/lib" is in the environment variable LD_LIBRARY_PATH. Note: There is a conflict between Open MPI and the Intel MPI library installed as part of Parallel Studio. (Even if you do not order Intel's Cluster Edition, part of Intel's MPI software gets installed.) Rename "/usr/local/intel/compilers_and_libraries_2016.0.150/linux/mpi" into "/usr/local/intel/compilers_and_libraries_2016.0.150/linux/mpi_renamed". Otherwise the wrong libraries and the wrong "mpirun" may be called. |