System Installation¶
All RPM packages are built in the tara-c-060 node built with tara-build-centos7
image.
Build Order¶
- munge
- OpenUCX
- PMIx
- Slurm
- Lmod/EasuBuild
Common Tools¶
We will use rpm-build
for building RPM packages from source code and wget
for downloading the code.
$ yum install rpm-build wget
MUNGE¶
Building MUNGE RPMs¶
Download the latest version of MUNGE
$ wget https://github.com/dun/munge/releases/download/munge-0.5.13/munge-0.5.13.tar.xz
Note
EPEL repository does not contain the latest version of munge
package.
Install MUNGE dependencies
$ yum install gcc bzip2-devel openssl-devel zlib-devel
Build RPM package from MUNGE source.
$ rpmbuild -tb --clean munge-0.5.13.tar.xz
Create MUNGE directory in parallel file system and move RPM files.
$ mkdir -p /utils/munge
$ mv rpmbuild/ /utils/munge/
Install and start MUNGE¶
Generate munge.key
. Need to do only once
$ dd if=/dev/urandom bs=1 count=1024 > /utils/munge/munge.key
Create munge user and group.
$ groupadd munge -g 2000
$ useradd --system munge -u 2000 -g munge -s /bin/nologin --no-create-home
Install MUNGE from RPM.
$ rpm -ivh /utils/munge/rpmbuild/RPMS/x86_64/munge-0.5.13-1.el7.x86_64.rpm \
/utils/munge/rpmbuild/RPMS/x86_64/munge-libs-0.5.13-1.el7.x86_64.rpm \
/utils/munge/rpmbuild/RPMS/x86_64/munge-devel-0.5.13-1.el7.x86_64.rpm
Create MUNGE local directory and copy munge.key
.
$ mkdir -p /etc/munge/
$ chown -R 2000:2000 /etc/munge/
$ chmod 500 /etc/munge/
$ cp /utils/munge/munge.key /etc/munge
$ chmod 400 /etc/munge/munge.key
Start MUNGE service
$ systemctl enable munge
$ systemctl start munge
$ systemctl status munge
Testing MUNGE installation
$ munge -n
$ munge -n | unmunge
$ munge -n | ssh <host> unmunge
$ remunge
Note
By default the Munge daemon runs with two threads, but a higher thread count can improve its throughput. For high throughput support, the Munge daemon should start with ten threads
OpenUCX¶
https://github.com/openucx/ucx/releases
yum install numactl numactl-libs numactl-devel
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
./contrib/configure-release --prefix=$PWD/install --with-cuda=/usr/local/cuda/
rpmbuild -bb --define "configure_options --enable-optimizations --with-cuda=/usr/local/cuda" ucx-1.4.0/ucx.spec
PMIx¶
Build PMIx RPM Package¶
Install PMIx dependencies
$ yum install libtool libevent-devel
Download the latest stable version of PMIx
$ wget https://github.com/pmix/pmix/releases/download/v3.0.2/pmix-3.0.2.tar.bz2
Build PMIx package from PMIx source.
$ ./configure --with-munge=/usr --with-munge-libdir=/usr
$ rpmbuild -tb --clean --define "configure_options --with-munge=/usr" pmix-3.0.2.tar.bz2
Note
PMIx script seems to support C11 features but will require gcc 4.9+
Create PMIx directory in parallel file system and move RPM files.
$ mkdir -p /utils/pmix
$ mv rpmbuild/ /utils/pmix/
Install PMIx from RPM.
$ rpm -ivh /utils/pmix/rpmbuild/RPMS/x86_64/pmix-3.0.2-1.el7.x86_64.rpm
Checking PMIx installation
$ grep PMIX_VERSION /usr/include/pmix_version.h
#define PMIX_VERSION_MAJOR 3L
#define PMIX_VERSION_MINOR 0L
#define PMIX_VERSION_RELEASE 2L
Slurm¶
Build SLURM RPM Package¶
Install SLURM and its plugins dependencies (See. Plugins Dependencies)
$ yum install readline-devel perl-ExtUtils-MakeMaker pam-devel hwloc-devel freeipmi-devel lua-devel mysql-devel libssh2-devel
Download the latest stable version of SLURM
$ wget https://download.schedmd.com/slurm/slurm-18.08.3.tar.bz2
Build SLURM package from SLURM source with PMIx.
$ rpmbuild -tb --clean slurm-18.08.3.tar.bz2
$ rpmbuild -bb --clean --define "configure_options --with-ucx" slurm.spec
Create SLURM directory in parallel file system and move RPM files.
$ mkdir -p /utils/slurm
$ mv rpmbuild/ /utils/slurm/
Install Slurm¶
Create slurm user and group.
$ groupadd slurm -g 2001
$ useradd --system slurm -u 2001 -g slurm -s /bin/nologin --no-create-home
Install SLURM and its plugins dependencies (See. Plugins Dependencies)
$ yum install readline-devel perl-ExtUtils-MakeMaker pam-devel hwloc-devel freeipmi-devel lua-devel mysql-devel libssh2-devel
Frontend¶
Install slurm from RPM packages.
$ rpm -ivh /utils/slurm/rpmbuild/RPMS/x86_64/slurm-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-perlapi-18.08.3-1.el7.x86_64.rpm
Setup firewall
$ firewall-cmd --add-port 60001-63000/tcp --permanent
$ firewall-cmd --reload
$ iptables -nL
Slurmctld¶
Install slurmctld from RPM packages.
$ rpm -ivh /utils/slurm/rpmbuild/RPMS/x86_64/slurm-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-slurmctld-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-perlapi-18.08.3-1.el7.x86_64.rpm
Create required directory
$ mkdir -p /var/log/slurm/ /var/run/slurm/ /var/spool/slurm/
$ chown slurm:slurm /var/log/slurm/
$ chown slurm:slurm /var/run/slurm/
$ chown slurm:slurm /var/spool/slurm/
Setup firewall
$ firewall-cmd --add-port 6817/tcp --permanent
$ firewall-cmd --add-port 60001-63000/tcp --permanent
$ firewall-cmd --reload
$ iptables -nL
Edit PIDFile configuration in /usr/lib/systemd/system/slurmctld.service
to the same localtion in slurm.conf
(Current setting: /var/run/slurm/slurmctld.pid
).
Following script could be use for editing.
$ sed -i -e 's@PIDFile=/var/run/slurmctld.pid@PIDFile=/var/run/slurm/slurmctld.pid@g' /usr/lib/systemd/system/slurmctld.service
Create slurmctld.conf
in /usr/lib/tmpfiles.d/
. The content of slurmctld.conf
is as follows
d /var/run/slurm 0755 slurm slurm -
Start slurmdbd
service
$ systemctl enable slurmdbd
$ systemctl start slurmdbd
$ systemctl status slurmdbd
Note
slurmctld
receives SIGTERM after the first setup. The problem was solved by editing the PIDFile
configuration in the .service
file and run command systemctl daemon-reload
.
SlurmDBD¶
Install slurmdbd from RPM packages.
$ rpm -ivh /utils/slurm/rpmbuild/RPMS/x86_64/slurm-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-slurmdbd-18.08.3-1.el7.x86_64.rpm
Create required directory
$ mkdir -p /var/log/slurm/ /var/run/slurm/
$ chown slurm:slurm /var/log/slurm/
$ chown slurm:slurm /var/run/slurm/
Create slurmdbd.conf
in /usr/lib/tmpfiles.d/
. The content of slurmdbd.conf
is as follows
d /var/run/slurm 0755 slurm slurm -
Configure MySQL
The following SQL code creates a database slurm_acct_db
and user slurmdbd
and
grants administrator privilege on the database to slurmdbd
user.
CREATE DATABASE slurm_acct_db;
create user 'slurmdbd'@'<slurmdbd_IP>' identified by '<password>';
grant all on slurm_acct_db.* TO 'slurmdbd'@'<slurmdbd_IP>';
Edit PIDFile configuration in /usr/lib/systemd/system/slurmdbd.service
to the same localtion in slurmdbd.conf
(Current setting: /var/run/slurm/slurmdbd.pid
).
Following script could be use for editing.
$ sed -i -e 's@PIDFile=/var/run/slurmdbd.pid@PIDFile=/var/run/slurm/slurmdbd.pid@g' /usr/lib/systemd/system/slurmdbd.service
Setup firewall
$ firewall-cmd --add-port 6819/tcp --permanent
$ firewall-cmd --reload
Start slurmdbd
service
$ systemctl enable slurmdbd
$ systemctl start slurmdbd
$ systemctl status slurmdbd
Note
slurmdbd
receives SIGTERM after the first setup. The problem was solved by editing the PIDFile
configuration in the .service
file and run command systemctl daemon-reload
.
Slurmd¶
Install slurmd
from RPM packages.
$ rpm -ivh /utils/slurm/rpmbuild/RPMS/x86_64/slurm-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-slurmd-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-perlapi-18.08.3-1.el7.x86_64.rpm \
/utils/slurm/rpmbuild/RPMS/x86_64/slurm-pam_slurm-18.08.3-1.el7.x86_64.rpm
Setup firewall
$ firewall-cmd --add-port 6818/tcp --permanent
$ firewall-cmd --add-port 60001-63000/tcp --permanent
$ firewall-cmd --reload
Create required directory
$ mkdir -p /var/log/slurm/ /var/run/slurm/ /var/spool/slurm/
$ chown slurm:slurm /var/log/slurm/
$ chown slurm:slurm /var/run/slurm/
$ chown slurm:slurm /var/spool/slurm/
Create slurmd.conf
in /usr/lib/tmpfiles.d/
. The content of slurmd.conf
is as follows
d /var/run/slurm 0755 slurm slurm -
Edit PIDFile configuration in /usr/lib/systemd/system/slurmd.service
to the same localtion in slurm.conf
(Current setting: /var/run/slurm/slurmd.pid
).
Following script could be use for editing.
$ sed -i -e 's@PIDFile=/var/run/slurmd.pid@PIDFile=/var/run/slurm/slurmd.pid@g' /usr/lib/systemd/system/slurmd.service
Start slurmdbd
service
$ systemctl enable slurmd
$ systemctl start slurmd
$ systemctl status slurmd
Note
slurmd
receives SIGTERM after the first setup. The problem was solved by editing the PIDFile
configuration in the .service
file and run command systemctl daemon-reload
.
Bringing node to idle state using scontrol
. For example,
$ scontrol update NodeName=tara-c-00[1-6] State=DOWN Reason="undraining"
$ scontrol update NodeName=tara-c-00[1-6] State=RESUME
Installing nhc —— ———-
Download RPM package
$ wget https://github.com/mej/nhc/releases/download/1.4.2/lbnl-nhc-1.4.2-1.el7.noarch.rpm
Install nhc package
$ rpm -ivh /utils/nhc/lbnl-nhc-1.4.2-1.el7.noarch.rpm
PAM Setup¶
/etc/pam.d/sshd
After password include password-auth
line, adds
account sufficient pam_slurm_adopt.so
account required pam_access.so
In pam_access configuration file (/etc/security/access.conf
), add
+:root:ALL
-:ALL:ALL
To guarantee that slurm services start after NFS, update /usr/lib/systemd/system/slurmd.service
from
After=munge.service network.target remote-fs.target
to
After=munge.service network.target remote-fs.target etc-slurm.mount
Lmod and EasyBuild¶
EasyBuild¶
Create modules group and user with a home-directory on a shared filesystem
$ groupadd modules -g 2002
$ useradd -m -c "Modules user" -d /utils/modules -u 2002 -g modules -s /bin/bash modules
Configures environment variables for bootstrapping EasyBuild
$ export EASYBUILD_PREFIX=/utils/modules
Download EasyBuild bootstrap script
$ wget https://raw.githubusercontent.com/easybuilders/easybuild-framework/develop/easybuild/scripts/bootstrap_eb.py
Execute boostrap_eb.py
$ python bootstrap_eb.py $EASYBUILD_PREFIX
Update $MODULEPATH
export MODULEPATH="/utils/modules/modules/all:$MODULEPATH"
Test EasyBuild
$ module load EasyBuild
$ eb --version
# OPTIONAL Unittest
$ export TEST_EASYBUILD_MODULES_TOOL=Lmod
$ python -m test.framework.suite
Enable access to all users.
Change permissions of /utils/modules/
chmod a+rx /utils/modules
Add z01_EasyBuild.sh
to /etc/profile.d/
. The content of the file is as follows
if [ -z "$__Init_Default_Modules" ]; then
export __Init_Default_Modules=1
export EASYBUILD_MODULES_TOOL=Lmod
export EASYBUILD_PREFIX=/utils/modules
module use $EASYBUILD_PREFIX/modules/all
else
module refresh
fi
EasyBuild robot path.
/utils/modules/software/EasyBuild/3.7.1/lib/python2.7/site-packages/easybuild_easyconfigs-3.7.1-py2.7.egg/easybuild/easyconfigs
Setup Lmod on other nodes¶
Install Lmod
$ yum install lmod
Add z01_EasyBuild.sh
to /etc/profile.d/
. The content of the file is as follows
if [ -z "$__Init_Default_Modules" ]; then
export __Init_Default_Modules=1
export EASYBUILD_MODULES_TOOL=Lmod
export EASYBUILD_PREFIX=/utils/modules
module use $EASYBUILD_PREFIX/modules/all
else
module refresh
fi
Intel License Manager¶
https://software.intel.com/en-us/articles/intel-software-license-manager-users-guide