====== HPC Cluster @ DCMC ====== The Cluster is made of some servers and blades organized in 3 racks physically located in the Cluster Room and Datacenter Room in Mancinelli site. The main hardware components where part of the former Labs cluster, new acquisitions and one blade from Mathematics Dept. Datacenter donated from a Research group of DICA. ===== Hardware ===== ^Server or Blade ^# nodes and processors ^# cores/processor ^Ram (GB) ^Local storage ^Network interfaces^ | **Blade Gandalf** | 6 nodes Dell with 2 Intel Xeon | 4 | 8 | 1 HD SAS 73GB | 1Gbit Ethernet for Management and Data| | **Blade Legolas** | 16 nodes Hp with 2 Intel Xeon | 4 | 24 | 1 HD SAS 73GB | 1Gbit Ethernet for Management and Data| | **Blade Merlino** | 9 nodes Dell with 2 AMD Opteron | 4 | 16 | 1 HD SATA 80GB | 1Gbit Ethernet for Management, 10Gbit Ethernet for Data| | **Blade Morgana** | 11 nodes Dell with 2 Intel Xeon | 4 | 24 | 2 HD SAS 146GB Raid0 | 1Gbit Ethernet for Management, 20Gbit Mellanox Infiniband for MPI, 10Gbit Ethernet for Data| | **Covenant** | 1 node Hp with 4 Intel Xeon, 2 nodes Dell with 2 Xeon cpu | 10,20 | 256,320 | 2 HD SAS 1TB Raid0 | 1Gbit Ethernet for Management, 10Gbit Ethernet for Data| | **Masternode** | 1 node Dell with 2 Intel Xeon| 4 | 24 | 2 HD SAS 1TB Raid1 for OS, 4 HD SAS 2TB Raid5 for Scratch, 35TB Storage for Home |1Gbit Ethernet for Management, 1Gbit Ethernet for Frontend/login, 1Gbit Ethernet for nodes console (iDRAC, ILO), 2 10Gbit Ethernet for Data, Fiber Channel 8Gbps for Storage, 1 10Gbit Mellanox Infiniband for Infiniband control| | **GPU nodes** | 5 nodes Dell T630/T640 with 2 Intel Xeon and 1 or 2 nVidia GPUs| 8 | 32/64 | 1 HD SAS 1TB | 1Gbit Ethernet for Management, 1Gbit Ethernet for Data| The Masternode provides: - A file system **/opt/ohpc/pub/** on local Raid storage for installed applications, shared via NFS on Gbit network - A file system **/homes** on Dept. Storage (up to 35 TB) for user homes shared via NFS on Gbit or 10Gbit network (Morgana nodes have 1 Gb bandwidth guaranteed due to 10Gb Ethernet hardware) - A file system **/scratch** on local Raid storage (up to 5.5 TB) for scratch shared via NFS on Gbit or 10Gbit network (Morgana nodes have 1 Gb bandwidth guaranteed due to 10Gb Ethernet hardware) - Login for users via SSH and via Web SSH interface - Node provisioning software (installation of new nodes and node rebuild in less than 5 min.) - PBSPro master for scheduling and resource management - Support for Singularity container technology - Infiniband network controller (via software) - Application software for compute nodes - Monitoring and utility software: Ganglia, Nagios - Documentation website: [[http://masternode.chem.polimi.it|http://masternode.chem.polimi.it]] accessible only from the wired DCMC network or via DCMC VPN - Web SSH interface [[https://masternode.chem.polimi.it/webssh|https://masternode.chem.polimi.it/webssh]] accessible only from the wired DCMC network or via DCMC VPN - Web interface to PBSPro submission: planned for 2019 ===== Software ===== The cluster has been installed following the directions of the **OpenHPC Project**, an open source project backed by Intel and many software and hardware HPC player and supported by a strong developer community. The main reason for this choice is that they committed to support the upgrade of the software platform following the evolution of the single components (OS, compilers, MPI distributions, toolchains components, utility software, hardware support, container technology) and also for their independence from a specific vendor or technology. The installed software is divided in Management and operations software and Application software. At the startup on April, 1st 2019 the software are: ==== Management and operations software ==== - **OpenHPC 1.3.6** (October 2018) for Linux CentOS 7.5 nodes image - **PBS Pro 18.1** (Altair released PBSPro as open source in mid 2016) - **Compilers:** **GCC 5.4.0**, **GCC 7.3.0**, **GCC 8.2.0**, **Intel C++/Fortran/MPI 19.0.3.199** ver. 2019 update 3 - **MPI:** **OpenMPI 1.10.7**, **OpenMPI 3.1.2**, **Intel MPI 2019.3**, **MPICH 3.2.1**, **MVAPICH2 2.2**, **MVAPICH2 2.3** - **Ganglia** for monitoring cluster performance (public access) - **Nagios** for monitoring nodes health status (restricted access) ==== Application software ==== - **Abaqus 2017** - **Abaqus 2018** - **Abaqus 2019** - **Ansys 19.3 2019 R1** (Fluent and LSDYNA) - **Comsol 5.4** - **Matlab R2019a** with parallel support - **Python 2.x** stack with Scipy,Numpy,Mpi4Py - **Python 3.4** stack with Scipy,Numpy,Mpi4py - **AdIOS** - **HDF5** and **pHDF5** - **NetCDF** and **pNetCDF** - **SIONLIB** - **R 3.5** - **Trilinos 12.12.1** And many more libraries and tools for parallel and scalar scientific programming. ===== Usage guides ===== [[Queues and Resources|Queues and access to the cluster]]\\ [[pbs_jobfile_structure|PBS jobfile structure]]\\ [[Modules|Modules]]