User Tools

Site Tools


queues_and_resources

Queues and Access to the cluster

You can start the execution of a job on one or more nodes on the cluster by creating a jobfile and submitting it to the Queues manager and scheduler (PBS). The jobfile is basically a shell script containing:

  1. shell commands
  2. PBS directives that tell the scheduler specific information about start/execution/stop/error behaviour of your job
  3. the command to start the specific application (one of those installed or your compiled code)

The submission of the jobfile to the PBS scheduler is simply a request to allocate the resources that You need for a maximun amount of time during which your job is supposed to end. Given the resources and the expected maximun execution time, PBS can schedule your job to start in the future, depending on the cluster resource availability. In a cluster environment the resources are organized in queues and all jobs submitted to a specific queue must wait in that queue to start as soon as the resources needed by the job are available.

Queues configured and their resources

The cluster is heterogeneous from the hardware point of view, so the queue configuration is driven by the necessity to keep similar hardware and interconnection technology together.

Queue name Max # of nodes Max # of cores(cpu) Max memory per node (GB) Memory/cpu (GB/core) MPI Interconnect network Access restriction
gandalf 6 (node0 to node5) 8 8 1 Ethernet Gbit none
merlino 9 (node18 to node26) 8 16 2 Ethernet Gbit none
default 15 (gandalf + merlino) 8 8 1 Ethernet Gbit none
morgana 11 (node6 to node16) 8 24 3 Infiniband yes (based on ACL)
covenant 1 (node17) + 2 (node51 to 52) 40/40 with HT 256/320 6.4/8 Shared memory yes (based on ACL)
polimeri 2 (node31 to 32) 24 48 2 Ethernet Gbit yes (based on ACL)
legolas 12 (node33 to 44) 8 24 1 Ethernet Gbit yes (based on ACL)
endurance 2 (node29, node49) 8 32 2 Ethernet Gbit yes (based on ACL)
raosq 5 (node45 to 48, node50) 8 24 1 Ethernet Gbit yes (based on ACL)
minervanichoid 2 (node27, node30) 8 32 2 Ethernet Gbit yes (based on ACL)

Access to the cluster and general usage terms

The masternode is the unique login node at the moment.

To improve security the login is possible

  • via SSH/SFTP client with username/private key
  • via Web interface to SSH with username/password (no file transfer, no graphic)

The private key is a file that is assigned by STI to the user together with the relative passphrase and is intended for personal use. It's possible to tunnel X11 connections to have some graphical interface for applications (not for Web SSH interface).

Notice for privacy: the usage of the cluster is restricted exclusively to the reasearch and teaching Projects of the Research Groups @ DCMC, so the home folder of the user is supposed NOT to contain any personal reserved file. The user private key will be shared between the owner and the his Group Members in charge of the specific Project to guarantee the control over the resource usage and the resulting data.

The login to the compute nodes and the direct execution of applications compromises the correct resource management and allocation of the scheduler, therefore it's inhibited by the PBS scheduler with the following rule:

The login to a compute node is possible only for an user that has a job in running state on that node and as far as the job stays in the running state. This feature is needed to check the job running evolution through the node local scratch /scratch_local folder content, usually for debugging.

It's possible to have a short list of users permitted to login to compute nodes as an exception to the above rule, but we strongly discourage this idea. Please ask the support email for this request.

It's possible to activate an interactive session on a compute node via a submission command parameter.

Login to the cluster via SSH client

To login to the cluster You should:

1- Get an account to the cluster (username/password) and the corresponding private key file and passphrase assigned by STI. Usually You obtain them after Your registration to the Dept. Intranet together with the Dept. Network account.

2- Install an SSH/SFTP client on your computer and also an X server only in case You need graphical connection to the masternode. For Windows pc we suggest MobaXTerm that comprises both the SSH client and the X Server; download the portable version (no need of installation, just download and execute) or the installer version at https://mobaxterm.mobatek.net/download-home-edition.html

3- Configure a profile on your SSH client with hostname masternode.chem.polimi.it, your username and flag the option Tunnel X11 connections

4- If you're connected to a network other than the wired DCMC (otherwise skip to next point), e.g. eduroam and polimi-protected wifi (NOT polimi), You need to access the DCMC VPN with Your Dept. Network account (CHKNET account)

5- Connect with the profile created and give the passphrase at the prompt request.

For any troubleshooting please contact STI by phone or send an email to the HPC Support clusterhpc-dcmc@polimi.it

Login to the cluster via Web interface to SSH

To login to the cluster via Web You should:

1- Get an account to the cluster (username/password) and the corresponding private key file and passphrase assigned by STI. Usually You obtain them after Your registration to the Dept. Intranet together with the Dept. Network account.

2- If you're connected to a network other than the wired DCMC (otherwise skip to next point), e.g. eduroam and polimi-protected wifi (NOT polimi), You need to access the DCMC VPN with Your Dept. Network account (CHKNET account)

3- Connect with a browser to the site https://masternode.chem.polimi.it/webssh and accept the security Exception for the self-signed certificate permanently

4- Use your credentials (username and password) to login at the prompt

Notice that this is a text command interface only, so You cannot transfer files between Your client and the Masternode or open graphical windows. Nevertheless it's a quick and useful way to login and check for job execution and errors.

For any troubleshooting please contact STI by phone or send an email to the HPC Support clusterhpc-dcmc@polimi.it

queues_and_resources.txt · Last modified: 2019/04/09 18:27 by druido