User Tools

Site Tools


comsol_5.3

COMSOL 5.3

Running COMSOL in parallel

COMSOL can run a job on many cores in parallel (Shared-memory processing or multithreading) and on many physical nodes (distributed computing through MPI). A good strategy is to use both parallel operations to maximize parallelization benefits. This means that you should request some cores on a certain number of nodes to leverage Comsol parallelization features. Cluster computing requires a floating network license (provided by POLIMI).

Four ways to run a cluster job

  1. Submit cluster-enabled batch job via PBS script - Requires completed and saved model mph file
  2. Branch off cluster-enabled batch jobs from the COMSOL GUI process started on masternode - Allows GUI model work, and batch job submission to PBS from within the GUI; limited command-line proficiency needed; with the Cluster Sweep feature it is possible to submit a single batch job from the COMSOL GUI and continue working in the GUI while the cluster job is computing in the background.
  3. Start a cluster-enabled COMSOL desktop GUI on the masternode and work interactively with cluster jobs
  4. Start the COMSOL Desktop GUI as a client on a local PC or Mac and connect to a cluster-enabled COMSOL server on the masternode and work interactively

Actually we support only method 1 and 2.

Environment and Documentation

The command to set the Comsol environment is: module load comsol/5.3.0

The link to the documentation is: http://masternode.chem.polimi.it/comsol53

BLAS Library

A large portion of the computational engine in COMSOL Multiphysics relies on BLAS. Included with COMSOL Multiphysics is the MKL (Math Kernel Library) BLAS library, which is the default BLAS library. For AMD processors, COMSOL Multiphysics also includes the ACML (AMD Core Math Library) BLAS library, optimized for AMD processors with SSE2 support, which might improve performance in some cases. It is also possible to supply another BLAS library optimized for some of our hardware on request.

Comsol in Distributed Memory

This is the case when running Comsol on more than one compute node relaying on MPI. In Comsol jargon, compute nodes and hosts can be different, where hosts are physical nodes on which the conpute nodes runs. In our cluster compute node and host are synonim.

MUMPS and SPOOLES Solvers are supported in distributed memory while PARDISO not. Pardiso is substituted by MUMPS or if the check box Parallel Direct Sparse Solver for Clusters is selected by Intel MKL Parallel Direct Sparse Solver for Clusters.

An additional benefit is that the memory usage per node is lower than when COMSOL Multiphysics is run in a nondistributed mode. Therefore, if you run COMSOL Multiphysics in the distributed mode on a cluster distributed over several computer nodes, you can solve a larger problem compared to when you run in a nondistributed mode.

Comsol and MPI

Comsol is shipped with it's own Intel MPI library (default) and supports also other system installed MPI based on MPICH2 through command options. By default Comsol uses Hydra Process Manager to initialize MPI environment and start parallel jobs; Hydra is designed to work natively with PBS and SSH and is part of the MPICH based MPI implementations, including Intel MPI.

See Using the Hydra Process Manager and Hydra Process Management Framework on MPICH website.

Hydra is more scalable than MPD, the other older process manager, that can still be invoked using the option -mpd; actually MPD is not supported on our cluster.

Comsol command line

The option -clustersimple together with batch command setup Comsol Intel MPI environment (translating in Hydra commands transparently) and let Intel MPI to correctly understand the environment.

The options -nn <number of nodes> - the number of compute nodes allocated for this job - and -f <filename> - the list of hostnames of the nodes allocated - as you can see in the jobfile below, are passed directly to Comsol from PBS so there is no need to specify them.

The options -inputfile <input-file.mph> and -outputfile <output-file.mph> are needed.

The options -mpiarg -rmk -mpiarg pbs tell to default comsol Intel MPI that the scheduler is PBS.

Intel MPI automatically detects interconnection fabric between nodes (eg Infiniband, Myrinet, Ethernet); the option -mpifabrics fabric1:fabric2, where fabric1 generally is shm and fabric2 tcp or ofa can be set to override.

PBS jobfile

#!/bin/bash
#
# Set Job execution shell
#PBS -S /bin/bash
 
# Set Job name: <jobname>=jobcomsol 
#PBS -N jobcomsol
 
# Set the execution queue: <queue name> is one of 
# gandalf, merlino, default, morgana, covenant
#PBS -q <queue name>
 
# Set mail addresses that will receive mail from PBS about job
# Can be a list of addresses separated by commas (,)
#PBS -M <polimi.it or mail.polimi.it email address only>
 
# Set events for mail from PBS about job
#PBS -m abe
 
# Job re-run (yes or no)
#PBS -r n
 
# Set standard output file 
#PBS -o jobcomsol.out
 
# Set standard error file 
#PBS -e jobcomsol.err
 
# Set request for N nodes,C (cores),P mpi processes per node
#PBS -l select=N:ncpus=C:mpiprocs=P
 
# Pass environment to job
#PBS -V
 
# Change to submission directory
cd $PBS_O_WORKDIR
 
# Command to launch application and it's parameters
 
module load comsol/5.3.0
 
export inputfile="<name-of-model-input-mph-file>.mph"
export outputfile="<name-of-output-mph-file>.mph"
 
echo "---------------------------------------------------------------------"
echo  "---Starting job at: `date`"
echo
echo "------Current working directory is `pwd`"
np=$(wc -l < $PBS_NODEFILE)
echo "------Running on ${np} processes (cores) on the following nodes:"
cat $PBS_NODEFILE
echo "----Parallel comsol run"
comsol -clustersimple batch -mpiarg -rmk -mpiarg pbs -inputfile $inputfile -outputfile
 $outputfile -batchlog jobcomsol.log
echo "-----job finished at `date`"
echo "---------------------------------------------------------------------"
comsol_5.3.txt · Last modified: 2017/11/28 18:10 by druido