====== COMSOL 5.3 ====== ===== Running COMSOL in parallel ===== COMSOL can run a job on many cores in parallel (Shared-memory processing or multithreading) and on many physical nodes (distributed computing through MPI). A good strategy is to use both parallel operations to maximize parallelization benefits. This means that you should request some cores on a certain number of nodes to leverage Comsol parallelization features. Cluster computing requires a floating network license (provided by POLIMI). ===== Four ways to run a cluster job ===== - **Submit cluster-enabled batch job via PBS script** - Requires completed and saved model mph file - **Branch off cluster-enabled batch jobs from the COMSOL GUI process started on masternode** - Allows GUI model work, and batch job submission to PBS from within the GUI; limited command-line proficiency needed; with the Cluster Sweep feature it is possible to submit a single batch job from the COMSOL GUI and continue working in the GUI while the cluster job is computing in the background. - **Start a cluster-enabled COMSOL desktop GUI on the masternode and work interactively with cluster jobs** - **Start the COMSOL Desktop GUI as a client on a local PC or Mac and connect to a cluster-enabled COMSOL server on the masternode and work interactively** **Actually we support only method 1 and 2.** ===== Environment and Documentation ===== The command to set the Comsol environment is: **module load comsol/5.3.0** The link to the documentation is: [[http://masternode.chem.polimi.it/comsol53|http://masternode.chem.polimi.it/comsol53]] ** BLAS Library ** A large portion of the computational engine in COMSOL Multiphysics relies on BLAS. Included with COMSOL Multiphysics is the MKL (Math Kernel Library) BLAS library, which is the default BLAS library. For AMD processors, COMSOL Multiphysics also includes the ACML (AMD Core Math Library) BLAS library, optimized for AMD processors with SSE2 support, which might improve performance in some cases. It is also possible to **supply another BLAS library** optimized for some of our hardware on request. ** Comsol in Distributed Memory ** This is the case when running Comsol on more than one compute node relaying on MPI. In Comsol jargon, compute nodes and hosts can be different, where hosts are physical nodes on which the conpute nodes runs. __In our cluster compute node and host are synonim__. MUMPS and SPOOLES Solvers are supported in distributed memory while PARDISO not. Pardiso is substituted by MUMPS or if the check box **Parallel Direct Sparse Solver for Clusters** is selected by Intel MKL Parallel Direct Sparse Solver for Clusters. An additional benefit is that the **memory usage per node is lower** than when COMSOL Multiphysics is run in a nondistributed mode. Therefore, if you run COMSOL Multiphysics in the distributed mode on a cluster distributed over several computer nodes, you can solve a larger problem compared to when you run in a nondistributed mode. ** Comsol and MPI ** Comsol is shipped with it's own Intel MPI library (default) and supports also other system installed MPI based on MPICH2 through command options. By default Comsol uses Hydra Process Manager to initialize MPI environment and start parallel jobs; Hydra is designed to work natively with PBS and SSH and is part of the MPICH based MPI implementations, including Intel MPI. See [[https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager|Using the Hydra Process Manager]] and [[https://wiki.mpich.org/mpich/index.php/Hydra_Process_Management_Framework|Hydra Process Management Framework]] on [[http://www.mpich.org/|MPICH website]]. Hydra is more scalable than MPD, the other older process manager, that can still be invoked using the option **-mpd**; actually MPD is not supported on our cluster. **Comsol command line** The option **-clustersimple** together with **batch** command setup Comsol Intel MPI environment (translating in Hydra commands transparently) and let Intel MPI to correctly understand the environment. The options **-nn ** - the number of compute nodes allocated for this job - and **-f ** - the list of hostnames of the nodes allocated - as you can see in the jobfile below, are passed directly to Comsol from PBS so there is no need to specify them. The options **-inputfile ** and **-outputfile ** are needed. The options **-mpiarg -rmk -mpiarg pbs** tell to default comsol Intel MPI that the scheduler is PBS. Intel MPI automatically detects interconnection fabric between nodes (eg Infiniband, Myrinet, Ethernet); the option **-mpifabrics fabric1:fabric2**, where fabric1 generally is shm and fabric2 tcp or ofa can be set to override. ===== PBS jobfile ===== #!/bin/bash # # Set Job execution shell #PBS -S /bin/bash # Set Job name: =jobcomsol #PBS -N jobcomsol # Set the execution queue: is one of # gandalf, merlino, default, morgana, covenant #PBS -q # Set mail addresses that will receive mail from PBS about job # Can be a list of addresses separated by commas (,) #PBS -M # Set events for mail from PBS about job #PBS -m abe # Job re-run (yes or no) #PBS -r n # Set standard output file #PBS -o jobcomsol.out # Set standard error file #PBS -e jobcomsol.err # Set request for N nodes,C (cores),P mpi processes per node #PBS -l select=N:ncpus=C:mpiprocs=P # Pass environment to job #PBS -V # Change to submission directory cd $PBS_O_WORKDIR # Command to launch application and it's parameters module load comsol/5.3.0 export inputfile=".mph" export outputfile=".mph" echo "---------------------------------------------------------------------" echo "---Starting job at: `date`" echo echo "------Current working directory is `pwd`" np=$(wc -l < $PBS_NODEFILE) echo "------Running on ${np} processes (cores) on the following nodes:" cat $PBS_NODEFILE echo "----Parallel comsol run" comsol -clustersimple batch -mpiarg -rmk -mpiarg pbs -inputfile $inputfile -outputfile $outputfile -batchlog jobcomsol.log echo "-----job finished at `date`" echo "---------------------------------------------------------------------" ===== Useful links ===== Please report new interesting or broken links to [[mailto:clusterhpc-dcmc@polimi.it|clusterhpc-dcmc@polimi.it]] Solvers integrated in Comsol: [[http://www.pardiso-project.org/|The PARDISO Solver Project]] **PAR**allel sparse **D**irect and multi-recursive **I**terative linear **SO**lvers [[http://www.netlib.org/linalg/spooles/spooles.2.2.html|SParse Object Oriented Linear Equations Solver]] [[http://mumps.enseeiht.fr/|MUltifrontal Massively Parallel sparse direct Solver]] [[https://software.intel.com/en-us/mkl-developer-reference-c-parallel-direct-sparse-solver-for-clusters-interface|Intel MKL Parallel Direct Sparse Solver for Clusters]] Quick guides and papers: [[https://www.carc.unm.edu/user-support/quick-byte-how-tos/25-parallel-comsol-jobs.html|A quick and clear guide on how to prepare the MPH input file for cluster processing]] [[https://itpeernetwork.intel.com/hybrid-parallel-simulation-solutions-with-multiphysics-and-intel/|Hybrid parallel simulation solutions with comsol on intel multicore and HPC clusters]] [[https://www.semanticscholar.org/paper/Parallel-Performance-Studies-for-COMSOL-Multiphysi-Petra-Gobbert/b683ca0c8b936716ce96a90831deb56047d7dc98|Parallel Performance Studies for COMSOL Multiphysics Using Scripting and Batch Processing]]