Molcas Forum

Molcas support and discussions for users and developers

You are not logged in.

Announcement

Welcome to the Molcas forum. You can choose an avatar and change the default style by going to "Profile" → "Personality" or "Display".

#1 2017-07-10 07:26:51

jost
Member
Registered: 2016-03-03
Posts: 2

Problem running MOLCAS on Omni-Path cluster

Dear MOLCAS-developers,
right now we are trying to setup MOLCAS 8.2 on our cluster. As the cluster is quite new it uses Intel's Omni-Path to interconnect the nodes.

Until now I tried different compiler & MPI combinations:
Open MPI 1.8.8 and gcc 4.8.5 and cmake

Open MPI 1.8.8 & gcc 4.8.5

mpirun --version
mpirun (Open MPI) 1.8.8

gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)

gfortran --version
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)

Open MPI and MOLCAS were compiled with same compiler. All in all everything runs, all (serial) tests pass but when running MOLCAS I get a bunch of errors related to Omni-Path:

--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            node008
  Device name:           hfi1_0
  Device vendor ID:      0x1175
  Device vendor part ID: 9456

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[node008:95200] 7 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[node008:95200] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

This is related to the fact, that Open MPI just started to support Omni-Path/PSM2 from version 1.10 upwards. I'm aware of the fact that this Open MPI version isn't supported explicitly by MOLCAS.
Open MPI 1.10.7 and gcc 4.8.5 and cmake
To resolve these errors I also compiled and tested MOLCAS with Open MPI 1.10.7 (with PSM2 support), again with the same gcc 4.8.5.

mpirun --version
mpirun (Open MPI) 1.10.7

gcc 4.8.5

Now the error shown above disappeared and again, all serial tests pass, but when I run it in parallel several issues appear:

  1. SEWARD for calculations with ricd in the &gateway module never finishes (https://pastebin.com/LjFEaUuk). The same calculation runs fine with Open MPI 1.8.8

  2. SEWARD without ricd in &gateway but with expert and chol in &seward runs fine (https://pastebin.com/7TJ13a2e). But a following MS-CASPT2 calculations also seems to never finish (https://pastebin.com/gpfavcJ2).

Compilation with Intel Parallel Studio 2017 and setup-script
Last but no least there is also an installation done by our administrator using Intel's ifort/icc/Intel MPI from Intel Parallel Studio 2017 (configure.log is here https://pastebin.com/pSrtCfwS). This installation produces the most problems:

  1. SEWARD can only be run with 12 or less cores, otherwise it never finishes.

  2. SEWARD with expert and chol just crashes (https://pastebin.com/DCjSMsne)

     ***
     *** Error in Cholesky Core Routine
     *** Message: Severe error in CHO_GETINT
     *** Code   :   104
     ***
  3. In comparison to Open MPI 1.10.7 and gcc 4.8.5 SEWARD with ricd in &gateway works

  4. CASPT2 just crashes (https://pastebin.com/AS3vbSVC)

  5. In comparison to the Open MPI/gcc version this version has much better performance

So, to formulate some specific questions:

  1. Is there some way to debug the "SEWARD-never-finishes" problem and the CASPT2-crash in the version compiled with the Intel suite?

  2. Do you have experience with Omni-Path and Open MPI 1.10.x? Could you imagine that Omni-Path could somehow lead to problems using molcas?

  3. Which version of Intel MPI and icc/ifort is supported? Is the Intel Parallel Studio 2017 too new? All in all I'd be the happiest about a MOLCAS installation using Intel compiler, Intel MPI and MKL

As we'd really like to use MOLCAS 8.2 I'd be happy for any feedback on this. If I have to provide any additional detail on the compilation etc. I'd gladly do so.

Best regards,
jost

Last edited by jost (2017-07-10 08:44:57)

Offline

#2 2017-07-10 08:57:31

Ignacio
Administrator
From: Uppsala
Registered: 2015-11-03
Posts: 231

Re: Problem running MOLCAS on Omni-Path cluster

I feel your frustration, I've been there quite a few times already. My experience is that parallel runs are mostly hit-and-miss, and when everything finally seems to work, some node in the system crashes. To answer you questions:

  1. Debugging in parallel is of course tricky. You could try running "mpirun -np $CPUS xterm -e gdb /path/to/seward.exe" on the scratch directory once all the input files have been created there. But I have never tried this.

  2. No experience with Omni-Path. I've tested Intel MPI (some version), and it seemed to work... Have you tried with GlobalArrays?

  3. The only Intel compiler version officially tested is 2013, but I've run with 2015 for a while too.

A note of caution, when using CMake: make sure the commands actually used by CMake are the good ones (make VERBOSE=1), sometimes CMake choose to use a different compiler or mpi wrapper (see e.g. https://stackoverflow.com/questions/392 … with-cmake )

Offline

Board footer

Powered by FluxBB

Last refresh: Today 21:32:04