Molcas Forum

Support and discussions for Molcas and OpenMolcas users and developers

You are not logged in.

Announcement

Welcome to the Molcas forum. You can choose an avatar and change the default style by going to "Profile" → "Personality" or "Display".

#1 2016-01-11 12:21:59

chiteo
Member
Registered: 2016-01-08
Posts: 5

Molcas MPI Calculation Restart

Hello,

I compiled and installed the MPI version of Molcas 8.0. It works well, but when I try to restart or continue a calculation using a different number of CPUs, the program stops with an I/O error. In particular I did CASSCF with 8 CPUs and I tried a CASPT2 with 2 CPUs and I got:

()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()
                                 MOLCAS executing module CASPT2 with 28000 MB of memory
                                              at 12:07:26 Mon Jan 11 2016
                                Parallel run using   2 nodes, running replicate-data mode
()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()()


++ I/O STATISTICS

  I. General I/O information
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Unit  Name          Flsize      Write/Read            MBytes           Write/Read
                      (MBytes)       Calls              In/Out           Time, sec.
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
   1  RUNFILE          15.44 .      21/      66 .      0.1/      0.5 .       0/       0
   2  LUSOLV            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   3  LUSBT             0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   4  LUHLF1            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   5  LUHLF2            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   6  LUHLF3            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   7  DRARR             0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   8  DRARRT            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
   9  RHS_01            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  10  RHS_02            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  11  RHS_03            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  12  RHS_04            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  13  RHS_05            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  14  RHS_06            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  15  H0T_01            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  16  H0T_02            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  17  H0T_03            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  18  H0T_04            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  19  LUDMAT            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  20  JOBIPH            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  21  JOBMIX            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  22  LUCIEX            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  23  MOLONE            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  24  MOLINT            0.00 .       0/       0 .      0.0/      0.0 .       0/       0
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
   *  TOTAL            15.44 .      21/      66 .      0.1/      0.5 .       0/       0
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  II. I/O Access Patterns
  - - - - - - - - - - - - - - - - - - - -


  II. I/O Access Patterns
  - - - - - - - - - - - - - - - - - - - -
  Unit  Name               % of random
                         Write/Read calls
  - - - - - - - - - - - - - - - - - - - -
   1  RUNFILE             28.6/   7.6
   2  LUSOLV               0.0/   0.0
   3  LUSBT                0.0/   0.0
   4  LUHLF1               0.0/   0.0
   5  LUHLF2               0.0/   0.0
   6  LUHLF3               0.0/   0.0
   7  DRARR                0.0/   0.0
   8  DRARRT               0.0/   0.0
   9  RHS_01               0.0/   0.0
  10  RHS_02               0.0/   0.0
  11  RHS_03               0.0/   0.0
  12  RHS_04               0.0/   0.0
  13  RHS_05               0.0/   0.0
  14  RHS_06               0.0/   0.0
  15  H0T_01               0.0/   0.0
  16  H0T_02               0.0/   0.0
  17  H0T_03               0.0/   0.0
  18  H0T_04               0.0/   0.0
  19  LUDMAT               0.0/   0.0
  20  JOBIPH               0.0/   0.0
  21  JOBMIX               0.0/   0.0
  22  LUCIEX               0.0/   0.0
  23  MOLONE               0.0/   0.0
  24  MOLINT               0.0/   0.0
  - - - - - - - - - - - - - - - - - - - -
--
 ###############################################################################
 ###############################################################################
 ###                                                                         ###
 ###                                                                        ###
 ###    Location: AixRd                                              ###
 ###    File: JOBIPH                                                   ###
 ###                                                                         ###
 ###                                                                         ###
 ###    Premature abort while reading buffer from disk ###
 ###    End of file reached                                          ###
 ###                                                                         ###
 ###                                                                         ###
 ###                                                                         ###
 ###                                                                         ###
###############################################################################
 ###############################################################################

This not happens when I use the same number of CPUs I used in the CASSCF calculation, i.e. 8 CPUs. How to solve that?

Thanks

Francesco

Offline

#2 2016-01-11 12:51:57

valera
Administrator
Registered: 2015-11-03
Posts: 94

Re: Molcas MPI Calculation Restart

In parallel run, Molcas creates temporary files in specific directories $WorkDir/tmp_001, _002 etc. I would say that in order to "continue" a calculation, one should have identical structure of directories - not only the total number, but the same order.


The main reason of my preference with respect to operating systems: cp is two characters, but copy is four.

Offline

#3 2016-01-11 13:04:03

Steven
Administrator
From: Lund
Registered: 2015-11-03
Posts: 95

Re: Molcas MPI Calculation Restart

In parallel, certain data is spread over different processes, which means you need the same data layout to continue a calculation.

In practice, the only data that is spread out is integrals from seward. So if you want to run a CASSCF and a CASPT2 with a different amount of processes, you would need to rerun gateway/seward.


Always check the orbitals.

Offline

#4 2016-01-11 14:40:39

chiteo
Member
Registered: 2016-01-08
Posts: 5

Re: Molcas MPI Calculation Restart

I did a test copying the .JobIph file from the previous CASSCF calculation in the scratch directory and rerunning gataway/seward, but I got the same error in the tmp_1/stderr._______1 file. Here is my input.

&GATEWAY
 BASLIB = /users/p0880/talotta/rupy4clno
 coord= geom.xyz
 Basis= VTZP,Ru.ECP.STUTTGART.8s7p6d1f.6s5p3d1f.
 Group= nosym
&SEWARD
 LOW Cholesky
&CASPT2
 Title
  Multi State CASPT2 starting from CASSCF(16,13) stv3 wavefuntion
 Multistate
  3 1 2 3

Maybe I'm missing something?

Offline

#5 2016-05-02 15:56:01

Vicente
Member
Registered: 2016-05-02
Posts: 1

Re: Molcas MPI Calculation Restart

I did, no exactly the same, but like that:

- First a parallel calculation with 8 CPUS
- Later, I restarted my calculation in serial mode just for CASPT2.

In order to restart the calculation, I copied Work directory, and used this input:

&gateway
   coord=$HomeDir/koko.xyz
   basis=ano-s-vdzp
   group=nosymm
   ricd

&seward

&caspt2
   multi=1 1
   nomulti
   ipea=0.00

And it worked fine.

I was supossed that we don't need all the files, only JopIph, RasOrb and Cholesky ones ... but just in case I copied all of them.

C U

Vicente.

Offline

Board footer

Powered by FluxBB

Last refresh: Today 00:27:34