Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  •  $SCRATCH mount a GPFS parallel filesystem which is designed to perform well with parallel I/O.
  • * In  In certain cases a big number of files is produced at runtime. This IO pattern puts stress on the $SCRATCH and can cause hardware failure from time to time.
  • * A  A proper solution would require the usage of external libraries like HDF5 or ADIOS. Those libraries gives certain flexibility in the way data are saved/handled.
  • * A  A workaround would be to use the local filesystem $TMPDIR
  • * $TMPDIR  $TMPDIR is visible only once resources are allocated. You can try to query the value of $TMPDIR after the login :
Info
iconfalse
    ssh fidis   
    [nvarini@fidis ~]$ echo $TMPDIR

    [nvarini@fidis ~]$

...



  •  However:


Info
iconfalse
    [nvarini@fidis ~]$ Sinteract
    Cores:            1
    Tasks:            1
    Time:             00:30:00
    Memory:           4G
    Partition:        parallel
    Account:          scitas-ge
    Jobname:          interact
    Resource:         
    QOS:              normal
    Reservation:      

...

Info
iconfalse
   salloc: Pending job allocation 159671
    salloc: job 159671 queued and waiting for resources
    salloc: job 159671 has been allocated resources
    salloc: Granted job allocation 159671
    srun: Job step created

    [nvarini@f061 ~]$ echo $TMPDIR
    /tmp/159671

...



  •  The variables $TMPDIR, $WORK and $SCRATCH are set by the SLURM prescheduler


How to use the $TMPDIR in your simulations* The

  • The following example show how to set the $TMPDIR with Quantum-ESPRESSO(QE).
  • QE relies on fortran namelists to read certain parameters used during the simulation.
  • The only change that has to be done to a standard pw input is related to the outdir in the &CONTROL namelist. For example, in the input below the outdir is set equal to fakeoutdir:

...



Info
iconfalse
   &CONTROL
    calculation = 'scf',
    restart_mode = 'from_scratch',
    prefix = 'lgps_diel'
    tstress = .false.
    tprnfor = .false.  
    outdir = 'fakeoutdir'
    pseudo_dir = '/scratch/nvarini/pseudo'
    disk_io='low'
    max_seconds=1800
    \

...



  • The submission script would look like

...


Info
iconfalse
#!/bin/bash
    #SBATCH --nodes 2
    #SBATCH --time=1:00:00
    #SBATCH -p debug

    module purge
    module load intel/16.0.3
    module load intelmpi/5.1.3
    module load fftw/3.3.4-mpi
    module load mkl/11.3.3


    sed "s|fakeoutdir|${TMPDIR}|g" temp_pw > ${TMPDIR}/${SLURM_JOB_ID}_pw
    srun pw.x < ${TMPDIR}/${SLURM_JOB_ID}_pw>${TMPDIR}/${SLURM_JOB_ID}.tmp.out
    tar cvf ${SLURM_JOB_ID}.archive.tar ${TMPDIR}/* .

...



  • After the sed command the CONTROL namelist looks like:
Info
iconfalse
    &CONTROL
    calculation = 'scf',
    restart_mode = 'from_scratch',
    prefix = 'lgps_diel'
    tstress = .false.
    tprnfor = .false.
    outdir = '/tmp/1325324'
    pseudo_dir = '/scratch/marcolon/test_LGPS/pseudo'
    disk_io='low'
    max_seconds=1800
    /

...





  • For a single 100GB file, all results in MB/s, <write into TMPDIR> : <copy from TMPDIR to /scratch>:


Info
iconfalse
    deneb E5v2: 76 : 74
    eltanin E5v3: 109 - 103
    fidis: 529 - 498





~        



Page properties
hiddentrue
Related issues