Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


The SCITAS machines use the SLURM workload manager in order to schedule users’ jobs. In particular, SLURM arbitrates the jobs’ queue contention by using a fair-share algorithm in SLURM is described at http://slurm.schedmd.com/fair_tree.html.

Info
SCITAS machines have a half-life of one week.

To see the share for your group you can use the "Sshare" command"

...

order to prioritize jobs and ensure that the users’ usage matches their share as much as possible. In particular, SCITAS clusters use a particular flavor of the fair-share algorithm called fair-tree.


In order to check their priority, the Sshare command is available on any SCITAS cluster. A typical output will be as follow:


$ Sshare 
            Account       User Raw Shares Norm Shares   Raw

...

Usage  Norm 

...

Usage Effectv Usage  FairShare 

...

  Level 

...

FS 

...

 
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ----------

...


scitas-ge

...

                               1    0.007752 

...

       1376    0.000003 

...

     0.000005 

...

          1468.763590

...

scitas-ge               aubort 

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000        inf 

...


scitas-ge             clemenco

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge                cubuk

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge                culpo

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge             degiorgi

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge               eroche

...

         1    0.043478 

...

        344    0.000001 

...

     0.250000 

...

  0.253333 

...

  0.173913

...

scitas-ge              nvarini 

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge                qubit

...

         1    0.043478 

...

        351    0.000001 

...

     0.255072 

...

  0.250000 

...

  0.170455

...

scitas-ge             rezzonic 

...

         1    0.043478 

...

        681    0.000001 

...

     0.494928 

...

  0.246667 

...

  0.087848

...

scitas-ge              richart 

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge              rmsilva

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge                  sue

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

       inf
scitas-ge                 topf

...

         1    0.043478 

...

          0    0.000000 

...

     0.000000 

...

  0.290000 

...

 

...

       inf


The value used to decide the priority of a job is the "Level FS". The higher the Level FS, the higher the priority. Level FS is the ratio of "Norm Shares" and "Effectv Usage" values, therefore a Level FS of less than 1 represents an overconsumption and more than 1 represents an underconsuming.

In this formula, the "Norm Shares" is the percentage of the cluster which is allocated to the account and the shares are in terms of coreswhereas “Effectv Usage” augments the normalized usage (the users' raw usage normalized to the total number of cpu-seconds of all jobs run) to account for usage from sibling accounts for usage from sibling accounts. Within a group all users have equal weight and so 1 share each.

The value used to decide the priority of a job is the "Level FS" and this is calculated based on the difference between the "Norm Shares" and "Effectv Usage" values. The higher the Level FS, the higher the priority.

A Level FS of less than 1 represents overconsumption. More than 1 means you are underconsuming. 


More informations about SLURM, fair-share and fair-tree can be found here:

https://slurm.schedmd.com/overview.html

https://slurm.schedmd.com/priority_multifactor.html

https://slurm.schedmd.com/fair_tree.html




Content by Label
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@10918
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("slurm","scheduler") and type = "page" and space = "DOC"
labelsslurm scheduler

...