Bath HPC
Balena
- Getting Access
- Getting Started
- System Architecture
- Developer Guides
- Premium Accounts
- MPI Guide
- Training
- Scheduler
- Storage
- Software
- Environment Modules
- Getting Help
User commands | SLURM |
---|---|
Job submission | sbatch [script_file] |
Queue list | squeue |
Queue list (by user) | squeue -u [user_name] |
Job deletion | scancel [job_id] |
Job information | scontrol show job [job_id] |
Job hold | scontrol hold [job_id] |
Job release | scontrol release [job_id] |
Node list | sinfo --Nodes --long |
Cluster status | sinfo or squeue |
GUI | sview /** graphical user interface to view and modify SLURM state **/ |
Environment | Description |
---|---|
$SLURM_ARRAY_TASK_ID | Job array ID (index) number |
$SLURM_ARRAY_JOB_ID | Job array's master job ID number |
$SLURM_JOB_ID | The ID of the job allocation |
$SLURM_JOB_DEPENDENCY | Set to value of the --dependency option |
$SLURM_JOB_NAME | Name of the job |
$SLURM_JOB_NODELIST | List of nodes allocated to the job |
$SLURM_JOB_NUM_NODES | Total number of nodes in the job's resource allocation |
$SLURM_JOB_PARTITION | Name of the partition in which the job is running |
$SLURM_MEM_PER_NODE | Memory requested per node |
$SLURM_NODEID | ID of the nodes allocated |
$SLURM_NTASKS | Number of tasks requested. Same as -n, --ntasks. To be used with mpirun, e.g. mpirun -np $SLURM_NTASKS binary |
$SLURM_NTASKS_PER_NODE | Number of tasks requested per node. Only set if the --ntasks-per-node option is specified |
$SLURM_PROCID | The MPI rank (or relative process ID) of the current process |
$SLURM_RESTART_COUNT | If the job has been restarted due to system failure or has been explicitly requeued, this will be sent to the number of times the job has been restarted |
$SLURM_SUBMIT_DIR | The directory from which sbatch was invoked |
$SLURM_SUBMIT_HOST | The hostname of the computer from which sbatch was invoked |
$SLURM_TASKS_PER_NODE | Number of tasks to be initiated on each node. Values are comma separated and in the same order as $SLURM_JOB_NODELIST |
$SLURM_TOPOLOGY_ADDR | The value will be set to the names network switches which may be involved in the job's communications from the system's top level switch down to the leaf switch and ending with node name |
$SLURM_TOPOLOGY_ADDR_PATTERN | The value will be set component types listed in $SLURM_TOPOLOGY_ADDR. Each component will be identified as either "switch" or "node". A period is used to separate each hardware component type |
Job specification | SLURM |
---|---|
Script directive | #SBATCH |
Account to charge | --account=[account] |
Begin Time | --begin=YYYY-MM-DD[HH:MM[:SS]] |
Combine stdout/stderr | (use --output without the --error) |
Copy Environment | --export=[ALL|NONE|variable] |
CPU Count | --ntasks [count] |
CPUs Per Task | --cpus-per-task=[count] |
Email Address | --mail-user=[address] |
Event Notification | --mail-type=[events] eg. BEGIN, END, FAIL, REQUEUE, and ALL (any state change) |
Generic Resources | --gres=[resource_spec] eg. gpu:4 or mic:4 |
Node features | --constraint=[feature] eg. k20x, s10k and 5110p |
Job Arrays | --array=[array_spec] |
Job Dependency | --depend=[state:job_id] |
Job host preference | --nodelist=[nodes] AND/OR --exclude=[nodes] |
Job Name | --job-name=[name] |
Job Restart | --requeue OR --no-requeue |
Licenses | --licenses=[license_spec] |
Memory Size | --mem=[mem][M][G][T] |
Node Count | --nodes=[min[-max]] |
Quality of Service | --qos=[name] |
Queue | --partition=[queue] |
Resource Sharing | --exclusive OR --shared |
Standard Error File | --error=[file_name] |
Standard Output File | --output=[file_name] |
Tasks Per Node | --ntasks-per-node=[count] |
Wall Clock Limit | --time=[min] OR [days-hh:mm:ss] |
Working Directory | --workdir=[dir_name] |