Scripts
Job scripts are how you request resources on the HPC. You can specify the resources you need in the job script and what job you would like to run. When you submit the job, it is put into a queue and will run when the HPC has the available resources to run the job. Understanding how job scripts work is essential when using the HPC.
Example Script
Scripts can take many forms depending on the resources that are needed. Not every line is necessary but let's go through the tutorial script line by line to explain what is going on.

Line | What it does |
---|---|
#!/bin/bash | This tells Slurm to run this job script using the bash shell. This line always needs to be at the top of every job script. |
#SBATCH --job-name HPC-Example | This assigns a name to the job, it is purely visual and isn't meaningful. You don't need to include it in your job script, but it is usually a good idea. |
#SBATCH -N 1 | This says our job will need 1 compute node. A node is a physical machine on the HPC, and this value is almost always 1. If you want to use more than one you will need a program that is built to run on multiple machines at once and give them the ability to communicate with each other using MPI. |
#SBATCH -n 1 | This says our job is needs one task. This is because we are only running one program. If we had multiple independent programs working together, we would want to increase the number of tasks. |
#SBATCH --cpus-per-task 6 | This says we need 6 CPUs to run out program with. This is because our program is set up to use multiple processes. You can set this value to one if your code doesn't use multiple processes. |
#SBATCH -t 0:10:00 | This sets a maximum run time of the program. The tutorial program usually runs in about 75 seconds, so I set the maximum run time to 10 minutes. There is no strict rule for what you should set the maximum run time to, but you should give the code plenty of time to execute while also making sure the code doesn't run forever if something goes wrong. If you don't know what to set the time to, 24 hours is usually fine. |
#SBATCH --no-requeue | This line makes sure that the job doesn't requeue if the job fails or something goes wrong. |
#SBATCH -o Tutorial.out | This just gives the output file a name. An output file will be created whether you give it a name or not. Naming the output file is useful if you want to delete the file after it is created (if the file is empty). You can do this with the additional code seen at the bottom of the job.script file. |
#SBATCH -e Tutorial.err | This is similar to the output file, but instead it is an error file. You don't need to name the error file but it is easier to find and remove this way. |
module load python3.11 | This simply loads the python module before running the program. |
srun python ./tutorial.py | This executes your program. |