Results
Here, we will demonstrate the results of our new qhist version using a test dataset. First, let’s make sure we have cleaned up any prior created data and qhist install for this notebook, and also avoid side-effects if this system does support a qhist config already.
# First, define the root path for our experiment directory
exp_root=$(pwd)
# Remove any existing files from prior notebook runs
rm -rf sample_logs local qhist
mkdir sample_logs
# Let's also make sure a system config does not intefere with our tests
unset QHIST_SERVER_CONFIG
Now, we can create the data:
# Create two "days" of data
cat > sample_logs/20250331 << "EOF"
03/31/2025 10:59:25;R;4215033.casper-pbs;user=vanderwb group=csgteam account="SCSG0001" project=_pbs_project_default jobname=STDIN queue=htc ctime=1743440355 qtime=1743440355 etime=1743440355 start=1743440360 exec_host=crhtc65/34 exec_vnode=(crhtc65:ncpus=1:mem=1kb) Resource_List.mem=30b Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=30:ompthreads=1 Resource_List.walltime=06:00:00 session=0 end=1743440365 Exit_status=-3 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0b resources_used.ncpus=1 resources_used.vmem=0kb resources_used.walltime=00:00:00 eligible_time=00:00:08 run_count=1
03/31/2025 10:59:35;E;4215033.casper-pbs;user=vanderwb group=csgteam account="SCSG0001" project=_pbs_project_default jobname=STDIN queue=htc ctime=1743440355 qtime=1743440355 etime=1743440355 start=1743440371 exec_host=crhtc65/34 exec_vnode=(crhtc65:ncpus=1:mem=1kb) Resource_List.mem=30b Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=30:ompthreads=1 Resource_List.walltime=06:00:00 session=0 end=1743440375 Exit_status=-1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0b resources_used.ncpus=1 resources_used.vmem=0kb resources_used.walltime=00:00:00 eligible_time=00:00:08 run_count=2
03/31/2025 11:03:12;E;4215034.casper-pbs;user=vanderwb group=csgteam account="SCSG0001" project=_pbs_project_default jobname=STDIN queue=htc ctime=1743440363 qtime=1743440363 etime=1743440363 start=1743440374 exec_host=crhtc86/12 exec_vnode=(crhtc86:ncpus=1:mem=31457280kb) Resource_List.mem=30gb Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=30GB:ompthreads=1 Resource_List.walltime=06:00:00 session=108694 end=1743440592 Exit_status=0 resources_used.cpupercent=2 resources_used.cput=00:00:09 resources_used.mem=756960kb resources_used.ncpus=1 resources_used.vmem=8653400kb resources_used.walltime=00:03:36 eligible_time=00:00:13 run_count=1
03/31/2025 11:34:35;E;4215265.casper-pbs;user=bneuman group=ncar account="SCSG0001" project=_pbs_project_default jobname=bneuman_matlab queue=htc ctime=1743442446 qtime=1743442446 etime=1743442446 start=1743442452 exec_host=crhtc72/0*5 exec_vnode=(crhtc72:ncpus=5:mem=10485760kb) Resource_List.mem=10gb Resource_List.mps=0 Resource_List.ncpus=5 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=5:ompthreads=5 Resource_List.walltime=00:25:00 session=95643 end=1743442475 Exit_status=0 resources_used.cpupercent=156 resources_used.cput=00:00:30 resources_used.mem=3163632kb resources_used.ncpus=5 resources_used.vmem=23445720kb resources_used.walltime=00:00:19 eligible_time=00:00:08 run_count=1
EOF
cat > sample_logs/20250401 << "EOF"
04/01/2025 13:07:35;E;4220853.casper-pbs;user=negins group=ncar account="P93300606" project=_pbs_project_default jobname=cr-jhub-batch-stable queue=htc ctime=1743527223 qtime=1743527223 etime=1743527223 start=1743527229 exec_host=crhtc65/20 exec_vnode=(crhtc65:ncpus=1:mem=4194304kb) Resource_List.mem=4gb Resource_List.mpiprocs=1 Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mpiprocs=1:ompthreads=1:mem=4GB Resource_List.walltime=02:00:00 session=104352 end=1743534455 Exit_status=-29 resources_used.cpupercent=8 resources_used.cput=00:03:52 resources_used.mem=261056kb resources_used.ncpus=1 resources_used.vmem=667248kb resources_used.walltime=02:00:22 eligible_time=00:00:09 run_count=1
EOF
Installing qhist with the Makefile
Since we are running qhist as a command-line utility within a bash kernel, we will use the Makefile approach for installing the latest version of qhist.
# First, let's clone the qhist repository
git clone --depth 1 --branch v1.0 https://github.com/NCAR/qhist.git
# Now we install into a specified prefix, which will also install pbsparse
cd qhist
make install PREFIX=$exp_root/local
cd $exp_root
Cloning into 'qhist'...
remote: Enumerating objects: 27, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 27 (delta 0), reused 22 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (27/27), 14.29 KiB | 975.00 KiB/s, done.
Note: switching to 'b93fef768c43812bf622621dc1f8548a27f043f2'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
git submodule init
Submodule 'lib/pbsparse' (git@github.com:NCAR/pbsparse.git) registered for path 'lib/pbsparse'
git submodule update
Cloning into '/glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/qhist/lib/pbsparse'...
Submodule path 'lib/pbsparse': checked out '057b0f811802308c7314e500b792a0233b7c70fd'
mkdir -p /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/bin /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/lib/qhist /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/share
sed 's|/src|/lib/qhist|' bin/qhist > /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/bin/qhist
cp -r src/qhist /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/lib/qhist
cp -r lib/pbsparse/src/pbsparse /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/lib/qhist
cp -r share /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/share
chmod +x /glade/work/vanderwb/papers/SEA-ISS-2025-Improving-PBS-for-NCAR-HPC-Users/notebooks/local/bin/qhist
Configuring qhist
qhist relies on a configuration file, which will tell the tool where to find accounting logs. There are multiple ways to specify this config file:
Set the environment variable
QHIST_SERVER_CONFIGto the path of your configuration file.Put your configuration into
server.jsonwithin thecfgsubdirectory of yourqhistinstallation.Create a configuration file at
/etc/qhist/server.json.
For this demonstration, we will use the second approach.
cat > $exp_root/local/lib/qhist/qhist/cfg/server.json << EOF
{
"pbs_log_path" : "$exp_root/sample_logs"
}
EOF
# Finally, let's add qhist to our PATH
export PATH=$exp_root/local/bin:$PATH
We should now be able to use qhist!
Basic queries
Since we are not using data from the current date, we will need to specify the date directly using the -p/--period argument to qhist.
qhist --period 20250401
Job ID User Queue Nodes NCPUs NGPUs End Mem(GB) CPU(%) Elap(h)
------------ ---------- -------- ----- ------ ----- ------- -------- -------- --------
4220853 negins htc 1 1 0 01-1307 0.25 8.00 2.01
By default, qhist will show tabular data in normal width, with only jobs that have “ended” being displayed. We can change this behavior via command-line arguments. For example, let’s say we wanted to display R or “requeue” records in long-form/list format.
Let’s run this over both days to ensure we capture the requeue record.
qhist -p 20250331-20250401 --event R --list
4215033.casper-pbs
User = vanderwb
Queue = htc
Job Submit = 2025-03-31 10:59:15
Eligible Time = 2025-03-31 10:59:15
Job Start = 2025-03-31 10:59:20
Job End = 2025-03-31 10:59:25
Used Mem(GB) = 0
Avg CPU (%) = 0.00
Waittime (h) = 0.00
Walltime (h) = 6.00
Elapsed (h) = 0.00
Job Name = STDIN
Exit Status = -3
Account = SCSG0001
Resources = 1:ncpus=1:mem=30:ompthreads=1
Advanced queries
We can also ask qhist to compute averages for numerical fields. This will be most useful if we specify a custom format with the fields of interest. We can see a majority of the available fields - which can be referenced by convenient short names - using --format=help:
qhist --format=help
This option allows you to specify a custom format. This setting's behavior
depends on which mode you are using:
For default and wide behavior, enter a string containing Python's format syntax
(modern version). For list and csv modes, a comma-delimited string with field
names is the expected input.
Examples:
qhist --format="{id:9.9} {account:9.9} {reqmem:8.2f} {memory:8.2f}"
qhist --list --format="account,reqmem,memory"
The following variables are available:
account
avgcpu
cputype
elapsed
eligible
end
gputype
memory
mpiprocs
name
nodelist
numcpus
numgpus
numnodes
ompthreads
placement
queue
reqmem
resources
start
status
submit
user
walltime
Now let’s specify our custom format and compute averages. We can also filter jobs by user and name to get only STDIN (interactive) jobs by vanderwb. Once we layer on options, we see how qhist makes querying the PBS Pro accounting logs much easier than with the raw data.
my_format="{numcpus:5d} {memory:8.2f} {reqmem:8.2f} {elapsed:8.2f}"
qhist -p 20250331 --user vanderwb --name STDIN --format "$my_format" --average
NCPUs Mem(GB) RMem(GB) Elap(h)
----- -------- -------- --------
1 0.00 0.00 0.00
1 0.72 30.00 0.06
Averages across 2 jobs:
NCPUs Mem(GB) RMem(GB) Elap(h)
----- -------- -------- --------
1.00 0.36 15.00 0.03
We can use free-form filtering to perform more complex searches. In the following example, we include both end and requeue records and then search for all jobs using more than 1 CPU core.
qhist -p 20250331-20250401 -e ER --filter "numcpus>1"
Job ID User Queue Nodes NCPUs NGPUs End Mem(GB) CPU(%) Elap(h)
------------ ---------- -------- ----- ------ ----- ------- -------- -------- --------
4215265 bneuman htc 1 5 0 31-1134 3.02 31.20 0.01
Other output modes
Finally, let’s demonstrate other output modes. We have already seen the -l/--list output; we can also display jobs in two other modes - csv and json.
Here, we examine just a single job: 4220853.casper-pbs.
# We also disable the data label header, to facilate machine readability
qhist -p 20250401 4220853.casper-pbs --csv --noheader
4220853.casper-pbs,negins,htc,2025-04-01 11:07:03,2025-04-01 11:07:03,2025-04-01 11:07:09,2025-04-01 13:07:35,0.24896240234375,8.0,0.0016666666666666668,2.0,2.006111111111111,cr-jhub-batch-stable,-29,P93300606,1:ncpus=1:mpiprocs=1:ompthreads=1:mem=4GB
While the output seems similar to the raw log output, now all fields are consistent across records and comma-separated (whereas the raw records are either semicolon or space delimited depending on the type of record element).
qhist -p 20250401 4220853.casper-pbs --json
{
"4220853.casper-pbs": {
"time": "2025-04-01 13:07:35",
"type": "E",
"short_id": "4220853",
"user": "negins",
"group": "ncar",
"account": "P93300606",
"project": "_pbs_project_default",
"jobname": "cr-jhub-batch-stable",
"queue": "htc",
"ctime": "2025-04-01 11:07:03",
"qtime": "1743527223",
"etime": "2025-04-01 11:07:03",
"start": "2025-04-01 11:07:09",
"exec_host": "crhtc65/20",
"exec_vnode": "(crhtc65:ncpus=1:mem=4194304kb)",
"Resource_List": {
"mem": 4.0,
"mpiprocs": "1",
"mps": "0",
"ncpus": 1,
"ngpus": 0,
"nodect": 1,
"nvpus": "0",
"place": "scatter",
"select": "1:ncpus=1:mpiprocs=1:ompthreads=1:mem=4GB",
"walltime": 2.0
},
"session": "104352",
"end": "2025-04-01 13:07:35",
"Exit_status": "-29",
"resources_used": {
"cpupercent": 8,
"cput": 0.06444444444444444,
"mem": 0.24896240234375,
"ncpus": 1,
"vmem": 0.6363372802734375,
"walltime": 2.006111111111111,
"avgcpu": 8.0
},
"eligible_time": 0.0025,
"run_count": 1,
"waittime": 0.0016666666666666668
}
}
While we only show a single job in these two modes for readability, they support multiple-job queries as do all other output modes.