Subsections


8.6 The Portable Batch System (PBS) and Maui Scheduler

PBS serves as the job launcher and batch queueing system for OSCAR. OSCAR uses the open source version of PBS, OpenPBS. A commercial version (PBSPro) exists as well. OpenPBS includes a basic FIFO scheduler with it, but is disabled by default in OSCAR. A more robust, open source scheduler named ``Maui'' is used instead. In OSCAR 2.0b1, we are using OpenPBS version 2.3p16 and Maui 3.0.7p8. PBS and Maui will continue to be updated in future OSCAR releases.

Basic PBS functionality is tested by OSCAR's test suite, and is also used to launch jobs when testing other software included in OSCAR. If the PBS test passes, PBS and Maui are up and working.

If the users of your OSCAR cluster have not used PBS before, you can expect somewhat of a learning curve. The OSCAR user's documentation contains some useful information to get them started. The user's document instructs users to ask the system administrator to provide them with sample PBS scripts used in the OSCAR test suite. Once the OSCAR test suite has been run (step 6 in the install process), these scripts can be found in the home directory of the oscartst user if the tests have been run previously.

8.6.1 Configuring PBS

By default, PBS installs without any queues or cluster specific paremeters defined. OSCAR configures PBS with sensible defaults based on what it finds in the SIS database. When the ``Complete Cluster Setup'' step is executed from the wizard, the post_install script from the PBS package in OSCAR is called. The post_install configures only PBS parameters that are non-existent, so as not to overwrite local customizations by the system administrator. However, if you wish to force the default values back in place over any local customizations, the post_install script can be invoked manually with a -default option. This will revert all values to the original OSCAR settings.

qmgr can be used to configure queues and PBS server parameters. The OSCAR PBS post_install script (located off the top-level OSCAR installation directory in packages/pbs/scripts) uses qmgr behind the scenes. There are man pages available, but reading the PBS admin guide is the best way to learn how to use it. It is available on OpenPBS's homepage, listed below. You will have to create an account on their site in order to download the admin guide.

8.6.2 PBS Resources

Abitrary node properties can be set by the administrator. PBS calls these properties ``resources''. These resources can be specified on the qsub command line when a user submits a job. This allows a user to restrict their jobs to run only on nodes exhibiting certain properties. If some nodes of a cluster have more memory, a different network, faster prcoessors, etc., jobs can be submitted so they only run a specific subset. These properties are stored in plain text in /usr/spool/PBS/server_priv/nodes. However, if adjusted in the plain text file, the PBS server must be restarted in order for changes to take effect. The more elaborate method is to use the qmgr command to modify node properties via the PBS API. OSCAR gives each node a starting property of ``all''.

8.6.3 An FAQueue

A popular misconception about PBS queues is that they are bound to a group of nodes. This is false. If you have a four node queue defined, it is not associated with any specific nodes. You can think of a queue as a multidimensional box that a job must fit in in order to allow submission. That is, the submitted parameters must fit within certain max and min values for nodes, ppn (procs per node), walltime, etc. If specific nodes are desired to run on, then resource attributes must be defined.

If you would like to get a full dump of your PBS server and queue configuration, you can issue this command:

  # qmgr -c "print server"

The qmgr interface can be used to define additional queues and their parameters. You can also change the parameters on the default OSCAR queue, ``workq''. For example, to show the configuration of the workq, execute the following:

  # qmgr -c "list queue workq"

To change any of the values listed, use the following:

  # qmgr -c "set queue workq PARAMETER = VALUE"

where PARAMETER is a parameter from the ``list queue'' command, and VALUE is a valid value for that parameter. You can use the ``print server'' and/or ``list queue'' commands to verify your changes.

Be aware that if you call the post_install command with the -default option, you will lose your customizations. Also note that OSCAR's default wallclock limit on workq is 10,000 hours. Depending on the application mix that will run on your cluster, you may wish to adjust this value.

Some useful links:

root 2002-11-08