[[PageOutline]] = Introduction = QCG-Computing service (the successor of the OpenDSP project) is an open source service acting as a computing provider exposing on demand access to computing resources and jobs over the HPC Basic Profile compliant Web Services interface. In addition the QCG-Computing offers remote interface for Advance Reservations management. This document describes installation of the QCG-Computing service in the PL-Grid environment. The service should be deployed on the machine (or virtual machine) that: * has at least 1GB of memory (recommended value: 2 GB) * has 10 GB of free disk space (most of the space will be used by the log files) * has any modern CPU (if you plan to use virtual machine you should dedicated to it one or two cores from the host machine) * is running under Scientific Linux 5.5 (in most cases the provided RPMs should work with any operating system based on Redhat Enterpise Linux 5.x, e.g. CentOS 5) = Prerequisites = We assume that you have the Torque local resource manager and the Maui scheduler already installed. This would be typically a frontend machine (i.e. machine where the pbs_server and maui daemons are running). If you want to install the QCG-Computing service on a separate submit host you should read this [[InstallationOnSeparateMachine| notes]]. Moreover the following packages must be installed before you proceed with the QCG-Computing installation. = Firewall configuration = In order to expose the !QosCosGrid services externally you need to open the following ports in the firewall: * 19000 (TCP) - QCG-Computing * 19001 (TCP) - QCG-Notification * 2811 (TCP) - GridFTP server * 9000-9500 (TCP) - GridFTP port-range (if you want to use different port-range adjust the `GLOBUS_TCP_PORT_RANGE` variable in the `/etc/xinetd.d/gsiftp` file) * Install database backend (PostgresSQL): {{{ #!div style="font-size: 90%" {{{#!sh yum install postgresql postgresql-server }}} }}} * UnixODBC and the PostgresSQL odbc driver: {{{ #!div style="font-size: 90%" {{{#!sh yum install unixODBC postgresql-odbc }}} }}} * Torque devel package and the rpmbuild package (needed to build DRMAA): {{{ #!div style="font-size: 90%" {{{#!sh rpm -i torque-devel-your-version.rpm yum install rpm-build }}} }}} The X.509 host certificate (signed by the Polish Grid CA) and key is already installed in the following locations: * `/etc/grid-security/hostcert.pem` * `/etc/grid-security/hostkey.pem` Most of the grid services and security infrastructures are sensitive to time skews. Thus we recommended to install a Network Time Protocol daemon or use any other solution that provides accurate clock synchronization. = Installation using provided RPMS = * Create the following users: * `qcg-comp` - needed by the QCG-Computing service * `qcg-broker` - the user that the [http://apps.man.poznan.pl/trac/qcg-broker QCG-Broker] service would be mapped to {{{ #!div style="font-size: 90%" {{{#!sh useradd -d /opt/plgrid/var/log/qcg-comp/ -m qcg-comp useradd -d /opt/plgrid/var/log/qcg-broker/ -m qcg-broker }}} }}} * and the following group: * `qcg-dev` - this group is allowed to read the configuration and log files. Please add the qcg services' developers to this group. {{{ #!div style="font-size: 90%" {{{#!sh groupadd qcg-dev }}} }}} * install PL-Grid (official) and QCG (testing) repositories: * !QosCosGrid testing repository {{{ #!div style="font-size: 90%" {{{#!sh cat > /etc/yum.repos.d/qcg.repo << EOF [qcg] name=QosCosGrid YUM repository baseurl=http://fury.man.poznan.pl/qcg-packages/sl/x86_64/ enabled=1 gpgcheck=0 EOF }}} }}} * Official PL-Grid repository {{{ #!div style="font-size: 90%" {{{#!sh rpm -Uvh http://software.plgrid.pl/packages/repos/plgrid-repos-2010-2.noarch.rpm }}} }}} * install QCG-Computing using YUM Package Manager: {{{ #!div style="font-size: 90%" {{{#!sh yum install qcg-comp }}} }}} * setup QCG-Computing database using provided script: {{{ #!div style="font-size: 90%" {{{#!sh /opt/plgrid/qcg/share/qcg-comp/tools/qcg-comp-install.sh Welcome to qcg-comp installation script! This script will guide you through process of configuring proper environment for running the QCG-Computing service. You have to answer few questions regarding parameters of your database. If you are not sure just press Enter and use the default values. Use local PostgreSQL server? (y/n) [y]: y Database [qcg-comp]: User [qcg-comp]: Password [qcg-comp]: MojeTajneHaslo Create database? (y/n) [y]: y Create user? (y/n) [y]: y Checking for system user qcg-comp...OK Checking whether PostgreSQL server is installed...OK Checking whether PostgreSQL server is running...OK Performing installation * Creating user qcg-comp...OK * Creating database qcg-comp...OK * Creating database schema...OK * Checking for ODBC data source qcg-comp... * Installing ODBC data source...OK Remember to add appropriate entry to /var/lib/pgsql/data/pg_hba.conf (as the first rule!) to allow user qcg-comp to access database qcg-comp. For instance: host qcg-comp qcg-comp 127.0.0.1/32 md5 and reload Postgres server. }}} }}} Add a new rule to the pg_hba.conf as requested: {{{ #!div style="font-size: 90%" {{{#!sh vim /var/lib/pgsql/data/pg_hba.conf /etc/init.d/postgresql reload }}} }}} Install Polish Grid and PL-Grid Simpla-CA certificates: {{{ #!div style="font-size: 90%" {{{#!sh wget https://dist.eugridpma.info/distribution/igtf/current/accredited/RPMS/ca_PolishGrid-1.40-1.noarch.rpm rpm -i ca_PolishGrid-1.40-1.noarch.rpm wget http://software.plgrid.pl/packages/general/ca_PLGRID-SimpleCA-1.0-2.noarch.rpm rpm -i ca_PLGRID-SimpleCA-1.0-2.noarch.rpm #install certificate revocation list fetching utility wget https://dist.eugridpma.info/distribution/util/fetch-crl/fetch-crl-2.8.5-1.noarch.rpm rpm -i fetch-crl-2.8.5-1.noarch.rpm #get fresh CRLs now /usr/sbin/fetch-crl #install cron job for it cat > /etc/cron.daily/fetch-crl.cron << EOF #!/bin/sh /usr/sbin/fetch-crl EOF chmod a+x /etc/cron.daily/fetch-crl.cron }}} }}} = The Grid Mapfile = This tutorial assumes that the QCG-Computing service is configured in such way, that every authenticated user must be authorized against the `grid-mapfile`. This file can be created manually by an administrator (if the service is run in "test mode") or generated automatically based on the LDAP directory service. === Manually created grid mapfile (for testing purpose only) === {{{ #!div style="font-size: 90%" {{{#!default #for test purpose only add mapping for your account echo '"MyCertDN" myaccount' >> /etc/grid-security/grid-mapfile }}} }}} === LDAP generated grid mapfile === {{{ #!div style="font-size: 90%" {{{#!default # # 1. install grid-mapfile generator from PL-Grid repository # yum install plggridmapfilegenerator # # 2. install cron job that updates QCG gridmapfile # cd /etc/cron.hourly/ wget http://www.qoscosgrid.org/trac/qcg-computing/downloads/qcg-gridmapfile.cron chmod a+x qcg-gridmapfile.cron # # 3. configure gridmapfilegenerator - remember to change # * url property to your local ldap replica # * search base # * filter expression # * security context cat > /opt/plgrid/qcg/etc/qcg-comp/plggridmapfilegenerator.conf << EOF [ldap] url=ldap://your.ldap.replica #search base #e.g. base=ou=People,dc=pcss,dc=plgrid,dc=pl base=ou=People,dc=your-unit,dc=plgrid,dc=pl #filter, specifies which users should be processed # e.g. filter=(&(plgridX509CertificateDN=*)(plgridAccessService=QCG_ACCESS_PCSS)) filter=(&(plgridX509CertificateDN=*)(plgridAccessService=QCG_ACCESS_YOUR_SUFFIX)) #timeout for execution of ldap queries timeout=10 #optional username and password #username= #password= tls=false disable_cert_checking=false #cacertdir=/etc/grid-security/certificates/ #cacertfile=/etc/grid-security/certificates/8a661490.0 [filter] blacklist_file=/etc/grid-security/grid-mapfile.deny.db [output] format=^plgridX509CertificateDN, uid EOF # # 4. run the cron job in order to generate gridmapfile now # /etc/cron.hourly/qcg-gridmapfile.cron }}} }}} After installing and running this tool one can find three files: * /etc/grid-security/grid-mapfile.local - here you can put list of DN and local unix accounts name that will be merged with data acquired from local LDAP server * /etc/grid-security/grid-mapfile.deny - here you can put list od DN's (only DNs!) that you want to deny access to the QCG-Computing service * /etc/grid-security/grid-mapfile - the final gridmap file generated using the above two files and information available in local LDAP server. Do not edit this file as it is generated automatically! = Scheduler configuration = Add appropriate rights for the `qcg-comp` and `qcg-broker` users in the Maui scheduler configuaration file: {{{ #!div style="font-size: 90%" {{{#!default vim /var/spool/maui/maui.cfg # primary admin must be first in list ADMIN1 root ADMIN2 qcg-broker ADMIN3 qcg-comp }}} }}} = Service certificates = Copy the service certificate and key into the `/opt/plgrid/qcg/etc/qcg-comp/certs/`. Remember to set appropriate rights to the key file. {{{ #!div style="font-size: 90%" {{{#!default cp /etc/grid-security/hostcert.pem /opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem cp /etc/grid-security/hostkey.pem /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem chown qcg-comp /opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem chown qcg-comp /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem chmod 0600 /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem }}} }}} = DRMAA library = DRMAA library must be compiled from SRC RPM: {{{ #!div style="font-size: 90%" {{{#!default wget http://fury.man.poznan.pl/qcg-packages/sl/SRPMS/pbs-drmaa-1.0.8-2.src.rpm rpmbuild --rebuild pbs-drmaa-1.0.8-2.src.rpm cd /usr/src/redhat/RPMS/x86_64/ rpm -i pbs-drmaa-1.0.8-2.x86_64.rpm }}} }}} however if you are using it for the first time then you should try to compile it with enabled logging: {{{ #!div style="font-size: 90%" {{{#!default wget http://fury.man.poznan.pl/qcg-packages/sl/SRPMS/pbs-drmaa-1.0.8-2.src.rpm rpmbuild --define 'configure_options --enable-debug' --rebuild pbs-drmaa-1.0.8-2.src.rpm cd /usr/src/redhat/RPMS/x86_64/ rpm -i pbs-drmaa-1.0.8-2.x86_64.rpm }}} }}} After installation you need '''either''': * configure the DRMAA library to use Torque logs ('''RECOMMENDED'''). Sample configuration file of the DRMAA library (`/opt/plgrid/qcg/etc/pbs_drmaa.conf`): {{{ #!div style="font-size: 90%" {{{#!default # pbs_drmaa.conf - Sample pbs_drmaa configuration file. wait_thread: 1, pbs_home: "/var/spool/pbs", cache_job_state: 600, }}} }}} '''Note:''' Remember to mount server log directory as described in the eariler [[InstallationOnSeparateMachine|note]]. '''or''' * configure Torque to keep information about completed jobs (e.g.: by setting: `qmgr -c 'set server keep_completed = 300'`). It is possible to limit users to submit job to predefined queue by setting default job category (in the `/opt/plgrid/qcg/etc/pbs_drmaa.conf` file): {{{ #!div style="font-size: 90%" {{{#!default job_categories: { default: "-q plgrid", }, }}} }}} = Service configuration = Edit the preinstalled service configuration file (`/opt/plgrid/qcg/etc/qcg-comp/qcg-comp`**d**`.xml`): {{{ #!div style="font-size: 90%" {{{#!xml /opt/plgrid/qcg/lib/qcg-core/modules/ /opt/plgrid/qcg/lib/qcg-comp/modules/ /opt/plgrid/var/log/qcg-comp/qcg-compd.log INFO frontend.example.com 19000 false /opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem /etc/grid-security/grid-mapfile http://localhost:19001/ /opt/plgrid/qcg/etc/qcg-comp/application_mapfile qcg-comp qcg-comp qcg-comp qcg-comp klaster.plgrid.pl PL Grid cluster }}} }}} In most cases it should be enough to change only following elements: `Transport/Module/Host` :: the hostname of the machine where the service is deployed `Transport/Module/Authentication/Module/X509CertFile` and `Transport/Module/Authentication/Module/X509KeyFile` :: the service private key and X.509 certificate. Make sure that the key and certificate is owned by the `qcg-comp` user. If you installed cert and key file in the recommended location you do not need to edit these fields. `Module[type="smc:notification_wsn"]/Module/ServiceURL` :: the URL of the QCG-Notification service (You can do it later, i.e. after [http://www.qoscosgrid.org/trac/qcg-notification/wiki/installation_in_PL-Grid installing the QCG-Notification service]) `Module[type="submission_drmaa"]/@path` :: path to the DRMAA library (the `libdrmaa.so`). Also, if you installed the DRMAA library using provided SRC RPM you do not need to change this path. `Module[type="reservation_python"]/@path` :: path to the reservation module. Change this if you are using different scheduler than Maui (e.g. use `reservation_moab.py` for Moab, `reservation_pbs.py` for PBS Pro) `Database/Password` :: the `qcg-comp` database password `FactoryAttributes/CommonName` :: a common name of the cluster (e.g. reef.man.poznan.pl). You can use any name that is unique among all systems (e.g. cluster name + domain name of your institution) `FactoryAttributes/LongDescription` :: a human readable description of the cluster == Restricting advance reservation == By default the QCG-Computing service can reserve any number of hosts. One can limit it by configuring the !Maui/Moab scheduler and the QCG-Computing service properly: * In !Maui/Moab mark some subset of nodes, using the partition mechanism, as reservable for QCG-Computing: {{{ #!div style="font-size: 90%" {{{#!default # all users can use both the DEFAULT and RENABLED partition SYSCFG PLIST=DEFAULT,RENABLED #in Moab you should use 0 instead DEFAULT #SYSCFG PLIST=0,RENABLED # mark some set of the machines (e.g. 64 nodes) as reservable NODECFG[node01] PARTITION=RENABLED NODECFG[node02] PARTITION=RENABLED NODECFG[node03] PARTITION=RENABLED ... NODECFG[node64] PARTITION=RENABLED }}} }}} * Tell the QCG-Computing to limit reservation to the aforementioned partition by editing the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` configuration file: {{{ #!div style="font-size: 90%" {{{#!default export QCG_AR_MAUI_PARTITION="RENABLED" }}} }}} = Configuring BAT Updater = == Installation == * install BAT using YUM {{{ yum install qcg-bat-updater }}} == Configuration == At first you must ask the BAT administrator to provide you all credentials (username/password and X.509 certificate) needed to connect to the BAT. Copy the received keystore to the file `/opt/plgrid/qcg/etc/qcg-bat-updater/truststore.ts` (make sure this file is only readable by root). Now you are ready to edit the QCG BAT configuration file `/opt/plgrid/qcg/etc/qcg-bat-updater/config.properties`. You should change the following properties: * qcg.db.pass - password of the QCG-Computing database (see `` section of the `qcg-compd.xml` file) * qcg.bat.user and qcg.bat.pass - put values provided by the BAT administrator * qcg.bat.keystore.pass - keystore pass (provided with key by the BAT administrator) * qcg.site.name - your site name * qcg.batch.server - hostname where the batch server is running == Operations == Usage records are sent to the main BAT server (BAT Broker) every hour by the QCG BAT Updater (acting as BAT agent). The QCG BAT Updater is started every hour via a cron job (installed in `/etc/cron.hourly/qcg-bat-updater.cron`). The QCG BAT Updater is a Java batch application that on every run: * reads jobs accounting information from the QCG-Computing database * converts it to a proper XML format * sends it over ActiveMQ channel (SSL secured) to BAT-Broker * stores id of the last record sent to the file `/opt/plgrid/var/run/qcg-bat-updater/last.id`. The QCG-BAT Updater logs can be found in `/opt/plgrid/var/log/qcg-bat-updater/qcg-bat-updater.log`. = Creating applications' script space = A common case for the QCG-Computing service is that an application is accessed using abstract app name rather than specifying absolute executable path. The application name/version to executbale path mappings are stored in the file `/opt/plgrid/qcg/etc/qcg-comp/application_mapfile`: {{{ #!div style="font-size: 90%" {{{#!default cat /opt/plgrid/qcg/etc/qcg-comp/application_mapfile # ApplicationName ApplicationVersion Executable date * /bin/date LPSolve 5.5 /usr/local/bin/lp_solve QCG-OMPI /opt/QCG/qcg/share/qcg-comp/tools/cross-mpi.starter }}} }}} It is also common to provide here wrapper scripts rather than target executables. The wrapper script can handle such aspects of the application lifetime like: environment initialization, copying files from/to scratch storage and application monitoring. It is recommended to create separate directory for those wrapper scripts (e.g. the application partition) for an applications and add write permission to them to the QCG Developers group. This directory must be readable by all users and from every worker node (the application partition usually fullfils those requirements). {{{ #!div style="font-size: 90%" {{{#!default mkdir /opt/exp_soft/plgrid/qcg-app-scripts chown :qcg-dev /opt/exp_soft/plgrid/qcg-app-scripts chmod g+rwx /opt/exp_soft/plgrid/qcg-app-scripts }}} }}} = Note on the security model = The QCG-Computing can be configured with various authentication and authorization modules. However in the typical deployment we assume that the QCG-Computing is configured as in the above example, i.e.: * authentication is provided on basics of ''httpg'' protocol, * authorization is based on the local `grid-mapfile` mapfile. = Starting the service = As root type: {{{ #!div style="font-size: 90%" {{{#!sh /etc/init.d/qcg-compd start }}} }}} The service logs can be found in: {{{ #!div style="font-size: 90%" {{{#!sh /opt/plgrid/var/log/qcg-comp/qcg-comp.log }}} }}} The service assumes that the following commands are in the standard search path: * `pbsnodes` * `showres` * `setres` * `releaseres` * `checknode` If any of the above commands is not installed in a standard location (e.g. `/usr/bin`) you may need to edit the `/opt/plgrid/qcg/etc/sysconfig/qcg-compd` file and set the `PATH` variable appropriately, e.g.: {{{ #!div style="font-size: 90%" {{{#!sh # INIT_WAIT=5 # # DRM specific options export PATH=$PATH:/opt/maui/bin }}} }}} If you compiled DRMAA with logging switched on you can set there also DRMAA logging level: {{{ #!div style="font-size: 90%" {{{#!sh # INIT_WAIT=5 # # DRM specific options export DRMAA_LOG_LEVEL=INFO }}} }}} **Note:** In current version, whenever you restart the PosgreSQL server you need also restart the QCG-Computing and QCG-Notification service: {{{ #!div style="font-size: 90%" {{{#!sh /etc/init.d/qcg-compd restart /etc/init.d/qcg-ntfd restart }}} }}} = Stopping the service = The service can be stopped using the following command: {{{ #!div style="font-size: 90%" {{{#!sh /etc/init.d/qcg-compd stop }}} }}} = Verifying the installation = * For convenience you can add the `/opt/plgrid/qcg/bin` and `/opt/plgrid/qcg/dependencies/globus/bin/` to your `PATH` variable. * Edit the QCG-Computing client configuration file (`/opt/plgrid/qcg/etc/qcg-comp/qcg-comp.xml`): * set the `Host` and `Port` to reflects the changes in the service configuration file (`qcg-compd.xml`). {{{ #!div style="font-size: 90%" {{{#!sh /opt/qcg/lib/qcg-core/modules/ /opt/qcg/lib/qcg-comp/modules/ httpg://frontend.example.com:19000/ }}} }}} * Initialize your credentials: {{{ #!div style="font-size: 90%" {{{#!sh grid-proxy-init Your identity: /O=Grid/OU=QosCosGrid/OU=PSNC/CN=Mariusz Mamonski Enter GRID pass phrase for this identity: Creating proxy .................................................................. Done Your proxy is valid until: Wed Apr 6 05:01:02 2012 }}} }}} * Query the QCG-Computing service: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -G | xmllint --format - # the xmllint is used only to present the result in more pleasant way true IT cluster IT department cluster for public use 0 1 worker.example.com x86_32 41073741824 http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing http://schemas.ogf.org/hpcp/2007/01/bp/BasicFilter http://schemas.qoscosgrid.org/comp/2011/04 http://example.com/SunGridEngine http://localhost:2211/ }}} }}} * Submit a sample job: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -c -J /opt/plgrid/qcg/share/qcg-comp/doc/examples/jsdl/sleep.xml Activity Id: ccb6b04a-887b-4027-633f-412375559d73 }}} }}} * Query it status: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Executing qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Executing qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 status = Finished exit status = 0 }}} }}} * Create an advance reservation: * copy the provided sample reservation description file (expressed in ARDL - Advance Reservation Description Language) {{{ #!div style="font-size: 90%" {{{#!sh cp /opt/plgrid/qcg/share/qcg-comp/doc/examples/ardl/oneslot.xml oneslot.xml }}} }}} * Edit the `oneslot.xml` and modify the `StartTime` and `EndTime` to dates that are in the near future, * Create a new reservation: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -c -D oneslot.xml Reservation Id: aab6b04a-887b-4027-633f-412375559d7d }}} }}} * List all reservations: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -l Reservation Id: aab6b04a-887b-4027-633f-412375559d7d Total number of reservations: 1 }}} }}} * Check which hosts where reserved: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -s -r aab6b04a-887b-4027-633f-412375559d7d Reserved hosts: worker.example.com[used=0,reserved=1,total=4] }}} }}} * Delete the reservation: {{{ #!div style="font-size: 90%" {{{#!sh qcg-comp -t -r aab6b04a-887b-4027-633f-412375559d7d Reservation terminated. }}} }}} * Check if the grid-ftp is working correctly: {{{ #!div style="font-size: 90%" {{{#!sh globus-url-copy gsiftp://your.local.host.name/etc/profile profile diff /etc/profile profile }}} }}} = Maintenance = The historic usage information is stored in two relations of the QCG-Computing database: `jobs_acc` and `reservations_acc`. You can always archive old usage data to a file and delete it from the database using the psql client: {{{ #!div style="font-size: 90%" {{{#!sh psql -h localhost qcg-comp qcg-comp Password for user qcg-comp: Welcome to psql 8.1.23, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit qcg-comp=> \o jobs.acc qcg-comp=> SELECT * FROM jobs_acc where end_time < date '2010-01-10'; qcg-comp=> \o reservations.acc qcg-comp=> SELECT * FROM reservations_acc where end_time < date '2010-01-10'; qcg-comp=> \o qcg-comp=> DELETE FROM jobs_acc where end_time < date '2010-01-10'; qcg-comp=> DELETE FROM reservation_acc where end_time < date '2010-01-10'; }}} }}} = PL-Grid Grants Support = Since version 2.2.7 QCG-Computing supports the PL-Grid grants system. The integration has two aspects: * QCG-Computing can accept jobs which has {{{ #!div style="font-size: 90%" {{{#!sh Sample-BES-HPC Manhattan }}} }}} * QCG-Computing can provide information about the local grants to the upper layers (e.g. QCG-Computing), so they can use for scheduling purpose. One can enable it by adding the following line to the QCG-Computing configuration file (qcg-compd.xml): {{{ #!div style="font-size: 90%" {{{#!sh }}} }}}