Changes between Version 170 and Version 171 of InstallingUsingRPMS

Show
Ignore:
Timestamp:
07/30/13 11:26:00 (11 years ago)
Author:
mmamonski
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • InstallingUsingRPMS

    v170 v171  
    1 *** !!! Warning !!! *** 
    2  
    31This page has been obsoleted by  the general [[InstallationGuide| Installation Guide]]. 
    4  
    5 = Introduction = 
    6 QCG-Computing service (the successor of the OpenDSP project) is an open source service acting as a computing provider exposing on demand access to computing resources and jobs over the HPC Basic Profile compliant Web Services interface. In addition the QCG-Computing offers remote interface for Advance Reservations management.  
    7  
    8 This document describes installation of the QCG-Computing service in the PL-Grid environment. The service should be deployed on the machine (or virtual machine) that: 
    9 * has at least 1GB of memory (recommended value: 2 GB) 
    10 * has 10 GB of free disk space (most of the space will be used by the log files) 
    11 * has any modern CPU (if you plan to use virtual machine you should dedicated to it one or two cores from the host machine) 
    12 * is running under Scientific Linux 5.5 (in most cases the provided RPMs should work with any operating system based on Redhat Enterpise Linux 5.x, e.g. CentOS 5) 
    13  
    14 = Prerequisites = 
    15 We assume that you have the local resource manager/scheduler already installed. This would be typically a frontend machine (i.e. machine where the pbs_server and maui daemons are running). If you want to install the QCG-Computing service on a separate submit host you should read this [[InstallationOnSeparateMachine| notes]].  
    16  
    17 Since version 2.4 the QCG-Computing services discovers installed application using the [http://modules.sourceforge.net/ Environment Modules] package. For this reason you should install modules at the qcg host and mount directories that contain all module files used at your cluster and make sure that user `qcg-comp` can see all modules. 
    18  
    19 The !QosCosGrid services do not require from you to install any QCG component on the worker nodes, however application wrapper scripts need the following software to be available on worker nodes: 
    20  * bash,  
    21  * rsync, 
    22  * zip/unzip, 
    23  * dos2unix, 
    24  * nc, 
    25  * python. 
    26 Which are usually available out of the box on most of the HPC systems. 
    27 = Firewall configuration = 
    28 In order to expose the !QosCosGrid services externally you need to open the following incoming ports in the firewall: 
    29 * 19000 (TCP) - QCG-Computing 
    30 * 19001 (TCP) - QCG-Notification 
    31 * 2811 (TCP) - GridFTP server 
    32 * 20000-25000 (TCP) - GridFTP  port-range (if you want to use different port-range adjust the `GLOBUS_TCP_PORT_RANGE` variable in the `/etc/xinetd.d/gsiftp` file) 
    33  
    34 You may also want to allow SSH access from white-listed machines (for administration purpose only). 
    35  
    36 The following outgoing trafic should be allowed in general: 
    37 * NTP, DNS, HTTP, HTTPS services 
    38 * gridftp (TCP ports: 2811 and port-ranges: 20000-25000) 
    39 Also the PL-Grid QCG-Accounting publisher plugin (BAT) need access the following port and machine: 
    40 *  acct.plgrid.pl 61616 (TCP) 
    41 = Related software = 
    42 * Install database backend (PostgresSQL):   
    43 {{{ 
    44 #!div style="font-size: 90%" 
    45 {{{#!sh 
    46 yum install postgresql postgresql-server 
    47 }}} 
    48 }}} 
    49 * UnixODBC and the PostgresSQL odbc driver: 
    50 {{{ 
    51 #!div style="font-size: 90%" 
    52 {{{#!sh 
    53 yum install unixODBC postgresql-odbc 
    54 }}} 
    55 }}} 
    56 The X.509 host certificate (signed by the Polish Grid CA) and key is already installed in the following locations: 
    57 * `/etc/grid-security/hostcert.pem` 
    58 * `/etc/grid-security/hostkey.pem` 
    59  
    60 Most of the grid services and security infrastructures are sensitive to time skews. Thus we recommended to install a Network Time Protocol daemon or use any other solution that provides accurate clock synchronization. 
    61  
    62  
    63  
    64 = Installation using provided RPMS = 
    65 * Create the following users: 
    66  * `qcg-comp` - needed by the QCG-Computing service 
    67  * `qcg-broker` - the user that the [http://apps.man.poznan.pl/trac/qcg-broker QCG-Broker] service would be mapped to  
    68 * The users must be also created (and having the same uid) on the batch server machine (but not necessarily the worker nodes). 
    69 {{{ 
    70 #!div style="font-size: 90%" 
    71 {{{#!sh 
    72 useradd -r -d /var/log/qcg/qcg-comp/  qcg-comp  
    73 useradd -r -d /var/log/qcg/qcg-broker/  qcg-broker   
    74 }}} 
    75 }}} 
    76 * and the following group: 
    77  * `qcg-dev` - this group is allowed to read the configuration and log files. Please add the qcg services' developers to this group. 
    78 {{{ 
    79 #!div style="font-size: 90%" 
    80 {{{#!sh 
    81 groupadd -r qcg-dev 
    82 }}} 
    83 }}} 
    84 * install !QosCosGrid repository (latest version, including new features and latest bug fixes, but may be unstable) 
    85 {{{ 
    86 #!div style="font-size: 90%" 
    87 {{{#!sh 
    88 cat > /etc/yum.repos.d/qcg.repo << EOF 
    89 [qcg] 
    90 name=QosCosGrid YUM repository 
    91 baseurl=http://www.qoscosgrid.org/qcg-packages/sl5/x86_64/ 
    92 #repo for SL6 baseurl=http://www.qoscosgrid.org/qcg-packages/sl6/x86_64/ 
    93 enabled=1 
    94 gpgcheck=0 
    95 EOF 
    96 }}} 
    97 }}} 
    98  
    99  
    100 * install QCG-Computing using YUM Package Manager: 
    101 {{{ 
    102 #!div style="font-size: 90%" 
    103 {{{#!sh 
    104 yum install qcg-comp qcg-comp-client qcg-comp-logrotate 
    105 }}} 
    106 }}} 
    107  
    108 * install grid-ftp server using this [[GridFTPInstallation|instruction]]. 
    109  
    110 * setup QCG-Computing database using provided script: 
    111 {{{ 
    112 #!div style="font-size: 90%" 
    113 {{{#!sh 
    114 /usr/share/qcg-comp/tools/qcg-comp-install.sh 
    115 Welcome to qcg-comp installation script! 
    116   
    117 This script will guide you through process of configuring proper environment 
    118 for running the QCG-Computing service. You have to answer few questions regarding 
    119 parameters of your database. If you are not sure just press Enter and use the 
    120 default values. 
    121    
    122 Use local PostgreSQL server? (y/n) [y]: y 
    123 Database [qcg-comp]:  
    124 User [qcg-comp]:  
    125 Password [RAND-PASSWD]: MojeTajneHaslo 
    126 Create database? (y/n) [y]: y 
    127 Create user? (y/n) [y]: y 
    128    
    129 Checking for system user qcg-comp...OK 
    130 Checking whether PostgreSQL server is installed...OK 
    131 Checking whether PostgreSQL server is running...OK 
    132    
    133 Performing installation 
    134 * Creating user qcg-comp...OK 
    135 * Creating database qcg-comp...OK 
    136 * Creating database schema...OK 
    137 * Checking for ODBC data source qcg-comp... 
    138 * Installing ODBC data source...OK 
    139      
    140 Remember to add appropriate entry to /var/lib/pgsql/data/pg_hba.conf (as the first rule!) to allow user qcg-comp to 
    141 access database qcg-comp. For instance: 
    142    
    143 host    qcg-comp       qcg-comp       127.0.0.1/32    md5 
    144    
    145 and reload Postgres server. 
    146 }}} 
    147 }}} 
    148  
    149 Add a new rule to the pg_hba.conf as requested: 
    150 {{{ 
    151 #!div style="font-size: 90%" 
    152 {{{#!sh 
    153 vim /var/lib/pgsql/data/pg_hba.conf  
    154 /etc/init.d/postgresql reload 
    155 }}} 
    156 }}} 
    157 Install EGI Accepted CA certificates (this also install the Polish Grid CA): 
    158 {{{ 
    159 #!div style="font-size: 90%" 
    160 {{{ 
    161 cd /etc/yum.repos.d/ 
    162 wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo 
    163 yum clean all 
    164 yum install ca-policy-egi-core 
    165 }}} 
    166 }}} 
    167 The above instructions were based on this [https://wiki.egi.eu/wiki/EGI_IGTF_Release manual] 
    168  
    169 Install  PL-Grid Simpla-CA certificate (not part of IGTF): 
    170 {{{ 
    171 #!div style="font-size: 90%" 
    172 {{{#!sh 
    173 wget http://software.plgrid.pl/packages/general/ca_PLGRID-SimpleCA-1.0-2.noarch.rpm 
    174 rpm -i ca_PLGRID-SimpleCA-1.0-2.noarch.rpm  
    175 #install certificate revocation list fetching utility 
    176 wget https://dist.eugridpma.info/distribution/util/fetch-crl/fetch-crl-2.8.5-1.noarch.rpm 
    177 rpm -i fetch-crl-2.8.5-1.noarch.rpm 
    178 #get fresh CRLs now 
    179 /usr/sbin/fetch-crl  
    180 #install cron job for it 
    181 cat > /etc/cron.daily/fetch-crl.cron << EOF 
    182 #!/bin/sh 
    183 /usr/sbin/fetch-crl 
    184 EOF 
    185 chmod a+x /etc/cron.daily/fetch-crl.cron 
    186 }}} 
    187 }}} 
    188 = The Grid Mapfile  = 
    189 This tutorial assumes that the QCG-Computing service is configured in such way, that every authenticated user must be authorized against the `grid-mapfile`. This file can be created manually by an administrator (if the service is run in "test mode") or generated automatically based on the LDAP directory service. 
    190 === Manually created grid mapfile (for testing purpose only) === 
    191 {{{ 
    192 #!div style="font-size: 90%" 
    193 {{{#!default 
    194 #for test purpose only add mapping for your account 
    195 echo '"MyCertDN" myaccount' >> /etc/grid-security/grid-mapfile 
    196 }}} 
    197 }}} 
    198 === LDAP generated grid mapfile (PL-Grid only) === 
    199 {{{ 
    200 #!div style="font-size: 90%" 
    201 {{{#!default 
    202 # 0. install PL-Grid repository  
    203 rpm -Uvh http://software.plgrid.pl/packages/repos/plgrid-repos-2010-2.noarch.rpm  
    204 # 
    205 # 1. install qcg grid-mapfile generator 
    206 # 
    207 yum install qcg-gridmapfilegenerator 
    208 # 
    209 # 2.  configure gridmapfilegenerator - remember to change  
    210 # * url property to your local ldap replica 
    211 # * search base 
    212 # * filter expression 
    213 # * security context 
    214 vim /etc/qcg/qcg-gridmapfile/plggridmapfilegenerator.conf 
    215 # 
    216 # 3. run the gridmapfile generator in order to generate gridmapfile now 
    217 # 
    218 /usr/sbin/qcg-gridmapfilegenerator.sh  
    219 }}} 
    220 }}} 
    221  
    222 After installing and running this tool one can find three files: 
    223  * /etc/grid-security/grid-mapfile.local - here you can put list of DN and local unix accounts name that will be merged with data acquired from local LDAP server 
    224  * /etc/grid-security/grid-mapfile.deny - here you can put list od DN's (only DNs!) that you want to deny access to the QCG-Computing service 
    225  * /etc/grid-security/grid-mapfile - the final gridmap file generated using the above two files and information available in local LDAP server. Do not edit this file as it is generated automatically! 
    226  
    227 This gridmapfile generator script is run every 10 minutes. Moreover its issues  `su - $USERNAME -c 'true' > /dev/null`  for every new user that do not have yet home directory (thus triggering pam_mkhomedir if installed). 
    228  
    229 At the end add mapping in the `grid-mapfile.local` for the purpose of QCG-Broker. 
    230 {{{ 
    231 "/C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl" qcg-broker 
    232 }}} 
    233 =  Scheduler configuration = 
    234 == !Maui/Moab == 
    235 Add appropriate rights for the `qcg-comp` and `qcg-broker` users in the Maui scheduler configuaration file: 
    236 {{{ 
    237 #!div style="font-size: 90%" 
    238 {{{#!default 
    239 vim /var/spool/maui/maui.cfg 
    240 # primary admin must be first in list 
    241 ADMIN1                root 
    242 ADMIN2                qcg-broker 
    243 ADMIN3                qcg-comp 
    244 }}} 
    245 }}} 
    246 == SLURM == 
    247 The QCG-Broker certificate should be mapped on the SLURM user that is authorized to create advance reservation. 
    248 = Service certificates = 
    249 Copy the service certificate and key into the `/opt/plgrid/qcg/etc/qcg-comp/certs/`. Remember to set appropriate rights to the key file. 
    250 {{{ 
    251 #!div style="font-size: 90%" 
    252 {{{#!default 
    253 cp /etc/grid-security/hostcert.pem /opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem 
    254 cp /etc/grid-security/hostkey.pem /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem 
    255 chown qcg-comp /opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem 
    256 chown qcg-comp /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem  
    257 chmod 0600 /opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem 
    258 }}} 
    259 }}} 
    260 =  DRMAA library = 
    261 == Torque/PBS Professional == 
    262 Install via YUM repository:  
    263 {{{ 
    264 #!div style="font-size: 90%" 
    265 {{{#!default 
    266 yum install pbs-drmaa #Torque 
    267 yum install pbspro-drmaa #PBS Proffesional 
    268 }}} 
    269 }}} 
    270  
    271 Alternatively compile DRMAA using source package downloaded from [http://sourceforge.net/projects/pbspro-drmaa/ SourceForge]. 
    272  
    273 After installation  you need '''either''': 
    274 * configure the DRMAA library to use Torque logs ('''RECOMMENDED'''). Sample configuration file of the DRMAA library (`/opt/plgrid/qcg/etc/pbs_drmaa.conf`): 
    275 {{{ 
    276 #!div style="font-size: 90%" 
    277 {{{#!default 
    278 # pbs_drmaa.conf - Sample pbs_drmaa configuration file. 
    279    
    280 wait_thread: 1, 
    281    
    282 pbs_home: "/var/spool/pbs", 
    283      
    284 cache_job_state: 600, 
    285 }}} 
    286 }}} 
    287  '''Note:''' Remember to mount server log directory as described in the eariler [[InstallationOnSeparateMachine|note]]. 
    288  
    289 '''or''' 
    290 * configure Torque to keep information about completed jobs (e.g.: by setting: `qmgr -c 'set server keep_completed = 300'`). If running in such configuration try to provide more resources (e.g. two cores instead of one) for the VM that hosts the service. Moreover tune the DRMAA configuration in order to throttle polling rate: 
    291 {{{ 
    292 #!div style="font-size: 90%" 
    293 {{{#!default 
    294    
    295 wait_thread: 1, 
    296 cache_job_state: 60, 
    297 pool_delay: 60, 
    298    
    299 }}} 
    300 }}} 
    301    
    302 It is possible to set the default queue by setting default job category (in the `/opt/plgrid/qcg/etc/pbs_drmaa.conf` file): 
    303 {{{ 
    304 #!div style="font-size: 90%" 
    305 {{{#!default 
    306 job_categories: { 
    307       default: "-q plgrid", 
    308 }, 
    309 }}} 
    310 }}} 
    311  
    312 == SLURM == 
    313 Install DRMAA for SLURM using source package available at [http://apps.man.poznan.pl/trac/slurm-drmaa SLURM DRMAA home page] 
    314 = Service configuration  = 
    315 Edit the preinstalled service configuration file (`/opt/plgrid/qcg/etc/qcg-comp/qcg-comp`**d**`.xml`): 
    316 {{{ 
    317 #!div style="font-size: 90%" 
    318 {{{#!xml 
    319 <?xml version="1.0" encoding="UTF-8"?> 
    320 <sm:QCGCore 
    321         xmlns:sm="http://schemas.qoscosgrid.org/core/2011/04/config" 
    322         xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" 
    323         xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" 
    324         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    325          
    326         <Configuration> 
    327                 <sm:ModuleManager> 
    328                         <sm:Directory>/opt/plgrid/qcg/lib/qcg-core/modules/</sm:Directory> 
    329                         <sm:Directory>/opt/plgrid/qcg/lib/qcg-comp/modules/</sm:Directory> 
    330                 </sm:ModuleManager> 
    331    
    332                 <sm:Service xsi:type="qcg-compd" description="QCG-Computing"> 
    333                         <sm:Logger> 
    334                                 <sm:Filename>/opt/plgrid/var/log/qcg-comp/qcg-compd.log</sm:Filename> 
    335                                 <sm:Level>INFO</sm:Level> 
    336                         </sm:Logger> 
    337    
    338                         <sm:Transport> 
    339                         <sm:Module xsi:type="sm:ecm_gsoap.service"> 
    340                            <sm:Host>frontend.example.com</sm:Host> 
    341                            <sm:Port>19000</sm:Port> 
    342                            <sm:KeepAlive>false</sm:KeepAlive> 
    343                            <sm:Authentication> 
    344                                    <sm:Module xsi:type="sm:atc_transport_gsi.service"> 
    345                                            <sm:X509CertFile>/opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem</sm:X509CertFile> 
    346                                            <sm:X509KeyFile>/opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem</sm:X509KeyFile> 
    347                                    </sm:Module> 
    348                            </sm:Authentication> 
    349                            <sm:Authorization> 
    350                                    <sm:Module xsi:type="sm:atz_mapfile"> 
    351                                            <sm:Mapfile>/etc/grid-security/grid-mapfile</sm:Mapfile> 
    352                                    </sm:Module> 
    353                            </sm:Authorization> 
    354                         </sm:Module> 
    355                             <sm:Module xsi:type="smc:qcg-comp-service"/> 
    356                         </sm:Transport> 
    357                          
    358                         <sm:Module xsi:type="pbs_jsdl_filter"/> 
    359                         <sm:Module xsi:type="atz_ardl_filter"/> 
    360                         <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/monitoring.py"/> 
    361                         <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/plgrid_info.py"/> 
    362                         <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/modules_info.py"/> 
    363  
    364                         <sm:Module xsi:type="submission_drmaa" path="/opt/plgrid/qcg/lib/libdrmaa.so"/> 
    365                         <sm:Module xsi:type="reservation_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/reservation_maui.py"/> 
    366                          
    367                         <sm:Module xsi:type="notification_wsn"> 
    368                                <PublishedBrokerURL>https://frontend.example.com:19011/</PublishedBrokerURL> 
    369                                 <sm:Module xsi:type="sm:ecm_gsoap.client"> 
    370                                                 <sm:ServiceURL>http://localhost:19001/</sm:ServiceURL> 
    371                                                         <sm:Authentication> 
    372                                                                 <sm:Module xsi:type="sm:atc_transport_http.client"/> 
    373                                                         </sm:Authentication> 
    374                                                 <sm:Module xsi:type="sm:ntf_client"/> 
    375                                 </sm:Module> 
    376                         </sm:Module> 
    377                                  
    378                         <sm:Module xsi:type="application_mapper"> 
    379                                 <ApplicationMapFile>/opt/plgrid/qcg/etc/qcg-comp/application_mapfile</ApplicationMapFile> 
    380                         </sm:Module> 
    381    
    382                         <Database> 
    383                                 <DSN>qcg-comp</DSN> 
    384                                 <User>qcg-comp</User> 
    385                                 <Password>qcg-comp</Password> 
    386                         </Database> 
    387    
    388                         <UnprivilegedUser>qcg-comp</UnprivilegedUser> 
    389                         <!--UseScratch>true</UseScratch> uncomment this if scratch is the only file system shared between the worker nodes and this machine --> 
    390    
    391                         <FactoryAttributes> 
    392                                 <CommonName>klaster.plgrid.pl</CommonName> 
    393                                 <LongDescription>PL Grid cluster</LongDescription> 
    394                         </FactoryAttributes> 
    395                 </sm:Service> 
    396    
    397         </Configuration> 
    398 </sm:QCGCore> 
    399 }}} 
    400 }}} 
    401 == Common == 
    402 In most cases it should be enough to change only following elements: 
    403  `Transport/Module/Host` :: 
    404    the hostname of the machine where the service is deployed  
    405  `Transport/Module/Authentication/Module/X509CertFile`  and  `Transport/Module/Authentication/Module/X509KeyFile` ::  
    406   the service private key and X.509 certificate. Make sure that the key and certificate is owned by the `qcg-comp` user.  If you installed cert and key file in the recommended location you do not need to edit these fields. 
    407  `Module[type="smc:notification_wsn"]/PublishedBrokerURL` ::  
    408   the external URL of the QCG-Notification service (You can do it later, i.e. after [http://www.qoscosgrid.org/trac/qcg-notification/wiki/installation_in_PL-Grid installing the QCG-Notification service]) 
    409  `Module[type="smc:notification_wsn"]/Module/ServiceURL` ::  
    410   the localhost URL of the QCG-Notification service (You can do it later, i.e. after [http://www.qoscosgrid.org/trac/qcg-notification/wiki/installation_in_PL-Grid installing the QCG-Notification service]) 
    411  `Module[type="submission_drmaa"]/@path` :: 
    412   path to the DRMAA library (the `libdrmaa.so`). Also, if you installed the DRMAA library using provided SRC RPM you do not need to change this path. 
    413  `Module[type="reservation_python"]/@path` :: 
    414   path to the reservation module. Change this if you are using different scheduler than Maui (e.g. use `reservation_moab.py` for Moab, `reservation_pbs.py` for PBS Pro) 
    415  `Database/Password` ::  
    416   the `qcg-comp` database password   
    417   `UseScratch` :: 
    418   set this to `true` if you set QCG_SCRATCH_DIR_ROOT in `sysconfig` so any job will be started from scratch directory (instead of default home directory) 
    419  `FactoryAttributes/CommonName` ::  
    420   a common name of the cluster (e.g. reef.man.poznan.pl). You can use any name that is unique among all systems (e.g. cluster name + domain name of your institution) 
    421  `FactoryAttributes/LongDescription` ::  
    422   a human readable description of the cluster 
    423 == Torque == 
    424  `Module[type="reservation_python"]/@path` :: 
    425   path to the reservation module. Change this if you are using different scheduler than Maui (e.g. use `reservation_moab.py` for Moab) 
    426 == PBS Professional == 
    427  `Module[type="reservation_python"]/@path` :: 
    428   path to the reservation module. Change this to `reservation_pbs.py`. 
    429 == SLURM == 
    430  
    431    `Module[type="reservation_python"]/@path` :: 
    432   path to the reservation module. Change this to `reservation_slurm.py`. 
    433  
    434 and replace: 
    435 {{{ 
    436                          <sm:Module xsi:type="pbs_jsdl_filter"/> 
    437 }}} 
    438 with: 
    439 {{{ 
    440                         <sm:Module xsi:type="slurm_jsdl_filter"/> 
    441 }}} 
    442 = Restricting advance reservation = 
    443  
    444 By default the QCG-Computing service can reserve any number of hosts. One can limit it by configuring the !Maui/Moab scheduler and the QCG-Computing service properly: 
    445  
    446 * In !Maui/Moab mark some subset of nodes, using the partition mechanism, as reservable for QCG-Computing: 
    447 {{{ 
    448 #!div style="font-size: 90%" 
    449 {{{#!default 
    450 # all users can use both the DEFAULT and RENABLED partition 
    451 SYSCFG           PLIST=DEFAULT,RENABLED 
    452 #in Moab you should use 0 instead DEFAULT 
    453 #SYSCFG           PLIST=0,RENABLED 
    454    
    455 # mark some set of the machines (e.g. 64 nodes) as reservable 
    456 NODECFG[node01] PARTITION=RENABLED 
    457 NODECFG[node02] PARTITION=RENABLED 
    458 NODECFG[node03] PARTITION=RENABLED 
    459 ... 
    460 NODECFG[node64] PARTITION=RENABLED 
    461  
    462 }}} 
    463 }}} 
    464  
    465 * Tell the QCG-Computing to limit reservation to the aforementioned partition by editing the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` configuration file: 
    466  
    467 {{{ 
    468 #!div style="font-size: 90%" 
    469 {{{#!default 
    470 export QCG_AR_MAUI_PARTITION="RENABLED" 
    471 }}} 
    472 }}} 
    473  
    474 * Moreover the QCG-Computing (since version 2.4) can enforce limits on maximal reservations duration length (default: one week) and size (measured in number of slots reserved): 
    475 {{{ 
    476 #!div style="font-size: 90%" 
    477 {{{#!default 
    478 ... 
    479                         <ReservationsPolicy> 
    480                                 <MaxDuration>24</MaxDuration> <!-- 24 hours --> 
    481                                 <MaxSlots>100</MaxSlots> 
    482                         </ReservationsPolicy> 
    483 ... 
    484 }}} 
    485 }}} 
    486  
    487 = Restricted node access (Torque/PBS-Professional only) = 
    488 Read this section only if the system is configured in such way that not all nodes are accesible using any queue/user. In such case you should provide nodes filter expression in the sysconfig file (`/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd`). Examples: 
    489 * Provide information about nodes that was taged with `qcg` property  
    490 {{{ 
    491 export QCG_NODE_FILTER=properties:qcg 
    492 }}} 
    493 * Provide information about all nodes except those tagged as `gpgpu` 
    494 {{{ 
    495 export QCG_NODE_FILTER=properties:~gpgpu 
    496 }}} 
    497 * Provide information only about resources that have `hp` as the `epoch` value: 
    498 {{{ 
    499 export QCG_NODE_FILTER=resources_available.epoch:hp 
    500 }}} 
    501 In general the `QCG_NODE_FILTER` must adhere the following syntax: 
    502 {{{ 
    503 pbsnodes-attr:regular-expression  
    504 }}} 
    505 or if you want to reverse semantic (i.e. all nodes except those matching the expression) 
    506 {{{ 
    507 pbsnodes-attr:~regular-expression  
    508 }}} 
    509 = Configuring QCG-Accounting = 
    510 Please use [http://www.qoscosgrid.org/trac/qcg-computing/wiki/QCG-Accounting QCG-Accounting] agent. You must enable `bat` as one of the  publisher plugins. 
    511  
    512 = Creating applications' script space = 
    513 A common case for the QCG-Computing service is that an application is accessed using abstract app name rather than specifying absolute executable path. The application name/version to executbale path mappings are stored in the file `/opt/plgrid/qcg/etc/qcg-comp/application_mapfile`: 
    514  
    515 {{{ 
    516 #!div style="font-size: 90%" 
    517 {{{#!default 
    518 cat /opt/plgrid/qcg/etc/qcg-comp/application_mapfile 
    519 # ApplicationName ApplicationVersion Executable 
    520  
    521 date * /bin/date 
    522 LPSolve 5.5 /usr/local/bin/lp_solve 
    523 QCG-OMPI /opt/QCG/qcg/share/qcg-comp/tools/cross-mpi.starter 
    524 }}} 
    525 }}} 
    526  
    527 It is also common to provide here  wrapper scripts rather than target executables. The wrapper script can handle such aspects of the application lifetime like: environment initialization,  copying files from/to scratch storage and application monitoring. It is recommended to create separate directory for those wrapper scripts (e.g. the application partition) for an applications and add write permission to them to the QCG Developers group. This directory must be readable by all users and from every worker node (the application partition usually fullfils those requirements). For example: 
    528  
    529 {{{ 
    530 #!div style="font-size: 90%" 
    531 {{{#!default 
    532 mkdir /opt/exp_soft/plgrid/qcg-app-scripts 
    533 chown :qcg-dev /opt/exp_soft/plgrid/qcg-app-scripts 
    534 chmod g+rwx /opt/exp_soft/plgrid/qcg-app-scripts 
    535 }}} 
    536 }}} 
    537  
    538 More on [ApplicationScripts Application Scripts]. 
    539 = Note on the security model = 
    540 The QCG-Computing can be configured with various authentication and authorization modules. However in the typical deployment we assume that the QCG-Computing is configured as in the above example, i.e.: 
    541 * authentication is provided on basics of ''httpg'' protocol, 
    542 * authorization is based on the local `grid-mapfile` mapfile. 
    543  
    544 = Starting the service = 
    545 As root type: 
    546 {{{ 
    547 #!div style="font-size: 90%" 
    548 {{{#!sh 
    549 /etc/init.d/qcg-compd start 
    550 }}} 
    551 }}} 
    552  
    553 The service logs can be found in: 
    554 {{{ 
    555 #!div style="font-size: 90%" 
    556 {{{#!sh 
    557 /opt/plgrid/var/log/qcg-comp/qcg-comp.log 
    558 }}} 
    559 }}} 
    560  
    561 The service assumes that the following commands are in the standard search path: 
    562 * `pbsnodes` 
    563 * `showres` 
    564 * `setres` 
    565 * `releaseres` 
    566 * `checknode` 
    567 If any of the above commands is not installed in a standard location (e.g. `/usr/bin`) you may need to edit the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` file and set the `PATH` variable appropriately, e.g.: 
    568 {{{ 
    569 #!div style="font-size: 90%" 
    570 {{{#!sh 
    571 # INIT_WAIT=5 
    572 # 
    573 # DRM specific options 
    574   
    575 export PATH=$PATH:/opt/maui/bin 
    576 }}} 
    577 }}} 
    578  
    579  
    580 If you compiled DRMAA with logging switched on you can set there also DRMAA logging level: 
    581 {{{ 
    582 #!div style="font-size: 90%" 
    583 {{{#!sh 
    584 # INIT_WAIT=5 
    585 # 
    586 # DRM specific options 
    587  
    588 export DRMAA_LOG_LEVEL=INFO 
    589 }}} 
    590 }}} 
    591  
    592 Also provide the location of the root scratch directory if its accessible from the QCG-Computing machine: 
    593 {{{ 
    594 #!div style="font-size: 90%" 
    595 {{{#!sh 
    596 # INIT_WAIT=5 
    597 # 
    598  
    599 export QCG_SCRATCH_DIR_ROOT="/mnt/lustre/scratch/people/" 
    600 }}} 
    601 }}} 
    602  
    603  
    604 **Note:** In current version, whenever you restart the PosgreSQL server you need also restart the QCG-Computing and QCG-Notification service: 
    605  
    606 {{{ 
    607 #!div style="font-size: 90%" 
    608 {{{#!sh 
    609 /etc/init.d/qcg-compd restart 
    610 /etc/init.d/qcg-ntfd restart 
    611 }}} 
    612 }}} 
    613  
    614 = Stopping the service = 
    615 The service can be stopped using the following command: 
    616 {{{ 
    617 #!div style="font-size: 90%" 
    618 {{{#!sh 
    619 /etc/init.d/qcg-compd stop 
    620 }}} 
    621 }}} 
    622  
    623 = Verifying the installation = 
    624  
    625 *  For convenience you can install the qcg environment module: 
    626 {{{ 
    627 #!div style="font-size: 90%" 
    628 {{{#!sh 
    629 cp /opt/plgrid/qcg/share/qcg-core/misc/qcg.module /usr/share/Modules/modulefiles/qcg 
    630 module load qcg 
    631 }}} 
    632 }}} 
    633 *  Edit the QCG-Computing client configuration file (`/opt/plgrid/qcg/etc/qcg-comp/qcg-comp.xml`): 
    634  *  set the `Host` and `Port` to reflects the changes in the service configuration file (`qcg-compd.xml`). 
    635 {{{ 
    636 #!div style="font-size: 90%" 
    637 {{{#!sh 
    638 <?xml version="1.0" encoding="UTF-8"?> 
    639 <sm:QCGCore 
    640        xmlns:sm="http://schemas.qoscosgrid.org/core/2011/04/config" 
    641        xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" 
    642        xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" 
    643        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    644    
    645        <Configuration> 
    646                <sm:ModuleManager> 
    647                        <sm:Directory>/opt/qcg/lib/qcg-core/modules/</sm:Directory> 
    648                        <sm:Directory>/opt/qcg/lib/qcg-comp/modules/</sm:Directory> 
    649                </sm:ModuleManager> 
    650   
    651                <sm:Client xsi:type="qcg-comp" description="QCG-Computing client"> 
    652                        <sm:Transport> 
    653                                <sm:Module xsi:type="sm:ecm_gsoap.client"> 
    654                                        <sm:ServiceURL>httpg://frontend.example.com:19000/</sm:ServiceURL> 
    655                                        <sm:Authentication> 
    656                                                <sm:Module xsi:type="sm:atc_transport_gsi.client"/> 
    657                                        </sm:Authentication> 
    658                                        <sm:Module xsi:type="smc:qcg-comp-client"/> 
    659                                </sm:Module> 
    660                        </sm:Transport> 
    661                </sm:Client> 
    662        </Configuration> 
    663 </sm:qcgCore> 
    664 }}} 
    665 }}} 
    666 * Initialize your credentials: 
    667 {{{ 
    668 #!div style="font-size: 90%" 
    669 {{{#!sh 
    670 grid-proxy-init -rfc 
    671 Your identity: /O=Grid/OU=QosCosGrid/OU=PSNC/CN=Mariusz Mamonski 
    672 Enter GRID pass phrase for this identity: 
    673 Creating proxy .................................................................. Done 
    674 Your proxy is valid until: Wed Apr  6 05:01:02 2012 
    675 }}} 
    676 }}} 
    677 * Query the QCG-Computing service: 
    678 {{{ 
    679 #!div style="font-size: 90%" 
    680 {{{#!sh 
    681 qcg-comp -G | xmllint --format - # the xmllint is used only to present the result in more pleasant way 
    682    
    683 <bes-factory:FactoryResourceAttributesDocument xmlns:bes-factory="http://schemas.ggf.org/bes/2006/08/bes-factory"> 
    684     <bes-factory:IsAcceptingNewActivities>true</bes-factory:IsAcceptingNewActivities> 
    685     <bes-factory:CommonName>IT cluster</bes-factory:CommonName> 
    686     <bes-factory:LongDescription>IT department cluster for public   use</bes-factory:LongDescription> 
    687     <bes-factory:TotalNumberOfActivities>0</bes-factory:TotalNumberOfActivities> 
    688     <bes-factory:TotalNumberOfContainedResources>1</bes-factory:TotalNumberOfContainedResources> 
    689     <bes-factory:ContainedResource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="bes-factory:BasicResourceAttributesDocumentType"> 
    690         <bes-factory:ResourceName>worker.example.com</bes-factory:ResourceName> 
    691         <bes-factory:CPUArchitecture> 
    692             <jsdl:CPUArchitectureName xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl">x86_32</jsdl:CPUArchitectureName> 
    693         </bes-factory:CPUArchitecture> 
    694         <bes-factory:CPUCount>4</bes-factory:CPUCount><bes-factory:PhysicalMemory>1073741824</bes-factory:PhysicalMemory> 
    695     </bes-factory:ContainedResource> 
    696     <bes-factory:NamingProfile>http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing</bes-factory:NamingProfile>  
    697     <bes-factory:BESExtension>http://schemas.ogf.org/hpcp/2007/01/bp/BasicFilter</bes-  factory:BESExtension> 
    698     <bes-factory:BESExtension>http://schemas.qoscosgrid.org/comp/2011/04</bes-factory:BESExtension> 
    699     <bes-factory:LocalResourceManagerType>http://example.com/SunGridEngine</bes-factory:LocalResourceManagerType> 
    700     <smcf:NotificationProviderURL xmlns:smcf="http://schemas.qoscosgrid.org/comp/2011/04/factory">http://localhost:2211/</smcf:NotificationProviderURL> 
    701 </bes-factory:FactoryResourceAttributesDocument> 
    702 }}} 
    703 }}} 
    704 * Submit a sample job: 
    705 {{{ 
    706 #!div style="font-size: 90%" 
    707 {{{#!sh 
    708 qcg-comp -c -J /opt/plgrid/qcg/share/qcg-comp/doc/examples/jsdl/date.xml 
    709 Activity Id: ccb6b04a-887b-4027-633f-412375559d73 
    710 }}} 
    711 }}} 
    712 * Query it status: 
    713 {{{ 
    714 #!div style="font-size: 90%" 
    715 {{{#!sh 
    716 qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 
    717 status = Executing 
    718 qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 
    719 status = Executing 
    720 qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 
    721 status = Finished 
    722 exit status = 0 
    723 }}} 
    724 }}} 
    725 * Create an advance reservation: 
    726  * copy the provided sample reservation description file (expressed in ARDL - Advance Reservation Description Language) 
    727 {{{ 
    728 #!div style="font-size: 90%" 
    729 {{{#!sh 
    730 cp /opt/plgrid/qcg/share/qcg-comp/doc/examples/ardl/oneslot.xml oneslot.xml 
    731 }}} 
    732 }}} 
    733  * Edit the `oneslot.xml` and modify the `StartTime` and `EndTime` to dates that are in the near future, 
    734  * Create a new reservation: 
    735 {{{ 
    736 #!div style="font-size: 90%" 
    737 {{{#!sh 
    738 qcg-comp -c -D oneslot.xml 
    739 Reservation Id: aab6b04a-887b-4027-633f-412375559d7d 
    740 }}} 
    741 }}} 
    742  * List all reservations: 
    743 {{{ 
    744 #!div style="font-size: 90%" 
    745 {{{#!sh 
    746 qcg-comp -l 
    747 Reservation Id: aab6b04a-887b-4027-633f-412375559d7d 
    748 Total number of reservations: 1 
    749 }}} 
    750 }}} 
    751  * Check which hosts where reserved: 
    752 {{{ 
    753 #!div style="font-size: 90%" 
    754 {{{#!sh 
    755 qcg-comp -s -r aab6b04a-887b-4027-633f-412375559d7d 
    756 Reserved hosts: 
    757 worker.example.com[used=0,reserved=1,total=4] 
    758 }}} 
    759 }}} 
    760  * Delete the reservation: 
    761 {{{ 
    762 #!div style="font-size: 90%" 
    763 {{{#!sh 
    764 qcg-comp -t -r aab6b04a-887b-4027-633f-412375559d7d 
    765 Reservation terminated. 
    766 }}} 
    767 }}} 
    768  * Check if the grid-ftp is working correctly: 
    769 {{{ 
    770 #!div style="font-size: 90%" 
    771 {{{#!sh 
    772 globus-url-copy gsiftp://your.local.host.name/etc/profile profile 
    773 diff /etc/profile profile 
    774 }}} 
    775 }}} 
    776  
    777 = Maintenance = 
    778 The historic usage information is stored in two relations of the QCG-Computing database: `jobs_acc` and `reservations_acc`. You can always archive old usage data to a file  and delete it from the database using the psql client: 
    779 {{{ 
    780 #!div style="font-size: 90%" 
    781 {{{#!sh 
    782 psql -h localhost qcg-comp qcg-comp  
    783 Password for user qcg-comp:  
    784 Welcome to psql 8.1.23, the PostgreSQL interactive terminal. 
    785    
    786 Type:  \copyright for distribution terms 
    787      \h for help with SQL commands 
    788      \? for help with psql commands 
    789      \g or terminate with semicolon to execute query 
    790      \q to quit 
    791  
    792 qcg-comp=> \o jobs.acc 
    793 qcg-comp=> SELECT * FROM jobs_acc where end_time < date '2010-01-10'; 
    794 qcg-comp=> \o reservations.acc 
    795 qcg-comp=> SELECT * FROM reservations_acc where end_time < date '2010-01-10'; 
    796 qcg-comp=> \o 
    797 qcg-comp=> DELETE FROM jobs_acc where end_time < date '2010-01-10'; 
    798 qcg-comp=> DELETE FROM reservation_acc where end_time < date '2010-01-10'; 
    799 }}} 
    800 }}} 
    801  
    802 you should also install logrotate configuration for QCG-Computing: 
    803 {{{ 
    804 #!div style="font-size: 90%" 
    805 {{{#!sh 
    806 yum install  qcg-comp-logrotate 
    807 }}} 
    808 }}} 
    809 **Important**: On any update/restart of the PostgreSQL database you must restart also the qcg-compd and qcg-ntfd services. 
    810 {{{ 
    811 /etc/init.d/qcg-compd restart 
    812 /etc/init.d/qcg-ntfd restart 
    813 }}} 
    814 On scheduled downtimes we recommend to disable submission in the service configuration file: 
    815 {{{ 
    816 ... 
    817    <AcceptingNewActivities>false</AcceptingNewActivities> 
    818 <FactoryAttributes> 
    819 }}} 
    820 = PL-Grid Grants Support = 
    821 Since version 2.2.7 QCG-Computing is integrated with PL-Grid grants system. The integration with grant system has three main interaction points: 
    822 * QCG-Computing can accept jobs which has grant id set explicitly. One must use the `<jsdl:JobProject>` element, e.g.: 
    823  
    824 {{{ 
    825 #!div style="font-size: 90%" 
    826 {{{#!sh 
    827 <?xml version="1.0" encoding="UTF-8"?> 
    828  
    829 <jsdl:JobDefinition 
    830  xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" 
    831  xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" 
    832  xmlns:jsdl-qcg-comp-factory="http://schemas.qoscosgrid.org/comp/2011/04/jsdl/factory"> 
    833    <jsdl:JobDescription> 
    834       <jsdl:JobIdentification> 
    835          <jsdl:JobProject>Manhattan</jsdl:JobProject> 
    836       </jsdl:JobIdentification> 
    837       <jsdl:Application> 
    838          <jsdl-hpcpa:HPCProfileApplication> 
    839         ... 
    840 }}} 
    841 }}} 
    842  
    843 * QCG-Computing can provide information about the local grants to the upper layers (e.g. QCG-Broker), so they can use for scheduling purpose. One can enable it by adding the following line to the QCG-Computing configuration file (qcg-compd.xml): 
    844 {{{ 
    845 #!div style="font-size: 90%" 
    846 {{{#!sh 
    847 </sm:Transport> 
    848 ... 
    849 <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/plgrid_info.py"/> 
    850 }}} 
    851 }}} 
    852 Please note that this module requires the [#LDAPgeneratedgridmapfile qcg-gridmapfilegenerator] to be installed. 
    853 * The grant id is provided in resource usage record sent to the BAT accounting service 
    854 == Configuring PBS DRMA submit filter == 
    855 In order to enforce PL-Grid grant policy you must configure PBS DRMAA submit filter by editing the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` and adding variable pointing to the DRMAA submit filter, e.g.: 
    856 {{{ 
    857 export PBSDRMAA_SUBMIT_FILTER="/software/grid/plgrid/qcg-app-scripts/app-scripts/tools/plgrid-grants/pbsdrmaa_submit_filter.py" 
    858 }}} 
    859 An example submit filter can be found in !QosCosGrid svn: 
    860 {{{ 
    861 svn co https://apps.man.poznan.pl/svn/qcg-computing/trunk/app-scripts/tools/plgrid-grants 
    862 }}} 
    863 More about PBS DRMAA submit filters can be found [[http://apps.man.poznan.pl/trac/pbs-drmaa/wiki/WikiStart#Submitfilter|here]]. 
    864 = GOCDB = 
    865 Please remember to register the QCG-Computing and QCG-Notification services in the GOCDB using the QCG.Computing and QCG.Notification services types respectively.