93 | | |
94 | | * add the public key of the QCG repository to your trusted keys in the apt configuration: |
95 | | {{{#!sh |
96 | | wget https://apps.man.poznan.pl/trac/qcg-notification/raw-attachment/wiki/InstallingUsingDeb/qcg.pub |
97 | | apt-key add qcg.pub |
98 | | }}} |
99 | | |
100 | | * refresh the packages list: |
101 | | {{{#!sh |
102 | | apt-get update |
103 | | }}} |
104 | | |
105 | | * install QCG-Computing: |
106 | | {{{ |
107 | | #!div style="font-size: 90%" |
108 | | {{{#!sh |
109 | | apt-get install qcg-comp qcg-comp-client qcg-comp-doc |
110 | | }}} |
111 | | }}} |
112 | | |
113 | | * setup the QCG-Computing database as described [http://apps.man.poznan.pl/trac/qcg-computing/wiki/InstallingFromSources#Databasesetup here]. |
114 | | |
115 | | |
116 | | |
117 | | |
| 95 | }}} |
| 96 | |
| 97 | |
| 98 | * install QCG-Computing using YUM Package Manager: |
| 99 | {{{ |
| 100 | #!div style="font-size: 90%" |
| 101 | {{{#!sh |
| 102 | yum install qcg-comp qcg-comp-client qcg-comp-logrotate |
| 103 | }}} |
| 104 | }}} |
| 105 | |
| 106 | * install grid-ftp server using this [[GridFTPInstallation|instruction]]. |
| 107 | |
| 108 | * setup QCG-Computing database using provided script: |
| 109 | {{{ |
| 110 | #!div style="font-size: 90%" |
| 111 | {{{#!sh |
| 112 | /usr/share/qcg-comp/tools/qcg-comp-install.sh |
| 113 | Welcome to qcg-comp installation script! |
| 114 | |
| 115 | This script will guide you through process of configuring proper environment |
| 116 | for running the QCG-Computing service. You have to answer few questions regarding |
| 117 | parameters of your database. If you are not sure just press Enter and use the |
| 118 | default values. |
| 119 | |
| 120 | Use local PostgreSQL server? (y/n) [y]: y |
| 121 | Database [qcg-comp]: |
| 122 | User [qcg-comp]: |
| 123 | Password [RAND-PASSWD]: MojeTajneHaslo |
| 124 | Create database? (y/n) [y]: y |
| 125 | Create user? (y/n) [y]: y |
| 126 | |
| 127 | Checking for system user qcg-comp...OK |
| 128 | Checking whether PostgreSQL server is installed...OK |
| 129 | Checking whether PostgreSQL server is running...OK |
| 130 | |
| 131 | Performing installation |
| 132 | * Creating user qcg-comp...OK |
| 133 | * Creating database qcg-comp...OK |
| 134 | * Creating database schema...OK |
| 135 | * Checking for ODBC data source qcg-comp... |
| 136 | * Installing ODBC data source...OK |
| 137 | |
| 138 | Remember to add appropriate entry to /var/lib/pgsql/data/pg_hba.conf (as the first rule!) to allow user qcg-comp to |
| 139 | access database qcg-comp. For instance: |
| 140 | |
| 141 | host qcg-comp qcg-comp 127.0.0.1/32 md5 |
| 142 | |
| 143 | and reload Postgres server. |
| 144 | }}} |
| 145 | }}} |
| 146 | |
| 147 | Add a new rule to the pg_hba.conf as requested: |
| 148 | {{{ |
| 149 | #!div style="font-size: 90%" |
| 150 | {{{#!sh |
| 151 | vim /var/lib/pgsql/data/pg_hba.conf |
| 152 | /etc/init.d/postgresql reload |
| 153 | }}} |
| 154 | }}} |
| 155 | Install EGI Accepted CA certificates (this also install the Polish Grid CA): |
| 156 | {{{ |
| 157 | #!div style="font-size: 90%" |
| 158 | {{{ |
| 159 | cd /etc/yum.repos.d/ |
| 160 | wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo |
| 161 | yum clean all |
| 162 | yum install ca-policy-egi-core |
| 163 | }}} |
| 164 | }}} |
| 165 | The above instructions were based on this [https://wiki.egi.eu/wiki/EGI_IGTF_Release manual] |
| 166 | |
| 167 | Install PL-Grid Simpla-CA certificate (not part of IGTF): |
| 168 | {{{ |
| 169 | #!div style="font-size: 90%" |
| 170 | {{{#!sh |
| 171 | wget http://software.plgrid.pl/packages/general/ca_PLGRID-SimpleCA-1.0-2.noarch.rpm |
| 172 | rpm -i ca_PLGRID-SimpleCA-1.0-2.noarch.rpm |
| 173 | #install certificate revocation list fetching utility |
| 174 | wget https://dist.eugridpma.info/distribution/util/fetch-crl/fetch-crl-2.8.5-1.noarch.rpm |
| 175 | rpm -i fetch-crl-2.8.5-1.noarch.rpm |
| 176 | #get fresh CRLs now |
| 177 | /usr/sbin/fetch-crl |
| 178 | #install cron job for it |
| 179 | cat > /etc/cron.daily/fetch-crl.cron << EOF |
| 180 | #!/bin/sh |
| 181 | /usr/sbin/fetch-crl |
| 182 | EOF |
| 183 | chmod a+x /etc/cron.daily/fetch-crl.cron |
| 184 | }}} |
| 185 | }}} |
| 186 | = The Grid Mapfile = |
| 187 | This tutorial assumes that the QCG-Computing service is configured in such way, that every authenticated user must be authorized against the `grid-mapfile`. This file can be created manually by an administrator (if the service is run in "test mode") or generated automatically based on the LDAP directory service. |
| 188 | === Manually created grid mapfile (for testing purpose only) === |
| 189 | {{{ |
| 190 | #!div style="font-size: 90%" |
| 191 | {{{#!default |
| 192 | #for test purpose only add mapping for your account |
| 193 | echo '"MyCertDN" myaccount' >> /etc/grid-security/grid-mapfile |
| 194 | }}} |
| 195 | }}} |
| 196 | === LDAP generated grid mapfile (PL-Grid only) === |
| 197 | {{{ |
| 198 | #!div style="font-size: 90%" |
| 199 | {{{#!default |
| 200 | # 0. install PL-Grid repository |
| 201 | rpm -Uvh http://software.plgrid.pl/packages/repos/plgrid-repos-2010-2.noarch.rpm |
| 202 | # |
| 203 | # 1. install qcg grid-mapfile generator |
| 204 | # |
| 205 | yum install qcg-gridmapfilegenerator |
| 206 | # |
| 207 | # 2. configure gridmapfilegenerator - remember to change |
| 208 | # * url property to your local ldap replica |
| 209 | # * search base |
| 210 | # * filter expression |
| 211 | # * security context |
| 212 | vim /etc/qcg/qcg-gridmapfile/plggridmapfilegenerator.conf |
| 213 | # |
| 214 | # 3. run the gridmapfile generator in order to generate gridmapfile now |
| 215 | # |
| 216 | /usr/sbin/qcg-gridmapfilegenerator.sh |
| 217 | }}} |
| 218 | }}} |
| 219 | |
| 220 | After installing and running this tool one can find three files: |
| 221 | * /etc/grid-security/grid-mapfile.local - here you can put list of DN and local unix accounts name that will be merged with data acquired from local LDAP server |
| 222 | * /etc/grid-security/grid-mapfile.deny - here you can put list od DN's (only DNs!) that you want to deny access to the QCG-Computing service |
| 223 | * /etc/grid-security/grid-mapfile - the final gridmap file generated using the above two files and information available in local LDAP server. Do not edit this file as it is generated automatically! |
| 224 | |
| 225 | This gridmapfile generator script is run every 10 minutes. Moreover its issues `su - $USERNAME -c 'true' > /dev/null` for every new user that do not have yet home directory (thus triggering pam_mkhomedir if installed). |
| 226 | |
| 227 | At the end add mapping in the `grid-mapfile.local` for the purpose of QCG-Broker. |
| 228 | {{{ |
| 229 | "/C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl" qcg-broker |
| 230 | }}} |
| 231 | = Scheduler configuration = |
| 232 | == !Maui/Moab == |
| 233 | Add appropriate rights for the `qcg-comp` and `qcg-broker` users in the Maui scheduler configuaration file: |
| 234 | {{{ |
| 235 | #!div style="font-size: 90%" |
| 236 | {{{#!default |
| 237 | vim /var/spool/maui/maui.cfg |
| 238 | # primary admin must be first in list |
| 239 | ADMIN1 root |
| 240 | ADMIN2 qcg-broker |
| 241 | ADMIN3 qcg-comp |
| 242 | }}} |
| 243 | }}} |
| 244 | == SLURM == |
| 245 | The QCG-Broker certificate should be mapped on the SLURM user that is authorized to create advance reservation. |
132 | | Install DRMAA for Torque/PBS Pro using source package available at [http://apps.man.poznan.pl/trac/pbs-drmaa PBS DRMAA home page] |
133 | | |
| 260 | Install via YUM repository: |
| 261 | {{{ |
| 262 | #!div style="font-size: 90%" |
| 263 | {{{#!default |
| 264 | yum install pbs-drmaa #Torque |
| 265 | yum install pbspro-drmaa #PBS Proffesional |
| 266 | }}} |
| 267 | }}} |
| 268 | |
| 269 | Alternatively compile DRMAA using source package downloaded from [http://sourceforge.net/projects/pbspro-drmaa/ SourceForge]. |
| 270 | |
| 271 | After installation you need '''either''': |
| 272 | * configure the DRMAA library to use Torque logs ('''RECOMMENDED'''). Sample configuration file of the DRMAA library (`/opt/plgrid/qcg/etc/pbs_drmaa.conf`): |
| 273 | {{{ |
| 274 | #!div style="font-size: 90%" |
| 275 | {{{#!default |
| 276 | # pbs_drmaa.conf - Sample pbs_drmaa configuration file. |
| 277 | |
| 278 | wait_thread: 1, |
| 279 | |
| 280 | pbs_home: "/var/spool/pbs", |
| 281 | |
| 282 | cache_job_state: 600, |
| 283 | }}} |
| 284 | }}} |
| 285 | '''Note:''' Remember to mount server log directory as described in the eariler [[InstallationOnSeparateMachine|note]]. |
| 286 | |
| 287 | '''or''' |
| 288 | * configure Torque to keep information about completed jobs (e.g.: by setting: `qmgr -c 'set server keep_completed = 300'`). If running in such configuration try to provide more resources (e.g. two cores instead of one) for the VM that hosts the service. Moreover tune the DRMAA configuration in order to throttle polling rate: |
| 289 | {{{ |
| 290 | #!div style="font-size: 90%" |
| 291 | {{{#!default |
| 292 | |
| 293 | wait_thread: 1, |
| 294 | cache_job_state: 60, |
| 295 | pool_delay: 60, |
| 296 | |
| 297 | }}} |
| 298 | }}} |
| 299 | |
| 300 | It is possible to set the default queue by setting default job category (in the `/opt/plgrid/qcg/etc/pbs_drmaa.conf` file): |
| 301 | {{{ |
| 302 | #!div style="font-size: 90%" |
| 303 | {{{#!default |
| 304 | job_categories: { |
| 305 | default: "-q plgrid", |
| 306 | }, |
| 307 | }}} |
| 308 | }}} |
158 | | xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" |
159 | | xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" |
160 | | xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> |
161 | | |
162 | | <Configuration> |
163 | | <sm:ModuleManager> |
164 | | <sm:Directory>/usr/lib/qcg-core/modules/</sm:Directory> |
165 | | <sm:Directory>/usr/lib/qcg-comp/modules/</sm:Directory> |
166 | | </sm:ModuleManager> |
167 | | |
168 | | <sm:Service xsi:type="qcg-compd" description="QCG Computing"> |
169 | | <sm:Logger> |
170 | | <sm:Filename>/var/log/qcg-comp/qcg-compd.log</sm:Filename> |
171 | | <sm:Level>INFO</sm:Level> |
172 | | </sm:Logger> |
173 | | |
174 | | <sm:Transport> |
175 | | <sm:Module xsi:type="sm:ecm_gsoap.service"> |
176 | | <sm:Host>localhost</sm:Host> |
177 | | <sm:Port>19000</sm:Port> |
178 | | </sm:Module> |
179 | | <sm:Module xsi:type="smc:qcg-comp-service"/> |
180 | | </sm:Transport> |
181 | | |
182 | | <sm:Authentication> |
183 | | <sm:Module xsi:type="sm:atc_transport_gsi.service"> |
184 | | <sm:X509CertFile>/etc/qcg-comp/certs/qcgcert.pem</sm:X509CertFile> |
185 | | <sm:X509KeyFile>/etc/qcg-comp/certs/qcgkey.pem</sm:X509KeyFile> |
186 | | </sm:Module> |
187 | | </sm:Authentication> |
188 | | |
189 | | <sm:Authorization> |
190 | | <sm:Module xsi:type="sm:atz_mapfile"> |
191 | | <sm:Mapfile>/etc/grid-security/grid-mapfile</sm:Mapfile> |
192 | | </sm:Module> |
193 | | </sm:Authorization> |
194 | | |
195 | | |
196 | | <sm:Module xsi:type="submission_drmaa" path="/usr/local/lib/libdrmaa.so"/> |
197 | | |
198 | | <!-- The jsdl filter module - uncomment module appropriate for your batch system --> |
199 | | <!-- sm:Module xsi:type="pbs_jsdl_filter"/--> |
200 | | <!-- sm:Module xsi:type="sge_jsdl_filter"/--> |
201 | | <!-- sm:Module xsi:type="slurm_jsdl_filter"/--> |
202 | | <!-- sm:Module xsi:type="lsf_jsdl_filter"/--> |
203 | | |
204 | | <!-- The reservation module - uncomment module appropriate for your batch/scheduler system --> |
205 | | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_sge.py"/--> |
206 | | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_maui.py"/--> |
207 | | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_moab.py"/--> |
208 | | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_pbs.py"/--> |
209 | | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_slurm.py"/--> |
210 | | <sm:Module xsi:type="atz_ardl_filter"/> |
211 | | |
212 | | <sm:Module xsi:type="sm:general_python" path="/usr/lib/qcg-comp/modules/python/monitoring.py"/> |
213 | | |
214 | | <sm:Module xsi:type="notification_wsn"> |
215 | | <sm:Module xsi:type="sm:ecm_gsoap.client" > |
216 | | <sm:ServiceURL>http://localhost:19001/</sm:ServiceURL> |
217 | | <sm:Authentication> |
218 | | <sm:Module xsi:type="sm:atc_transport_http.client"/> |
219 | | </sm:Authentication> |
220 | | <sm:Module xsi:type="sm:ntf_client"/> |
221 | | </sm:Module> |
222 | | </sm:Module> |
223 | | |
224 | | <sm:Module xsi:type="application_mapper"> |
225 | | <ApplicationMapFile>/etc/qcg-comp/application_mapfile</ApplicationMapFile> |
226 | | </sm:Module> |
227 | | |
228 | | <Database> |
229 | | <DSN>qcg-comp</DSN> |
230 | | <User>qcg-comp</User> |
231 | | <Password>qcg-comp</Password> |
232 | | </Database> |
233 | | |
234 | | <UnprivilegedUser>qcg-comp</UnprivilegedUser> |
235 | | |
236 | | <FactoryAttributes> |
237 | | <CommonName>IT cluster</CommonName> |
238 | | <LongDescription>IT department cluster for public use</LongDescription> |
239 | | </FactoryAttributes> |
240 | | </sm:Service> |
241 | | |
242 | | </Configuration> |
| 320 | xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" |
| 321 | xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" |
| 322 | xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> |
| 323 | |
| 324 | <Configuration> |
| 325 | <sm:ModuleManager> |
| 326 | <sm:Directory>/opt/plgrid/qcg/lib/qcg-core/modules/</sm:Directory> |
| 327 | <sm:Directory>/opt/plgrid/qcg/lib/qcg-comp/modules/</sm:Directory> |
| 328 | </sm:ModuleManager> |
| 329 | |
| 330 | <sm:Service xsi:type="qcg-compd" description="QCG-Computing"> |
| 331 | <sm:Logger> |
| 332 | <sm:Filename>/opt/plgrid/var/log/qcg-comp/qcg-compd.log</sm:Filename> |
| 333 | <sm:Level>INFO</sm:Level> |
| 334 | </sm:Logger> |
| 335 | |
| 336 | <sm:Transport> |
| 337 | <sm:Module xsi:type="sm:ecm_gsoap.service"> |
| 338 | <sm:Host>frontend.example.com</sm:Host> |
| 339 | <sm:Port>19000</sm:Port> |
| 340 | <sm:KeepAlive>false</sm:KeepAlive> |
| 341 | <sm:Authentication> |
| 342 | <sm:Module xsi:type="sm:atc_transport_gsi.service"> |
| 343 | <sm:X509CertFile>/opt/plgrid/qcg/etc/qcg-comp/certs/qcgcert.pem</sm:X509CertFile> |
| 344 | <sm:X509KeyFile>/opt/plgrid/qcg/etc/qcg-comp/certs/qcgkey.pem</sm:X509KeyFile> |
| 345 | </sm:Module> |
| 346 | </sm:Authentication> |
| 347 | <sm:Authorization> |
| 348 | <sm:Module xsi:type="sm:atz_mapfile"> |
| 349 | <sm:Mapfile>/etc/grid-security/grid-mapfile</sm:Mapfile> |
| 350 | </sm:Module> |
| 351 | </sm:Authorization> |
| 352 | </sm:Module> |
| 353 | <sm:Module xsi:type="smc:qcg-comp-service"/> |
| 354 | </sm:Transport> |
| 355 | |
| 356 | <sm:Module xsi:type="pbs_jsdl_filter"/> |
| 357 | <sm:Module xsi:type="atz_ardl_filter"/> |
| 358 | <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/monitoring.py"/> |
| 359 | <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/plgrid_info.py"/> |
| 360 | <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/modules_info.py"/> |
| 361 | |
| 362 | <sm:Module xsi:type="submission_drmaa" path="/opt/plgrid/qcg/lib/libdrmaa.so"/> |
| 363 | <sm:Module xsi:type="reservation_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/reservation_maui.py"/> |
| 364 | |
| 365 | <sm:Module xsi:type="notification_wsn"> |
| 366 | <PublishedBrokerURL>https://frontend.example.com:19011/</PublishedBrokerURL> |
| 367 | <sm:Module xsi:type="sm:ecm_gsoap.client"> |
| 368 | <sm:ServiceURL>http://localhost:19001/</sm:ServiceURL> |
| 369 | <sm:Authentication> |
| 370 | <sm:Module xsi:type="sm:atc_transport_http.client"/> |
| 371 | </sm:Authentication> |
| 372 | <sm:Module xsi:type="sm:ntf_client"/> |
| 373 | </sm:Module> |
| 374 | </sm:Module> |
| 375 | |
| 376 | <sm:Module xsi:type="application_mapper"> |
| 377 | <ApplicationMapFile>/opt/plgrid/qcg/etc/qcg-comp/application_mapfile</ApplicationMapFile> |
| 378 | </sm:Module> |
| 379 | |
| 380 | <Database> |
| 381 | <DSN>qcg-comp</DSN> |
| 382 | <User>qcg-comp</User> |
| 383 | <Password>qcg-comp</Password> |
| 384 | </Database> |
| 385 | |
| 386 | <UnprivilegedUser>qcg-comp</UnprivilegedUser> |
| 387 | <!--UseScratch>true</UseScratch> uncomment this if scratch is the only file system shared between the worker nodes and this machine --> |
| 388 | |
| 389 | <FactoryAttributes> |
| 390 | <CommonName>klaster.plgrid.pl</CommonName> |
| 391 | <LongDescription>PL Grid cluster</LongDescription> |
| 392 | </FactoryAttributes> |
| 393 | </sm:Service> |
| 394 | |
| 395 | </Configuration> |
264 | | |
265 | | Moreover remember to uncomment `jsdl_filter` and `reservation_python` modules (appropriate for your batch system). |
266 | | |
| 421 | == Torque == |
| 422 | `Module[type="reservation_python"]/@path` :: |
| 423 | path to the reservation module. Change this if you are using different scheduler than Maui (e.g. use `reservation_moab.py` for Moab) |
| 424 | == PBS Professional == |
| 425 | `Module[type="reservation_python"]/@path` :: |
| 426 | path to the reservation module. Change this to `reservation_pbs.py`. |
| 427 | == SLURM == |
| 428 | |
| 429 | `Module[type="reservation_python"]/@path` :: |
| 430 | path to the reservation module. Change this to `reservation_slurm.py`. |
| 431 | |
| 432 | and replace: |
| 433 | {{{ |
| 434 | <sm:Module xsi:type="pbs_jsdl_filter"/> |
| 435 | }}} |
| 436 | with: |
| 437 | {{{ |
| 438 | <sm:Module xsi:type="slurm_jsdl_filter"/> |
| 439 | }}} |
| 440 | = Restricting advance reservation = |
| 441 | |
| 442 | By default the QCG-Computing service can reserve any number of hosts. One can limit it by configuring the !Maui/Moab scheduler and the QCG-Computing service properly: |
| 443 | |
| 444 | * In !Maui/Moab mark some subset of nodes, using the partition mechanism, as reservable for QCG-Computing: |
| 445 | {{{ |
| 446 | #!div style="font-size: 90%" |
| 447 | {{{#!default |
| 448 | # all users can use both the DEFAULT and RENABLED partition |
| 449 | SYSCFG PLIST=DEFAULT,RENABLED |
| 450 | #in Moab you should use 0 instead DEFAULT |
| 451 | #SYSCFG PLIST=0,RENABLED |
| 452 | |
| 453 | # mark some set of the machines (e.g. 64 nodes) as reservable |
| 454 | NODECFG[node01] PARTITION=RENABLED |
| 455 | NODECFG[node02] PARTITION=RENABLED |
| 456 | NODECFG[node03] PARTITION=RENABLED |
| 457 | ... |
| 458 | NODECFG[node64] PARTITION=RENABLED |
| 459 | |
| 460 | }}} |
| 461 | }}} |
| 462 | |
| 463 | * Tell the QCG-Computing to limit reservation to the aforementioned partition by editing the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` configuration file: |
| 464 | |
| 465 | {{{ |
| 466 | #!div style="font-size: 90%" |
| 467 | {{{#!default |
| 468 | export QCG_AR_MAUI_PARTITION="RENABLED" |
| 469 | }}} |
| 470 | }}} |
| 471 | |
| 472 | * Moreover the QCG-Computing (since version 2.4) can enforce limits on maximal reservations duration length (default: one week) and size (measured in number of slots reserved): |
| 473 | {{{ |
| 474 | #!div style="font-size: 90%" |
| 475 | {{{#!default |
| 476 | ... |
| 477 | <ReservationsPolicy> |
| 478 | <MaxDuration>24</MaxDuration> <!-- 24 hours --> |
| 479 | <MaxSlots>100</MaxSlots> |
| 480 | </ReservationsPolicy> |
| 481 | ... |
| 482 | }}} |
| 483 | }}} |
| 484 | |
| 485 | = Restricted node access (Torque/PBS-Professional only) = |
| 486 | Read this section only if the system is configured in such way that not all nodes are accesible using any queue/user. In such case you should provide nodes filter expression in the sysconfig file (`/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd`). Examples: |
| 487 | * Provide information about nodes that was taged with `qcg` property |
| 488 | {{{ |
| 489 | export QCG_NODE_FILTER=properties:qcg |
| 490 | }}} |
| 491 | * Provide information about all nodes except those tagged as `gpgpu` |
| 492 | {{{ |
| 493 | export QCG_NODE_FILTER=properties:~gpgpu |
| 494 | }}} |
| 495 | * Provide information only about resources that have `hp` as the `epoch` value: |
| 496 | {{{ |
| 497 | export QCG_NODE_FILTER=resources_available.epoch:hp |
| 498 | }}} |
| 499 | In general the `QCG_NODE_FILTER` must adhere the following syntax: |
| 500 | {{{ |
| 501 | pbsnodes-attr:regular-expression |
| 502 | }}} |
| 503 | or if you want to reverse semantic (i.e. all nodes except those matching the expression) |
| 504 | {{{ |
| 505 | pbsnodes-attr:~regular-expression |
| 506 | }}} |
| 507 | = Configuring QCG-Accounting = |
| 508 | Please use [http://www.qoscosgrid.org/trac/qcg-computing/wiki/QCG-Accounting QCG-Accounting] agent. You must enable `bat` as one of the publisher plugins. |
307 | | {{{#!sh |
308 | | /var/log/qcg-comp/qcg-compd.log |
309 | | }}} |
310 | | |
311 | | |
312 | | |
| 552 | {{{ |
| 553 | #!div style="font-size: 90%" |
| 554 | {{{#!sh |
| 555 | /opt/plgrid/var/log/qcg-comp/qcg-comp.log |
| 556 | }}} |
| 557 | }}} |
| 558 | |
| 559 | The service assumes that the following commands are in the standard search path: |
| 560 | * `pbsnodes` |
| 561 | * `showres` |
| 562 | * `setres` |
| 563 | * `releaseres` |
| 564 | * `checknode` |
| 565 | If any of the above commands is not installed in a standard location (e.g. `/usr/bin`) you may need to edit the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` file and set the `PATH` variable appropriately, e.g.: |
| 566 | {{{ |
| 567 | #!div style="font-size: 90%" |
| 568 | {{{#!sh |
| 569 | # INIT_WAIT=5 |
| 570 | # |
| 571 | # DRM specific options |
| 572 | |
| 573 | export PATH=$PATH:/opt/maui/bin |
| 574 | }}} |
| 575 | }}} |
| 576 | |
| 577 | |
| 578 | If you compiled DRMAA with logging switched on you can set there also DRMAA logging level: |
| 579 | {{{ |
| 580 | #!div style="font-size: 90%" |
| 581 | {{{#!sh |
| 582 | # INIT_WAIT=5 |
| 583 | # |
| 584 | # DRM specific options |
| 585 | |
| 586 | export DRMAA_LOG_LEVEL=INFO |
| 587 | }}} |
| 588 | }}} |
| 589 | |
| 590 | Also provide the location of the root scratch directory if its accessible from the QCG-Computing machine: |
| 591 | {{{ |
| 592 | #!div style="font-size: 90%" |
| 593 | {{{#!sh |
| 594 | # INIT_WAIT=5 |
| 595 | # |
| 596 | |
| 597 | export QCG_SCRATCH_DIR_ROOT="/mnt/lustre/scratch/people/" |
| 598 | }}} |
| 599 | }}} |
| 600 | |
| 601 | |
| 602 | **Note:** In current version, whenever you restart the PosgreSQL server you need also restart the QCG-Computing and QCG-Notification service: |
| 603 | |
| 604 | {{{ |
| 605 | #!div style="font-size: 90%" |
| 606 | {{{#!sh |
| 607 | /etc/init.d/qcg-compd restart |
| 608 | /etc/init.d/qcg-ntfd restart |
| 609 | }}} |
| 610 | }}} |
| 723 | * Create an advance reservation: |
| 724 | * copy the provided sample reservation description file (expressed in ARDL - Advance Reservation Description Language) |
| 725 | {{{ |
| 726 | #!div style="font-size: 90%" |
| 727 | {{{#!sh |
| 728 | cp /opt/plgrid/qcg/share/qcg-comp/doc/examples/ardl/oneslot.xml oneslot.xml |
| 729 | }}} |
| 730 | }}} |
| 731 | * Edit the `oneslot.xml` and modify the `StartTime` and `EndTime` to dates that are in the near future, |
| 732 | * Create a new reservation: |
| 733 | {{{ |
| 734 | #!div style="font-size: 90%" |
| 735 | {{{#!sh |
| 736 | qcg-comp -c -D oneslot.xml |
| 737 | Reservation Id: aab6b04a-887b-4027-633f-412375559d7d |
| 738 | }}} |
| 739 | }}} |
| 740 | * List all reservations: |
| 741 | {{{ |
| 742 | #!div style="font-size: 90%" |
| 743 | {{{#!sh |
| 744 | qcg-comp -l |
| 745 | Reservation Id: aab6b04a-887b-4027-633f-412375559d7d |
| 746 | Total number of reservations: 1 |
| 747 | }}} |
| 748 | }}} |
| 749 | * Check which hosts where reserved: |
| 750 | {{{ |
| 751 | #!div style="font-size: 90%" |
| 752 | {{{#!sh |
| 753 | qcg-comp -s -r aab6b04a-887b-4027-633f-412375559d7d |
| 754 | Reserved hosts: |
| 755 | worker.example.com[used=0,reserved=1,total=4] |
| 756 | }}} |
| 757 | }}} |
| 758 | * Delete the reservation: |
| 759 | {{{ |
| 760 | #!div style="font-size: 90%" |
| 761 | {{{#!sh |
| 762 | qcg-comp -t -r aab6b04a-887b-4027-633f-412375559d7d |
| 763 | Reservation terminated. |
| 764 | }}} |
| 765 | }}} |
| 766 | * Check if the grid-ftp is working correctly: |
| 767 | {{{ |
| 768 | #!div style="font-size: 90%" |
| 769 | {{{#!sh |
| 770 | globus-url-copy gsiftp://your.local.host.name/etc/profile profile |
| 771 | diff /etc/profile profile |
| 772 | }}} |
| 773 | }}} |
| 774 | |
| 775 | = Maintenance = |
| 776 | The historic usage information is stored in two relations of the QCG-Computing database: `jobs_acc` and `reservations_acc`. You can always archive old usage data to a file and delete it from the database using the psql client: |
| 777 | {{{ |
| 778 | #!div style="font-size: 90%" |
| 779 | {{{#!sh |
| 780 | psql -h localhost qcg-comp qcg-comp |
| 781 | Password for user qcg-comp: |
| 782 | Welcome to psql 8.1.23, the PostgreSQL interactive terminal. |
| 783 | |
| 784 | Type: \copyright for distribution terms |
| 785 | \h for help with SQL commands |
| 786 | \? for help with psql commands |
| 787 | \g or terminate with semicolon to execute query |
| 788 | \q to quit |
| 789 | |
| 790 | qcg-comp=> \o jobs.acc |
| 791 | qcg-comp=> SELECT * FROM jobs_acc where end_time < date '2010-01-10'; |
| 792 | qcg-comp=> \o reservations.acc |
| 793 | qcg-comp=> SELECT * FROM reservations_acc where end_time < date '2010-01-10'; |
| 794 | qcg-comp=> \o |
| 795 | qcg-comp=> DELETE FROM jobs_acc where end_time < date '2010-01-10'; |
| 796 | qcg-comp=> DELETE FROM reservation_acc where end_time < date '2010-01-10'; |
| 797 | }}} |
| 798 | }}} |
| 799 | |
| 800 | you should also install logrotate configuration for QCG-Computing: |
| 801 | {{{ |
| 802 | #!div style="font-size: 90%" |
| 803 | {{{#!sh |
| 804 | yum install qcg-comp-logrotate |
| 805 | }}} |
| 806 | }}} |
| 807 | **Important**: On any update/restart of the PostgreSQL database you must restart also the qcg-compd and qcg-ntfd services. |
| 808 | {{{ |
| 809 | /etc/init.d/qcg-compd restart |
| 810 | /etc/init.d/qcg-ntfd restart |
| 811 | }}} |
| 812 | On scheduled downtimes we recommend to disable submission in the service configuration file: |
| 813 | {{{ |
| 814 | ... |
| 815 | <AcceptingNewActivities>false</AcceptingNewActivities> |
| 816 | <FactoryAttributes> |
| 817 | }}} |
| 818 | = PL-Grid Grants Support = |
| 819 | Since version 2.2.7 QCG-Computing is integrated with PL-Grid grants system. The integration with grant system has three main interaction points: |
| 820 | * QCG-Computing can accept jobs which has grant id set explicitly. One must use the `<jsdl:JobProject>` element, e.g.: |
| 821 | |
| 822 | {{{ |
| 823 | #!div style="font-size: 90%" |
| 824 | {{{#!sh |
| 825 | <?xml version="1.0" encoding="UTF-8"?> |
| 826 | |
| 827 | <jsdl:JobDefinition |
| 828 | xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl" |
| 829 | xmlns:jsdl-hpcpa="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa" |
| 830 | xmlns:jsdl-qcg-comp-factory="http://schemas.qoscosgrid.org/comp/2011/04/jsdl/factory"> |
| 831 | <jsdl:JobDescription> |
| 832 | <jsdl:JobIdentification> |
| 833 | <jsdl:JobProject>Manhattan</jsdl:JobProject> |
| 834 | </jsdl:JobIdentification> |
| 835 | <jsdl:Application> |
| 836 | <jsdl-hpcpa:HPCProfileApplication> |
| 837 | ... |
| 838 | }}} |
| 839 | }}} |
| 840 | |
| 841 | * QCG-Computing can provide information about the local grants to the upper layers (e.g. QCG-Broker), so they can use for scheduling purpose. One can enable it by adding the following line to the QCG-Computing configuration file (qcg-compd.xml): |
| 842 | {{{ |
| 843 | #!div style="font-size: 90%" |
| 844 | {{{#!sh |
| 845 | </sm:Transport> |
| 846 | ... |
| 847 | <sm:Module xsi:type="sm:general_python" path="/opt/plgrid/qcg/lib/qcg-comp/modules/python/plgrid_info.py"/> |
| 848 | }}} |
| 849 | }}} |
| 850 | Please note that this module requires the [#LDAPgeneratedgridmapfile qcg-gridmapfilegenerator] to be installed. |
| 851 | * The grant id is provided in resource usage record sent to the BAT accounting service |
| 852 | == Configuring PBS DRMA submit filter == |
| 853 | In order to enforce PL-Grid grant policy you must configure PBS DRMAA submit filter by editing the `/opt/plgrid/qcg/etc/qcg-comp/sysconfig/qcg-compd` and adding variable pointing to the DRMAA submit filter, e.g.: |
| 854 | {{{ |
| 855 | export PBSDRMAA_SUBMIT_FILTER="/software/grid/plgrid/qcg-app-scripts/app-scripts/tools/plgrid-grants/pbsdrmaa_submit_filter.py" |
| 856 | }}} |
| 857 | An example submit filter can be found in !QosCosGrid svn: |
| 858 | {{{ |
| 859 | svn co https://apps.man.poznan.pl/svn/qcg-computing/trunk/app-scripts/tools/plgrid-grants |
| 860 | }}} |
| 861 | More about PBS DRMAA submit filters can be found [[http://apps.man.poznan.pl/trac/pbs-drmaa/wiki/WikiStart#Submitfilter|here]]. |
| 862 | = GOCDB = |
| 863 | Please remember to register the QCG-Computing and QCG-Notification services in the GOCDB using the QCG.Computing and QCG.Notification services types respectively. |