| 1 | [[PageOutline]] |
| 2 | |
| 3 | = Introduction = |
| 4 | QCG-Computing service (the successor of the OpenDSP project) is an open source service acting as a computing provider exposing on demand access to computing resources and jobs over the HPC Basic Profile compliant Web Services interface. In addition the QCG-Computing offers remote interface for Advance Reservations management. |
| 5 | |
| 6 | This document describes installation of the QCG-Computing service on Debian machines using binary packages. The service should be deployed on the machine (or virtual machine) that: |
| 7 | * has at least 1GB of memory (recommended value: 2 GB) |
| 8 | * has 10 GB of free disk space (most of the space will be used by the log files) |
| 9 | * has any modern CPU (if you plan to use virtual machine you should dedicated to it one or two cores from the host machine) |
| 10 | * runs DEBIAN 6.X |
| 11 | = Prerequisites = |
| 12 | We assume that you have the local batch systems already installed. |
| 13 | |
| 14 | The !QosCosGrid services do not require from you to install any QCG component on the worker nodes, however application wrapper scripts need the following software to be available on worker nodes: |
| 15 | * bash, |
| 16 | * rsync, |
| 17 | * zip/unzip, |
| 18 | * dos2unix, |
| 19 | * nc, |
| 20 | * python. |
| 21 | Which are usually available out of the box on most of the HPC systems. |
| 22 | |
| 23 | == GridFTP server == |
| 24 | To be fully operable the !QosCosGrid stack requires the GridFTP server to be installed. This requirements is usually fulfilled by most PRACE sites. If not it can be easily installed by issuing the following commands: |
| 25 | {{{ |
| 26 | # apt-get install xinetd globus-gridftp-server-progs |
| 27 | # cat > /etc/xinetd.d/gsiftp << EOF |
| 28 | service gsiftp |
| 29 | { |
| 30 | instances = 100 |
| 31 | socket_type = stream |
| 32 | wait = no |
| 33 | user = root |
| 34 | env += GLOBUS_TCP_PORT_RANGE=20000,25000 |
| 35 | server = /usr/sbin/globus-gridftp-server |
| 36 | server_args = -i -aa -l /var/log/globus-gridftp.log |
| 37 | server_args += -d WARN |
| 38 | log_on_success += DURATION |
| 39 | nice = 10 |
| 40 | disable = no |
| 41 | } |
| 42 | EOF |
| 43 | # /etc/init.d/xinetd reload |
| 44 | Reloading internet superserver configuration: xinetd. |
| 45 | }}} |
| 46 | = Firewall configuration = |
| 47 | In order to expose the !QosCosGrid services externally you need to open the following incoming ports in the firewall: |
| 48 | * 19000 (TCP) - QCG-Computing |
| 49 | * 19001 (TCP) - QCG-Notification |
| 50 | * 2811 (TCP) - GridFTP server |
| 51 | * 20000-25000 (TCP) - GridFTP port-range |
| 52 | |
| 53 | The following outgoing trafic should be allowed in general: |
| 54 | * NTP, DNS, HTTP, HTTPS services |
| 55 | * gridftp (TCP ports: 2811 and port-ranges: 9000-9500, 20000-25000) |
| 56 | |
| 57 | = Related software = |
| 58 | * Install database backend (PostgresSQL) - optional, only if you want to host the QCG-Computing database on the same machine. |
| 59 | {{{#!sh |
| 60 | apt-get install postgresql |
| 61 | }}} |
| 62 | |
| 63 | * UnixODBC and the PostgresSQL ODBC driver: |
| 64 | {{{#!sh |
| 65 | apt-get install unixodbc odbc-postgresql |
| 66 | }}} |
| 67 | |
| 68 | Moreover we further assume that the X.509 host certificate and key are already installed in the following locations: |
| 69 | * `/etc/grid-security/hostcert.pem` |
| 70 | * `/etc/grid-security/hostkey.pem` |
| 71 | |
| 72 | Most of the grid services and security infrastructures are sensitive to time skews. Thus we recommended to install a Network Time Protocol daemon or use any other solution that provides accurate clock synchronization. |
| 73 | |
| 74 | = Installation = |
| 75 | The one who want to install QCG-Computing on Debian should follow these steps: |
| 76 | |
| 77 | * ensure that the qcg-comp user is present in a system, otherwise create it: |
| 78 | {{{#!sh |
| 79 | useradd -r -d /var/log/qcg-comp/ qcg-comp |
| 80 | }}} |
| 81 | |
| 82 | * ensure that the qcg-dev group is present in a system, otherwise create it: |
| 83 | {{{#!sh |
| 84 | groupadd -r qcg-dev |
| 85 | }}} |
| 86 | |
| 87 | * install the !QosCosGrid Debian repository: |
| 88 | {{{#!sh |
| 89 | cat > /etc/apt/sources.list.d/qcg.unstable.list << EOF |
| 90 | deb http://fury.man.poznan.pl/qcg-packages/debian/ unstable main |
| 91 | EOF |
| 92 | }}} |
| 93 | |
| 94 | * add the public key of the QCG repository to your trusted keys in the apt configuration: |
| 95 | {{{#!sh |
| 96 | wget https://apps.man.poznan.pl/trac/qcg-notification/raw-attachment/wiki/InstallingUsingDeb/qcg.pub |
| 97 | apt-key add qcg.pub |
| 98 | }}} |
| 99 | |
| 100 | * refresh the packages list: |
| 101 | {{{#!sh |
| 102 | apt-get update |
| 103 | }}} |
| 104 | |
| 105 | * install QCG-Computing: |
| 106 | {{{ |
| 107 | #!div style="font-size: 90%" |
| 108 | {{{#!sh |
| 109 | apt-get install qcg-comp qcg-comp-client qcg-comp-doc |
| 110 | }}} |
| 111 | }}} |
| 112 | |
| 113 | * setup the QCG-Computing database as described [http://apps.man.poznan.pl/trac/qcg-computing/wiki/InstallingFromSources#Databasesetup here]. |
| 114 | |
| 115 | |
| 116 | |
| 117 | |
| 118 | = Service certificates = |
| 119 | Copy the service certificate and key into the `/etc/qcg-comp/certs/`. Remember to set appropriate rights to the key file. |
| 120 | {{{ |
| 121 | #!div style="font-size: 90%" |
| 122 | {{{#!default |
| 123 | cp /etc/grid-security/hostcert.pem /etc/qcg-comp/certs/qcgcert.pem |
| 124 | cp /etc/grid-security/hostkey.pem /etc/qcg-comp/certs/qcgkey.pem |
| 125 | chown qcg-comp /etc/qcg-comp/certs/qcgcert.pem |
| 126 | chown qcg-comp /etc/qcg-comp/certs/qcgkey.pem |
| 127 | chmod 0600 /etc/qcg-comp/certs/qcgkey.pem |
| 128 | }}} |
| 129 | }}} |
| 130 | = DRMAA library = |
| 131 | == Torque/PBS Professional == |
| 132 | Install DRMAA for Torque/PBS Pro using source package available at [http://apps.man.poznan.pl/trac/pbs-drmaa PBS DRMAA home page] |
| 133 | |
| 134 | |
| 135 | == SLURM == |
| 136 | Install DRMAA for SLURM using source package available at [http://apps.man.poznan.pl/trac/slurm-drmaa SLURM DRMAA home page]. |
| 137 | {{{ |
| 138 | # install SLURM headers files |
| 139 | apt-get install libslurm21-dev |
| 140 | # get SLURM DRMAA |
| 141 | wget http://apps.man.poznan.pl/trac/slurm-drmaa/downloads/slurm-drmaa-1.0.5.tar.gz |
| 142 | tar -xzf slurm-drmaa-1.0.5.tar.gz |
| 143 | cd slurm-drmaa-1.0.5 |
| 144 | # configure, make and install (by default DRMAA should be installed into /usr/local/ |
| 145 | ./configure |
| 146 | make |
| 147 | make install |
| 148 | # test it! |
| 149 | /usr/local/bin/drmaa-run /bin/hostname |
| 150 | |
| 151 | }}} |
| 152 | = Service configuration = |
| 153 | Edit the preinstalled service configuration file (`/etc/qcg-comp/qcg-comp`**d**`.xml`): |
| 154 | {{{ |
| 155 | <?xml version="1.0" encoding="UTF-8"?> |
| 156 | <sm:QCGCore |
| 157 | xmlns:sm="http://schemas.qoscosgrid.org/core/2011/04/config" |
| 158 | xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" |
| 159 | xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" |
| 160 | xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> |
| 161 | |
| 162 | <Configuration> |
| 163 | <sm:ModuleManager> |
| 164 | <sm:Directory>/usr/lib/qcg-core/modules/</sm:Directory> |
| 165 | <sm:Directory>/usr/lib/qcg-comp/modules/</sm:Directory> |
| 166 | </sm:ModuleManager> |
| 167 | |
| 168 | <sm:Service xsi:type="qcg-compd" description="QCG Computing"> |
| 169 | <sm:Logger> |
| 170 | <sm:Filename>/var/log/qcg-comp/qcg-compd.log</sm:Filename> |
| 171 | <sm:Level>INFO</sm:Level> |
| 172 | </sm:Logger> |
| 173 | |
| 174 | <sm:Transport> |
| 175 | <sm:Module xsi:type="sm:ecm_gsoap.service"> |
| 176 | <sm:Host>localhost</sm:Host> |
| 177 | <sm:Port>19000</sm:Port> |
| 178 | </sm:Module> |
| 179 | <sm:Module xsi:type="smc:qcg-comp-service"/> |
| 180 | </sm:Transport> |
| 181 | |
| 182 | <sm:Authentication> |
| 183 | <sm:Module xsi:type="sm:atc_transport_gsi.service"> |
| 184 | <sm:X509CertFile>/etc/qcg-comp/certs/qcgcert.pem</sm:X509CertFile> |
| 185 | <sm:X509KeyFile>/etc/qcg-comp/certs/qcgkey.pem</sm:X509KeyFile> |
| 186 | </sm:Module> |
| 187 | </sm:Authentication> |
| 188 | |
| 189 | <sm:Authorization> |
| 190 | <sm:Module xsi:type="sm:atz_mapfile"> |
| 191 | <sm:Mapfile>/etc/grid-security/grid-mapfile</sm:Mapfile> |
| 192 | </sm:Module> |
| 193 | </sm:Authorization> |
| 194 | |
| 195 | |
| 196 | <sm:Module xsi:type="submission_drmaa" path="/usr/local/lib/libdrmaa.so"/> |
| 197 | |
| 198 | <!-- The jsdl filter module - uncomment module appropriate for your batch system --> |
| 199 | <!-- sm:Module xsi:type="pbs_jsdl_filter"/--> |
| 200 | <!-- sm:Module xsi:type="sge_jsdl_filter"/--> |
| 201 | <!-- sm:Module xsi:type="slurm_jsdl_filter"/--> |
| 202 | <!-- sm:Module xsi:type="lsf_jsdl_filter"/--> |
| 203 | |
| 204 | <!-- The reservation module - uncomment module appropriate for your batch/scheduler system --> |
| 205 | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_sge.py"/--> |
| 206 | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_maui.py"/--> |
| 207 | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_moab.py"/--> |
| 208 | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_pbs.py"/--> |
| 209 | <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_slurm.py"/--> |
| 210 | <sm:Module xsi:type="atz_ardl_filter"/> |
| 211 | |
| 212 | <sm:Module xsi:type="sm:general_python" path="/usr/lib/qcg-comp/modules/python/monitoring.py"/> |
| 213 | |
| 214 | <sm:Module xsi:type="notification_wsn"> |
| 215 | <sm:Module xsi:type="sm:ecm_gsoap.client" > |
| 216 | <sm:ServiceURL>http://localhost:19001/</sm:ServiceURL> |
| 217 | <sm:Authentication> |
| 218 | <sm:Module xsi:type="sm:atc_transport_http.client"/> |
| 219 | </sm:Authentication> |
| 220 | <sm:Module xsi:type="sm:ntf_client"/> |
| 221 | </sm:Module> |
| 222 | </sm:Module> |
| 223 | |
| 224 | <sm:Module xsi:type="application_mapper"> |
| 225 | <ApplicationMapFile>/etc/qcg-comp/application_mapfile</ApplicationMapFile> |
| 226 | </sm:Module> |
| 227 | |
| 228 | <Database> |
| 229 | <DSN>qcg-comp</DSN> |
| 230 | <User>qcg-comp</User> |
| 231 | <Password>qcg-comp</Password> |
| 232 | </Database> |
| 233 | |
| 234 | <UnprivilegedUser>qcg-comp</UnprivilegedUser> |
| 235 | |
| 236 | <FactoryAttributes> |
| 237 | <CommonName>IT cluster</CommonName> |
| 238 | <LongDescription>IT department cluster for public use</LongDescription> |
| 239 | </FactoryAttributes> |
| 240 | </sm:Service> |
| 241 | |
| 242 | </Configuration> |
| 243 | </sm:QCGCore> |
| 244 | }}} |
| 245 | In most cases it should be enough to change only following elements: |
| 246 | `Transport/Module/Host` :: |
| 247 | the hostname of the machine where the service is deployed |
| 248 | `Transport/Module/Authentication/Module/X509CertFile` and `Transport/Module/Authentication/Module/X509KeyFile` :: |
| 249 | the service private key and X.509 certificate. Make sure that the key and certificate is owned by the `qcg-comp` user. If you installed cert and key file in the recommended location you do not need to edit these fields. |
| 250 | `Module[type="smc:notification_wsn"]/PublishedBrokerURL` :: |
| 251 | the external URL of the QCG-Notification service (You can do it later, i.e. after [http://apps.man.poznan.pl/trac/qcg-notification/wiki/InstallingUsingDeb installing the QCG-Notification service]) |
| 252 | `Module[type="smc:notification_wsn"]/Module/ServiceURL` :: |
| 253 | the localhost URL of the QCG-Notification service (You can do it later, i.e. after [http://apps.man.poznan.pl/trac/qcg-notification/wiki/InstallingUsingDeb installing the QCG-Notification service]) |
| 254 | `Module[type="submission_drmaa"]/@path` :: |
| 255 | path to the DRMAA library (the `libdrmaa.so`). Also, if you installed the DRMAA library using provided SRC RPM you do not need to change this path. |
| 256 | `Database/Password` :: |
| 257 | the `qcg-comp` database password |
| 258 | `UseScratch` :: |
| 259 | set this to `true` if you set QCG_SCRATCH_DIR_ROOT in `sysconfig` so any job will be started from scratch directory (instead of the default home directory) |
| 260 | `FactoryAttributes/CommonName` :: |
| 261 | a common name of the cluster (e.g. reef.man.poznan.pl). You can use any name that is unique among all systems (e.g. cluster name + domain name of your institution) |
| 262 | `FactoryAttributes/LongDescription` :: |
| 263 | a human readable description of the cluster |
| 264 | |
| 265 | Moreover remember to uncomment `jsdl_filter` and `reservation_python` modules (appropriate for your batch system). |
| 266 | |
| 267 | |
| 268 | = Creating applications' script space = |
| 269 | A common case for the QCG-Computing service is that an application is accessed using abstract app name rather than specifying absolute executable path. The application name/version to executbale path mappings are stored in the file `/etc/qcg-comp/application_mapfile`: |
| 270 | |
| 271 | {{{#!default |
| 272 | cat /etc/qcg-comp/application_mapfile |
| 273 | # ApplicationName ApplicationVersion Executable |
| 274 | |
| 275 | date * /bin/date |
| 276 | LPSolve 5.5 /usr/local/bin/lp_solve |
| 277 | }}} |
| 278 | |
| 279 | |
| 280 | It is also common to provide here wrapper scripts rather than target executables. The wrapper script can handle such aspects of the application lifetime like: environment initialization, copying files from/to scratch storage and application monitoring. It is recommended to create separate directory for those wrapper scripts (e.g. the application partition) for an applications and add write permission to them to the QCG Developers group. This directory must be readable by all users and from every worker node (the application partition usually fullfils those requirements). |
| 281 | |
| 282 | {{{ |
| 283 | #!div style="font-size: 90%" |
| 284 | {{{#!default |
| 285 | mkdir /opt/exp_soft/qcg-app-scripts |
| 286 | chown :qcg-dev /opt/exp_soft/qcg-app-scripts |
| 287 | chmod g+rwx /opt/exp_soft/qcg-app-scripts |
| 288 | }}} |
| 289 | }}} |
| 290 | |
| 291 | More on [ApplicationScripts Application Scripts]. |
| 292 | = Note on the security model = |
| 293 | The QCG-Computing can be configured with various authentication and authorization modules. However in the typical deployment we assume that the QCG-Computing is configured as in the above example, i.e.: |
| 294 | * authentication is provided on basics of ''httpg'' protocol, |
| 295 | * authorization is based on the local `grid-mapfile` mapfile. |
| 296 | |
| 297 | = Starting the service = |
| 298 | As root type: |
| 299 | {{{ |
| 300 | #!div style="font-size: 90%" |
| 301 | {{{#!sh |
| 302 | /etc/init.d/qcg-comp start |
| 303 | }}} |
| 304 | }}} |
| 305 | |
| 306 | The service logs can be found in: |
| 307 | {{{#!sh |
| 308 | /var/log/qcg-comp/qcg-compd.log |
| 309 | }}} |
| 310 | |
| 311 | |
| 312 | |
| 313 | |
| 314 | = Stopping the service = |
| 315 | The service can be stopped using the following command: |
| 316 | {{{ |
| 317 | #!div style="font-size: 90%" |
| 318 | {{{#!sh |
| 319 | /etc/init.d/qcg-comp stop |
| 320 | }}} |
| 321 | }}} |
| 322 | |
| 323 | = Verifying the installation = |
| 324 | |
| 325 | * Edit the QCG-Computing client configuration file (`/etc/qcg-comp/qcg-comp.xml`): |
| 326 | * set the `Host` and `Port` to reflects the changes in the service configuration file (`qcg-compd.xml`). |
| 327 | {{{ |
| 328 | #!div style="font-size: 90%" |
| 329 | {{{#!sh |
| 330 | <?xml version="1.0" encoding="UTF-8"?> |
| 331 | <sm:QCGCore |
| 332 | xmlns:sm="http://schemas.qoscosgrid.org/core/2011/04/config" |
| 333 | xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" |
| 334 | xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" |
| 335 | xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> |
| 336 | |
| 337 | <Configuration> |
| 338 | <sm:ModuleManager> |
| 339 | <sm:Directory>/opt/qcg/lib/qcg-core/modules/</sm:Directory> |
| 340 | <sm:Directory>/opt/qcg/lib/qcg-comp/modules/</sm:Directory> |
| 341 | </sm:ModuleManager> |
| 342 | |
| 343 | <sm:Client xsi:type="qcg-comp" description="QCG-Computing client"> |
| 344 | <sm:Transport> |
| 345 | <sm:Module xsi:type="sm:ecm_gsoap.client"> |
| 346 | <sm:ServiceURL>httpg://frontend.example.com:19000/</sm:ServiceURL> |
| 347 | <sm:Authentication> |
| 348 | <sm:Module xsi:type="sm:atc_transport_gsi.client"/> |
| 349 | </sm:Authentication> |
| 350 | <sm:Module xsi:type="smc:qcg-comp-client"/> |
| 351 | </sm:Module> |
| 352 | </sm:Transport> |
| 353 | </sm:Client> |
| 354 | </Configuration> |
| 355 | </sm:qcgCore> |
| 356 | }}} |
| 357 | }}} |
| 358 | * Initialize your credentials: |
| 359 | {{{ |
| 360 | #!div style="font-size: 90%" |
| 361 | {{{#!sh |
| 362 | grid-proxy-init -rfc |
| 363 | Your identity: /O=Grid/OU=QosCosGrid/OU=PSNC/CN=Mariusz Mamonski |
| 364 | Enter GRID pass phrase for this identity: |
| 365 | Creating proxy .................................................................. Done |
| 366 | Your proxy is valid until: Wed Apr 6 05:01:02 2012 |
| 367 | }}} |
| 368 | }}} |
| 369 | * Query the QCG-Computing service: |
| 370 | {{{ |
| 371 | #!div style="font-size: 90%" |
| 372 | {{{#!sh |
| 373 | qcg-comp -G | xmllint --format - # the xmllint is used only to present the result in more pleasant way |
| 374 | |
| 375 | <bes-factory:FactoryResourceAttributesDocument xmlns:bes-factory="http://schemas.ggf.org/bes/2006/08/bes-factory"> |
| 376 | <bes-factory:IsAcceptingNewActivities>true</bes-factory:IsAcceptingNewActivities> |
| 377 | <bes-factory:CommonName>IT cluster</bes-factory:CommonName> |
| 378 | <bes-factory:LongDescription>IT department cluster for public use</bes-factory:LongDescription> |
| 379 | <bes-factory:TotalNumberOfActivities>0</bes-factory:TotalNumberOfActivities> |
| 380 | <bes-factory:TotalNumberOfContainedResources>1</bes-factory:TotalNumberOfContainedResources> |
| 381 | <bes-factory:ContainedResource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="bes-factory:BasicResourceAttributesDocumentType"> |
| 382 | <bes-factory:ResourceName>worker.example.com</bes-factory:ResourceName> |
| 383 | <bes-factory:CPUArchitecture> |
| 384 | <jsdl:CPUArchitectureName xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl">x86_32</jsdl:CPUArchitectureName> |
| 385 | </bes-factory:CPUArchitecture> |
| 386 | <bes-factory:CPUCount>4</bes-factory:CPUCount><bes-factory:PhysicalMemory>1073741824</bes-factory:PhysicalMemory> |
| 387 | </bes-factory:ContainedResource> |
| 388 | <bes-factory:NamingProfile>http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing</bes-factory:NamingProfile> |
| 389 | <bes-factory:BESExtension>http://schemas.ogf.org/hpcp/2007/01/bp/BasicFilter</bes- factory:BESExtension> |
| 390 | <bes-factory:BESExtension>http://schemas.qoscosgrid.org/comp/2011/04</bes-factory:BESExtension> |
| 391 | <bes-factory:LocalResourceManagerType>http://example.com/SunGridEngine</bes-factory:LocalResourceManagerType> |
| 392 | <smcf:NotificationProviderURL xmlns:smcf="http://schemas.qoscosgrid.org/comp/2011/04/factory">http://localhost:2211/</smcf:NotificationProviderURL> |
| 393 | </bes-factory:FactoryResourceAttributesDocument> |
| 394 | }}} |
| 395 | }}} |
| 396 | * Submit a sample job: |
| 397 | {{{ |
| 398 | #!div style="font-size: 90%" |
| 399 | {{{#!sh |
| 400 | qcg-comp -c -J /usr/share/doc/qcg-comp-doc/examples/date.xml |
| 401 | Activity Id: ccb6b04a-887b-4027-633f-412375559d73 |
| 402 | }}} |
| 403 | }}} |
| 404 | * Query it status: |
| 405 | {{{ |
| 406 | #!div style="font-size: 90%" |
| 407 | {{{#!sh |
| 408 | qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 |
| 409 | status = Executing |
| 410 | qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 |
| 411 | status = Executing |
| 412 | qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 |
| 413 | status = Finished |
| 414 | exit status = 0 |
| 415 | }}} |
| 416 | }}} |