Changes between Initial Version and Version 1 of InstallationGuide

Show
Ignore:
Timestamp:
05/27/13 23:14:50 (11 years ago)
Author:
mmamonski
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • InstallationGuide

    v1 v1  
     1[[PageOutline]]  
     2 
     3= Introduction = 
     4QCG-Computing service (the successor of the OpenDSP project) is an open source service acting as a computing provider exposing on demand access to computing resources and jobs over the HPC Basic Profile compliant Web Services interface. In addition the QCG-Computing offers remote interface for Advance Reservations management.  
     5 
     6This document describes installation of the QCG-Computing service on Debian machines using binary packages. The service should be deployed on the machine (or virtual machine) that: 
     7* has at least 1GB of memory (recommended value: 2 GB) 
     8* has 10 GB of free disk space (most of the space will be used by the log files) 
     9* has any modern CPU (if you plan to use virtual machine you should dedicated to it one or two cores from the host machine) 
     10* runs DEBIAN 6.X 
     11= Prerequisites = 
     12We assume that you have the local batch systems  already installed. 
     13 
     14The !QosCosGrid services do not require from you to install any QCG component on the worker nodes, however application wrapper scripts need the following software to be available on worker nodes: 
     15 * bash,  
     16 * rsync, 
     17 * zip/unzip, 
     18 * dos2unix, 
     19 * nc, 
     20 * python. 
     21Which are usually available out of the box on most of the HPC systems. 
     22 
     23== GridFTP server == 
     24To be fully operable the !QosCosGrid stack requires the GridFTP server to be installed. This requirements is usually fulfilled by most PRACE sites. If not it can be easily installed by issuing the following commands: 
     25{{{ 
     26# apt-get install xinetd globus-gridftp-server-progs  
     27# cat > /etc/xinetd.d/gsiftp << EOF 
     28service gsiftp 
     29{ 
     30 instances               = 100 
     31 socket_type             = stream 
     32 wait                    = no 
     33 user                    = root 
     34 env                     += GLOBUS_TCP_PORT_RANGE=20000,25000 
     35 server = /usr/sbin/globus-gridftp-server 
     36 server_args = -i -aa -l /var/log/globus-gridftp.log 
     37 server_args += -d WARN 
     38 log_on_success          += DURATION 
     39 nice                    = 10 
     40 disable                 = no 
     41} 
     42EOF 
     43# /etc/init.d/xinetd reload 
     44Reloading internet superserver configuration: xinetd. 
     45}}} 
     46= Firewall configuration = 
     47In order to expose the !QosCosGrid services externally you need to open the following incoming ports in the firewall: 
     48* 19000 (TCP) - QCG-Computing 
     49* 19001 (TCP) - QCG-Notification 
     50* 2811 (TCP) - GridFTP server 
     51* 20000-25000 (TCP) - GridFTP  port-range  
     52 
     53The following outgoing trafic should be allowed in general: 
     54* NTP, DNS, HTTP, HTTPS services 
     55* gridftp (TCP ports: 2811 and port-ranges: 9000-9500, 20000-25000) 
     56 
     57= Related software = 
     58* Install database backend (PostgresSQL) - optional, only if you want to host the QCG-Computing database on the same machine.  
     59{{{#!sh 
     60apt-get install postgresql 
     61}}} 
     62 
     63* UnixODBC and the PostgresSQL ODBC driver: 
     64{{{#!sh 
     65apt-get install unixodbc odbc-postgresql 
     66}}} 
     67 
     68Moreover we further assume that the X.509 host certificate and key are already installed in the following locations: 
     69* `/etc/grid-security/hostcert.pem` 
     70* `/etc/grid-security/hostkey.pem` 
     71 
     72Most of the grid services and security infrastructures are sensitive to time skews. Thus we recommended to install a Network Time Protocol daemon or use any other solution that provides accurate clock synchronization. 
     73 
     74= Installation  = 
     75The one who want to install QCG-Computing on Debian should follow these steps: 
     76 
     77* ensure that the qcg-comp user is present in a system, otherwise create it: 
     78{{{#!sh 
     79useradd -r -d /var/log/qcg-comp/ qcg-comp 
     80}}} 
     81 
     82* ensure that the qcg-dev group is present in a system, otherwise create it: 
     83{{{#!sh 
     84groupadd -r qcg-dev 
     85}}} 
     86 
     87* install the !QosCosGrid Debian repository: 
     88{{{#!sh 
     89cat > /etc/apt/sources.list.d/qcg.unstable.list << EOF 
     90deb http://fury.man.poznan.pl/qcg-packages/debian/ unstable main 
     91EOF 
     92}}} 
     93 
     94* add the public key of the QCG repository to your trusted keys in the apt configuration: 
     95{{{#!sh 
     96wget https://apps.man.poznan.pl/trac/qcg-notification/raw-attachment/wiki/InstallingUsingDeb/qcg.pub 
     97apt-key add qcg.pub 
     98}}} 
     99 
     100* refresh the packages list: 
     101{{{#!sh 
     102apt-get update 
     103}}} 
     104 
     105* install QCG-Computing: 
     106{{{ 
     107#!div style="font-size: 90%" 
     108{{{#!sh 
     109apt-get install qcg-comp qcg-comp-client qcg-comp-doc 
     110}}} 
     111}}} 
     112 
     113* setup the QCG-Computing database as described [http://apps.man.poznan.pl/trac/qcg-computing/wiki/InstallingFromSources#Databasesetup here]. 
     114 
     115 
     116 
     117 
     118= Service certificates = 
     119Copy the service certificate and key into the `/etc/qcg-comp/certs/`. Remember to set appropriate rights to the key file. 
     120{{{ 
     121#!div style="font-size: 90%" 
     122{{{#!default 
     123cp /etc/grid-security/hostcert.pem /etc/qcg-comp/certs/qcgcert.pem 
     124cp /etc/grid-security/hostkey.pem /etc/qcg-comp/certs/qcgkey.pem 
     125chown qcg-comp /etc/qcg-comp/certs/qcgcert.pem 
     126chown qcg-comp /etc/qcg-comp/certs/qcgkey.pem  
     127chmod 0600 /etc/qcg-comp/certs/qcgkey.pem 
     128}}} 
     129}}} 
     130=  DRMAA library = 
     131== Torque/PBS Professional == 
     132Install DRMAA for Torque/PBS Pro using source package available at [http://apps.man.poznan.pl/trac/pbs-drmaa PBS DRMAA home page] 
     133 
     134 
     135== SLURM == 
     136Install DRMAA for SLURM using source package available at [http://apps.man.poznan.pl/trac/slurm-drmaa SLURM DRMAA home page]. 
     137{{{ 
     138# install SLURM headers files 
     139apt-get install libslurm21-dev 
     140# get SLURM DRMAA 
     141wget http://apps.man.poznan.pl/trac/slurm-drmaa/downloads/slurm-drmaa-1.0.5.tar.gz 
     142tar -xzf slurm-drmaa-1.0.5.tar.gz 
     143cd slurm-drmaa-1.0.5 
     144# configure, make and install (by default DRMAA should be installed into /usr/local/ 
     145./configure 
     146make  
     147make install 
     148# test it! 
     149 /usr/local/bin/drmaa-run /bin/hostname 
     150 
     151}}} 
     152= Service configuration  = 
     153Edit the preinstalled service configuration file (`/etc/qcg-comp/qcg-comp`**d**`.xml`): 
     154{{{ 
     155<?xml version="1.0" encoding="UTF-8"?> 
     156<sm:QCGCore 
     157        xmlns:sm="http://schemas.qoscosgrid.org/core/2011/04/config" 
     158        xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" 
     159        xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" 
     160        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
     161         
     162        <Configuration> 
     163                <sm:ModuleManager> 
     164                        <sm:Directory>/usr/lib/qcg-core/modules/</sm:Directory> 
     165                        <sm:Directory>/usr/lib/qcg-comp/modules/</sm:Directory> 
     166                </sm:ModuleManager> 
     167 
     168                <sm:Service xsi:type="qcg-compd" description="QCG Computing"> 
     169                        <sm:Logger> 
     170                                <sm:Filename>/var/log/qcg-comp/qcg-compd.log</sm:Filename> 
     171                                <sm:Level>INFO</sm:Level> 
     172                        </sm:Logger> 
     173 
     174                        <sm:Transport> 
     175                                <sm:Module xsi:type="sm:ecm_gsoap.service"> 
     176                                        <sm:Host>localhost</sm:Host> 
     177                                        <sm:Port>19000</sm:Port> 
     178                                </sm:Module> 
     179                                <sm:Module xsi:type="smc:qcg-comp-service"/> 
     180                        </sm:Transport> 
     181                         
     182                        <sm:Authentication> 
     183        <sm:Module xsi:type="sm:atc_transport_gsi.service"> 
     184                         <sm:X509CertFile>/etc/qcg-comp/certs/qcgcert.pem</sm:X509CertFile> 
     185                         <sm:X509KeyFile>/etc/qcg-comp/certs/qcgkey.pem</sm:X509KeyFile> 
     186        </sm:Module> 
     187                        </sm:Authentication> 
     188 
     189      <sm:Authorization> 
     190        <sm:Module xsi:type="sm:atz_mapfile"> 
     191          <sm:Mapfile>/etc/grid-security/grid-mapfile</sm:Mapfile> 
     192        </sm:Module> 
     193      </sm:Authorization> 
     194 
     195 
     196                        <sm:Module xsi:type="submission_drmaa" path="/usr/local/lib/libdrmaa.so"/> 
     197 
     198      <!-- The jsdl filter module - uncomment module appropriate for your batch system --> 
     199      <!-- sm:Module xsi:type="pbs_jsdl_filter"/--> 
     200      <!-- sm:Module xsi:type="sge_jsdl_filter"/--> 
     201      <!-- sm:Module xsi:type="slurm_jsdl_filter"/--> 
     202      <!-- sm:Module xsi:type="lsf_jsdl_filter"/--> 
     203 
     204      <!-- The reservation module - uncomment module appropriate for your batch/scheduler system --> 
     205                        <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_sge.py"/--> 
     206                        <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_maui.py"/--> 
     207                        <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_moab.py"/--> 
     208                        <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_pbs.py"/--> 
     209                        <!--sm:Module xsi:type="reservation_python" path="/usr/lib/qcg-comp/modules/python/reservation_slurm.py"/--> 
     210      <sm:Module xsi:type="atz_ardl_filter"/> 
     211 
     212      <sm:Module xsi:type="sm:general_python" path="/usr/lib/qcg-comp/modules/python/monitoring.py"/> 
     213                         
     214                        <sm:Module xsi:type="notification_wsn"> 
     215                                <sm:Module xsi:type="sm:ecm_gsoap.client" > 
     216                                                <sm:ServiceURL>http://localhost:19001/</sm:ServiceURL> 
     217                                                        <sm:Authentication> 
     218                                                                <sm:Module xsi:type="sm:atc_transport_http.client"/> 
     219                                                        </sm:Authentication> 
     220                                                <sm:Module xsi:type="sm:ntf_client"/> 
     221                                </sm:Module> 
     222                        </sm:Module> 
     223                                 
     224                        <sm:Module xsi:type="application_mapper"> 
     225                                <ApplicationMapFile>/etc/qcg-comp/application_mapfile</ApplicationMapFile> 
     226                        </sm:Module> 
     227 
     228                        <Database> 
     229                                <DSN>qcg-comp</DSN> 
     230                                <User>qcg-comp</User> 
     231                                <Password>qcg-comp</Password> 
     232                        </Database> 
     233 
     234                        <UnprivilegedUser>qcg-comp</UnprivilegedUser> 
     235 
     236                        <FactoryAttributes> 
     237                                <CommonName>IT cluster</CommonName> 
     238                                <LongDescription>IT department cluster for public use</LongDescription> 
     239                        </FactoryAttributes> 
     240                </sm:Service> 
     241 
     242        </Configuration> 
     243</sm:QCGCore> 
     244}}} 
     245In most cases it should be enough to change only following elements: 
     246 `Transport/Module/Host` :: 
     247   the hostname of the machine where the service is deployed  
     248 `Transport/Module/Authentication/Module/X509CertFile`  and  `Transport/Module/Authentication/Module/X509KeyFile` ::  
     249  the service private key and X.509 certificate. Make sure that the key and certificate is owned by the `qcg-comp` user.  If you installed cert and key file in the recommended location you do not need to edit these fields. 
     250 `Module[type="smc:notification_wsn"]/PublishedBrokerURL` ::  
     251  the external URL of the QCG-Notification service (You can do it later, i.e. after [http://apps.man.poznan.pl/trac/qcg-notification/wiki/InstallingUsingDeb installing the QCG-Notification service]) 
     252 `Module[type="smc:notification_wsn"]/Module/ServiceURL` ::  
     253  the localhost URL of the QCG-Notification service (You can do it later, i.e. after [http://apps.man.poznan.pl/trac/qcg-notification/wiki/InstallingUsingDeb installing the QCG-Notification service]) 
     254 `Module[type="submission_drmaa"]/@path` :: 
     255  path to the DRMAA library (the `libdrmaa.so`). Also, if you installed the DRMAA library using provided SRC RPM you do not need to change this path. 
     256 `Database/Password` ::  
     257  the `qcg-comp` database password   
     258  `UseScratch` :: 
     259  set this to `true` if you set QCG_SCRATCH_DIR_ROOT in `sysconfig` so any job will be started from scratch directory (instead of the default home directory) 
     260 `FactoryAttributes/CommonName` ::  
     261  a common name of the cluster (e.g. reef.man.poznan.pl). You can use any name that is unique among all systems (e.g. cluster name + domain name of your institution) 
     262 `FactoryAttributes/LongDescription` ::  
     263  a human readable description of the cluster 
     264 
     265Moreover remember to uncomment  `jsdl_filter` and `reservation_python` modules (appropriate for your batch system). 
     266 
     267 
     268= Creating applications' script space = 
     269A common case for the QCG-Computing service is that an application is accessed using abstract app name rather than specifying absolute executable path. The application name/version to executbale path mappings are stored in the file `/etc/qcg-comp/application_mapfile`: 
     270 
     271{{{#!default 
     272cat /etc/qcg-comp/application_mapfile 
     273# ApplicationName ApplicationVersion Executable 
     274 
     275date * /bin/date 
     276LPSolve 5.5 /usr/local/bin/lp_solve 
     277}}} 
     278 
     279 
     280It is also common to provide here  wrapper scripts rather than target executables. The wrapper script can handle such aspects of the application lifetime like: environment initialization,  copying files from/to scratch storage and application monitoring. It is recommended to create separate directory for those wrapper scripts (e.g. the application partition) for an applications and add write permission to them to the QCG Developers group. This directory must be readable by all users and from every worker node (the application partition usually fullfils those requirements). 
     281 
     282{{{ 
     283#!div style="font-size: 90%" 
     284{{{#!default 
     285mkdir /opt/exp_soft/qcg-app-scripts 
     286chown :qcg-dev /opt/exp_soft/qcg-app-scripts 
     287chmod g+rwx /opt/exp_soft/qcg-app-scripts 
     288}}} 
     289}}} 
     290 
     291More on [ApplicationScripts Application Scripts]. 
     292= Note on the security model = 
     293The QCG-Computing can be configured with various authentication and authorization modules. However in the typical deployment we assume that the QCG-Computing is configured as in the above example, i.e.: 
     294* authentication is provided on basics of ''httpg'' protocol, 
     295* authorization is based on the local `grid-mapfile` mapfile. 
     296 
     297= Starting the service = 
     298As root type: 
     299{{{ 
     300#!div style="font-size: 90%" 
     301{{{#!sh 
     302/etc/init.d/qcg-comp start 
     303}}} 
     304}}} 
     305 
     306The service logs can be found in: 
     307{{{#!sh 
     308/var/log/qcg-comp/qcg-compd.log 
     309}}} 
     310 
     311 
     312 
     313 
     314= Stopping the service = 
     315The service can be stopped using the following command: 
     316{{{ 
     317#!div style="font-size: 90%" 
     318{{{#!sh 
     319/etc/init.d/qcg-comp stop 
     320}}} 
     321}}} 
     322 
     323= Verifying the installation = 
     324 
     325*  Edit the QCG-Computing client configuration file (`/etc/qcg-comp/qcg-comp.xml`): 
     326 *  set the `Host` and `Port` to reflects the changes in the service configuration file (`qcg-compd.xml`). 
     327{{{ 
     328#!div style="font-size: 90%" 
     329{{{#!sh 
     330<?xml version="1.0" encoding="UTF-8"?> 
     331<sm:QCGCore 
     332       xmlns:sm="http://schemas.qoscosgrid.org/core/2011/04/config" 
     333       xmlns="http://schemas.qoscosgrid.org/comp/2011/04/config" 
     334       xmlns:smc="http://schemas.qoscosgrid.org/comp/2011/04/config" 
     335       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
     336   
     337       <Configuration> 
     338               <sm:ModuleManager> 
     339                       <sm:Directory>/opt/qcg/lib/qcg-core/modules/</sm:Directory> 
     340                       <sm:Directory>/opt/qcg/lib/qcg-comp/modules/</sm:Directory> 
     341               </sm:ModuleManager> 
     342  
     343               <sm:Client xsi:type="qcg-comp" description="QCG-Computing client"> 
     344                       <sm:Transport> 
     345                               <sm:Module xsi:type="sm:ecm_gsoap.client"> 
     346                                       <sm:ServiceURL>httpg://frontend.example.com:19000/</sm:ServiceURL> 
     347                                       <sm:Authentication> 
     348                                               <sm:Module xsi:type="sm:atc_transport_gsi.client"/> 
     349                                       </sm:Authentication> 
     350                                       <sm:Module xsi:type="smc:qcg-comp-client"/> 
     351                               </sm:Module> 
     352                       </sm:Transport> 
     353               </sm:Client> 
     354       </Configuration> 
     355</sm:qcgCore> 
     356}}} 
     357}}} 
     358* Initialize your credentials: 
     359{{{ 
     360#!div style="font-size: 90%" 
     361{{{#!sh 
     362grid-proxy-init -rfc 
     363Your identity: /O=Grid/OU=QosCosGrid/OU=PSNC/CN=Mariusz Mamonski 
     364Enter GRID pass phrase for this identity: 
     365Creating proxy .................................................................. Done 
     366Your proxy is valid until: Wed Apr  6 05:01:02 2012 
     367}}} 
     368}}} 
     369* Query the QCG-Computing service: 
     370{{{ 
     371#!div style="font-size: 90%" 
     372{{{#!sh 
     373qcg-comp -G | xmllint --format - # the xmllint is used only to present the result in more pleasant way 
     374   
     375<bes-factory:FactoryResourceAttributesDocument xmlns:bes-factory="http://schemas.ggf.org/bes/2006/08/bes-factory"> 
     376    <bes-factory:IsAcceptingNewActivities>true</bes-factory:IsAcceptingNewActivities> 
     377    <bes-factory:CommonName>IT cluster</bes-factory:CommonName> 
     378    <bes-factory:LongDescription>IT department cluster for public   use</bes-factory:LongDescription> 
     379    <bes-factory:TotalNumberOfActivities>0</bes-factory:TotalNumberOfActivities> 
     380    <bes-factory:TotalNumberOfContainedResources>1</bes-factory:TotalNumberOfContainedResources> 
     381    <bes-factory:ContainedResource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="bes-factory:BasicResourceAttributesDocumentType"> 
     382        <bes-factory:ResourceName>worker.example.com</bes-factory:ResourceName> 
     383        <bes-factory:CPUArchitecture> 
     384            <jsdl:CPUArchitectureName xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl">x86_32</jsdl:CPUArchitectureName> 
     385        </bes-factory:CPUArchitecture> 
     386        <bes-factory:CPUCount>4</bes-factory:CPUCount><bes-factory:PhysicalMemory>1073741824</bes-factory:PhysicalMemory> 
     387    </bes-factory:ContainedResource> 
     388    <bes-factory:NamingProfile>http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing</bes-factory:NamingProfile>  
     389    <bes-factory:BESExtension>http://schemas.ogf.org/hpcp/2007/01/bp/BasicFilter</bes-  factory:BESExtension> 
     390    <bes-factory:BESExtension>http://schemas.qoscosgrid.org/comp/2011/04</bes-factory:BESExtension> 
     391    <bes-factory:LocalResourceManagerType>http://example.com/SunGridEngine</bes-factory:LocalResourceManagerType> 
     392    <smcf:NotificationProviderURL xmlns:smcf="http://schemas.qoscosgrid.org/comp/2011/04/factory">http://localhost:2211/</smcf:NotificationProviderURL> 
     393</bes-factory:FactoryResourceAttributesDocument> 
     394}}} 
     395}}} 
     396* Submit a sample job: 
     397{{{ 
     398#!div style="font-size: 90%" 
     399{{{#!sh 
     400qcg-comp -c -J /usr/share/doc/qcg-comp-doc/examples/date.xml 
     401Activity Id: ccb6b04a-887b-4027-633f-412375559d73 
     402}}} 
     403}}} 
     404* Query it status: 
     405{{{ 
     406#!div style="font-size: 90%" 
     407{{{#!sh 
     408qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 
     409status = Executing 
     410qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 
     411status = Executing 
     412qcg-comp -s -a ccb6b04a-887b-4027-633f-412375559d73 
     413status = Finished 
     414exit status = 0 
     415}}} 
     416}}}