Version 82 (modified by mmamonski, 11 years ago) (diff)

--

The QCG-Accounting Agent

Architecture

QCG-Accounting internal architecture

Installation

You can install the package using the QosCosGrid yum repository:

yum install qcg-accounting qcg-accounting-logrotate

Configuration

The whole configuration of QCG-Accounting agent is stored in single properties files (/etc/qcg/qcg-acc/config.properties). List of configuration properties:

Common

  • qcg.site.name - your GOCDB site name,
  • qcg.batch.server - hostname where the batch server is running,
  • qcg.parser.plugin - the name of the log parser plugin (e.g. pbs). Delete this property if the agent has no access to LRMS logs,
  • qcg.publishers.plugins - the coma separated list of publisher plugins (e.g. bat,apel)
  • qcg.debug - if set to true produce more verbose messages,
  • qcg.state.dir - local state directory (default: /var/run/qcg/qcg-acc/),
  • qcg.max.delay - maximum random delay before reporting starts, the random delay was introduced in order to avoid sending reports by all sites in the same time
  • qcg.default.vo - the default VO name sent in case no FQAN was available (default: "vo.plgrid.pl"), e.g. job was submitted using non-VOMS proxy
  • qcg.db.pass - password of the QCG-Computing database (see <Database> section of the qcg-compd.xml file),

If your database setup is not standard you may need to configure also the following properties:

  • qcg.db.host - QCG-Computing database host,
  • qcg.db.port - QCG-Computing database port,
  • qcg.db.name - QCG-Computing database name,
  • qcg.db.user - QCG-Computing database name.
  • qcg.db.max.days - Limit processed job records to those that has started N days ago (Default is to look 90 days back)

Also if you want to report job as different (by default the QCG-Accounting agent tries to guess local hostname automatically) submit host than you may want to set the following property:

  • qcg.submit.host=host.second.alias

Parser plugins

SLURM plugin - slurm

No configuration needed. The plugin assumes that the sacct command is usable on the qcg machine.

PBS Pro and Torque log parser - pbs

  • qcg.pbs.home - the root of the Torque/PBS Pro spool directory (e.g. /var/torque).
  • qcg.pbs.max.days - max number of days to look back into the past (default: 7 days).

Publishers plugins

BAT publisher (PL-Grid only) - bat

At first you must ask the BAT administrator to provide you all credentials (username/password and X.509 certificate) needed to connect to the BAT. Copy the received keystore into the file /etc/qcg/qcg-acc/truststore.ts (make sure that this file is only readable by root).

  • qcg.bat.user and qcg.bat.pass - put here values provided by the BAT administrator
  • qcg.bat.keystore.pass - keystore pass (provided with key by the BAT administrator)
  • qcg.bat.test - enables test mode (i.e. do not send records to BAT broker) - default: false.
  • qcg.bat.grid.only - set this to true if you do not want to report LRMS specific job information.

APEL SSM publisher - apel

  • Install APEL SSM2 from the UMD-3 repository:
      yum install apel-ssm
    
  • Make sure that /var/spool/apel/outgoing/ exists:
    mkdir /var/spool/apel/outgoing/
    chmod 0700 /var/spool/apel/outgoing/
    
  • In the GOCDB add new endpoint for QCG host of gLite-APEL service type

Then configure the plugin itself:

  • qcg.ssm.msg.dir - directory for outgoing usage record messages (default: /var/spool/apel/outgoing/),
  • qcg.ssm.benchmark.type - benchmark name: either Si2k or HEPSPEC,
  • qcg.ssm.benchmark.value - benchmark value (if cluster is composed of machines various types provide here weighted mean),
  • qcg.ssm.site.name - site name as reported to APEL (optional). Default: qcg.site.name

Known Issues - The QCG-Accounting must be installed on different machine than the glite-APEL and UNICORE SSM publishers otherwise reports may get overridden.

Grid-SAFE publisher - gridsafe

The Gird-SAFE publisher plugin was developed within the  MAPPER project to simplify gathering accounting data between from many infrastructures (EGI, PRACE and campus resources). Steps needed to configure the GRID-SAFE plugin:

  • you can use the host cert-key pair to authenticate in the Grid-SAFE RUPI service, but first you need to convert it into the PKCS12 format. You must report your host DN to the Grid-SAFE administrator
    openssl pkcs12 -export -descert -inkey /etc/grid-security/hostkey.pem -in /etc/grid-security/hostcert.pem -out /etc/qcg/qcg-acc/hostcred.p12 -name "HOST Certificate"
    chmod 0400 /etc/qcg/qcg-acc/hostcred.p12
    
  • you can use the example configuration:
    qcg.gridsafe.url=https://gridsafe-mapper.drg.lrz.de:8443/axis2/services/RUPIService
    qcg.gridsafe.keystore=/etc/qcg/qcg-acc/hostcred.p12
    qcg.gridsafe.keystore.pass=gridsafepass
    qcg.gridsafe.truststore=/etc/qcg/qcg-acc/gridsafe-truststore.jks
    qcg.gridsafe.truststore.pass=storepass
    qcg.gridsafe.truststore.type=jks
    #send usage report only about the following users
    qcg.gridsafe.filter.userdn.file=http://gridsafe-mapper.drg.lrz.de/gridsafe/mapper.users
    
  • or configure it manually:
    • qcg.gridsafe.url - URL of the Grid-SAFE RUPI WebService (e.g.  https://gridsafe-mapper.drg.lrz.de:8443/axis2/services/RUPIService),
    • qcg.gridsafe.keystore - path to the keystore file for the RUPI plugin,
    • qcg.gridsafe.keystore.pass - password to access the keystore,
    • qcg.gridsafe.keystore.type - type of the keystore: pkcs12 or jks (default is pkcs12),
    • qcg.gridsafe.truststore - path to the truststore file for the RUPI plugin,
    • qcg.gridsafe.truststore.pass - password to access the truststore,
    • qcg.gridsafe.truststore.type - type of the truststore: pkcs12 or jks (default is pkcs12).

Filters

  • qcg.PUBLISHER.filter.userdn - send usage records only for jobs with the given X.509 DN
  • qcg.PUBLISHER.filter.userdn.file - send usage record only for jobs with the X.509 DN's listed in the given file (the file location can be an URL stream, e.g.  http://gridsafe-mapper.drg.lrz.de/gridsafe/mapper.users)
  • qcg.PUBLISHER.filter.project - send usage record only for jobs with the given project Id (grant)
  • qcg.PUBLISHER.filter.project.file - send usage records only for jobs with project id (grant) listed in the given file

Troubleshooting

The QCG-Accounting Agent stores all diagnostic information in the following log file: /var/log/qcg/qcg-acc/qcg-accounting.log. You may also try to set the qcg.debug configuration property to true in order to get more verbosity of log messages.

Migration from version 2.X to 3.0

  • stop cron temporary and make sure that no QCG-Accounting process is running
    /etc/init.d/crond stop
    ps -AF | grep QCGAcc
    
  • backup /opt/plgrid/var/run/qcg-acc/:
    cp -r /opt/plgrid/var/run/qcg-acc/ /opt/plgrid/var/run/qcg-acc.copy
    
  • update qcg-accounting:
    yum update qcg-accounting qcg-accounting-logrotate
    
  • copy configuration files and keystores to the new conf dir:
    cp /opt/plgrid/qcg/etc/qcg-acc/config.properties.rpmsave /etc/qcg/qcg-acc/config.properties
    cp /opt/plgrid/qcg/etc/qcg-acc/keystore.ks.rpmsave /etc/qcg/qcg-acc/keystore.ks
    
  • update any paths in config.properties (i.e. /opt/plgrid/qcg/etc/qcg-acc/ to /etc/qcg/qcg-acc/)
  • IMPORTANT copy last reported job ids:
    cp /opt/plgrid/var/run/qcg-acc.copy/* /var/run/qcg/qcg-acc/
    
  • try to run once and check for any errors (you may want to set temporary qcg.max.delay to 0):
    /usr/bin/qcg-accounting
    ...
    [    INFO] - Tue May 21 00:37:29 CEST 2013: new lastReportedID: 26957. Processing took: 0 seconds.
    
  • start again cron
    /etc/init.d/crond start
    

License

QCG-Accounting is released under the GPL license. For QosCosGrid licensing details see:  QosCosGrid license

FAQ

  • Q: I want to republish records for all jobs that started N days ago. What should I do?
  • A: You can do this by deleting /var/run/qcg/qcg-acc/PUBLISHER-NAME.last.id and setting qcg.db.max.days to the number of days back that you want to publish records. Also remember to adjust the qcg.pbs.max.days so it is not smaller than qcg.db.max.days.

Release Notes

Attachments