Version 13 (modified by bartek, 12 years ago) (diff)

--

Benchmarks of QosCosGrid

QCG-Computing

The QCG-Computing service tests concerned the job submission and job management tasks, which are typical for this kind of a service. The proposed two types of the tests used the following metrics:

  • response time,
  • throughput

All the tests were performed using specially written program on the basis of the SAGA C++ library. There were utilized two adaptors offered by SAGA C++, namely:

  • gLite CREAM (based on glite-ce-cream-client-api-c) - gLite (CREAM-CE service),
  • OGSA BES (based on gSOAP) - UNICORE and QosCosGrid (QCG-Computing service).

The use of the common access layer allowed to minimize the risk of obtaining incorrect results. In the same purpose, the jobs were submitted to the same resource and didn't require any data transfer.

Testbed

  • Client machine:
    • 8 cores (Intel(R) Xeon(R) CPU E5345),
    • 11 GB RAM,
    • Scientific Linux 5.3,
    • RTT from the client's machine to the cluster's frontend: about 12 ms.
  • Cluster Zeus (84. place on TOP500):
    • queueing system: Torque 2.4.12 + Maui 3.3,
    • about 800 nodes,
    • about 3-4k tasks present in the system,
    • Maui „RMPOLLINTERVAL”: 3,5 minutes,
    • for the puropose of the tests, a special partition (WP4) was set aside: 64 cores / 8 nodes - 64 slots,
    • test users (plgtestm01-10 and 10 users from the plgridXXX pool) were assigned on an exclusive basis to the WP4 partition.
  • Service nodes (qcg.grid.cyf-kr.edu.pl, cream.grid.cyf-kr.edu.pl, uni-ce.grid.cyf-kr.edu.pl):
    • virtual machines (Scientific Linux 5.5),
    • QCG and UNICORE: 1 virtual core, 2GB RAM,
    • gLite CREAM: 3 virtual cores, 8 GB RAM.

Test 1 - Response Time

The main program creates N processes (each process can use a different certificate) that invoke the function 'sustain_thread'. Next it wait for the end of all running processes.

In general, the idea of the program is to keep in a system jobs_per_thread jobs for test_duration seconds, inquering all the time (the delays between calls drawn from a defined interval) about all currently running or queued jobs.

Test 2 - Throughput