Introduction

The Message Passing Interface (MPI) is de facto a standard in the domain of parallel applications demanding computational resources that are beyond what single machine can provide. It delivers end-users both the programming interface consisting of simple communication primitives and the environment for spawning and monitoring MPI processes. A variety of implementations of the MPI standard is available (both as commercial and open source). In QosCosGrid, it was decided to use OpenMPI implementation of the MPI 2.0 standard as input for further enhancements. Of key importance were the inter-cluster communication techniques that deal with firewalls and Network Address Translation. In addition, the mechanism for spawning new processes in OpenMPI needed to be integrated with QosCosGrid-developed middleware. The extended version of the OpenMPI framework was named QCG-OMPI (where QCG stands for QosCosGrid). The extensions were three-fold:

  1. internally, QCG-OMPI improves the MPI library by featuring cross-cluster connectivity techniques to enable, when possible, direct connections between MPI ranks that are located in remote clusters potentially separated by firewalls;
  2. the MPI standard was extended to comply with the QosCosGrid semi-opportunistic approach, by providing a new interface to describe the actual topology provided by the meta-scheduler;

The QCG-OMPI is a QosCosGrid branch of the  OpenMPI 1.3.1. The branch differs from the original version only in implementation of two MCA (Modular Component Architecture) transport plugins:

  • OOB (Out Of Band) TCP - a module responsible for exchanging control messages,
  • BTL (Byte Transfer Layer) TCP - a module responsible for exchanging actual MPI messages.

The adaptation relayed on adding SOCKS based traffic tunneling if communicating with a MPI process located on different cluster.

Further reading