Introduction to the GT4 Activity Plugin

The scope of the GT4 activity plugin is to enable the Generic Workflow Execution Service (GWES) to invoke remote command line programs via the Grid middleware Globus Toolkit 4.0.x (GT4) or 5.x (GT5) [1]. Using this plugin you can model command line programs within GWorkflowDL workflows by connecting transitions with the URL of the GT4 ManagedJobFactoryService or the GT5 gatekeeperServiceContact and the wrapper script of the command line program.

This plugin uses the Java Commodity Grid Kit libraries [2] to invoke Globus Toolkit services without the need to locally install the Globus Toolkit on the GWES server. The technical report [4] (in German) describes the requirements to the Grid nodes for resource providers.

Operation

Globus Toolkit 4.0.x

In order to invoke a remote command line program on a GT4 Grid node, the transition needs to contain an operation candidate of type "wsgram". The operation name is equal to the path of the wrapper script for the command line program. The type of the local resource manager (e.g., Fork, PBS, LSF, ...) is separated by a space. If the type of the local resource manager is omitted, then the plugin uses "Fork" as default .

The resource name is equal to the URL of the Grid node's ManagedJobFactoryService. The part after the "@" denotes the identifier of the D-GRDL resource description [5] in case you use the resource description database.

Example of a WS-GRAM operation without using the resource description database:

  <transition ID="cat">
    <description>concatenate two files</description>
    <inputPlace placeID="d1" edgeExpression="input1" />
    <inputPlace placeID="d2" edgeExpression="input2" />
    <outputPlace placeID="d3" edgeExpression="stdout" />
    <operation>
      <oc:operationClass xmlns:oc="http://www.gridworkflow.org/gworkflowdl/operationclass" name="urn:dgrdl:software:cat">
        <oc:operationCandidate type="wsgram"
                               operationName="/usr/local/gwes/cat/cat.sh@software:cat-fhrg PBS"
                               resourceName="https://grid2:8443/wsrf/services/ManagedJobFactoryService@hardware:grid2.net/PBS"
                               selected="true" />
      </oc:operationClass>
    </operation>
  </transition>

More examples are available in the examples section.

Globus Toolkit 5.x

The implementation for Globus Toolkit 5.x is still in a prototype testing phase. You may need to adjust the source code in order to get this plugin working for your Globus Toolkit 5.x-based Grid environment.

In order to invoke a remote command line program on a GT5 Grid node, the transition needs to contain an operation candidate of type "gram". The operation name is equal to the path of the wrapper script for the command line program.

The resource name is equal to the the Grid node's gatekeeperServiceContact. The part after the "@" denotes the identifier of the D-GRDL resource description [5] in case you use the resource description database.

Example of a (PRE-WS) GRAM operation without using the resource description database:

  <transition ID="cat">
    <description>concatenate two files</description>
    <inputPlace placeID="d1" edgeExpression="input1" />
    <inputPlace placeID="d2" edgeExpression="input2" />
    <outputPlace placeID="d3" edgeExpression="stdout" />
    <operation>
      <oc:operationClass xmlns:oc="http://www.gridworkflow.org/gworkflowdl/operationclass" name="urn:dgrdl:software:cat">
        <oc:operationCandidate type="gram"
                               operationName="/usr/local/gwes/cat/cat.sh@software:cat-fhrg"
                               resourceName="grid2.net:2119/jobmanager-pbs@hardware:grid2.net/PBS"
                               selected="true" />
      </oc:operationClass>
    </operation>
  </transition>

Inputs

If the transition is enabled, the plugin automatically generates a Globus Toolkit 4 job description (or Globus Toolkit 5 RSL) using the contents on the tokens from the corresponding read and input places. The incoming edge expressions map the tokens to the matching command line parameters. If the token contents begins with "gsiftp", then the GWES includes the corresponding file as file stage in into the job description.

Example:

   <place ID="d1">
     <token>
       <data>
         <file xmlns="">gsiftp://grid3//usr/local/gwes/cat/d1.dat</file>
       </data>
     </token>
   </place>

Outputs

The plugin generates automatically names for the output files of the remote command line parameters. After successful invocation of the command line program, the plugin puts these output file names as new tokens on the output places.

Installation

  1. Install the Generic Workflow Execution Service (GWES). In the following we assume that the GWES is newer or equal version 2.1 and has been installed at the directory $GWES_HOME (e.g., $GWES_HOME=~/local/apache-tomcat/webapps/gwes).
  2. Download and unpack
    gwes-plugin-gt4activity-2.1.1.rc2-bin.tar.gz
      or
    gwes-plugin-gt4activity-2.1.1.rc2-bin.zip

    from the GWES download section.

    Alternatively download the plugin sources

    gwes-plugin-gt4activity-2.1.1.rc2-src.tar.gz
      or
    gwes-plugin-gt4activity-2.1.1.rc2-src.zip

    from the GWES download section and compile the java library by yourself using maven2 with the command

    mvn clean package

    Remark: You will need to exclude some of the JUnit tests in the file pom.xml if you have no valid Grid credential to access the test Grid nodes.

  3. Copy the java library gwes-plugin-gt4activity-2.1.1.rc2.jar to the directory $GWES_HOME/WEB-INF/lib/ (e.g., $GWES_HOME=~/local/apache-tomcat/webapps/gwes/WEB-INF/lib/)
  4. Copy all additional java libraries from lib to the directory $GWES_HOME/WEB-INF/lib:
    # from binary distribution
    cp gwes-plugin-gt4activity-2.1.1.rc2/lib/* $GWES_HOME/WEB-INF/lib
  5. Copy the Axis1 wsdd file gwes-client-config.wsdd:
    # from binary distribution:
    cp gwes-plugin-gt4activity-2.1.1.rc2/gwes-client-config.wsdd $GWES_HOME/WEB-INF/classes
    # OR from source distribution:
    cp gwes-plugin-gt4activity-2.1.1.rc2/src/main/config/gwes-client-config.wsdd $GWES_HOME/WEB-INF/classes
  6. Copy all scripts from bin to the directory $GWES_HOME/bin/:
    cp gwes-plugin-gt4activity-2.1.1.rc2/bin/* $GWES_HOME/bin/
  7. Configure your gwes.properties (details refer next section):
    # GT4.0.x
    gwes.activity.wsgram.class=de.fraunhofer.first.gwes.plugin.gt4activity.wsgram.WSGRAMActivity
    # GT2.x, GT5.x
    gwes.activity.gram.class=de.fraunhofer.first.gwes.plugin.gt4activity.gram.GRAMActivity
  8. Restart the GWES, e.g., restarting the tomcat container.
  9. Copy the script gwes-command-line-operation.sh to the directory specified by the property gwes.gram.home.directory on all Grid nodes, e.g., using scp or gsiscp:
    scp gwes-plugin-gt4activity-2.1.1.rc2/bin/gwes-command-line-operation.sh grid1.net:~/.gwes/
  10. Create an empty directory activity-directory-template as sub directory of the directory specified by the property gwes.gram.home.directory on all Grid nodes, e.g.:
    gsissh grid1.net
    mkdir ~/.gwes/activity-directory-template
  11. Configure Java CoG Kit on the GWES server using
    $GWES_HOME/bin/gwes-cog-setup.sh
  12. Generate Grid proxy credential on the GWES server (repeat this step if credential expires)
    $GWES_HOME/bin/gwes-grid-proxy-init.sh

    For a documentation of additional requirements on Grid nodes, please refer to [4].

Configuration

The plugin supports the following properties:

gwes.properties

Global GWES configuration file ($GWES_HOME/WEB-INF/classes/gwes.properties):

##################################
# GWES GT2/GT4/GT5 Activity Plugin
##################################

# Implementation of net.kwfgrid.gwes.Activity for operation type "wsgram" (GT4.0.x), e.g.,
# <oc:operationCandidate type="wsgram" operationName="software:cat" resourceName="hardware:server" selected="true" />
gwes.activity.wsgram.class=de.fraunhofer.first.gwes.plugin.gt4activity.wsgram.WSGRAMActivity

# Implementation of net.kwfgrid.gwes.Activity for operation type "gram" (PRE-WS, GT2 or GT5), e.g.,
# <oc:operationCandidate type="gram" operationName="software:cat" resourceName="hardware:server" selected="true" />
gwes.activity.gram.class=de.fraunhofer.first.gwes.plugin.gt4activity.gram.GRAMActivity

# local working directory home
# e.g. /tmp/.gwes or .gwes for ~/.gwes
# You can override this property for a specific resource by adding the property
# <simpleProperty ident="gwes.gram.home.directory" type="string" unit="">...</simpleProperty>
# to the corresponding D-GRDL database entry.
# REMARK: when using workflows with fileStageIn, the absolute path to the gwes working directory home must be the same
# on all grid nodes because there is a bug in Globus RFT when resolving ${GLOBUS_USER_HOME} for remote locations!
gwes.gram.home.directory=.gwes

# WS-GRAM batch mode
# true: use WS-GRAM in batch mode with status polling (firewall-friendly)
# false: use WS-GRAM in interactive mode with listener
gwes.gram.batch=true

# File name of the shell script to be used for executing WS-GRAM command line operations and creating
# the temporary working directory.
# Copy the script gwes/bin/gwes-command-line-operation.sh to all remote WS-GRAM resources.
# In addition you need to create an empty directory called "activity-directory-template" within the directory specified by
# the property "gwes.gram.home.directory".
# The path can be either relative to "gwes.gram.home.directory", or with leading "/" indicating an absolute path.
gwes.gram.executor.script=gwes-command-line-operation.sh

# Directory name of an (empty) directory which serves as template for working directories for activities.
# In case of a "file stage in" phase, the GWES makes a file stage in of this directory template to the working
# directory before invoking the activity. This is a workaround as Globus Toolkit does not directly support the
# creation of new directories within the job description.
# The path is relative to "gwes.gram.home.directory". Default value is "activity-directory-template".
gwes.gram.activity.directory.template=activity-directory-template

# Number of retries for refreshing the status of GRAM jobs after a Globus exception.
gwes.gram.status.retries=3

# Maximum number of retries for reliable file transfers: This is number of times RFT retries a transfer failed with
# a non-fatal error.
gwes.rft.maxattempts=3

Transition properties

There are no special workflow or transition properties supported by this plugin. You may use the generic properties defined for all GWES activities to adjust fault tolerance or timeouts for this plugin, such as:

  • activity.maxattempts
  • breakpoint
  • combine.data.groups
  • ignore.data.groups
  • priority
  • timeout
  • timeout.running
  • timeout.active

Please refer to GWES documentation for details, especially the JavaDoc of class net.kwfgrid.gwes.Constants.java.

References

  • [1] Globus Toolkit web site: http://www.globus.org/toolkit/
  • [2] Java Commodity Grid Kit web site: http://wiki.cogkit.org/
  • [3] A Java Commodity Grid Kit. Gregor von Laszewski, Ian Foster, Jarek Gawor, and Peter Lane. Concurrency and Computation: Practice and Experience, 13(89):643- 662, 2001. (pdf)
  • [4] Andreas Hoheisel: Anforderungen an Grid-Knoten für das GWES-Workflow-Management. Technischer Bericht, Fraunhofer FIRST, 2011. (pdf)
  • [5] Armin Wolf: Spezifikation der D-Grid-Ressourcenbeschreibungssprache D-GRDL und ihrer Nutzung im Grid-Computing. Technischer Bericht, Fraunhofer FIRST, 2007. (pdf)
  • [6] ResourceUpdater web site: http://www.gridworkflow.org/fhrg/resourceupdater/docs/
  • [7] GWES Tutorial (in German - pdf 1.3MB)
  • [8] Dagmar Krefting, Julian Bart, Kamen Beronov, Olga Dzhimova, Jurgen Falkner, Michael Hartung, Andreas Hoheisel, Tobias A. Knoch, Thomas Lingner, Yassene Mohammed, Kathrin Peter, Erhard Rahm, Ulrich Sax, Dietmar Sommerfeld, Thomas Steinke, Thomas Tolxdorff, Michal Vossberg, Fred Viezens, Anette Weisbecker: MediGRID: Towards a user friendly secured grid infrastructure. In: Future Generation Computer Systems, Volume 25, Issue 3, March 2009, Pages 326-336, ISSN 0167-739X, Elsevier, 2009. (pdf)
  • [9] Andreas Hoheisel: Grid-Workflow-Management. In: Weisbecker, A.; Pfreundt, F.-J.; Linden, J.; Unger, S. (Hrsg.): Fraunhofer Enterprise Grids – Software. Fraunhofer IRB Verlag, Stuttgart, 2008. ISBN: 978-3-8167-7804-2 (pdf)
  • [10] Falk Neubauer, Andreas Hoheisel, Joachim Geiler: Workflow-based Grid Applications. In: Future Generation Computer Systems, Volume 22, Issues 1-2, January 2006, Pages 6-15, ISSN 0167-739X, Elsevier, 2006. (pdf)
  • [11] Hoheisel, A.: User Tools and Languages for Graph-based Grid Workflows. In: Special Issue of Concurrency and Computation: Practice and Experience, Wiley, 2006. (pdf)
  • [12] Hoheisel, A., Der, U.: An XML-based Framework for Loosely Coupled Applications on Grid Environments. P.M.A. Sloot et al. (Eds.): ICCS 2003. LNCS 2657, 245–254, &copy; Springer-Verlag Berlin Heidelberg, 2003. (pdf)