

This file describes the usage of Hybrid Performance Tools.  These
tools are designed to assist in using a number of  the performance tools in a 
hybrid system that uses more than one or processor architecture in the design. 
In particular the Cell Broadband Engine (Cell B.E.) is used as an
accelerator for a host system of a different architecture.

Important Note: 
--------------
Several of the tools make use of ssh to launch work on the accelerator.
The ssh support between the host and the accelerator must be set up so the
scripts can ssh from the host to the accelerator without needing to supply
a password.

List of Tools covered:

   Tool         Host Script
============================
1. cpc		(cpch)  - Cell Performance Counter
2. FDPRPro	(fdprproh)
3. OProfile	(oprofileh, oprofilerpt)
4. PDT		(traceh) - Trace facility.
5. PDTR         (pdtr)   - pdt trace analyzer.

General Setup Information
-------------------------

NOTE:  all of these tools assume you have a running hybrid application 
which the tool can be run against.

The Hybrid Performance Tools use common setup scripts for setting up
environment information for the tools and applications.

The perfToolHostSetup script is source'd in the host portion of the tooling 
scripts.  Within that script the perfToolUsrEnv script is source'd.  This script 
is generally where environment settings that a user may want/need to change
will be found.  This is installed as a read only file.  If the defaults do not
work for a given user, they may copy this file and modify the values of the
environment variables within.  To make the perfToolHostSetup find your new
perfToolUsrEnv you will need to export PERF_TOOLS_USR_ENV.  It will need to 
point to the full path and file name of the new perfToolUsrEnv.
(example: export PERF_TOOLS_USR_ENV=/home/usr/johndoe/bin/myPerfToolUsrEnv )


The current user variables are as follows:

SDK_ROOT - points the install location of the SDK.  This variable only needs
to be assigned if the SDK is install to a non-default location.

SDK_PROTOTYPE_ROOT - this points to the base location for SDK functionally that 
is prototype for this release.  The default location is: /opt/cell/sdk/prototype

PERF_DATA_ROOT - this is the base location for all the tool output.  This needs
to be the same on both the host and the accelerator.  It is expected to be quite
common that this is an NFS mounted file system.  All output directories created
by the tools will have this as a base directory.  
Default location is:  /$SCRATCH/perfData

The following four optional environment variables allow you to identify the
locations of the host and accelerator pieces of your hybrid application
so we can appropriately add them to the appropriate path.

This allows you to point to alternative versions of your executable which you
may be using to work with the tools (ex/ if you've compiled a version of your
application with trace enable then when running the trace launching tool you
would want to make sure the traced version of you executable were place at the
front of the various paths).

HOST_APP_PATH             - path where the host application executable is located.
HOST_APP_LD_LIBRARY_PATH  - path for host application dependent shared libraries.
ACCEL_APP_PATH            - path where the accelerator application executable is
                             located.
ACCEL_APP_LD_LIBRARY_PATH - path for accelerator application dependent shared 
                             libraries.


The perfToolAcceleratorSetup script is source'd in the Cell B.E.
(accelerator) portion of the tools.


Output directory Structure:
--------------------------

The top part of the directory structure which is common to all the tools is as
follows:

        PERF_DATA_ROOT/userid/hostname
        
        where
        userid   - is the userid of the person running the traceh command.
        hostname - is the hostname from the machine the traceh command was
                    launched.
        
        The rest of the directory structure will be unique for a given tool.

Tool Description:
-----------------

cpc
---

cpc (CellPerfCounter) is a Cell B.E. tool that allows the user to monitor
events with the hardware Performance Management Unit (PMU) built into the
the Cell B.E. processor during the execution of an application.  CPC only runs
against the cell portion of a hybrid application.

For more information on which events can be monitored by cpc,
the names of the events, command line options, and other general information
refer to the documentation for the cpc tool.

The Hybrid Tools includes the scripts cpch and cpca to assist in using
cpc in a hybrid environment.  A hybrid program can be monitored using the
cpch script.  The cpca script is used internally by cpch and is not meant
to be used by a user directly.

  cpch usage:
	cpch [options] <application name> [<application parameters>]

  Parameters for oprofileh:

  --runid=	optional - name of the current run.  If no runid is
		provided a timestamp will be used.

  --cell-event=	required - may be specified multiple times depending on the
		capabilities of the Cell B.E.'s PMU and OProfile.
		The event(s) to be monitored on the Cell B.E. part of the
		program are specified.  e.g. --host-event=SPU_CYCLES:500000

  --cell-options= optional - may specify any parameters for the cpc command.
		Multiple parameter values may be specified by enclosing in quotes.

  --help,-h	Prints help information for this command.

Example:
	cpch --cell-event=C --cell-options="-i 32000000" my_application -xyz




FDPRPro
-------

The Post-link Optimization for Linux on POWER is a performance-tuning utility
for reducing the execution time and the real memory utilization of user-level
application programs. The tool optimizes the executable image of a program by
collecting information on the program's behavior under a typical workload, and
creating a new version of the program that is optimized for that workload. 
The new program generated by the post-link optimizer typically runs faster and
uses less real memory than the original program.

Note: The post-link optimizer applies advanced optimization techniques to a
program. This may result in programs that do not behave as expected. 
Programs that are optimized using this tool should be used with due caution
and should be rigorously retested, at least with the same test suite used to
test the original program, to verify expected functionality. The
optimized program is not supported as input to the optimizer.

The Hybrid Tools includes the scripts fdprproh and fdprproa to assist in using
fdprpro in a hybrid environment.  A hybrid program can be analyzed and optimized
using the fdprproh script.  The fdprproa script is used internally by fdprproh
and is not meant to be used by a user directly.

NOTE:  Currently the hybrid tooling script, fdrprpoh, only supports 
coordinating usage of fdrprpo on the accelerator (CBE).  Future versions
will include support for running fdprpro on the host (Starting with Opteron).

usage: fdprproh [OPTION] ... <application name> [<application arguments>]
  NOTE 1: if application arguments contain switches (ex/ -p),
         then all the arguments must be placed in double quotes

  -l, --listenv   list the environment variables for the script processing
  -h, --help
  -o, --optimization-options      "<fdprpro optimization options>"
        NOTE 2: the optimization options must be enclosed in double quotes
        Output optimized executable will be placed in the same directory as
        the input executable.

Example:

        fdprproh --optimization-options "-O3"  /myApplicationPath/myApplication appArguments

Note:  in the event that fdprpro has a failure you will be able to find a 
directory called fdpr_temp in the same directory as the source executable you
are trying to optimize. Files there may contain additional information useful
in determining the cause of the failure.  For example, a log file is created 
with the output of each phase of running the tool:  instrumentation, executing
the instrumented code, and optimizing the code.  If the run completes without
any errors the fdpr_temp is removed.

OProfile
--------

OProfile is an open source profiling tool with support for a number of
different hardware platforms.  It uses the Performance Management Unit (PMU)
built into most modern processors to monitor performance related events
during the execution of an application.  Because the OProfile tool accesses
hardware register(s) it requires that the userid running this tool either
have root authority or sudo authority to run OProfile.  For more information
about which events can be monitored for each type of cpu, the event names,
and other command line options refer to the OProfile documentation.
(http://oprofile.sourceforge.net/docs/)

The Hybrid Tools includes the scripts oprofileh, oprofilea, and oprofilerpt
to assist in using OProfile in a hybrid environment.  A hybrid program can
be profiled using the oprofileh script.  The oprofilerpt script can be
used at a later time to create a report from the profile data.  The
oprofilea script is used internally by oprofileh and is not meant to be
called by a user directly.
OProfile can be run on host part of the application and/or the Cell B.E.
part of the application.

  oprofileh usage:
	oprofileh [options] <application name> [<application parameters>]

  Parameters for oprofileh:

  --runid=	optional - name of the current run to refer to the data 
		using the oprofilerpt command at a later time.
		If no runid is provided a timestamp will be used.

  --host-vm=	optional - (--no-vmlinux,--vmlinux=,...) common OProfile
		parameters that are provided prior to starting OProfile.
		These option(s) are used on the host OProfile.  Multiple
		parameter values may be specified by enclosing in quotes.

  --cell-vm=	optional - (--no-vmlinux,--vmlinux=,...) common OProfile
		parameters that are provided prior to starting OProfile.
		These option(s) are used on the Cell B.E. OProfile.  Multiple
		parameter values may be specified by enclosing in quotes.

  --host-event= optional - may be specified multiple times depending on the
		capabilities of the host machine's PMU and OProfile.
		The event(s) to be monitored on the host part of the program
		are specified.  e.g. --host-event=CPU_CLK_UNHALTED:500000

  --cell-event=	optional - may be specified multiple times depending on the
		capabilities of the Cell B.E.'s PMU and OProfile.
		The event(s) to be monitored on the Cell B.E. part of the
		program are specified.  e.g. --host-event=SPU_CYCLES:500000

  --host-options= optional - may specify any parameters for the OProfile
		host "opcontrol --start" command.  Multiple parameter
		values may be specified by enclosing in quotes.

  --cell-options= optional - may specify any parameters for the OProfile
		Cell B.E. "opcontrol --start" command.  Multiple parameter
		values may be specified by enclosing in quotes.

  --help,-h	Prints help information for this command.

Example:
	oprofileh --runid=Run7 --cell-vm=--no-vmlinux --cell-event=SPU_CYCLES:200000 my_application -lx

The oprofilerpt command is used after oprofileh to create the desired
report(s).  It references the data using the runid, which is a required
parameter.  Reports are stored in the Performance Tools directory structure.
The script will print the directory name(s) where the reports may be
found as it runs the report commands.

  oprofilerpt usage:
	oprofilerpt --runid=<id> [options] opreport|opannotate|opgprof

  Parameters for oprofilerpt:

  --runid=	Name of the run that was specified or created when 
		oprofileh was previously used to profile a program.

  --host	optional - create a report on the host OProfile data.

  --cell	optional - create report(s) on the Cell B.E. OProfile data.

  --host-options= optional - may specify any parameters for the host
		opreport/opannotate/opgprof command.  Multiple parameter
		values may be specified by enclosing in quotes.

  --cell-options= optional - may specify any parameters for the Cell B.E.
		opreport/opannotate/opgprof command.  Multiple parameter
		values may be specified by enclosing in quotes.

  --delete,-d	Delete the OProfile session data (saved under runid)
		after the report has been created.

  --print,-p	Print a copy of the OProfile report(s).

  --help,-h	Prints help information for this command.

Example:
	oprofilerpt --runid=Run7 --cell --cell-options=--symbols --print opreport



trace/PDT
---------

The PDT trace facility allows users to turn on various levels of trace for several
of the IBM supplied cell and Opteron libraries.  It also allows users a way to 
instrument there code so that they can add trace point which will be logged in the
same trace log.  

For more information on using and configuring PDT see either the Users Guide in
/usr/share/pdt/doc or the PDT chapter in the SDK Programmers Guide.

The scripts provided here in the hybrid tooling RPMs enable easy setup, coordination,
and use of the trace facility in the hybrid environment.

These scripts assume you have enabled your application for trace (see the PDT
documentation for details).  The scripts also allow for transparent switching 
of the LD_LIBRARY_PATH to include the traced versions of the IBM provided libraries.
This is useful if you are using the shared library versions of these libraries.
If you are statically linking in libraries that are trace enabled you will need
to modify you make files accordingly (see the PDT documentation).  

Environment variables are also provided so that the user can point to traced versions
of their libraries and have them substituted when tracing the application (see 
the user environment variable descriptions above for ACCEL_APP_PATH and
ACCEL_APP_LD_LIBRARY_PATH).

usage: traceh [OPTION] ... <application name> [<application arguments>]
  NOTE 1: if application arguments contain switches (ex/ -p or --myOpt),
        then all the arguments must be placed in double quotes

  OPTIONS:
  -h, --help
  -l, --listenv    Prints out trace environment variable information
      --runid      Specifies a prefix to be prepend to the base trace
                    directory associated with the trace of this application.
                    Default is a date/time based directory name.
                    Note that this is used to coordinate PDTR analysis.

        Example:  
        traceh --runid myFastRun2 myHybridApp argument

setup:
     Note that PDT exists on both the host and the accelerator.  It supports the
ability to trace shipped libraries that have been enabled for trace.  It also
support the ability for the user application to add trace points to their code
(see the PDT users guide for details).
     Both PDT on the host as well as PDT on the accelerator require a configuration
file to tell them which trace functions you want turned on for the run of your
application.  Use the following two export statements from the host environment
to point to the two appropriate configuration files:

     # Host PDT configuration file example
     export PDT_CONFIG_FILE=/usr/share/pdt/example/pdt_dacs_config_hybrid.xml

     # Accelerator PDT configuration file example
     export DACS_START_ENV_LIST=
             "PDT_CONFIG_FILE=/usr/share/pdt/example/pdt_dacs_config_cell.xml"

Note that the example listed enables default tracing of all the DaCS code.

Output:

All trace output is rooted off of PERF_DATA_ROOT as defined above. 

     If the --runid is used the output will be put in the directory:

          <$PERF_DATA_ROOT>/trace/<runid>

     If the --runid is NOT used the output will be put in the directory:

          <$PERF_DATA_ROOT>/trace/<YYYYmmddHHMMSS>  
          (ie. Year, month, day, hour, minute, second that the traceh script
           was launched)

     Individual files generated by PDT will also have the source hostname
     pre-pended to the front so that point of origin can be determined.

PDTR
----

PDTR is a command line tool that provides both viewing and post-processing of
PDT traces on the target machine.  Use of PDTR requires trace output files
from PDT.  Once the application has run and the trace output files written
PDTR can be used to show the trace output and analysis.

For more information on using pdtr see the pdtr manpage or the PDT section of
SDK Programmers Guide.

usage: pdtrh --runid=ID [options]
     --runid=ID       Required.  Must either match the runID given to traceh
                       when generating the trace, or match the default date-timestamp.
  options:
  -h,--help           print this help
  -l,--listenv        Prints out trace environment variable information 
  -o, --pdtr-options "<pdtr command line options>" 
        NOTE: if more than one option is supplied it must
        be enclosed in double quotes

Output files will be placed in the same directory as the source trace files and will
have a .pep extention.


