DaCS debugging

This chapter explains some of the alternatives provided within the SDK to debug a Hybrid DaCS application. The standard methods of using GDB or printf can still be used, but these have some unique considerations. The Hybrid DaCS daemons, which manage the Hybrid DaCS processes, provide logs and methods of retaining runtime information, such as core dumps and the contents of the current working directory. Hybrid DaCS also provides three different versions of the library with different levels of error checking. The base version is optimized for performance and provides limited error checking and no tracing. The trace version provides tracing support and the debug version provides error checking (such as parameter verification on all the APIs).

printf considerations

The easiest and most well known way to debug a program is to add printf statements at strategic points. This method can be useful in hybrid application debug provided the developer understands the following considerations:
  • printf output may be interleaved between host and accelerator application output, and may not be in exact time sequence order of invocation between the two (or more) applications running;
  • DaCS buffers the stderr and stdout streams of the accelerator. The buffers are generally flushed when a newline character is introduced into the stream. Flushing the stream directly may have little or no impact on displaying the data because of this behavior.

Debugging with GDB

Even though a comprehensive debugger is not available, gdb and gdbserver may be used. However, for debugging applications on the PPU the two debuggers provided by the SDK, ppu-gdb and ppu-gdbserver, should be used. These debuggers provide the same options and capabilities as the normal gdb programs but are specifically targeted for the PPU architecture.

To debug a hybrid PPU application you have a number of options.
  1. If the process is running, attach to the process and debug. The process id can be found by using executing ps -ef on the command line of the PPU:
    ppu-gdb <program name> <process id>
  2. If the process is failing use one of the following techniques to attach the debugger to the process prior to the program ending:
    1. Use the facilities provided by DaCS to start a debugging session with ppu-gdbserver. In order to do this an environment variable needs to be set either prior to launching the host application or within the application.

      The DACS_START_PARENT environment variable allows you to change the program that is launched on the PPU. Substitution variables can be used within the command:
      %e
      the accelerator executable name, and
      %a
      arguments to be passed to the executable.
      For example:
      export DACS_START_PARENT="/usr/bin/ppu-gdbserver localhost:5678  %e %a"
      Once the application is started the accelerator application will wait for a ppu-gdb client to connect to it. (This assumes that the debugging is being performed on the PPU client, and that the client source code is available on the PPU.) For example:
      > ppu-gdb program
      (gdb) target remote localhost:5678
      (gdb) <debug as usual>

      If debugging remotely, for example from an x86 client, it will be necessary to find the proper levels of code and library that are installed on the PPU for proper debugging. It will be easier to start by debugging directly on the PPU. Refer to the gdb documentation for setting the shared library and source code paths.

    2. Add a sleep() call of long enough duration so that the debugger can be started up and attached to the process.
    3. Include a global variable and strategic while loop in the code to halt the program so that gdb can be attached, for example:

      Program:
      int  gdbwait = 1;
      
      int main(int argc, char* argv[])
      {
          .
          .
          while(gdbwait);
          .
          .
      }
      Command line:
      > ppu-gdb program 23423
      (gdb) set gdbwait=0
      (gdb) c
    4. Include code to use sigwait to wait for the user to attach, setting the ACCEL_DEBUG_START environment variable for the host process and then passing it to the child using either dacs_runtime_init() or dacs_de_start() and its envp parameter, or set the the DACS_START_ENV_LIST environment variable
      DACS_START_ENV_LIST="ACCEL_DEBUG_START=Y"
      before running the host process. Once the remote process has started it waits until you attach to it using a debugger, for example ppu-gdb -p <pid>. If ACCEL_DEBUG_START is not set the process executes normally.

Example:
#include <signal.h>
#include <syscall.h>
...
/*
In the case of ACCEL_DEBUG_START, actually wait until
the user *has* attached a debugger to this thread.
This is done here by doing an sigwait on the empty set,
which will return with EINTR after the debugger has attached.
*/
if ( getenv("ACCEL_DEBUG_START")) {
int my_pid = getpid();
fprintf(stdout,"\nPPU64: ACCEL_DEBUG_START ... 
                attach debugger to pid %d\n", my_pid);
fflush(stdout); sigset_t set; sigemptyset (&set);
/* Use syscall to avoid glibc looping on EINTR. */
syscall (__NR_rt_sigtimedwait, &set, (void *) 0, (void *) 0,
         _NSIG / 8);
}

Daemon Support

The Hybrid DaCS library has multiple daemons monitoring the running DaCS applications. The daemons log errors and informational messages to specific system logs. The daemons provides the capability of capturing core files that may be generated on catastrophic failure and may retain the current working directory on the accelerator for later examination. These are the main features that will be used when debugging applications, but the daemons support other configuration options which may be useful in debugging certain types of problems. These options are documented in the /etc/dacsd.conf file. The following sections discuss the main features listed above.

Logs

The Hybrid DaCS daemon logs may contain invaluable information for debugging problems. The logs are located by default in
  • /var/log/hdacsd.log on the host, and
  • /var/log/adacsd.log on the accelerator.
These locations may be overridden in the daemon configuration file located in /etc/dacsd.conf. For further details see Initializing the DaCS daemons.

The logs require root authority to view.

The daemons support more detailed logging by setting the environment variable DACS_HYBRID_DEBUG=Y when launching the application. This variable will also be passed on to the accelerator daemon as well. The DACS_HYBRID_DEBUG environment variable increases the log level in hdacsd and adacsd for the duration of the application, and also creates a DaCSd SPI log for the HE and AE applications in the /tmp directory on the host and the accelerator. The log file names are /tmp/dacsd_spi_<pid>.log, where <pid> is the process id of the host or accelerator application.

Failures of a DaCS application often occur within the first few DaCS functions called. The logs may provide detailed information as to the reason of the failure. Some typical errors are:
  • dacs_runtime_init() - failures during this call are usually related to incompatibilities between a Hybrid DaCS application and the daemons installed on the system. A message in the logs will indicate this failure:
    SocketSrv    init: version mismatch
  • dacs_reserve_children() - failures during this call can usually be tracked back to errors in the /etc/dacs_topology.config file. The IP addresses and reservation visibility should be verified. For more information on the configuration file refer to the installation guide shipped with the SDK.

    The actual number of accelerators allocated by this function may not match the number requested; in particular "zero" available accelerators may be returned with an empty DE list. This function does not return a failure if no accelerators are available. The user must check the return values of this function before proceeding.

  • dacs_de_start() - failures during this call are typically program and library path related issues.
    • verify that the program name being passed is a full path name to the executable, and that the executable exists on the target if the creation flag passed is DACS_PROC_REMOTE_FILE, or on the local host if DACS_PROCESS_LOCAL_FILE.
    • verify that the shared libraries can be found correctly on the accelerator. This may be done in several ways.
      • Use RPATH when linking the accelerator application, where the RPATH points to the exact location of the libraries on the accelerator.
      • Use LD_LIBRARY_PATH. Since a user's profile is not set up when the accelerator application launches you must specify the LD_LIBRARY_PATH in the DACS_START_ENV_LIST environment variable to correctly find all of the libraries.
      • Use ldconfig on the accelerator to cache the proper location of the shared libraries.
      • Pass all of the libraries down with the accelerator application into the same working directory using the DACS_PROC_LOCAL_FILE_LIST creation flag and a file list that contains the absolute path of the program and each library needed to run.

Core files

The adacsd daemon has a configuration option to specify the generation of core files. The configuration file is found in /etc/dacsd.conf. The following is an excerpt of the relevant portion of this configuration file.
# Set curlimit on core dump resource limit for AE application.
# The curlimit is a soft limit and is less than the max limit,
# which is a hard limit.
# If a core dump is larger than the curlimit the dump will not occur.
# If child_rlimit_core=0, the current resource limit is NOT changed
#  for the AE child
# If child_rlimit_core=value>0 the current resource limit will be
# changed to min(value, hard_limit).
# If child_rlimit_core=-1 the resource limit will be set to the hard
# limit--which could be infinite

        child_rlimit_core=0

If this value is changed the daemon must re-read the configuration file as described below.

Saving to the CWD

The adacsd daemon configuration file also supports keeping the current working directory (CWD) after the process has executed on the accelerator. This can be specified in the /etc/dacsd.conf file. The relevant excerpt is shown below:
# Normally the AE Current Working Directory and its contents
# are deleted when the AE process terminates.
# Set ae_cwd_keep=true if you want to prevent the
# AE Current Working Directory from being deleted.

    ae_cwd_keep=false
If this value is changed the daemon must re-read the configuration file as described below.
To find where the core file is being written, issue the command:
cat /proc/sys/kernel/core_pattern
If the result is core then the core file is written in the current working directory. Since the current working directory is by default removed on termination, core files will be lost without further changes. You are recommended to change this by:
echo "/tmp/core-%t-%e.%p" > /proc/sys/kernel/core_pattern
which will write any core dumps into the /tmp directory with a name of core-<timestamp>-<executable>.<pid>.

Making daemon configuration changes take effect

On reboot the adacsd will re-read the configuration file and the changes will take effect. The changes can be made effective immediately by sending a SIGHUP signal to the adacsd daemon. For example, run these commands on the CBE platform command line
> ps aux | grep dacsd  # find the process ID
> kill -s SIGHUP <dacsd_process_ID>
The process ID may be found in the pidfile as well. See the line for ADACSD_ARGS in dacsd.conf :
ADACSD_ARGS="--log /var/log/adacsd.log --pidfile /var/run/adacsd.pid"
and cat /var/run/adacsd.pid

DaCS library versions

The optimized version of libdacs_hybrid is installed in /opt/cell/sdk/prototype/usr/lib64 and will normally be used in production. Two other libraries are also available for development purposes; each library provides a different set of functionality to help in analyzing an application. To use a library temporarily LD_LIBRARY_PATH can be set in the local environment, and also on the accelerator by using the DACS_START_ENV_LIST environment variable. An example of this is:
export LD_LIBRARY_PATH=/opt/cell/sdk/prototype/usr/lib64/dacs/debug
export DACS_START_ENV="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}"
Note: The other versions of the library must be installed on the accelerator, or the dacs_de_start() call will fail.

Error Checking Library

Hybrid DaCS provides an error checking library to enable additional error checking, such as validation of parameters on the DaCS APIs. The error checking library is found in directory /opt/cell/sdk/prototype/usr/lib64/dacs/debug.

It is recommended that this library is used when first developing a DaCS application. Once the application is running successfully the developer can then use the regular runtime library.

Trace enabled Library

Hybrid DaCS provides a tracing and debug library to track DaCS library calls. The trace library is found in directory /opt/cell/sdk/prototype/usr/lib64/dacs/trace.

Linking with this library instead of the production or debug library will provide additional traces that can be used to debug where a program is failing by seeing what calls are made, their arguments, and the return value associated with the call. Refer to the PDT users guide for additional capabilities of this library and environment.