This chapter explains some of the alternatives provided within the SDK to debug a Hybrid DaCS application. The standard methods of using GDB or printf can still be used, but these have some unique considerations. The Hybrid DaCS daemons, which manage the Hybrid DaCS processes, provide logs and methods of retaining runtime information, such as core dumps and the contents of the current working directory. Hybrid DaCS also provides three different versions of the library with different levels of error checking. The base version is optimized for performance and provides limited error checking and no tracing. The trace version provides tracing support and the debug version provides error checking (such as parameter verification on all the APIs).
Even though a comprehensive debugger is not available, gdb and gdbserver may be used. However, for debugging applications on the PPU the two debuggers provided by the SDK, ppu-gdb and ppu-gdbserver, should be used. These debuggers provide the same options and capabilities as the normal gdb programs but are specifically targeted for the PPU architecture.
ppu-gdb <program name> <process id>
Use the facilities provided by DaCS to start a debugging session with ppu-gdbserver. In order to do this an environment variable needs to be set either prior to launching the host application or within the application.
export DACS_START_PARENT="/usr/bin/ppu-gdbserver localhost:5678 %e %a"
> ppu-gdb program (gdb) target remote localhost:5678 (gdb) <debug as usual>
If debugging remotely, for example from an x86 client, it will be necessary to find the proper levels of code and library that are installed on the PPU for proper debugging. It will be easier to start by debugging directly on the PPU. Refer to the gdb documentation for setting the shared library and source code paths.
Include a global variable and strategic while loop in the code to halt the program so that gdb can be attached, for example:
int gdbwait = 1; int main(int argc, char* argv[]) { . . while(gdbwait); . . }
> ppu-gdb program 23423 (gdb) set gdbwait=0 (gdb) c
DACS_START_ENV_LIST="ACCEL_DEBUG_START=Y"before running the host process. Once the remote process has started it waits until you attach to it using a debugger, for example ppu-gdb -p <pid>. If ACCEL_DEBUG_START is not set the process executes normally.
#include <signal.h> #include <syscall.h> ... /* In the case of ACCEL_DEBUG_START, actually wait until the user *has* attached a debugger to this thread. This is done here by doing an sigwait on the empty set, which will return with EINTR after the debugger has attached. */ if ( getenv("ACCEL_DEBUG_START")) { int my_pid = getpid(); fprintf(stdout,"\nPPU64: ACCEL_DEBUG_START ... attach debugger to pid %d\n", my_pid); fflush(stdout); sigset_t set; sigemptyset (&set); /* Use syscall to avoid glibc looping on EINTR. */ syscall (__NR_rt_sigtimedwait, &set, (void *) 0, (void *) 0, _NSIG / 8); }
The Hybrid DaCS library has multiple daemons monitoring the running DaCS applications. The daemons log errors and informational messages to specific system logs. The daemons provides the capability of capturing core files that may be generated on catastrophic failure and may retain the current working directory on the accelerator for later examination. These are the main features that will be used when debugging applications, but the daemons support other configuration options which may be useful in debugging certain types of problems. These options are documented in the /etc/dacsd.conf file. The following sections discuss the main features listed above.
Logs
The logs require root authority to view.
The daemons support more detailed logging by setting the environment variable DACS_HYBRID_DEBUG=Y when launching the application. This variable will also be passed on to the accelerator daemon as well. The DACS_HYBRID_DEBUG environment variable increases the log level in hdacsd and adacsd for the duration of the application, and also creates a DaCSd SPI log for the HE and AE applications in the /tmp directory on the host and the accelerator. The log file names are /tmp/dacsd_spi_<pid>.log, where <pid> is the process id of the host or accelerator application.
SocketSrv init: version mismatch
dacs_reserve_children() - failures during this call can usually be tracked back to errors in the /etc/dacs_topology.config file. The IP addresses and reservation visibility should be verified. For more information on the configuration file refer to the installation guide shipped with the SDK.
The actual number of accelerators allocated by this function may not match the number requested; in particular "zero" available accelerators may be returned with an empty DE list. This function does not return a failure if no accelerators are available. The user must check the return values of this function before proceeding.
Core files
# Set curlimit on core dump resource limit for AE application. # The curlimit is a soft limit and is less than the max limit, # which is a hard limit. # If a core dump is larger than the curlimit the dump will not occur. # If child_rlimit_core=0, the current resource limit is NOT changed # for the AE child # If child_rlimit_core=value>0 the current resource limit will be # changed to min(value, hard_limit). # If child_rlimit_core=-1 the resource limit will be set to the hard # limit--which could be infinite child_rlimit_core=0
If this value is changed the daemon must re-read the configuration file as described below.
Saving to the CWD
# Normally the AE Current Working Directory and its contents # are deleted when the AE process terminates. # Set ae_cwd_keep=true if you want to prevent the # AE Current Working Directory from being deleted. ae_cwd_keep=falseIf this value is changed the daemon must re-read the configuration file as described below.
cat /proc/sys/kernel/core_patternIf the result is core then the core file is written in the current working directory. Since the current working directory is by default removed on termination, core files will be lost without further changes. You are recommended to change this by:
echo "/tmp/core-%t-%e.%p" > /proc/sys/kernel/core_patternwhich will write any core dumps into the /tmp directory with a name of core-<timestamp>-<executable>.<pid>.
Making daemon configuration changes take effect
> ps aux | grep dacsd # find the process ID > kill -s SIGHUP <dacsd_process_ID>
ADACSD_ARGS="--log /var/log/adacsd.log --pidfile /var/run/adacsd.pid"and cat /var/run/adacsd.pid
export LD_LIBRARY_PATH=/opt/cell/sdk/prototype/usr/lib64/dacs/debug export DACS_START_ENV="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}"
Error Checking Library
Hybrid DaCS provides an error checking library to enable additional error checking, such as validation of parameters on the DaCS APIs. The error checking library is found in directory /opt/cell/sdk/prototype/usr/lib64/dacs/debug.
It is recommended that this library is used when first developing a DaCS application. Once the application is running successfully the developer can then use the regular runtime library.
Trace enabled Library
Hybrid DaCS provides a tracing and debug library to track DaCS library calls. The trace library is found in directory /opt/cell/sdk/prototype/usr/lib64/dacs/trace.
Linking with this library instead of the production or debug library will provide additional traces that can be used to debug where a program is failing by seeing what calls are made, their arguments, and the return value associated with the call. Refer to the PDT users guide for additional capabilities of this library and environment.