Tracing, Debugging and Profiling Mechanisms and Architecture on Many-Core Systems

This project will examine the common instrumentation and data collection needs of different tracing, debugging and profiling tools. In addition, new many-core architectures will be studied in order to propose efficient mechanisms and interactive tools to monitor these systems. This project has five research axes:

  • Research on the architecture and performance, related to tracing, of different many-core heterogeneous processors such as Adapteva’s Parallela board with the 64 processors Epiphany chip, and Kalray’s MPPA MANYCORE processors containing 256 cores.

  • Examine the new Intel Processor Trace (PT) extensions with a focus on bare-metal platforms with thousands of processors used for Telecom equipment, in tasks such as packet and baseband processing. These custom heterogeneous many-core processors are crucial for obtaining the desired level of performance, reliability and power efficiency. They resemble GPGPU processors in their highly parallel architecture but are optimized for network instead of graphics tasks.

  • Examine the architectural peculiarities of the many-core systems studied, and propose a tool architecture suitable for these large-scale parallel systems. Efficient algorithms are required both for controlling and interacting with the instrumentation and tracing hardware of these heterogeneous many-core systems. Based on some prototyping with CoreSet commands in GDB, the whole interaction with systems containing 256 cores, and soon much more, needs to be reinvented. Similarly, improved algorithms are required for efficient low-disturbance data collection. On many-core systems, the data communication infrastructure rapidly becomes a bottleneck. When instrumentation traffic is super-imposed on the same communication channels, it becomes very difficult to maintain low-disturbance. 

  • Research on the tool architecture from a different point of view, sharing data collection mechanisms between the different levels (kernel, user-space, bare metal, Java Virtual Machine and Python runtime) and monitoring applications such as tracers (LTTng, Ftrace, Perf), debuggers (GDB), profilers (Perf, Oprofile) and specialized tools (Address / Thread / Heap Sanitizer). Existing mechanisms for static and dynamic instrumentation will be revisited with a special emphasis on scalability. At the instrumentation point, efficient handlers must be available to be called for verifying conditions, aggregating values, collecting data and for more specialized tasks such as extracting a stack dump, or scanning roots (global and local variables) to verify memory objects reachability. A significant challenge is to propose new low-overhead thread safe algorithms to perform these various tasks. Different strategies will be explored to install efficient handlers for those tasks. The handlers can be provided as bytecode with just-in-time compilation or as pre-compiled native code. 

  • Setup and support of these specialized hardware and software platforms to test the proposed algorithms on some of their bare-metal platforms, and integration of the proposed mechanisms in GDB and LTTng. [With the help of Ericsson and EfficiOS]