Cloud Debugging and Monitoring

This project will study multi-level execution models in general, and in particular in the context of OpenStack
with virtual machine migration and has four research axes:

  • Research on the architecture of the OpenStack Cloud framework and propose different data collection, analysis and monitoring activities. Some of the parameters already monitored at high level with OpenStack’s Tomograph will be reused. The parameters to monitor include resource usage (power consumption, CPU, disk and network usage) and various latencies (query response time, contention among virtual machines, migration latency). The computation of these latencies requires correlating information from several nodes. For instance, the information about virtual machines migration, including the bandwidth required and the periods of performance degradation and non-availability, requires tracing information from user-space and kernel level, in the origin and destination hosts.
  • Research on the architecture and organization of Software Defined Networks like OpenDaylight overlaid on top of the physical networks. The collected data will provide information about physical and logical networks and their current usage, helping to diagnose
    networking problems or possible violations of Terms of Service concerning the available bandwidth to specific client virtual machines. Discussions will be held with Ericsson support engineers to learn more about the most difficult problems they encounter in the field, and the metrics and views that could be extracted from the virtual and physical network layers in order to facilitate the monitoring and diagnosis of such problems.
  • Propose efficient architecture and algorithms to collect and analyse in formation coming from several levels in the execution model, from physical machines to virtual machines
    (e.g. KVM) and to user-space, bare metal, Java virtual machines (JVM) and Python runtime information, extending our earlier work. The proposed architecture must efficiently support the analysis of mobile virtual machines, migrating from one physical machine to another, and must scale to large clusters / cloud. Different techniques for scaling will be examined, including hierarchical aggregation, dynamic selection of the level of details, and sampling based on time (one request every n) or node (one node every n similar nodes).
  • Setup and support of the OpenStack cloud environment and OpenDaylight Software Defined Network. [With the help of Ericsson to provide access to an internal test Cloud, already setup for another research project between Ericsson and Ecole de Technologie Superieure’s professor Mohamed Cheriet].