The first step to having an effective monitoring system is to know what’s happening in the system; meaningful statistics from the platform need to be available in order to know if there’s something wrong. In OpenStack, Ceilometer gathers system statistics and makes them available to users and any applications running on top of OpenStack that may wish to utilize these statistics for monitoring, metering and billing customers.
At present, the majority of meters are available on a per-project basis, and each OpenStack component can specify what meters it wants to be made available. Unfortunately, in a virtualized environment, the majority of resources being monitored are virtual. This is sufficient for billing, for example, where customers are charged based on (virtual) resource usage, so service providers only need to know when a resource was created, how long it existed for, and perhaps how much bandwidth it used (via a virtual interface). For a monitoring system, more physical resource statistics are required.
A very limited set of statistics about the underlying hardware are available, but these are not enabled by default, and depend on particular OpenStack components being part of a cloud deployment. However, in a monitoring or service assurance use case, statistics regarding the host are necessary to ensure that everything is working correctly. In a nutshell, the statistics available to Ceilometer are not granular enough for these purposes.
Ceilometer supports adding custom meters. If suitable statistics are available, they can be submitted to Ceilometer via its REST API or through the command line. Obviously, manual statistics collection using this mechanism is neither practical nor scalable, but this is a task that could be automated. This does not mean we’re starting from scratch in terms of getting suitable metrics into Ceilometer. We just need to find another data source that can be plugged into Ceilometer to automatically upload existing statistics.
Collectd is a system statistics collection service, which is written in C and that is highly scalable. It has many metrics already available, such as CPU utilization, FSCache, IRQ handling and netstat. It has a plug-in-based architecture which means that only chosen metrics need to be enabled; these factors combine to make the process very lightweight to run.
The architecture of collectd consists of plug-ins which can be enabled and disabled via a configuration file. The plug-ins include read and write plug-ins, namely, collecting statistics from the system, and outputting them in a certain format (e.g. CSV, HTTP, JSON, etc.). In addition to these read and write plug-ins, collectd has a set of "binding" plug-ins to languages such as Java, Perl and Python to be used for plug-in development in addition to its native C.
OpenStack is written in Python, and collectd provides language bindings through its Python plug-in. A collectd plug-in was developed (in Python) to interact with Ceilometer, and make the metrics from collectd available to Ceilometer without the need to modify Ceilometer's code, i.e. a collectd "write" plug-in was developed that outputs data in a format which is consumable by Ceilometer .
This means that all the previously available metrics from collectd can be made available to Ceilometer, and plug-ins such as interface, cpu and cpufreq can be used to get vital statistics about network load and cpu usage; this data can then be used for scheduling, for example, or as a diagnostic tool for identifying performance bottlenecks. These stats can be used by any agent on an OpenStack cloud, and some logic or intelligence can be applied to make more informed decisions for automatic provisioning of resources. For example, additional statistics could be used to identify potential faults in a system, and evacuating all workloads to ensure continuous service.
A Proof-of-Concept (PoC) has been developed to make collectd statistics available to OpenStack and Ceilometer. This is realized through the Python plug-in for collectd which provides Python bindings for collectd and makes the statistics available in a Python environment, thus allowing interfacing with OpenStack clients.
The plug-in formats the collectd meters into Ceilometer's sample format and POSTs the statistics to Ceilometer's API. The collectd plug-in makes data from ~90 plug-ins available to Ceilometer. However, not all of these plug-ins provide relevant data for a cloud administrator, but they can be disabled through configuration, therefore providing a flexible telemetry solution that can now interface with Ceilometer. Collectd is designed to be lightweight and highly scalable; this means it can run alongside OpenStack on any telemetry-enabled node.
The end result is that more statistics are available about the physical hardware, from a highly configurable (and extensible) application. Because collectd is lightweight and highly configurable, statistics can be enabled and disabled very quickly by modifying the configuration and restarting the daemon; this typically takes less than a second, so there is minimal service interruption.
This scalable, flexible solution enables better statistics to be available for the development and deployment of monitoring, fault tolerance, repair and load balancing services for the OpenStack cloud. Instead of trying to add new features to Ceilometer, the plug-in-based architecture and numerous available plug-ins of collectd were exploited to make a larger set of statistics available to Ceilometer and OpenStack. While collectd does not provide all the necessary NFV-related statistics for an OpenStack cloud, the development process for new plug-ins is now better understood, and can be applied to incorporating any previously unavailable statistics into the collectd, which are automatically made available to Ceilometer by the plug-in.
The plug-in was tested against the Kilo release of OpenStack; however, as the plug-in utilized the RESTful API, and does not use any Ceilometer function calls, it should be compatible with all releases which include Ceilometer.