Supervision is good, observability is better

Gartner, as it does every year, presented its top 10 strategic technology trends for 2023 at the IT/Xpo 2022 Symposium. Observability appears for the first time in this panorama of perspectives. While not entirely new, observability is not yet a familiar approach for the users we meet, perhaps because they don’t see how it differs from or complements monitoring. But make no mistake: observability is much more than a new buzzword for monitoring, it’s an interesting subject that should be carefully considered.

What is the difference between control and supervision?

Supervision (monitoring) corresponds to the process of collecting, analyzing and using information from the physical equipment and software that make up the IS, in order to track the progress of the program towards achieving its goals, identify possible failures and ensure its optimal operation at any time. This monitoring refers to the observation of specific parameters and can provide a lot of additional data. But it is usually considered independently of the larger systemic context.

Observability, on the other hand, refers to the ability to understand the internal state of a system by analyzing observable data, including digitized artifacts such as logs, traces, API calls, delays, downloads, and file transfers that are generated when a stakeholder takes some action. Observability helps teams analyze what is happening to find and fix the root causes of problems.

To summarize, monitoring allows you to know the state of the system, while observability helps to more accurately determine what is happening and what needs to be done.

Supervision? Observability? Or both?

The question then becomes which model to choose for your environment… assuming you have to choose between two.

Monitoring provides a limited view of system data, focusing on individual measurements, which is sufficient when the failure modes of said systems are well understood. By focusing on key metrics, monitoring provides information about overall system performance. But the more complex applications and hardware, the more failure modes they have. While it is easy for a system administrator to understand what patterns might lead to a general failure (such as a spike in memory usage), it is often impossible to predict errors in distributed applications. And this is the whole point of observability: by being able to understand the internal state of the system, it becomes possible to determine what is not working, and the reasons for this failure.

Be careful! It is not enough to establish a correlation between several indicators in order to be able to diagnose in modern applications. On the contrary, even! These modern, increasingly complex applications require greater visibility into the state of systems. Therefore, in order to achieve this, it is necessary to combine observability with a powerful monitoring tool, which becomes an important component.

Keys to Observability

To understand what is going on inside a system, observability depends on its logs, metrics, and traces.

  • Logs / logs aggregated application and system data that provides historical information about transactions and flows, including social media. Log entries describe the events and overall operation of these metrics, providing context when they are recorded. For example, a log message would report a high error rate in an API function. But it needs to be linked to dashboard measurements that show resource depletion in order to more accurately analyze the problem.
  • Metrology represents sets of measurements taken over time, and there are several types:
  • Measures metrics that measure a value at a specific point in time. For example, CPU usage during measurement is a representative metric.
  • A delta metric in which a value measures the change since the last save. Metrics that measure the number of requests are delta metrics.
  • Finally, cumulative metrics, in which the value steadily increases over time. The Bytes Sent metric can be cumulative because each value reflects the total number of bytes sent by the service at that moment.
  • Distributed tracing is the third pillar of observability and represents the way a user or application interacts in a system. An application can depend on multiple services, each with its own set of metrics and logs. Distributed tracing is a diagnostic technique that involves observing requests as they move through distributed environments. In these complex systems, traces reveal any performance issues that may occur in operations between microservices.
  • Full observability is in line with DevOps practice and depends on more types of data than just key metrics. One of the main challenges for an IT manager who wants to “watch” their system will be finding a balance between a lack of data and an excess of information distributed in storage or unusable. But if successful – and this is one of the main problems of observability – the company thus provides greater reliability of its IT infrastructure, improved visibility of its distributed application architecture, providing better security and better user experience necessary for the growth of their business.

    Back to top button

    Adblock Detected

    Please consider supporting us by disabling your ad blocker.