Navigating the Observability landscape using OpenTelemetry

In this blog post, before advancing our understanding of OpenTelemetry, let us take a moment to retrace our steps and provide a summary of some observability practices employed in the past that posed significant challenges. This will help us enhance our understanding of OpenTelemetry as we progress further.

Different observability solutions use their own libraries, protocols, and data formats for instrumentation, making it challenging for developers by creating custom connections and dealing with complex settings across systems. This results in a fragmented observability landscape due to the use of complex tools.
Numerous existing monitoring and observability solutions were tightly coupled to specific vendors or platforms, resulting in complexity and high costs when transitioning or migrating between systems. Developers had to code the entire logic based on the tool for every application or service, thus adding to the challenge of switching or adopting alternative solutions.

These limitations restricted flexibility and impeded innovation. Consequently, individuals recognized the need for a solution that could address these issues and provide a unified approach to observing a system.

What is the OpenTelemetry way...

OpenTelemetry (also referred to as OTel) is the outcome of a merger between two preceding projects, OpenTracing and OpenCensus. Both projects aimed to address the lack of a standard for instrumenting code and sending telemetry data to an Observability backend. However, they couldn't fully solve the problem individually. Hence, the merger formed OpenTelemetry, combining their strengths to offer a comprehensive, unified solution. It is the second most active project incubated under Cloud Native Computing Foundation (CNCF) after Kubernetes.

Before knowing what OpenTelemetry is, it is crucial to understand what it is not. So OpenTelemetry should not be mistaken for a backend, storage solution, or Observability platform. It does not inherently provide native backend functionalities like visualization, alerts, queries, or search capabilities.

Hence, OpenTelemetry can be defined as a unified open-source observability framework with a collection of APIs, SDKs, and tools, that follows a set of standards to instrument, collect and export the telemetry data to various backends for subsequent analysis in a vendor-neutral way.

The OpenTelemetry workflow...

From a high level of abstraction, OpenTelemetry can be understood as a complete flow that emits, receives processes, and exports telemetry signals to any backend for analysis.

Instrument the system...

Instrumentation is the first step in the process that enables any system to emit telemetry data like Traces, Metrics, and Logs. OpenTelemetry supports instrumentation across diverse systems including, microservices, virtual machines (VMs), distributed systems, cloud-based services, containerized applications, or even on-premises data centers ensuring comprehensive observability across various system architectures.

OpenTelemetry (OTel) provides flexibility for instrumenting systems across major programming languages such as JavaScript, Java, Go, Python, C++, Elixir, Rust, Swift, .NET, and Ruby. Initially supporting only traces, OTel has expanded to include metrics and now also supports logs. It is important to acknowledge that support for each programming language varies, with some being stable, others experimental, and some in alpha or development stages. To stay updated on the latest status and releases, you can always refer here.
Some methods for instrumenting the system with OpenTelemetry include:
1. Automatic Instrumentation: This approach allows us to collect telemetry from an application without the need to modify its source code. It offers a seamless way to gather essential data from the system.
2. Manual Instrumentation: This approach allows us to focus on capturing crucial information for performance analysis and debugging, depending on the specific requirements of our business where each component has its performance requirements.
  
  For example, let's consider an e-commerce application that follows a microservices architecture. The application interacts with external payment gateways or third-party services in this scenario. By instrumenting the API requests, we can measure important factors such as latency, success rate, error rates, and other relevant information specific to that particular API. Manual instrumentation proves invaluable in efficiently observing and understanding the behavior of our systems.
3. Library Instrumentation: It provides valuable insights into library-specific operations and helps optimize and monitor their usage within the larger context of an application or system. For example, when instrumenting the MySQL library, you can capture and trace each database query made by the application, measuring the time taken for execution and any associated latency. This allows you to identify slow or inefficient queries that might impact your application's overall performance.

OpenTelemetry Collector...

OpenTelemetry Collector serves as an intermediary between the instrumented application and the backend observability services, as it is used to receive, process, and export telemetry data in a vendor-agnostic way. Utilizing multiple collectors offers benefits such as scalability, high availability, load balancing, and more. The collector is internally divided into further distinct components based on their respective functionalities:

Receiver: This component is responsible for receiving telemetry data emitted by various instrumented applications and systems. It supports a wide range of data formats, including popular ones like OpenTelemetry Protocol (OTLP), Jaeger, Zipkin, Prometheus, Syslog, and AWS CloudWatch.
Processors: It plays a vital role in performing specific operations on telemetry data in batches, allowing for efficient manipulation and customization based on requirements. The operations within the processor component significantly improve the scope of Observability. Some of the significant operations include:
- Filter helps in applying criteria-based inclusion or exclusion of telemetry data.
- Sampling helps in selecting required subsets from large telemetry data sets before export. For example, looking at a particular set of spans for business purposes.
- Enrichment is used to enhance telemetry data with additional context or metadata.
Exporters: They are responsible for the smooth transmission of the processed telemetry data to external systems or observability backends in desired languages. They help in providing integration capabilities with diverse storage, visualization, and analysis platforms, including popular choices like Prometheus, Jaeger, Honeycomb, Datadog, New Relic, and alternative open-source tools like OpenSearch, SigNoz, etc.
Extensions: In OpenTelemetry Extensions are available primarily for tasks that do not involve processing telemetry data. Examples of extensions include health monitoring, service discovery, and data forwarding. Extensions are optional.
- Health Check extension establishes a URL that allows checking the status of the OpenTelemetry Collector.
- Zpages provides an HTTP endpoint that offers real-time data for debugging various components of the OpenTelemetry Collector.
- Service discovery extension dynamically identifies instrumented applications or services, automatically configuring the collector to receive telemetry data.
  
  These extensions eliminate manual configuration, enabling the collector to adapt effortlessly to environmental changes and enhancing observability infrastructure. With these features, the OTel Collector provides flexibility, scalability, and seamless integration with different systems in dynamic environments.

However, there are instances, such as in development or testing environments, where telemetry signals are directly transmitted to the backends as this approach is simple to debug. In addition to the native collector, OpenTelemetry offers the capability to develop custom collectors tailored to specific business requirements. This empowers organizations to create collectors that align with their unique needs and further enhance their observability infrastructure.