OTEL: standard logs, traces and metrics in a mad world

Published Thu, October 10, 2024 ∙ Educational, Observability ∙ by Johanan Ottensooser

OpenTelemetry (OTEL) is an open-source observability framework that provides a unified set of tools, APIs, and SDKs for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs. It was created to standardize the way telemetry data is collected and transmitted, addressing the fragmentation caused by multiple, incompatible observability solutions. Maintained by the Cloud Native Computing Foundation (CNCF), OTEL is widely used by developers, DevOps engineers, and SRE teams aiming to gain deeper insights into application performance and behavior in distributed systems.

Before OTEL, separate tools like OpenTracing and OpenCensus offered fragmented solutions; OTEL improves upon them by combining their efforts into a single, cohesive framework that reduces complexity and enhances interoperability.


Implementing OTEL in Your Organization

Technical Implementation Steps

To implement OTEL in your organization, start by instrumenting your applications using OTEL SDKs available for various programming languages. This involves integrating the OTEL libraries into your application code to capture telemetry data.

This would replace the language specific libraries, third part frameworks or commercial agents that would otherwise create such logs.

Next, set up OTEL Collectors to receive, process, and export this data to your chosen backend or observability platform (e.g. Prometheus, Jaeger, Zipkin, Datadog, New Relic, Splunk, and others that support OpenTelemetry formats.) Connect them to OTEL collectors by configuring exporters in the collector's configuration file to direct data to your chosen backend system. This involves specifying the exporter type, endpoint details, and any necessary authentication within the OTEL collector configuration.

It is likely that you would use Kubernetes or other scalable compute to manage these systems.

To summarize; Implementing OTEL requires several technologies:

  • OTEL SDKs and APIs: For instrumenting applications.
  • OTEL Collectors: To collect and export telemetry data.
  • Backend Systems: Such as Jaeger, Prometheus, or commercial APM solutions to store and analyze data.
  • Containerization and Orchestration Tools: Like Docker and Kubernetes for scalable deployment.
  • Monitoring Tools: For visualizing and alerting, such as Grafana or Kibana.

Open source tools like Moose can be used to stitch together these tools, and ensure that OTEL and Pre-OTEL systems integrate well.

What teams would run this process?

The platform would typically be run by a dedicated Observability or DevOps team responsible for maintaining the telemetry infrastructure. This team would manage the OTEL Collectors, configure data pipelines, and ensure data integrity and security. They would also collaborate with IT and network teams to optimize resource utilization and address any infrastructure challenges.

Your team will need skills in:

  • Programming and Application Development: To integrate OTEL into application code.
  • DevOps Practices: Including CI/CD pipelines, infrastructure as code, and automation.
  • Observability and Monitoring: Understanding of metrics, logs, and traces.
  • Cloud Computing: Experience with cloud services if using cloud-based backends.
  • Data Management: Skills in handling large volumes of telemetry data efficiently.

Integration for Product and Operations Teams

Product and operations teams need to align their workflows to leverage the new observability capabilities. This includes updating application code to include OTEL instrumentation, defining custom metrics and spans relevant to their services, and using new dashboards and alerting systems. Training sessions or workshops may be necessary to bring these teams up to speed with OTEL's features and best practices.

Moose or other open source tools can be used to create data quality checks to ensure OTEL compliance, and to extract relevant data for analysis.

Downsides of Transitioning to OTEL

Transitioning to OTEL can present challenges such as the initial overhead of instrumenting existing applications and services. There may be compatibility issues with legacy systems, and the need for team training can consume time and resources. Additionally, managing the increased volume of telemetry data requires robust infrastructure and can lead to higher operational costs if not optimized properly.

You can remediate the initial overhead and compatibility issues by implementing OpenTelemetry gradually, starting with critical services. Whilst there are scaling benefits to bringing more of your systems to your OTEL system, you can still reap some benefits by rolling out to existing systems one at a time.

Ideal Organizations for OTEL Adoption

Organizations that operate large-scale, distributed systems and require deep insights into application performance would benefit most from OTEL. This includes enterprises with microservices architectures, cloud-native applications, or those undergoing digital transformation initiatives. Companies aiming for standardization in observability practices across diverse tech stacks would also find OTEL advantageous.

Whilst there are costs to implementing OTEL that scale with the complexity of the organization, the benefits scale similarly.

Contrasting OTEL with Pre-OTEL Observability Paradigms

Before OTEL, organizations often relied on disparate tools and proprietary solutions for observability, leading to fragmented data and siloed teams. Each tool had its own instrumentation methods, making it challenging to achieve a unified view of system performance. Custom integrations were frequently required, increasing complexity and maintenance overhead.

OTEL addresses these issues by providing a standardized approach to telemetry data collection and transmission. It enables interoperability between different observability tools and platforms, reduces vendor lock-in, and simplifies the instrumentation process. By unifying metrics, logs, and traces under a common framework, OTEL enhances collaboration between teams and improves the efficiency of monitoring and troubleshooting efforts.

Implementing OTEL can significantly enhance an organization's observability capabilities, but it requires careful planning, the right technologies, and skilled teams to maximize its benefits.

Careers
We're hiring
2025 All rights reserved