Files
backstage/beps/0012-metrics-service
2025-07-09 21:48:29 -06:00
..

title, status, authors, owners, project-areas, creation-date
title status authors owners project-areas creation-date
Backstage Metrics Service implementable
@kurtaking
@kurtaking
core-framework
observability
2025-06-23

Table of Contents

Summary

Add a core MetricsService to Backstage's framework to provide a unified interface for metrics instrumentation. The service offers OpenTelemetry-based capabilities with support for app configuration (e.g. app-config.yaml).

This approach leverages industry standards while focusing the MetricsService on distinct Backstage concerns, following the same pattern as other core services (DatabaseService builds on Knex, LoggerService builds on Winston, HttpRouterService builds on Express, etc.).

Motivation

There is no guidance when it comes to metrics instrumentation. While individual plugins may implement their own metrics, there's no standardized approach for metrics collection, naming, or configuration.

By providing a core metrics service:

  • Plugin Authors and the Community gain a straightforward way to address metrics instrumentation and can focus on business logic instead of needing to reimplement metrics plumbing.
  • Backstage Admins receive a reliable stream of metrics from the core system for monitoring, alerting, and troubleshooting.

Goals

  • Plugin-scoped metric namespacing
  • Consistent metrics patterns across all plugins
  • Aligned with OpenTelemetry industry standards
  • Provide a familiar interface as other core services
  • Standardized initialization and lifecycle management

The catalog and scaffolder plugins will be updated to use the new metrics service in the initial alpha release.

Non-Goals

  • Adding metrics to plugins that don't currently have it (outside of catalog and scaffolder)
  • Tracing and other telemetry concerns are out of scope for this BEP.
  • Refactoring the existing LoggerService. Future work to unify observability related concerns would be ideal, but not a goal.

Proposal

Following similar patterns to other core services, create a new RootMetricsService responsible for initializing the OpenTelemetry SDK, root-level concerns, and the creation of plugin-specific MetricsService instances. The root service delegates to the plugin-scoped MetricsService to initialize a meter for each registered plugin based on the service.name and optional service.version provided in the app's app-config.yaml file. This is based on the recommendation in the OpenTelemetry documentation.

Naming Conventions

All Backstage metrics follow this hierarchical pattern:

backstage.{scope}.{scope_name}.{metric_name}

Where:

  • backstage is the root namespace for all Backstage metrics
  • {scope} is the system scope (either plugin or core)
  • {scope_name} is the name of the plugin or core service (e.g., catalog, scaffolder, database)
  • {metric_name} is the hierarchical metric name as provided by the plugin author (e.g., entities.processed.total, tasks.completed.total)

Scope

The scope represents where it belongs in the Backstage ecosystem.

  • plugin - A plugin-specific metric (e.g. backstage.plugin.catalog.entities.count)
  • core - A core service metric (e.g. backstage.core.database.connections.active)

Plugin-Scoped Metrics

Pattern: backstage.plugin.{pluginId}.{metric_name}

# Examples
backstage.plugin.catalog.entities.processed.total
backstage.plugin.scaffolder.tasks.completed.total
backstage.plugin.techdocs.builds.active
backstage.plugin.auth.sessions.active.total # todo: technically a core service and a backend plugin

Core Metrics

Pattern: backstage.core.{core_service}.{metric_name}

# Examples
backstage.core.database.connections.active
backstage.core.scheduler.tasks.queued.total
backstage.core.httpRouter.requests.total

Design Details

References

Integration with OpenTelemetry Auto-Instrumentation

The RootMetricsService will automatically enable instrumentation for known libraries leveraged by the Backstage framework. Configuration will be provided to enable or disable auto-instrumentation via inclusion or exclusion lists.

  • Express
  • Knex
  • Winston
  • etc.

The MetricsService complements rather than duplicates auto-instrumentation by focusing on application-level metrics that only Backstage can provide. For example, the catalog plugin may want to track the number of entities processed by the refresh operation and the kind of entity being processed.

// Auto-instrumentation provides (automatically):
// - http.server.requests.total{method="GET", route="/catalog/entities", status_code="200"}
// - http.server.request.duration{method="GET", route="/catalog/entities"}

// MetricsService provides (manually):
const entityMetrics = metricsService.createCounter('entities.processed.total');
entityMetrics.add(entities.length, { operation: 'refresh', kind: 'Component' });

Configuration

// Not final, but this is the general idea...
interface MetricsConfig {
  enabled: boolean;

  resource: {
    serviceName?: string;
    serviceVersion?: string;
    environment?: string;
  };

  collection?: {
    exportIntervalMillis?: number;
    // ...
  };

  exporters: Array<{
    type: 'prometheus' | 'otlp' | 'console' | '...';
    config?: Record<string, any>;
  }>;

  autoInstrumentation: {
    enabled: boolean;
    include?: string[];
    exclude?: string[];
  };
}
backend:
  metrics:
    enabled: true

    resource:
      serviceName: backstage
      serviceVersion: 0.0.1
      environment: production

    # Collection settings
    collection:
      exportIntervalMillis: 15000

    exporters:
      - type: prometheus
        config:
          port: 9464
      # ...
      - type: console

    autoInstrumentation:
      enabled: true
      exclude: ['express']

Interface

Provide a wrapper around OpenTelemetry's API while re-exporting the types from the @opentelemetry/api package. This introduces concepts already familiar to both the Backstage community and those familiar with OpenTelemetry.

interface MetricsService {
  // Synchronous instrumentation
  createCounter(name: string, options?: MetricOptions): Counter;
  createUpDownCounter(name: string, options?: MetricOptions): UpDownCounter;
  createHistogram(name: string, options?: MetricOptions): Histogram;
  createGauge(name: string, options?: MetricOptions): Gauge;

  // Asynchronous instrumentation
  createObservableCounter(
    name: string,
    options?: MetricOptions,
  ): ObservableCounter;
  createObservableUpDownCounter(
    name: string,
    options?: MetricOptions,
  ): ObservableUpDownCounter;
  createObservableGauge(name: string, options?: MetricOptions): ObservableGauge;

  // Future - add additional convenience methods as we learn more about the needs of the framework
}

Root Metrics Service

The RootMetricsService is responsible for initializing the OpenTelemetry SDK and creating plugin-scoped metrics services. If the end user wants to initialize their own SDK, they are responsible for initializing the OpenTelemetry SDK with their own configuration. The RootMetricsService is responsible for providing metrics to other root services and creating plugin-scoped metrics services.

interface RootMetricsService {
  forPlugin(pluginId: string): MetricsService;
}

export const rootMetricsServiceFactory = createServiceFactory({
  // depends on as little as possible so that it can be initialized as early as possible.
  service: rootMetricsServiceRef,
  deps: {
    rootConfig: coreServices.rootConfig,
  },
  factory: ({ rootConfig }) => {
    return DefaultRootMetricsService.fromConfig(rootConfig);
  },
});
class DefaultRootMetricsService implements RootMetricsService {
  private sdk: NodeSDK;

  static fromConfig(config: Config): RootMetricsService {
    const metricsConfig = config.getOptionalConfig('backend.metrics');

    const sdk = new NodeSDK({
      resource: createResourceFromConfig(metricsConfig),
      instrumentations: [
        getNodeAutoInstrumentations({
          ...getAutoInstrumentationConfig(metricsConfig),
        }),
      ],
      metricReader: createMetricReadersFromConfig(metricsConfig),
    });

    sdk.start();

    return new DefaultRootMetricsService(sdk);
  }

  constructor(private sdk: NodeSDK) {}

  forPlugin(pluginId: string): MetricsService {
    return new PluginMetricsService(pluginId);
  }

  async shutdown(): Promise<void> {
    await this.sdk.shutdown();
  }
}

Plugin Metrics Service

Each plugin receives a metrics service that automatically namespaces all metrics to match the naming conventions.

const metricsServiceFactory = createServiceFactory({
  service: metricsServiceRef,
  deps: {
    rootMetrics: coreServices.rootMetrics,
    pluginMetadata: coreServices.pluginMetadata,
  },
  factory: ({ rootMetrics, pluginMetadata }) => {
    return rootMetrics.forPlugin(pluginMetadata.getId());
  },
});

class PluginMetricsService implements MetricsService {
  // ...
  constructor(private pluginId: string) {
    this.meter = metrics.getMeter(`backstage.plugin.${pluginId}`);
  }

  // ... other interface methods

  private prefixMetricName(name: string): string {
    return `backstage.plugin.${this.pluginId}.${name}`;
  }
}
Example
const entitiesProcessed = metricsService.createCounter(
  'entities.processed.total',
  {
    description: 'Total entities processed during refresh',
    unit: '{entity}',
  },
);

entitiesProcessed.add(100);

// ...
// metric is now available as `backstage.plugin.catalog.entities.processed.total`

Release Plan

  1. Create a new RootMetricsService that initializes the OpenTelemetry SDK and creates plugin-scoped metrics services.
  2. Create the plugin-scoped MetricsService that provides a metrics service for plugins.
  3. Create alpha-related documentation to add to existing core service docs.
  4. Release the metrics service under @alpha.
  5. Mark all existing metrics implementations as deprecated.
  6. Refactor catalog and scaffolder plugins to use the new (alpha) MetricsService.
  7. Offer a migration path for existing adopters to migrate to the new metrics service.
  8. Release the metrics service
  9. Update all documentation to reference the new metrics service.
  10. Create follow-up action items to integrate the new metrics service into the core system.
  11. Fully deprecate all existing metrics implementations like the existing Prometheus one-off implementations.

Deprecation Plan

TBD

Dependencies

  1. The root metrics service MUST BE initialized as EARLY as possible to prevent dependents from receiving no-op meters
  2. There are one-off implementations of metrics in the wild that may conflict with the proposed service. However, this is unlikely to be a problem as the SDK should continue to pick things up.

Alternatives

Plugin authors continue to implement their own metrics as they see fit.