title, status, authors, owners, project-areas, creation-date
| title | status | authors | owners | project-areas | creation-date | |||
|---|---|---|---|---|---|---|---|---|
| Backstage Metrics Service | implementable |
|
|
|
2025-06-23 |
Table of Contents
- Summary
- Motivation
- Proposal
- Naming Conventions
- Design Details
- Release Plan
- Dependencies
- Alternatives
Summary
Add a core MetricsService to Backstage's framework to provide a unified interface for metrics instrumentation. The service offers industry standards (OTEL) while focusing the MetricsService on distinct Backstage concerns, following the same pattern as other core services (DatabaseService builds on Knex, LoggerService builds on Winston, HttpRouterService builds on Express, etc.).
Motivation
While individual plugins may implement their own metrics, there's no standardized approach leading to inconsistent metrics patterns across the ecosystem. For example, both catalog_entities_count and catalog.processed.entities.count are examples of existing metric patterns. Ideally, these would be standardized to backstage.plugin.catalog.entities.count and backstage.plugin.catalog.entities.processed.total respectively.
By providing a core metrics service:
- Plugin Authors and the Community gain a straightforward way to address metrics instrumentation and can focus on business logic instead of needing to reimplement metrics plumbing.
- Backstage Admins receive a reliable stream of metrics from the core system for monitoring, alerting, and troubleshooting.
Goals
- Plugin-scoped metric namespacing
- Consistent metrics patterns across all plugins
- Aligned with OpenTelemetry industry standards
- Provide a familiar interface as other core services
The catalog and scaffolder plugins will be updated to use the new metrics service in the initial alpha release.
Non-Goals
- Providing a way to configure the OpenTelemetry SDK. This is out of scope for this BEP.
- Adding metrics to plugins missing existing metrics (outside of catalog and scaffolder)
- Tracing and other telemetry concerns are out of scope for this BEP.
- Refactoring the existing
LoggerService. Future work to unify observability related concerns would be ideal, but not a goal.
Proposal
Following similar patterns to other core services, create a new RootMetricsService responsible for root-level concerns and the creation of plugin-specific MetricsService instances.
Naming Conventions
All Backstage metrics follow this hierarchical pattern:
backstage.{scope}.{scope_name}.{metric_name}
Where:
backstageis the root namespace for all Backstage metrics{scope}is the system scope (either plugin or core){scope_name}is the name of the plugin or core service (e.g.,catalog,scaffolder,database,scheduler){metric_name}is the hierarchical metric name as provided by the plugin author (e.g.,entity.count,tasks.completed.total)
Scope
The scope represents where it belongs in the Backstage ecosystem.
plugin- A plugin-specific metric (e.g.backstage.plugin.catalog.entity.count)core- A metric provided by the core system (e.g.backstage.core.database.connections.active)
Plugin-Scoped Metrics
Pattern: backstage.plugin.{pluginId}.{metric_name}
# Examples
backstage.plugin.catalog.entities.processed.total
backstage.plugin.scaffolder.tasks.completed.total
backstage.plugin.techdocs.builds.active
backstage.plugin.auth.sessions.active.total # todo: technically a core service and a backend plugin
Core-Scoped Metrics
Pattern: backstage.core.{service}.{metric_name}
# Examples
backstage.core.database.connections.active
backstage.core.scheduler.tasks.queued.total
backstage.core.httpRouter.requests.total
Design Details
References
Integration with OpenTelemetry Auto-Instrumentation
The MetricsService complements rather than duplicates auto-instrumentation by focusing on application-level metrics that only Backstage can provide. For example, the catalog plugin may want to track the number of entities processed by the refresh operation and the kind of entity being processed.
// Auto-instrumentation provides (automatically):
// - http.server.requests.total{method="GET", route="/catalog/entities", status_code="200"}
// - http.server.request.duration{method="GET", route="/catalog/entities"}
// MetricsService provides (manually):
const entityMetrics = metricsService.createCounter('entities.processed.total');
entityMetrics.add(entities.length, { operation: 'refresh', kind: 'Component' });
// Metric is now available as `backstage.plugin.catalog.entities.processed.total`
Configuration
A challenging factor of only introducing a MetricsService is the need to collect other OTEL-related configuration such as resources, tracing providers, views, and more prior to starting the SDK. This means that in order to introduce a MetricsService, we must support all OTEL Node SDK configuration along with it. Along with this, the official recommendation from the OTEL team is to not initialize and start the SDK on behalf of the user.
With this, we will not include any configuration as part of this BEP. Users will be responsible for initializing the SDK based on the current guidance
Interface
Provide a wrapper around OpenTelemetry's API while leveraging the types from the @opentelemetry/api package. This introduces concepts already familiar to both the Backstage community and those familiar with OpenTelemetry.
interface MetricsService {
// Synchronous instrumentation
createCounter(name: string, options?: MetricOptions): Counter;
createUpDownCounter(name: string, options?: MetricOptions): UpDownCounter;
createHistogram(name: string, options?: MetricOptions): Histogram;
createGauge(name: string, options?: MetricOptions): Gauge;
// Asynchronous instrumentation
createObservableCounter(
name: string,
options?: MetricOptions,
): ObservableCounter;
createObservableUpDownCounter(
name: string,
options?: MetricOptions,
): ObservableUpDownCounter;
createObservableGauge(name: string, options?: MetricOptions): ObservableGauge;
// Future - add additional convenience methods as we learn more about the needs of the framework
}
Root Metrics Service
The RootMetricsService is responsible for providing metrics to other root services and creating both plugin-scoped and core-scoped MetricsService instances.
interface RootMetricsService {
// note: no config is provided to the root service.
static forRoot(): RootMetricsService;
forPlugin(pluginId: string): MetricsService;
// final implementation will be similar to
forService(serviceName: string, scope: 'plugin' | 'core'): MetricsService;
}
export const rootMetricsServiceFactory = createServiceFactory({
// depends on as little as possible so that it can be initialized as early as possible.
service: rootMetricsServiceRef,
deps: {},
factory: () => {
return DefaultRootMetricsService.forRoot();
},
});
Plugin Metrics Service
Each plugin receives a metrics service that automatically namespaces all metrics to match the naming conventions.
const metricsServiceFactory = createServiceFactory({
service: metricsServiceRef,
deps: {
rootMetrics: coreServices.rootMetrics,
pluginMetadata: coreServices.pluginMetadata,
},
factory: ({ rootMetrics, pluginMetadata }) => {
return rootMetrics.forPlugin(pluginMetadata.getId());
},
});
Example
const entitiesProcessed = metricsService.createCounter(
'entities.processed.total',
{
description: 'Total entities processed during refresh',
unit: '{entity}',
},
);
entitiesProcessed.add(100);
// ...
// metric is now available as `backstage.plugin.catalog.entities.processed.total`
Release Plan
- Create the new metrics-related services.
- Create alpha-related documentation to add to existing core service docs.
- Release the metrics service under
@alpha. - Mark all existing metrics implementations as deprecated.
- Refactor catalog and scaffolder plugins to use the new (alpha)
MetricsService. - Offer a migration path for existing adopters to migrate to the new metrics service.
- Release the metrics service under
@public - Update remaining documentation to reference the new metrics service.
- Create follow-up action items to integrate the new metrics service into the core system.
- Fully deprecate all existing metrics implementations like the existing Prometheus one-off implementations.
Deprecation Plan
- Deprecation warning are added to all existing metrics
- New metrics will run in parallel with the deprecated ones for a period of time
- All existing metrics are removed from the codebase in the next major version
Dependencies
- The
otelSDK MUST BE initialized as EARLY as possible to prevent dependents from receiving no-op meters - we will not change the current guidance on this - There are one-off implementations of metrics in the wild that may conflict with the proposed service. However, this is unlikely to be a problem as the SDK should continue to pick things up.
Alternatives
- Plugin authors continue to implement their own metrics as they see fit.
- A combined TelemetryService that provides both metrics and tracing.