* feat: add MetricsService alpha release Introduces MetricsService as a new @alpha core service wrapping @opentelemetry/api. Includes migration of existing catalog metrics to use the new service. Signed-off-by: benjdlambert <ben@blam.sh> * chore: duplicate otel types, add plugin-scoped factory and tests Signed-off-by: benjdlambert <ben@blam.sh> * chore: update BEP-0012 metrics service design Signed-off-by: Kurt King <kurtaking@gmail.com> * chore: address PR feedback from freben and rugvip Rename instrument types with MetricsService prefix for namespace clarity, move config to backend.metrics.plugin.{pluginId}, add config.d.ts schema, and improve factory test assertions. Signed-off-by: benjdlambert <ben@blam.sh> --------- Signed-off-by: benjdlambert <ben@blam.sh> Signed-off-by: Kurt King <kurtaking@gmail.com> Co-authored-by: Kurt King <kurtaking@gmail.com>
10 KiB
title, status, authors, owners, project-areas, creation-date
| title | status | authors | owners | project-areas | creation-date | |||
|---|---|---|---|---|---|---|---|---|
| Backstage Metrics Service | implementable |
|
|
|
2025-06-23 |
Table of Contents
- Summary
- Motivation
- Proposal
- Naming Conventions
- Design Details
- Release Plan
- Dependencies
- Alternatives
Summary
Add a core MetricsService to Backstage's framework to provide a unified interface for metrics instrumentation. The service offers industry standards (OTEL) while focusing the MetricsService on distinct Backstage concerns, following the same pattern as other core services (DatabaseService builds on Knex, LoggerService builds on Winston, HttpRouterService builds on Express, etc.).
Motivation
While individual plugins may implement their own metrics, there's no standardized approach leading to inconsistent metrics patterns across the ecosystem and incompatibility with OpenTelemetry semantic conventions. For example, a plugin implementing MCP functionality might incorrectly namespace metrics as backstage_mcp_client_duration when OpenTelemetry semantic conventions explicitly define mcp.client.operation.duration as the standard.
By providing a core metrics service:
- Plugin Authors and the Community gain a straightforward way to address metrics instrumentation and can focus on business logic instead of needing to reimplement metrics plumbing.
- Backstage Admins receive a reliable stream of metrics from the core system for monitoring, alerting, and troubleshooting.
Goals
- Plugin identification via OpenTelemetry Instrumentation Scope
- Consistent metrics patterns across all plugins
- Aligned with OpenTelemetry industry standards
- Provide a familiar interface as other core services
The catalog and scaffolder plugins will be updated to use the new metrics service in the initial alpha release.
Non-Goals
- Providing a way to configure the OpenTelemetry SDK. This is out of scope for this BEP.
- Adding metrics to plugins missing existing metrics (outside of catalog and scaffolder)
- Tracing and other telemetry concerns are out of scope for this BEP.
- Refactoring the existing
LoggerService. Future work to unify observability related concerns would be ideal, but not a goal.
Proposal
Following similar patterns to other core services, create a new RootMetricsService responsible for root-level concerns and the creation of plugin-specific MetricsService instances.
Naming Conventions
All Backstage metrics follow this hierarchical pattern:
backstage.{scope}.{scope_name}.{metric_name}
Where:
backstageis the root namespace for all Backstage metrics{scope}is the system scope (either plugin or core){scope_name}is the name of the plugin or core service (e.g.,catalog,scaffolder,database,scheduler){metric_name}is the hierarchical metric name as provided by the plugin author (e.g.,entity.count,tasks.completed.total)
Scope
The scope represents where it belongs in the Backstage ecosystem.
plugin- A plugin-specific metric (e.g.backstage.plugin.catalog.entity.count)core- A metric provided by the core system (e.g.backstage.core.database.connections.active)
Plugin-Scoped Metrics
Pattern: backstage.plugin.{pluginId}.{metric_name}
# Examples
backstage.plugin.catalog.entities.processed.total
backstage.plugin.scaffolder.tasks.completed.total
backstage.plugin.techdocs.builds.active
backstage.plugin.auth.sessions.active.total # todo: technically a core service and a backend plugin
Core-Scoped Metrics
Pattern: backstage.core.{service}.{metric_name}
# Examples
backstage.core.database.connections.active
backstage.core.scheduler.tasks.queued.total
backstage.core.httpRouter.requests.total
Design Details
References
Integration with OpenTelemetry Auto-Instrumentation
The MetricsService complements rather than duplicates auto-instrumentation by focusing on application-level metrics that only Backstage can provide. For example, the catalog plugin may want to track the number of entities processed by the refresh operation and the kind of entity being processed.
// Auto-instrumentation provides (automatically):
// - http.server.requests.total{method="GET", route="/catalog/entities", status_code="200"}
// - http.server.request.duration{method="GET", route="/catalog/entities"}
// MetricsService provides (manually):
const entityMetrics = metricsService.createCounter('entities.processed.total');
entityMetrics.add(entities.length, {
operation: 'refresh',
'entity.kind': 'Component',
});
// Metric is now available as `entities.processed.total`
Configuration
A challenging factor of only introducing a MetricsService is the need to collect other OTEL-related configuration such as resources, tracing providers, views, and more prior to starting the SDK. This means that in order to introduce a MetricsService, we must support all OTEL Node SDK configuration along with it. Along with this, the official recommendation from the OTEL team is to not initialize and start the SDK on behalf of the user.
With this, we will not include any configuration as part of this BEP. Users will be responsible for initializing the SDK based on the current guidance
Interface
Provide a wrapper around OpenTelemetry's API while leveraging the types from the @opentelemetry/api package. This introduces concepts already familiar to both the Backstage community and those familiar with OpenTelemetry.
interface MetricsService {
// Synchronous instrumentation
createCounter(name: string, options?: MetricOptions): Counter;
createUpDownCounter(name: string, options?: MetricOptions): UpDownCounter;
createHistogram(name: string, options?: MetricOptions): Histogram;
createGauge(name: string, options?: MetricOptions): Gauge;
// Asynchronous instrumentation
createObservableCounter(
name: string,
options?: MetricOptions,
): ObservableCounter;
createObservableUpDownCounter(
name: string,
options?: MetricOptions,
): ObservableUpDownCounter;
createObservableGauge(name: string, options?: MetricOptions): ObservableGauge;
// Future - add additional convenience methods as we learn more about the needs of the framework
}
Plugin Metrics Service
Each plugin receives a metrics service that automatically configures the Instrumentation Scope to identify the plugin. The scope name follows the pattern backstage-plugin-{pluginId}.
export const metricsServiceFactory = createServiceFactory({
service: coreServices.metrics,
deps: {
pluginMetadata: coreServices.pluginMetadata,
},
factory: ({ pluginMetadata }) => {
const pluginId = pluginMetadata.getId();
const scopeName = `backstage-plugin-${pluginId}`;
return new DefaultMetricsService(scopeName, version, ...);
},
});
Example
const entitiesProcessed = metricsService.createCounter(
'entities.processed.total',
{
description: 'Total entities processed during refresh',
unit: '{entity}',
},
);
entitiesProcessed.add(100);
// ...
// metric is now available as `backstage.plugin.catalog.entities.processed.total`
Release Plan
- Create the new metrics-related services.
- Create alpha-related documentation to add to existing core service docs.
- Release the metrics service under
@alpha. - Mark all existing metrics implementations as deprecated.
- Refactor catalog and scaffolder plugins to use the new (alpha)
MetricsService. - Offer a migration path for existing adopters to migrate to the new metrics service.
- Release the metrics service under
@public - Update remaining documentation to reference the new metrics service.
- Create follow-up action items to integrate the new metrics service into the core system.
- Fully deprecate all existing metrics implementations like the existing Prometheus one-off implementations.
Deprecation Plan
- Deprecation warning are added to all existing metrics
- New metrics will run in parallel with the deprecated ones for a period of time
- All existing metrics are removed from the codebase in the next major version
Dependencies
- The
otelSDK MUST BE initialized as EARLY as possible to prevent dependents from receiving no-op meters - we will not change the current guidance on this - There are one-off implementations of metrics in the wild that may conflict with the proposed service. However, this is unlikely to be a problem as the SDK should continue to pick things up.
Alternatives
- Plugin authors continue to implement their own metrics as they see fit.
- A combined TelemetryService that provides both metrics and tracing.
Rejected: Forced Namespace Prefixes
Prepend backstage.plugin.{pluginId}. to all metric names. This was the original proposal but conflicts with OpenTelemetry semantic conventions.
Problems:
- Makes it impossible to use standard semantic conventions like
mcp.*,gen_ai.*,http.* - Breaks compatibility with industry-standard observability tooling
- Prevents cross-service metric aggregation
- Goes against OpenTelemetry best practices and official guidance
Example of conflict:
// Plugin wants to emit: mcp.client.operation.duration
// Framework forces: backstage.plugin.mcp-actions.mcp.client.operation.duration
// This violates the semantic convention and breaks tooling