Rework catalog configuration; add integrations documentation

Signed-off-by: Tim Hansen <timbonicus@gmail.com>
This commit is contained in:
Tim Hansen
2021-03-23 14:34:52 -06:00
parent 5da3fddb42
commit cc3dca079e
9 changed files with 320 additions and 106 deletions
+52
View File
@@ -0,0 +1,52 @@
---
id: discovery
title: GitHub Discovery
sidebar_label: Discovery
description: Documentation on GitHub organization discovery
---
The GitHub integration has a special discovery processor for discovering catalog
entities within a GitHub organization. The processor will crawl the GitHub
organization and register entities matching the configured path. This can be
useful as an alternative to static locations or manually adding things to the
catalog.
To use the discovery processor, you'll need a GitHub integration
[set up](locations.md) with a `GITHUB_TOKEN`. Then you can add a location target
to the catalog configuration:
```yaml
catalog:
locations:
- type: github-discovery
target: https://github.com/myorg/service-*/blob/main/catalog-info.yaml
```
Note the `github-discovery` type, as this is not a regular `url` processor.
The target is composed of three parts:
- The base organization URL, `https://github.com/myorg` in this case
- The repository blob to scan, which accepts \* wildcard tokens. This can simply
be `*` to scan all repositories in the organization. This example only looks
for repositories prefixed with `service-`.
- The path within each repository to find the catalog YAML file. This will
usually be `/blob/main/catalog-info.yaml`, `/blob/master/catalog-info.yaml` or
a similar variation for catalog files stored in the root directory of each
repository.
## GitHub API Rate Limits
GitHub
[rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting)
API requests to 5,000 per hour (or more for Enterprise accounts). The default
Backstage catalog backend refreshes data every 100 seconds, which issues an API
request for each discovered location.
This means if you have more than ~140 catalog entities, you may get throttled by
rate limiting. This will soon be resolved once catalog refreshes make use of
ETags; to work around this in the meantime, you can change the refresh rate of
the catalog in your `packages/backend/src/plugins/catalog.ts` file.
This is true for any method of adding GitHub entities to the catalog, but
especially easy to hit with automatic discovery.
+60
View File
@@ -0,0 +1,60 @@
---
id: locations
title: GitHub Locations
sidebar_label: Locations
description: Documentation on GitHub location integration
---
The GitHub integration supports loading catalog entities from github.com or
GitHub Enterprise. Components can be added to
[static catalog configuration](../../features/software-catalog/configuration.md),
registered with the
[catalog-import](https://github.com/backstage/backstage/tree/master/plugins/catalog-import)
plugin, or [discovered](discovery.md) from a GitHub organization. Users and
Groups can also be [loaded from an organization](org.md).
## Configuration
To use this integration, add configuration to your root `app-config.yaml`:
```yaml
integrations:
github:
- host: github.com
token:
$env: GITHUB_TOKEN
- host: ghe.example.net
apiBaseUrl: https://ghe.example.net/api/v3
rawBaseUrl: https://ghe.example.net/raw
token:
$env: GHE_TOKEN
```
> Note: A public GitHub provider is added automatically at startup for
> convenience, so you only need to list it if you want to supply a
> [token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token).
Directly under the `github` key is a list of provider configurations, where you
can list the various GitHub-compatible providers you want to be able to fetch
data from. Each entry is a structure with up to four elements:
- `host` (optional): The host of the location target that you want to match on.
The default host is `github.com`.
- `token` (optional): An authentication token as expected by GitHub. If
supplied, it will be passed along with all calls to this provider, both API
and raw. If it is not supplied, anonymous access will be used.
- `apiBaseUrl` (optional): If you want to communicate using the APIv3 method
with this provider, specify the base URL for its endpoint here, with no
trailing slash. Specifically when the target is GitHub, you can leave it out
to be inferred automatically. For a GitHub Enterprise installation, it is
commonly at `https://api.<host>` or `https://<host>/api/v3`.
- `rawBaseUrl` (optional): If you want to communicate using the raw HTTP method
with this provider, specify the base URL for its endpoint here, with no
trailing slash. Specifically when the target is public GitHub, you can leave
it out to be inferred automatically. For a GitHub Enterprise installation, it
is commonly at `https://api.<host>` or `https://<host>/api/v3`.
You need to supply either `apiBaseUrl` or `rawBaseUrl` or both (except for
public GitHub, for which we can infer them). The `apiBaseUrl` will always be
preferred over the other if a `token` is given, otherwise `rawBaseUrl` will be
preferred.