Software Catalog Management
Maintain and evolve a Backstage software catalog based on the Backstage System Model.
This skill helps you create, update, and organize catalog entities following Backstage best practices.
Overview
The Backstage catalog models software using these core concepts:
Core Entities:
- •Components - Individual pieces of software (services, websites, libraries)
- •APIs - Boundaries between components (OpenAPI, GraphQL, gRPC, AsyncAPI)
- •Resources - Infrastructure needed to operate components (databases, queues, buckets)
Organizational Entities:
- •Users - People (employees, contractors)
- •Groups - Teams, business units, or interest groups
Ecosystem Modeling:
- •Systems - Collections of components/APIs that cooperate to perform a function
- •Domains - Business-aligned groupings of systems (bounded contexts)
When to Use This Skill
Use this skill when you need to:
- •Add a new component, API, or system to the catalog
- •Update entity metadata, ownership, or relationships
- •Reorganize the catalog structure (systems, domains)
- •Define API contracts and their providers/consumers
- •Model organizational structure (groups, teams)
- •Document dependencies and relationships
- •Validate catalog consistency
Automated Catalog Generation
IMPORTANT: Always look for opportunities to generate catalog entities automatically from existing project metadata. This ensures the catalog stays synchronized with the actual codebase and reduces manual maintenance burden.
When to Generate Catalog Entities
You should proactively suggest creating catalog generation automation when you encounter:
Dependency Management Files:
- •
devbox.json- Development environment packages (existing example in this repo) - •
package.json- Node.js/npm dependencies - •
go.mod- Go module dependencies - •
requirements.txt/pyproject.toml- Python dependencies - •
Gemfile- Ruby dependencies - •
pom.xml/build.gradle- Java/Maven/Gradle dependencies - •
Cargo.toml- Rust dependencies - •
composer.json- PHP dependencies
Infrastructure as Code:
- •Terraform modules (
.tffiles) - Generate Resource entities - •CloudFormation templates - Generate Resource entities
- •Kubernetes manifests - Generate Component and Resource entities
- •Helm charts - Generate System and Component entities
- •CDK constructs - Generate Resource entities from stacks
API Definitions:
- •OpenAPI/Swagger specs (
openapi.yaml,swagger.json) - Generate API entities - •GraphQL schemas (
schema.graphql) - Generate API entities - •gRPC proto files (
*.proto) - Generate API entities - •AsyncAPI specs - Generate API entities
Service Definitions:
- •Docker Compose files (
docker-compose.yml) - Generate Components and Resources - •Kubernetes Deployments - Generate Components
- •Service mesh configs (Istio, Linkerd) - Generate Components and APIs
Repository Metadata:
- •CODEOWNERS files - Infer ownership for Components
- •GitHub Actions workflows - Document CI/CD integrations
- •Monorepo structure - Generate multiple Components from subdirectories
Configuration Files:
- •Backstage entity files in subdirectories - Aggregate into main catalog
- •Microservice registry files - Generate Components
- •Service discovery configs - Generate Components and APIs
Catalog Generation Workflow
When you identify a generation opportunity, follow this pattern:
- •
Detect the source
- •Scan for dependency/config files in the project
- •Example:
find . -name "package.json" -not -path "*/node_modules/*"
- •
Parse the source
- •Extract relevant metadata (package names, versions, types)
- •Example:
jq '.dependencies | keys[]' package.json
- •
Generate catalog entities
- •Create Component entities for dependencies
- •Create Resource entities for infrastructure
- •Create API entities for specifications
- •Tag with source metadata
- •Add annotations linking back to source
- •
Create automation script
- •Add generator script to
scripts/directory - •Make it executable and well-documented
- •Include error handling and validation
- •Add generator script to
- •
Integrate with build system
- •Add to CI/CD pipeline or Task runner (Taskfile.yaml)
- •Schedule regular regeneration
- •Track changes in version control
- •
Document the process
- •Add README explaining automation
- •Document how to regenerate manually
- •Note source of truth (the original file)
Reference: Existing Devbox Pattern
The repository already has automation for parsing devbox.json and generating Component entities for development tools. Use this as a reference template for creating other dependency generators.
What the devbox generator does:
- •Parses
devbox.jsonpackages array - •Creates Component entities for each development tool
- •Tags them with appropriate metadata
- •Links to upstream documentation
- •Organizes under the
local-devsystem
Apply this same pattern to:
- •
package.json→ npm dependencies - •
go.mod→ Go module dependencies - •
requirements.txt→ Python packages - •
Cargo.toml→ Rust crates
Example Generators
Node.js Dependencies (package.json)
When you encounter package.json, create a generator:
Common Workflows
Workflow 1: Adding a New Component
Input: Component details (name, type, owner, etc.)
Steps:
- •
Gather component information
- •Ask for: name, description, type, lifecycle, owner
- •Identify: system membership, APIs provided/consumed, dependencies
- •
Determine catalog location
- •If component has its own repo: create
catalog-info.yamlin repo root - •If multi-component repo: create in component subdirectory
- •If centralized catalog: add to central catalog file
- •If component has its own repo: create
- •
Create catalog entity
yamlapiVersion: backstage.io/v1alpha1 kind: Component metadata: name: <kebab-case-name> description: <brief-description> tags: - <language> - <framework> annotations: github.com/project-slug: <org>/<repo> spec: type: <service|website|library> lifecycle: <experimental|production|deprecated> owner: <team-name> system: <system-name> # optional providesApis: - <api-name> # optional consumesApis: - <api-name> # optional dependsOn: - component:<component-name> - resource:<resource-name> - •
Validate relationships
- •Ensure owner (Group) exists in catalog
- •Ensure system exists if specified
- •Ensure referenced APIs exist
- •Ensure dependencies are valid entity references
- •
Add well-known annotations (as applicable)
- •
github.com/project-slug- GitHub repository - •
backstage.io/techdocs-ref- TechDocs location - •
sonarqube.org/project-key- SonarQube project - •
pagerduty.com/service-id- PagerDuty service - •
sentry.io/project-slug- Sentry project
- •
- •
Commit and register
- •Commit the catalog file
- •Register location in Backstage (if needed)
Workflow 2: Defining an API
Input: API details and specification
Steps:
- •
Gather API information
- •Ask for: name, type (openapi/graphql/grpc/asyncapi), lifecycle, owner
- •Obtain: API definition/specification
- •Identify: which components provide/consume it
- •
Create API entity
yamlapiVersion: backstage.io/v1alpha1 kind: API metadata: name: <api-name> description: <api-description> spec: type: <openapi|graphql|grpc|asyncapi> lifecycle: <experimental|production|deprecated> owner: <team-name> system: <system-name> # optional definition: | <api-spec-content> # Or use $text substitution: # definition: # $text: ./api-spec.yaml - •
Link to components
- •Update components that provide this API: add to
providesApis - •Update components that consume this API: add to
consumesApis
- •Update components that provide this API: add to
- •
Validate API spec
- •Ensure definition is valid for the specified type
- •Consider using
$textor$yamlsubstitution for external files
Workflow 3: Creating a System
Input: System details and component membership
Steps:
- •
Define system scope
- •Ask for: name, description, owner, domain
- •Identify: components that belong to this system
- •Identify: public APIs the system exposes
- •
Create system entity
yamlapiVersion: backstage.io/v1alpha1 kind: System metadata: name: <system-name> description: <system-description> spec: owner: <team-name> domain: <domain-name> # optional
- •
Update component memberships
- •For each component in the system: add
system: <system-name>to spec - •This creates
partOfrelations automatically
- •For each component in the system: add
- •
Document system boundaries
- •Ensure public APIs are defined
- •Private/internal APIs can remain implicit
Workflow 4: Organizing with Domains
Input: Domain structure and business alignment
Steps:
- •
Define domain
yamlapiVersion: backstage.io/v1alpha1 kind: Domain metadata: name: <domain-name> description: <business-area-description> spec: owner: <team-name>
- •
Assign systems to domain
- •Update systems: add
domain: <domain-name>to spec
- •Update systems: add
- •
Create domain hierarchy (if needed)
- •Domains can be subdomains of other domains
- •Use naming:
parent-domain/subdomain
Workflow 5: Modeling Organizational Structure
Input: Team/group structure
Steps:
- •
Create groups
yamlapiVersion: backstage.io/v1alpha1 kind: Group metadata: name: <team-name> description: <team-description> spec: type: <team|business-unit|product-area> profile: displayName: <Human Readable Name> email: <team-email> picture: <team-avatar-url> parent: <parent-group> # optional children: [] members: - <user-id> - •
Create users (if needed)
yamlapiVersion: backstage.io/v1alpha1 kind: User metadata: name: <user-id> spec: profile: displayName: <Full Name> email: <email> picture: <avatar-url> memberOf: - <team-name> - •
Model hierarchy
- •Use
parentandchildrenfor group hierarchy - •Supports multi-root hierarchies
- •Use
Workflow 6: Adding Resources
Input: Infrastructure resource details
Steps:
- •
Create resource entity
yamlapiVersion: backstage.io/v1alpha1 kind: Resource metadata: name: <resource-name> description: <resource-description> spec: type: <database|queue|storage|cdn|...> owner: <team-name> system: <system-name> # optional dependencyOf: - component:<component-name> - •
Link dependencies
- •Update components that depend on this resource
- •Add to component's
dependsOnlist
Catalog File Structure
Single-File Catalog
All entities in one file (good for small catalogs):
apiVersion: backstage.io/v1alpha1 kind: System metadata: name: my-system spec: owner: team-a --- apiVersion: backstage.io/v1alpha1 kind: Component metadata: name: my-service spec: type: service owner: team-a system: my-system
Distributed Catalog
Each component in its own repo with catalog-info.yaml:
repo-a/ catalog-info.yaml # Component A repo-b/ catalog-info.yaml # Component B central-catalog/ catalog-info.yaml # Systems, Domains, Groups
Entity Reference Format
When referencing other entities, use these formats:
- •Fully qualified:
<kind>:<namespace>/<name> - •With default namespace:
<kind>:<name>(assumesdefaultnamespace) - •With default kind:
<name>(kind depends on context)
Examples:
- •
component:default/my-service - •
group:platform-team - •
my-api(in context where kind is obvious)
Well-Known Annotations
Add these annotations for integrations:
Source Control
- •
github.com/project-slug: <org>/<repo> - •
gitlab.com/project-slug: <group>/<project> - •
bitbucket.org/project-key: <project>
CI/CD
- •
circleci.com/project-slug: <vcs>/<org>/<repo> - •
jenkins.io/job-full-name: <folder>/<job>
Monitoring & Alerting
- •
pagerduty.com/integration-key: <key> - •
pagerduty.com/service-id: <id> - •
sentry.io/project-slug: <project>
Quality & Security
- •
sonarqube.org/project-key: <key> - •
snyk.io/org-id: <org-id>
Documentation
- •
backstage.io/techdocs-ref: dir:.
Other
- •
backstage.io/time-saved: PT8H(for templates)
Validation Checklist
Before committing catalog changes:
- • All required fields present (
apiVersion,kind,metadata.name,spec.owner, etc.) - • Entity names follow naming rules (kebab-case, max 63 chars)
- • All entity references are valid (owner, system, dependencies)
- • Owner group/user exists in catalog
- • System exists if specified
- • API definitions are valid for their type
- • Annotations use proper format
- • Tags are lowercase and follow format rules
- • No circular dependencies
- • Relations make semantic sense
Common Patterns
Monorepo with Multiple Components
# catalog-info.yaml in repo root
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: frontend
spec:
type: website
owner: team-a
system: my-system
consumesApis:
- backend-api
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: backend
spec:
type: service
owner: team-a
system: my-system
providesApis:
- backend-api
dependsOn:
- resource:postgres-db
API-First Design
# 1. Define API first
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: payment-api
spec:
type: openapi
lifecycle: production
owner: payments-team
definition:
$text: ./openapi.yaml
---
# 2. Define provider component
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-service
spec:
type: service
owner: payments-team
providesApis:
- payment-api
Hierarchical Teams
apiVersion: backstage.io/v1alpha1
kind: Group
metadata:
name: engineering
spec:
type: business-unit
children:
- platform-team
- product-team
---
apiVersion: backstage.io/v1alpha1
kind: Group
metadata:
name: platform-team
spec:
type: team
parent: engineering
members:
- alice
- bob
Substitutions
Use substitutions to reference external files:
Text Substitution
spec:
definition:
$text: https://example.com/api.yaml
# or
$text: ./api-spec.yaml
JSON Substitution
metadata:
annotations:
config:
$json: ./config.json
YAML Substitution
spec:
definition:
$yaml: ./definition.yaml
Note: Configure backend.reading.allow for external URLs:
backend:
reading:
allow:
- host: example.com
Guardrails
- •Entity Names: Must be unique per kind within a namespace
- •Naming Convention: Use kebab-case for names
- •Ownership: Every Component, API, System must have an owner
- •Lifecycle: Use standard values:
experimental,production,deprecated - •Types: Establish organizational taxonomy for component types
- •Relations: Ensure bidirectional consistency (managed automatically by Backstage)
- •Namespaces: Use
defaultunless you need isolation - •File Location: Prefer
catalog-info.yamlas filename
Integration with OpenSpec
When creating components from OpenSpec changes:
- •After change completion: Create/update catalog entry
- •System assignment: Map to appropriate system based on capability
- •API documentation: If change includes API, create API entity
- •Dependencies: Document in catalog based on design artifact
- •Track in td:
td log "Added <name> to software catalog"
Output Examples
Success - New Component
## Component Added to Catalog **Name:** payment-service **Type:** service **Owner:** payments-team **System:** payment-processing **Location:** `./catalog-info.yaml` **Relations:** - Part of system: payment-processing - Provides API: payment-api - Depends on: postgres-db (resource) Commit this file and register the location in Backstage.
Success - System Created
## System Created **Name:** payment-processing **Domain:** finance **Owner:** payments-team **Components:** - payment-service - payment-gateway - payment-reconciliation **Public APIs:** - payment-api - webhook-api Updated 3 components to reference this system.
Validation Error
## Validation Failed **Entity:** payment-service (Component) **Issues:** - Owner "payments-team" does not exist in catalog - System "payment-system" does not exist - API reference "payment-api" not found **Next Steps:** 1. Create group: payments-team 2. Create system: payment-system 3. Create API entity: payment-api
Tips for AI Agents
- •Always validate references: Check that owner, system, APIs exist before creating
- •Use consistent naming: Follow project conventions for entity names
- •Document decisions: Add descriptions and tags to aid discovery
- •Think in layers: Domain → System → Component → API
- •Model dependencies: Make relationships explicit
- •Keep it updated: Catalog reflects current state, not desired state
- •Use annotations: Connect catalog to external systems (GitHub, PagerDuty, etc.)
- •Start simple: Can always add more detail later
#!/bin/bash
# scripts/generate-npm-catalog.sh
# Generate catalog entities from package.json dependencies
PACKAGE_JSON="${1:-package.json}"
OUTPUT_DIR="${2:-./catalog/generated/npm}"
mkdir -p "$OUTPUT_DIR"
jq -r '.dependencies // {} | keys[]' "$PACKAGE_JSON" | while read -r pkg; do
version=$(jq -r ".dependencies[\"$pkg\"]" "$PACKAGE_JSON")
entity_name=$(echo "npm-$pkg" | tr '/' '-' | tr '@' '-')
cat > "$OUTPUT_DIR/${entity_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: ${entity_name}
description: NPM dependency ${pkg}
annotations:
backstage.io/generated: "true"
backstage.io/source-file: ${PACKAGE_JSON}
npm.org/package: ${pkg}
npm.org/version: ${version}
tags:
- npm
- dependency
- javascript
spec:
type: library
lifecycle: production
owner: contributors
system: local-dev
ENTITY
done
echo "Generated NPM catalog entities in $OUTPUT_DIR"
Add to Taskfile.yaml:
catalog:generate:npm:
desc: Generate catalog from package.json
cmds:
- ./scripts/generate-npm-catalog.sh
Go Dependencies (go.mod)
When you encounter go.mod, create a generator:
#!/bin/bash
# scripts/generate-go-catalog.sh
# Generate catalog entities from go.mod dependencies
GO_MOD="${1:-go.mod}"
OUTPUT_DIR="${2:-./catalog/generated/go}"
mkdir -p "$OUTPUT_DIR"
awk '/^require /,/^\)/' "$GO_MOD" | grep -E '^\s+[a-z]' | while read -r line; do
pkg=$(echo "$line" | awk '{print $1}')
version=$(echo "$line" | awk '{print $2}')
entity_name=$(echo "go-$pkg" | tr '/' '-' | tr '.' '-')
cat > "$OUTPUT_DIR/${entity_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: ${entity_name}
description: Go module ${pkg}
annotations:
backstage.io/generated: "true"
backstage.io/source-file: ${GO_MOD}
go.dev/module: ${pkg}
go.dev/version: ${version}
tags:
- go
- dependency
- module
spec:
type: library
lifecycle: production
owner: contributors
system: local-dev
ENTITY
done
echo "Generated Go module catalog entities in $OUTPUT_DIR"
Terraform Resources
When you encounter .tf files, create a generator:
#!/bin/bash
# scripts/generate-terraform-catalog.sh
# Generate Resource entities from Terraform configs
TF_DIR="${1:-.}"
OUTPUT_DIR="${2:-./catalog/generated/terraform}"
mkdir -p "$OUTPUT_DIR"
find "$TF_DIR" -name "*.tf" -not -path "*/.*" -exec grep -h "^resource" {} \; | while read -r line; do
resource_type=$(echo "$line" | awk '{print $2}' | tr -d '"')
resource_name=$(echo "$line" | awk '{print $3}' | tr -d '"')
entity_name=$(echo "${resource_type}-${resource_name}" | tr '_' '-')
cat > "$OUTPUT_DIR/${entity_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: ${entity_name}
description: Terraform ${resource_type} resource
annotations:
backstage.io/generated: "true"
backstage.io/source-file: terraform/*.tf
terraform.io/resource-type: ${resource_type}
terraform.io/resource-name: ${resource_name}
tags:
- terraform
- infrastructure
spec:
type: infrastructure
owner: contributors
system: cloud-ops
ENTITY
done
echo "Generated Terraform Resource entities in $OUTPUT_DIR"
OpenAPI Specifications
When you encounter OpenAPI/Swagger specs, create a generator:
#!/bin/bash
# scripts/generate-openapi-catalog.sh
# Generate API entities from OpenAPI specifications
SPEC_FILE="$1"
OUTPUT_DIR="${2:-./catalog/generated/apis}"
[ ! -f "$SPEC_FILE" ] && echo "Error: $SPEC_FILE not found" && exit 1
mkdir -p "$OUTPUT_DIR"
api_title=$(yq eval '.info.title' "$SPEC_FILE" 2>/dev/null || jq -r '.info.title' "$SPEC_FILE")
api_version=$(yq eval '.info.version' "$SPEC_FILE" 2>/dev/null || jq -r '.info.version' "$SPEC_FILE")
api_name=$(echo "$api_title" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
cat > "$OUTPUT_DIR/api-${api_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: ${api_name}
description: ${api_title}
annotations:
backstage.io/generated: "true"
backstage.io/source-file: ${SPEC_FILE}
openapi.org/version: ${api_version}
tags:
- openapi
- rest
spec:
type: openapi
lifecycle: production
owner: contributors
definition:
\$text: ${SPEC_FILE}
ENTITY
echo "Generated API entity: ${api_name}"
Detection Pattern for AI Agents
Proactively scan for catalog opportunities during ANY task:
# Quick detection function
detect_catalog_opportunities() {
echo "🔍 Scanning for catalog generation opportunities..."
[ -f "package.json" ] && echo "✓ package.json → suggest npm catalog generation"
[ -f "go.mod" ] && echo "✓ go.mod → suggest Go module catalog generation"
[ -f "requirements.txt" ] && echo "✓ requirements.txt → suggest Python catalog"
[ -f "pyproject.toml" ] && echo "✓ pyproject.toml → suggest Python catalog"
[ -f "Cargo.toml" ] && echo "✓ Cargo.toml → suggest Rust catalog"
[ -f "Gemfile" ] && echo "✓ Gemfile → suggest Ruby catalog"
find . -name "*.tf" -not -path "*/.*" 2>/dev/null | head -1 | grep -q . && \
echo "✓ Terraform files → suggest Resource catalog generation"
find . \( -name "openapi*.yaml" -o -name "swagger*.json" \) 2>/dev/null | head -1 | grep -q . && \
echo "✓ OpenAPI specs → suggest API catalog generation"
find . -name "docker-compose*.yml" 2>/dev/null | head -1 | grep -q . && \
echo "✓ Docker Compose → suggest service catalog generation"
find . -name "*.proto" -not -path "*/.*" 2>/dev/null | head -1 | grep -q . && \
echo "✓ gRPC protos → suggest API catalog generation"
}
# Run this during task initialization
detect_catalog_opportunities
When to Suggest Generation
Trigger Points:
- •
After creating a new service/component
- •Check for dependency files
- •Suggest generating dependency catalog entries
- •
After adding infrastructure
- •Scan for IaC files
- •Generate Resource entities
- •
After defining an API
- •Check for OpenAPI/GraphQL/gRPC specs
- •Generate API entity from specification
- •
During project onboarding
- •Scan entire repository
- •Suggest ALL generation opportunities at once
- •
When dependencies are updated
- •Detect changes to package files
- •Suggest regenerating catalog
- •
In code review
- •If PR adds new dependency file
- •Suggest adding generator
Guardrails for Generated Entities
Always apply these standards:
- •Namespace: Use
defaultor specific namespace likegenerated - •Generated marker: Add
backstage.io/generated: "true"annotation - •Source tracking: Add
backstage.io/source-file: "<path>"annotation - •Regeneration warning: Include comment header warning about auto-generation
- •Owner assignment: Default to generic owner (e.g.,
contributors) - •Lifecycle: Default to
productionfor stable dependencies - •System assignment: Group by source type:
- •Dev tools →
local-dev - •Infrastructure →
cloud-ops - •APIs → system that owns them
- •Dev tools →
Example entity header:
# WARNING: This file is auto-generated from package.json
# Do not edit manually - changes will be overwritten
# To regenerate: task catalog:generate:npm
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
annotations:
backstage.io/generated: "true"
backstage.io/source-file: package.json
# ... rest of entity
Task Tracking for Generation Work
When implementing catalog generation automation:
# Initialize task td start "catalog-gen-<source-type>" # Track progress td log "Analyzed <source-file> structure" td log "Created generator script: scripts/generate-<source>-catalog.sh" td log "Added Taskfile command: catalog:generate:<source>" td log "Tested generation with sample data" td log --decision "Using <tool> for parsing <format> because <reason>" # Capture state on completion td handoff "catalog-gen-<source-type>" \ --done "Generator script created, tested, and integrated" \ --remaining "Add to CI/CD pipeline, update documentation" \ --decision "Placed generated entities in catalog/generated/<source> for isolation"
Integration with OpenSpec
When creating components through OpenSpec workflow:
- •After artifact creation: Check for dependency files
- •After implementation: Scan for new APIs/infrastructure
- •Before archiving: Ensure catalog entries exist
- •Track catalog work:
td log "Generated catalog entities from <source>"
Example:
# In opsx-apply after implementing a new service if [ -f "package.json" ]; then echo "📦 Found package.json - generating dependency catalog" task catalog:generate:npm td log "Generated npm dependency catalog from package.json" fi