Software Catalog Management

Maintain and evolve a Backstage software catalog based on the Backstage System Model.

This skill helps you create, update, and organize catalog entities following Backstage best practices.

Overview

The Backstage catalog models software using these core concepts:

Core Entities:

•Components - Individual pieces of software (services, websites, libraries)
•APIs - Boundaries between components (OpenAPI, GraphQL, gRPC, AsyncAPI)
•Resources - Infrastructure needed to operate components (databases, queues, buckets)

Organizational Entities:

•Users - People (employees, contractors)
•Groups - Teams, business units, or interest groups

Ecosystem Modeling:

•Systems - Collections of components/APIs that cooperate to perform a function
•Domains - Business-aligned groupings of systems (bounded contexts)

When to Use This Skill

Use this skill when you need to:

•Add a new component, API, or system to the catalog
•Update entity metadata, ownership, or relationships
•Reorganize the catalog structure (systems, domains)
•Define API contracts and their providers/consumers
•Model organizational structure (groups, teams)
•Document dependencies and relationships
•Validate catalog consistency

Automated Catalog Generation

IMPORTANT: Always look for opportunities to generate catalog entities automatically from existing project metadata. This ensures the catalog stays synchronized with the actual codebase and reduces manual maintenance burden.

When to Generate Catalog Entities

You should proactively suggest creating catalog generation automation when you encounter:

Dependency Management Files:

•devbox.json - Development environment packages (existing example in this repo)
•package.json - Node.js/npm dependencies
•go.mod - Go module dependencies
•requirements.txt / pyproject.toml - Python dependencies
•Gemfile - Ruby dependencies
•pom.xml / build.gradle - Java/Maven/Gradle dependencies
•Cargo.toml - Rust dependencies
•composer.json - PHP dependencies

Infrastructure as Code:

•Terraform modules (.tf files) - Generate Resource entities
•CloudFormation templates - Generate Resource entities
•Kubernetes manifests - Generate Component and Resource entities
•Helm charts - Generate System and Component entities
•CDK constructs - Generate Resource entities from stacks

API Definitions:

•OpenAPI/Swagger specs (openapi.yaml, swagger.json) - Generate API entities
•GraphQL schemas (schema.graphql) - Generate API entities
•gRPC proto files (*.proto) - Generate API entities
•AsyncAPI specs - Generate API entities

Service Definitions:

•Docker Compose files (docker-compose.yml) - Generate Components and Resources
•Kubernetes Deployments - Generate Components
•Service mesh configs (Istio, Linkerd) - Generate Components and APIs

Repository Metadata:

•CODEOWNERS files - Infer ownership for Components
•GitHub Actions workflows - Document CI/CD integrations
•Monorepo structure - Generate multiple Components from subdirectories

Configuration Files:

•Backstage entity files in subdirectories - Aggregate into main catalog
•Microservice registry files - Generate Components
•Service discovery configs - Generate Components and APIs

Catalog Generation Workflow

When you identify a generation opportunity, follow this pattern:

•
Detect the source
- •Scan for dependency/config files in the project
- •Example: find . -name "package.json" -not -path "*/node_modules/*"
•
Parse the source
- •Extract relevant metadata (package names, versions, types)
- •Example: jq '.dependencies | keys[]' package.json
•
Generate catalog entities
- •Create Component entities for dependencies
- •Create Resource entities for infrastructure
- •Create API entities for specifications
- •Tag with source metadata
- •Add annotations linking back to source
•
Create automation script
- •Add generator script to scripts/ directory
- •Make it executable and well-documented
- •Include error handling and validation
•
Integrate with build system
- •Add to CI/CD pipeline or Task runner (Taskfile.yaml)
- •Schedule regular regeneration
- •Track changes in version control
•
Document the process
- •Add README explaining automation
- •Document how to regenerate manually
- •Note source of truth (the original file)

Reference: Existing Devbox Pattern

The repository already has automation for parsing devbox.json and generating Component entities for development tools. Use this as a reference template for creating other dependency generators.

What the devbox generator does:

•Parses devbox.json packages array
•Creates Component entities for each development tool
•Tags them with appropriate metadata
•Links to upstream documentation
•Organizes under the local-dev system

Apply this same pattern to:

•package.json → npm dependencies
•go.mod → Go module dependencies
•requirements.txt → Python packages
•Cargo.toml → Rust crates

Example Generators

Node.js Dependencies (package.json)

When you encounter package.json, create a generator:

Common Workflows

Workflow 1: Adding a New Component

Input: Component details (name, type, owner, etc.)

Steps:

•
Gather component information
- •Ask for: name, description, type, lifecycle, owner
- •Identify: system membership, APIs provided/consumed, dependencies
•
Determine catalog location
- •If component has its own repo: create catalog-info.yaml in repo root
- •If multi-component repo: create in component subdirectory
- •If centralized catalog: add to central catalog file

•

Create catalog entity

yaml

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: <kebab-case-name>
  description: <brief-description>
  tags:
    - <language>
    - <framework>
  annotations:
    github.com/project-slug: <org>/<repo>
spec:
  type: <service|website|library>
  lifecycle: <experimental|production|deprecated>
  owner: <team-name>
  system: <system-name>  # optional
  providesApis:
    - <api-name>  # optional
  consumesApis:
    - <api-name>  # optional
  dependsOn:
    - component:<component-name>
    - resource:<resource-name>

•
Validate relationships
- •Ensure owner (Group) exists in catalog
- •Ensure system exists if specified
- •Ensure referenced APIs exist
- •Ensure dependencies are valid entity references
•
Add well-known annotations (as applicable)
- •github.com/project-slug - GitHub repository
- •backstage.io/techdocs-ref - TechDocs location
- •sonarqube.org/project-key - SonarQube project
- •pagerduty.com/service-id - PagerDuty service
- •sentry.io/project-slug - Sentry project
•
Commit and register
- •Commit the catalog file
- •Register location in Backstage (if needed)

Workflow 2: Defining an API

Input: API details and specification

Steps:

•
Gather API information
- •Ask for: name, type (openapi/graphql/grpc/asyncapi), lifecycle, owner
- •Obtain: API definition/specification
- •Identify: which components provide/consume it

•

Create API entity

yaml

apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: <api-name>
  description: <api-description>
spec:
  type: <openapi|graphql|grpc|asyncapi>
  lifecycle: <experimental|production|deprecated>
  owner: <team-name>
  system: <system-name>  # optional
  definition: |
    <api-spec-content>
    # Or use $text substitution:
    # definition:
    #   $text: ./api-spec.yaml

•
Link to components
- •Update components that provide this API: add to providesApis
- •Update components that consume this API: add to consumesApis
•
Validate API spec
- •Ensure definition is valid for the specified type
- •Consider using $text or $yaml substitution for external files

Workflow 3: Creating a System

Input: System details and component membership

Steps:

•
Define system scope
- •Ask for: name, description, owner, domain
- •Identify: components that belong to this system
- •Identify: public APIs the system exposes

•

Create system entity

yaml

apiVersion: backstage.io/v1alpha1
kind: System
metadata:
  name: <system-name>
  description: <system-description>
spec:
  owner: <team-name>
  domain: <domain-name>  # optional

•
Update component memberships
- •For each component in the system: add system: <system-name> to spec
- •This creates partOf relations automatically
•
Document system boundaries
- •Ensure public APIs are defined
- •Private/internal APIs can remain implicit

Workflow 4: Organizing with Domains

Input: Domain structure and business alignment

Steps:

•

Define domain

yaml

apiVersion: backstage.io/v1alpha1
kind: Domain
metadata:
  name: <domain-name>
  description: <business-area-description>
spec:
  owner: <team-name>

•
Assign systems to domain
- •Update systems: add domain: <domain-name> to spec
•
Create domain hierarchy (if needed)
- •Domains can be subdomains of other domains
- •Use naming: parent-domain/subdomain

Workflow 5: Modeling Organizational Structure

Input: Team/group structure

Steps:

•

Create groups

yaml

apiVersion: backstage.io/v1alpha1
kind: Group
metadata:
  name: <team-name>
  description: <team-description>
spec:
  type: <team|business-unit|product-area>
  profile:
    displayName: <Human Readable Name>
    email: <team-email>
    picture: <team-avatar-url>
  parent: <parent-group>  # optional
  children: []
  members:
    - <user-id>

•

Create users (if needed)

yaml

apiVersion: backstage.io/v1alpha1
kind: User
metadata:
  name: <user-id>
spec:
  profile:
    displayName: <Full Name>
    email: <email>
    picture: <avatar-url>
  memberOf:
    - <team-name>

•
Model hierarchy
- •Use parent and children for group hierarchy
- •Supports multi-root hierarchies

Workflow 6: Adding Resources

Input: Infrastructure resource details

Steps:

•

Create resource entity

yaml

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
  name: <resource-name>
  description: <resource-description>
spec:
  type: <database|queue|storage|cdn|...>
  owner: <team-name>
  system: <system-name>  # optional
  dependencyOf:
    - component:<component-name>

•
Link dependencies
- •Update components that depend on this resource
- •Add to component's dependsOn list

Catalog File Structure

Single-File Catalog

All entities in one file (good for small catalogs):

yaml

apiVersion: backstage.io/v1alpha1
kind: System
metadata:
  name: my-system
spec:
  owner: team-a
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: my-service
spec:
  type: service
  owner: team-a
  system: my-system

Distributed Catalog

Each component in its own repo with catalog-info.yaml:

code

repo-a/
  catalog-info.yaml  # Component A
repo-b/
  catalog-info.yaml  # Component B
central-catalog/
  catalog-info.yaml  # Systems, Domains, Groups

Entity Reference Format

When referencing other entities, use these formats:

•Fully qualified: <kind>:<namespace>/<name>
•With default namespace: <kind>:<name> (assumes default namespace)
•With default kind: <name> (kind depends on context)

Examples:

•component:default/my-service
•group:platform-team
•my-api (in context where kind is obvious)

Well-Known Annotations

Add these annotations for integrations:

Source Control

•github.com/project-slug: <org>/<repo>
•gitlab.com/project-slug: <group>/<project>
•bitbucket.org/project-key: <project>

CI/CD

•circleci.com/project-slug: <vcs>/<org>/<repo>
•jenkins.io/job-full-name: <folder>/<job>

Monitoring & Alerting

•pagerduty.com/integration-key: <key>
•pagerduty.com/service-id: <id>
•sentry.io/project-slug: <project>

Quality & Security

•sonarqube.org/project-key: <key>
•snyk.io/org-id: <org-id>

Documentation

•backstage.io/techdocs-ref: dir:.

Other

•backstage.io/time-saved: PT8H (for templates)

Validation Checklist

Before committing catalog changes:

Common Patterns

Monorepo with Multiple Components

yaml

# catalog-info.yaml in repo root
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: frontend
spec:
  type: website
  owner: team-a
  system: my-system
  consumesApis:
    - backend-api
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: backend
spec:
  type: service
  owner: team-a
  system: my-system
  providesApis:
    - backend-api
  dependsOn:
    - resource:postgres-db

API-First Design

yaml

# 1. Define API first
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: payment-api
spec:
  type: openapi
  lifecycle: production
  owner: payments-team
  definition:
    $text: ./openapi.yaml
---
# 2. Define provider component
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
spec:
  type: service
  owner: payments-team
  providesApis:
    - payment-api

Hierarchical Teams

yaml

apiVersion: backstage.io/v1alpha1
kind: Group
metadata:
  name: engineering
spec:
  type: business-unit
  children:
    - platform-team
    - product-team
---
apiVersion: backstage.io/v1alpha1
kind: Group
metadata:
  name: platform-team
spec:
  type: team
  parent: engineering
  members:
    - alice
    - bob

Substitutions

Use substitutions to reference external files:

Text Substitution

yaml

spec:
  definition:
    $text: https://example.com/api.yaml
    # or
    $text: ./api-spec.yaml

JSON Substitution

yaml

metadata:
  annotations:
    config:
      $json: ./config.json

YAML Substitution

yaml

spec:
  definition:
    $yaml: ./definition.yaml

Note: Configure backend.reading.allow for external URLs:

yaml

backend:
  reading:
    allow:
      - host: example.com

Guardrails

•Entity Names: Must be unique per kind within a namespace
•Naming Convention: Use kebab-case for names
•Ownership: Every Component, API, System must have an owner
•Lifecycle: Use standard values: experimental, production, deprecated
•Types: Establish organizational taxonomy for component types
•Relations: Ensure bidirectional consistency (managed automatically by Backstage)
•Namespaces: Use default unless you need isolation
•File Location: Prefer catalog-info.yaml as filename

Integration with OpenSpec

When creating components from OpenSpec changes:

•After change completion: Create/update catalog entry
•System assignment: Map to appropriate system based on capability
•API documentation: If change includes API, create API entity
•Dependencies: Document in catalog based on design artifact
•Track in td: td log "Added <name> to software catalog"

Output Examples

Success - New Component

code

## Component Added to Catalog

**Name:** payment-service
**Type:** service
**Owner:** payments-team
**System:** payment-processing

**Location:** `./catalog-info.yaml`

**Relations:**
- Part of system: payment-processing
- Provides API: payment-api
- Depends on: postgres-db (resource)

Commit this file and register the location in Backstage.

Success - System Created

code

## System Created

**Name:** payment-processing
**Domain:** finance
**Owner:** payments-team

**Components:**
- payment-service
- payment-gateway
- payment-reconciliation

**Public APIs:**
- payment-api
- webhook-api

Updated 3 components to reference this system.

Validation Error

code

## Validation Failed

**Entity:** payment-service (Component)

**Issues:**
- Owner "payments-team" does not exist in catalog
- System "payment-system" does not exist
- API reference "payment-api" not found

**Next Steps:**
1. Create group: payments-team
2. Create system: payment-system
3. Create API entity: payment-api

Tips for AI Agents

•Always validate references: Check that owner, system, APIs exist before creating
•Use consistent naming: Follow project conventions for entity names
•Document decisions: Add descriptions and tags to aid discovery
•Think in layers: Domain → System → Component → API
•Model dependencies: Make relationships explicit
•Keep it updated: Catalog reflects current state, not desired state
•Use annotations: Connect catalog to external systems (GitHub, PagerDuty, etc.)
•Start simple: Can always add more detail later

bash

#!/bin/bash
# scripts/generate-npm-catalog.sh
# Generate catalog entities from package.json dependencies

PACKAGE_JSON="${1:-package.json}"
OUTPUT_DIR="${2:-./catalog/generated/npm}"

mkdir -p "$OUTPUT_DIR"

jq -r '.dependencies // {} | keys[]' "$PACKAGE_JSON" | while read -r pkg; do
  version=$(jq -r ".dependencies[\"$pkg\"]" "$PACKAGE_JSON")
  entity_name=$(echo "npm-$pkg" | tr '/' '-' | tr '@' '-')
  
  cat > "$OUTPUT_DIR/${entity_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${entity_name}
  description: NPM dependency ${pkg}
  annotations:
    backstage.io/generated: "true"
    backstage.io/source-file: ${PACKAGE_JSON}
    npm.org/package: ${pkg}
    npm.org/version: ${version}
  tags:
    - npm
    - dependency
    - javascript
spec:
  type: library
  lifecycle: production
  owner: contributors
  system: local-dev
ENTITY
done

echo "Generated NPM catalog entities in $OUTPUT_DIR"

Add to Taskfile.yaml:

yaml

catalog:generate:npm:
  desc: Generate catalog from package.json
  cmds:
    - ./scripts/generate-npm-catalog.sh

Go Dependencies (go.mod)

When you encounter go.mod, create a generator:

bash

#!/bin/bash
# scripts/generate-go-catalog.sh
# Generate catalog entities from go.mod dependencies

GO_MOD="${1:-go.mod}"
OUTPUT_DIR="${2:-./catalog/generated/go}"

mkdir -p "$OUTPUT_DIR"

awk '/^require /,/^\)/' "$GO_MOD" | grep -E '^\s+[a-z]' | while read -r line; do
  pkg=$(echo "$line" | awk '{print $1}')
  version=$(echo "$line" | awk '{print $2}')
  entity_name=$(echo "go-$pkg" | tr '/' '-' | tr '.' '-')
  
  cat > "$OUTPUT_DIR/${entity_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${entity_name}
  description: Go module ${pkg}
  annotations:
    backstage.io/generated: "true"
    backstage.io/source-file: ${GO_MOD}
    go.dev/module: ${pkg}
    go.dev/version: ${version}
  tags:
    - go
    - dependency
    - module
spec:
  type: library
  lifecycle: production
  owner: contributors
  system: local-dev
ENTITY
done

echo "Generated Go module catalog entities in $OUTPUT_DIR"

Terraform Resources

When you encounter .tf files, create a generator:

bash

#!/bin/bash
# scripts/generate-terraform-catalog.sh
# Generate Resource entities from Terraform configs

TF_DIR="${1:-.}"
OUTPUT_DIR="${2:-./catalog/generated/terraform}"

mkdir -p "$OUTPUT_DIR"

find "$TF_DIR" -name "*.tf" -not -path "*/.*" -exec grep -h "^resource" {} \; | while read -r line; do
  resource_type=$(echo "$line" | awk '{print $2}' | tr -d '"')
  resource_name=$(echo "$line" | awk '{print $3}' | tr -d '"')
  entity_name=$(echo "${resource_type}-${resource_name}" | tr '_' '-')
  
  cat > "$OUTPUT_DIR/${entity_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
  name: ${entity_name}
  description: Terraform ${resource_type} resource
  annotations:
    backstage.io/generated: "true"
    backstage.io/source-file: terraform/*.tf
    terraform.io/resource-type: ${resource_type}
    terraform.io/resource-name: ${resource_name}
  tags:
    - terraform
    - infrastructure
spec:
  type: infrastructure
  owner: contributors
  system: cloud-ops
ENTITY
done

echo "Generated Terraform Resource entities in $OUTPUT_DIR"

OpenAPI Specifications

When you encounter OpenAPI/Swagger specs, create a generator:

bash

#!/bin/bash
# scripts/generate-openapi-catalog.sh
# Generate API entities from OpenAPI specifications

SPEC_FILE="$1"
OUTPUT_DIR="${2:-./catalog/generated/apis}"

[ ! -f "$SPEC_FILE" ] && echo "Error: $SPEC_FILE not found" && exit 1

mkdir -p "$OUTPUT_DIR"

api_title=$(yq eval '.info.title' "$SPEC_FILE" 2>/dev/null || jq -r '.info.title' "$SPEC_FILE")
api_version=$(yq eval '.info.version' "$SPEC_FILE" 2>/dev/null || jq -r '.info.version' "$SPEC_FILE")
api_name=$(echo "$api_title" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')

cat > "$OUTPUT_DIR/api-${api_name}.yaml" << ENTITY
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: ${api_name}
  description: ${api_title}
  annotations:
    backstage.io/generated: "true"
    backstage.io/source-file: ${SPEC_FILE}
    openapi.org/version: ${api_version}
  tags:
    - openapi
    - rest
spec:
  type: openapi
  lifecycle: production
  owner: contributors
  definition:
    \$text: ${SPEC_FILE}
ENTITY

echo "Generated API entity: ${api_name}"

Detection Pattern for AI Agents

Proactively scan for catalog opportunities during ANY task:

bash

# Quick detection function
detect_catalog_opportunities() {
  echo "🔍 Scanning for catalog generation opportunities..."
  
  [ -f "package.json" ] && echo "✓ package.json → suggest npm catalog generation"
  [ -f "go.mod" ] && echo "✓ go.mod → suggest Go module catalog generation"
  [ -f "requirements.txt" ] && echo "✓ requirements.txt → suggest Python catalog"
  [ -f "pyproject.toml" ] && echo "✓ pyproject.toml → suggest Python catalog"
  [ -f "Cargo.toml" ] && echo "✓ Cargo.toml → suggest Rust catalog"
  [ -f "Gemfile" ] && echo "✓ Gemfile → suggest Ruby catalog"
  
  find . -name "*.tf" -not -path "*/.*" 2>/dev/null | head -1 | grep -q . && \
    echo "✓ Terraform files → suggest Resource catalog generation"
  
  find . \( -name "openapi*.yaml" -o -name "swagger*.json" \) 2>/dev/null | head -1 | grep -q . && \
    echo "✓ OpenAPI specs → suggest API catalog generation"
  
  find . -name "docker-compose*.yml" 2>/dev/null | head -1 | grep -q . && \
    echo "✓ Docker Compose → suggest service catalog generation"
  
  find . -name "*.proto" -not -path "*/.*" 2>/dev/null | head -1 | grep -q . && \
    echo "✓ gRPC protos → suggest API catalog generation"
}

# Run this during task initialization
detect_catalog_opportunities

When to Suggest Generation

Trigger Points:

•
After creating a new service/component
- •Check for dependency files
- •Suggest generating dependency catalog entries
•
After adding infrastructure
- •Scan for IaC files
- •Generate Resource entities
•
After defining an API
- •Check for OpenAPI/GraphQL/gRPC specs
- •Generate API entity from specification
•
During project onboarding
- •Scan entire repository
- •Suggest ALL generation opportunities at once
•
When dependencies are updated
- •Detect changes to package files
- •Suggest regenerating catalog
•
In code review
- •If PR adds new dependency file
- •Suggest adding generator

Guardrails for Generated Entities

Always apply these standards:

•Namespace: Use default or specific namespace like generated
•Generated marker: Add backstage.io/generated: "true" annotation
•Source tracking: Add backstage.io/source-file: "<path>" annotation
•Regeneration warning: Include comment header warning about auto-generation
•Owner assignment: Default to generic owner (e.g., contributors)
•Lifecycle: Default to production for stable dependencies
•
System assignment: Group by source type:
- •Dev tools → local-dev
- •Infrastructure → cloud-ops
- •APIs → system that owns them

Example entity header:

yaml

# WARNING: This file is auto-generated from package.json
# Do not edit manually - changes will be overwritten
# To regenerate: task catalog:generate:npm

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  annotations:
    backstage.io/generated: "true"
    backstage.io/source-file: package.json
  # ... rest of entity

Task Tracking for Generation Work

When implementing catalog generation automation:

bash

# Initialize task
td start "catalog-gen-<source-type>"

# Track progress
td log "Analyzed <source-file> structure"
td log "Created generator script: scripts/generate-<source>-catalog.sh"
td log "Added Taskfile command: catalog:generate:<source>"
td log "Tested generation with sample data"
td log --decision "Using <tool> for parsing <format> because <reason>"

# Capture state on completion
td handoff "catalog-gen-<source-type>" \
  --done "Generator script created, tested, and integrated" \
  --remaining "Add to CI/CD pipeline, update documentation" \
  --decision "Placed generated entities in catalog/generated/<source> for isolation"

Integration with OpenSpec

When creating components through OpenSpec workflow:

•After artifact creation: Check for dependency files
•After implementation: Scan for new APIs/infrastructure
•Before archiving: Ensure catalog entries exist
•Track catalog work: td log "Generated catalog entities from <source>"

Example:

bash

# In opsx-apply after implementing a new service
if [ -f "package.json" ]; then
  echo "📦 Found package.json - generating dependency catalog"
  task catalog:generate:npm
  td log "Generated npm dependency catalog from package.json"
fi