Code Property Graph

Code Property Graph (CPG) query and analysis via DuckDB, DuckPGQ, and apogee-tree tree-sitter visitors.

Synopsis

CPG tools are accessed through the MCP server:

cpg_find_callers(function_name="build_manifest", depth=3)
cpg_trace_data_flow(variable="user_input", function="parse")
cpg_temporal_diff(commit_a="abc123", commit_b="def456")
cpg_full_audit(branch_a="mainline", branch_b="fedora-ark", subsystem="crypto")

Description

The CPG subsystem provides structural code analysis through 13 MCP tools. Data is stored in DuckDB with DuckPGQ extensions for graph traversal queries (recursive CTEs, reachability, path enumeration).

apogee-tree provides tree-sitter-based symbol extraction across supported languages (C, C++, Java, Go, Rust, Python, JavaScript, TypeScript, TSX).

Query categories:

  • Graph traversal: callers, callees, impact analysis via recursive CTEs with configurable depth limits

  • Structural inspection: module structure showing classes, functions, and their relationships within a file

  • Data flow: forward and backward variable tracing within functions

  • Temporal analysis: structural diff between commits showing added, removed, and modified symbols and edges

  • Semantic search: find structurally similar functions using embedding-based similarity

  • Cross-branch analysis: divergence reports, backport checks, full audits comparing two branch snapshots

  • Security scanning: compound vulnerability queries using scope_summary safety flags

Reference

Tool

Description

Key parameters

cpg_find_callers

Find functions calling a given function

function_name, depth, subsystem, branch

cpg_find_callees

Find functions called by a given function

function_name, depth, subsystem, branch

cpg_impact_analysis

Analyze change blast radius: callers + callees + risk

function_name, file, branch

cpg_show_module_structure

Show file layout: modules, classes, functions

path, subsystem, branch

cpg_trace_data_flow

Trace variable flow within and across functions (follows arg_passes_to edges)

variable, function, file, branch

cpg_temporal_diff

Compare CPG state between two commits

commit_a, commit_b, file_path, subsystem, branch

cpg_semantic_search

Find structurally similar functions

query_function, k, threshold, subsystem, branch

cpg_reconstruct_at_commit

Reconstruct file/function body at a historical commit

commit_hash, file_path, function_name

cpg_raw_query

Execute read-only SQL against CPG database

sql, params, max_rows

cpg_security_scan

Run compound security scan using scope_summary flags

subsystem, branch, min_risk

cpg_branch_divergence_report

Cross-branch structural divergence report

branch_a, branch_b, subsystem

cpg_backport_check

Assess backport safety for a function

function_name, file_path, source_branch, target_branch

cpg_full_audit

Combined security + divergence + backport audit

branch_a, branch_b, report_level

Examples

# Who calls build_manifest?
cpg_find_callers(function_name="build_manifest", depth=2)

# What would break if I change compute_scores?
cpg_impact_analysis(
    function_name="compute_scores",
    file="src/apogee_engine/manifest/scoring.py"
)

# What changed between two commits?
cpg_temporal_diff(commit_a="HEAD~5", commit_b="HEAD")

Cross-branch analysis

The CPG database indexes the HEAD of each branch and remote, enabling structural comparison across kernel trees without switching worktrees. Branches are identified by short hash or ref name.

A typical cross-branch workflow:

  1. Temporal diff — identify structural changes between two branch snapshots using cpg_temporal_diff with the branch parameter.

  2. Divergence report — generate a full divergence report with cpg_branch_divergence_report to see functions only in one branch, diverged implementations, and safety flag differences.

  3. Backport check — for each candidate function, run cpg_backport_check to assess whether the function can be safely backported, checking body identity, caller/callee compatibility, and prerequisite functions.

For a complete audit combining all three steps:

cpg_full_audit(
    branch_a="mainline",
    branch_b="fedora-ark",
    subsystem="crypto",
    report_level="minimum"
)

CPG Construction Model

The CPG is built by apogee-manifest, a Rust CLI that parses source files via tree-sitter, extracts structural nodes and edges, and persists them to DuckDB.

Build configuration is controlled by BuildConfig:

BuildConfig(
    repo="~/projects/linux",
    label="current",
    max_bytes=512000,
    commit_ref="fedora-ark/os-build",
    depth=BuildDepth.Full,  # or Skeleton
    only_files=None,  # or set of changed files
    capture_bodies=True,
    skip_auto_detect=False,
)

Key terms:

Skeleton

A reduced-fidelity build (BuildDepth.Skeleton) that extracts nodes, edges, and body hashes but skips body text, embeddings, and scope summaries. Used for historical commit indexing where only structural identity and change detection are needed.

Delta

A lightweight record of what changed between two commits, stored in commit_node_deltas and commit_edge_deltas tables. Contains only identity, topology, and body_hash – no embeddings, scope summaries, or body text.

Overlay

A full CPG built from uncommitted working-directory state using WORKING_DIR_HASH as the commit hash. Purged and rebuilt on each invocation (never updated in-place). When the user commits, the builder auto-detects the new HEAD and demotes the previous HEAD to a delta.

Two-layer temporal model:

  • HEAD snapshots (storage_type='full'): complete CPG with embeddings, scope summaries, bodies. One per indexed branch. This is the queryable workbench — all 13 MCP tools operate against full snapshots.

  • Historical deltas (storage_type='delta'): lightweight records of what changed between commits. Stored in commit_node_deltas and commit_edge_deltas tables. Only identity + topology + body_hash — no embeddings, no scope summaries, no body text.

Skeleton mode (build_depth='skeleton'): parse changed files only, compute body_hash but skip body text, skip embeddings and scope summaries. Used for historical commit indexing. Produces staging data consumed by write_delta to create delta records.

Working directory overlay (--working-dir): builds a full CPG from uncommitted filesystem state using WORKING_DIR_HASH as the commit hash. Always purged and rebuilt (never updated in-place). When the user commits, the builder auto-detects the new HEAD and demotes the previous HEAD to a delta.

Incremental branch loading: when indexing a second branch, the builder detects the closest existing full snapshot and SQL-copies unchanged nodes, edges, embeddings, and scope summaries. Only changed files are re-parsed. On the Linux kernel, this reduces second-branch build time from 25 minutes to ~5 minutes (4.7x faster).

Per-subsystem history (--history-depth N): indexes the last N commits that touched each top-level directory, not global merge commits. Uses git log -- <path> to find subsystem-relevant commits. Each historical commit is stored as a skeleton-depth delta.

Subsystem scoping

Most CPG tools accept a subsystem parameter that restricts queries to a logical code boundary. Subsystems are auto-detected from project structure and persisted to .apogee/subsystems.toml.

Auto-detection runs these detectors in priority order (first to produce two or more subsystems wins):

  1. Explicit config (.apogee/subsystems.toml with strategy = "explicit")

  2. Linux kernel MAINTAINERS file

  3. GitHub CODEOWNERS

  4. Cargo workspace members

  5. Go modules (go.mod in subdirectories)

  6. npm workspace package.json

  7. Top-level directory heuristic (fallback)

Each file is mapped to exactly one subsystem via longest-prefix matching. The mapping is stored in the subsystem column on nodes and scope_summary tables.

Usage in tool calls:

cpg_security_scan(subsystem="drivers/crypto", branch="mainline")
cpg_branch_divergence_report(
    branch_a="mainline", branch_b="rhel-9.5",
    subsystem="drivers/crypto"
)

Substring matching is used — subsystem="drivers" matches drivers/crypto, drivers/net, etc.

Manual configuration: set strategy = "explicit" in .apogee/subsystems.toml and define patterns per subsystem:

[subsystems.drivers-crypto]
patterns = ["drivers/crypto/"]
excludes = ["drivers/crypto/test/"]

See the subsystem glossary entry for the full definition.

Note

This page is also available as a man page: man apogee-cpg

See Also