Code Property Graph¶
Code Property Graph (CPG) query and analysis via DuckDB, DuckPGQ, and apogee-tree tree-sitter visitors.
Synopsis¶
CPG tools are accessed through the MCP server:
cpg_find_callers(function_name="build_manifest", depth=3)
cpg_trace_data_flow(variable="user_input", function="parse")
cpg_temporal_diff(commit_a="abc123", commit_b="def456")
cpg_full_audit(branch_a="mainline", branch_b="fedora-ark", subsystem="crypto")
Description¶
The CPG subsystem provides structural code analysis through 13 MCP tools. Data is stored in DuckDB with DuckPGQ extensions for graph traversal queries (recursive CTEs, reachability, path enumeration).
apogee-tree provides tree-sitter-based symbol extraction across supported languages (C, C++, Java, Go, Rust, Python, JavaScript, TypeScript, TSX).
Query categories:
Graph traversal: callers, callees, impact analysis via recursive CTEs with configurable depth limits
Structural inspection: module structure showing classes, functions, and their relationships within a file
Data flow: forward and backward variable tracing within functions
Temporal analysis: structural diff between commits showing added, removed, and modified symbols and edges
Semantic search: find structurally similar functions using embedding-based similarity
Cross-branch analysis: divergence reports, backport checks, full audits comparing two branch snapshots
Security scanning: compound vulnerability queries using scope_summary safety flags
Reference¶
Tool |
Description |
Key parameters |
|---|---|---|
|
Find functions calling a given function |
|
|
Find functions called by a given function |
|
|
Analyze change blast radius: callers + callees + risk |
|
|
Show file layout: modules, classes, functions |
|
|
Trace variable flow within and across functions (follows arg_passes_to edges) |
|
|
Compare CPG state between two commits |
|
|
Find structurally similar functions |
|
|
Reconstruct file/function body at a historical commit |
|
|
Execute read-only SQL against CPG database |
|
|
Run compound security scan using scope_summary flags |
|
|
Cross-branch structural divergence report |
|
|
Assess backport safety for a function |
|
|
Combined security + divergence + backport audit |
|
Examples¶
# Who calls build_manifest?
cpg_find_callers(function_name="build_manifest", depth=2)
# What would break if I change compute_scores?
cpg_impact_analysis(
function_name="compute_scores",
file="src/apogee_engine/manifest/scoring.py"
)
# What changed between two commits?
cpg_temporal_diff(commit_a="HEAD~5", commit_b="HEAD")
Cross-branch analysis¶
The CPG database indexes the HEAD of each branch and remote, enabling structural comparison across kernel trees without switching worktrees. Branches are identified by short hash or ref name.
A typical cross-branch workflow:
Temporal diff — identify structural changes between two branch snapshots using
cpg_temporal_diffwith thebranchparameter.Divergence report — generate a full divergence report with
cpg_branch_divergence_reportto see functions only in one branch, diverged implementations, and safety flag differences.Backport check — for each candidate function, run
cpg_backport_checkto assess whether the function can be safely backported, checking body identity, caller/callee compatibility, and prerequisite functions.
For a complete audit combining all three steps:
cpg_full_audit(
branch_a="mainline",
branch_b="fedora-ark",
subsystem="crypto",
report_level="minimum"
)
CPG Construction Model¶
The CPG is built by apogee-manifest, a Rust CLI that parses source
files via tree-sitter, extracts structural nodes and edges, and persists
them to DuckDB.
Build configuration is controlled by BuildConfig:
BuildConfig(
repo="~/projects/linux",
label="current",
max_bytes=512000,
commit_ref="fedora-ark/os-build",
depth=BuildDepth.Full, # or Skeleton
only_files=None, # or set of changed files
capture_bodies=True,
skip_auto_detect=False,
)
Key terms:
- Skeleton
A reduced-fidelity build (
BuildDepth.Skeleton) that extracts nodes, edges, and body hashes but skips body text, embeddings, and scope summaries. Used for historical commit indexing where only structural identity and change detection are needed.- Delta
A lightweight record of what changed between two commits, stored in
commit_node_deltasandcommit_edge_deltastables. Contains only identity, topology, and body_hash – no embeddings, scope summaries, or body text.- Overlay
A full CPG built from uncommitted working-directory state using
WORKING_DIR_HASHas the commit hash. Purged and rebuilt on each invocation (never updated in-place). When the user commits, the builder auto-detects the new HEAD and demotes the previous HEAD to a delta.
Two-layer temporal model:
HEAD snapshots (
storage_type='full'): complete CPG with embeddings, scope summaries, bodies. One per indexed branch. This is the queryable workbench — all 13 MCP tools operate against full snapshots.Historical deltas (
storage_type='delta'): lightweight records of what changed between commits. Stored incommit_node_deltasandcommit_edge_deltastables. Only identity + topology + body_hash — no embeddings, no scope summaries, no body text.
Skeleton mode (build_depth='skeleton'): parse changed files only,
compute body_hash but skip body text, skip embeddings and scope summaries.
Used for historical commit indexing. Produces staging data consumed by
write_delta to create delta records.
Working directory overlay (--working-dir): builds a full CPG from
uncommitted filesystem state using WORKING_DIR_HASH as the commit
hash. Always purged and rebuilt (never updated in-place). When the user
commits, the builder auto-detects the new HEAD and demotes the previous
HEAD to a delta.
Incremental branch loading: when indexing a second branch, the builder detects the closest existing full snapshot and SQL-copies unchanged nodes, edges, embeddings, and scope summaries. Only changed files are re-parsed. On the Linux kernel, this reduces second-branch build time from 25 minutes to ~5 minutes (4.7x faster).
Per-subsystem history (--history-depth N): indexes the last N
commits that touched each top-level directory, not global merge commits.
Uses git log -- <path> to find subsystem-relevant commits. Each
historical commit is stored as a skeleton-depth delta.
Subsystem scoping¶
Most CPG tools accept a subsystem parameter that restricts
queries to a logical code boundary. Subsystems are auto-detected
from project structure and persisted to .apogee/subsystems.toml.
Auto-detection runs these detectors in priority order (first to produce two or more subsystems wins):
Explicit config (
.apogee/subsystems.tomlwithstrategy = "explicit")Linux kernel
MAINTAINERSfileGitHub
CODEOWNERSCargo workspace members
Go modules (
go.modin subdirectories)npm workspace
package.jsonTop-level directory heuristic (fallback)
Each file is mapped to exactly one subsystem via longest-prefix
matching. The mapping is stored in the subsystem column on
nodes and scope_summary tables.
Usage in tool calls:
cpg_security_scan(subsystem="drivers/crypto", branch="mainline")
cpg_branch_divergence_report(
branch_a="mainline", branch_b="rhel-9.5",
subsystem="drivers/crypto"
)
Substring matching is used — subsystem="drivers" matches
drivers/crypto, drivers/net, etc.
Manual configuration: set strategy = "explicit" in
.apogee/subsystems.toml and define patterns per subsystem:
[subsystems.drivers-crypto]
patterns = ["drivers/crypto/"]
excludes = ["drivers/crypto/test/"]
See the subsystem glossary entry for the full definition.
Note
This page is also available as a man page: man apogee-cpg
See Also¶
apogee-manifest — source data for CPG construction
apogee-mcp — MCP server hosting CPG tools