apogee-tree

Clean-room Rust library that parses source code via tree-sitter and emits a Code Property Graph (CPG) — the structural foundation for all CPG query tools.

Synopsis

apogee-tree is a library crate, not a standalone CLI. It is consumed by apogee-manifest (the CLI) and the DuckDB store layer.

use apogee_tree::{CpgBuilder, CpgNode, CpgEdge};

let mut builder = CpgBuilder::new(repo_root)
    .set_capture_bodies(true);
builder.add_source(source_bytes, "main.c");
let (cpg, body_content) = builder.build();

Description

apogee-tree extracts structural information from source code using tree-sitter parsers. It produces a graph of nodes (functions, classes, variables, calls, parameters, literals) and edges (calls, contains, data_flows_to, defined_by, used_by, arg_passes_to, has_parameter) — the Code Property Graph.

Supported languages (8):

  • C, C++ (including headers)

  • Python

  • Go

  • Rust

  • Java

  • JavaScript (including JSX)

  • TypeScript (including TSX)

C/C++ parsing model:

Each file is parsed independently — apogee-tree does not follow #include directives or run the C preprocessor. Header files (.h, .hpp, .hxx, .hh) are parsed with the same visitor as their corresponding source files and produce the same node types.

What is captured:

  • #include directives are emitted as import nodes (the include path is extracted, but the target file is not inlined)

  • #define constants are emitted as macro nodes with is_function_like: false

  • #define function-like macros (e.g., MIN(a, b)) are emitted as macro nodes with is_function_like: true

  • Kernel storage-class annotations (__init, __exit, __always_inline, __cold, __hot, __noinline) are stripped from function return types so they do not pollute signatures

What is not captured:

  • Macro invocations that don’t expand to parseable constructs (e.g., EXPORT_SYMBOL, MODULE_AUTHOR) are not extracted as nodes — tree-sitter sees the unexpanded source

  • Conditional compilation (#ifdef) is not evaluated; both branches are visible to the parser

Node kinds (17):

Kind

Description

module

Top-level file scope

class

Class or struct definition

function

Function or method definition

variable

Variable declaration or assignment

parameter

Function parameter

call

Function call site

argument

Argument at a call site

literal

Literal value (string, number, etc.)

return

Return statement

import

Import or include statement

branch

If/else/switch condition

loop

For/while/do loop

block

Code block scope

typedef

Type alias or typedef

macro

Macro definition

enum

Enum definition

namespace

Namespace or module scope

Edge kinds (7):

Kind

Description

contains

Parent scope contains child node

calls

Call site resolves to function definition

has_parameter

Function has a parameter

data_flows_to

Data flows from source to target

defined_by

Variable defined by an expression

used_by

Value used by a consumer

arg_passes_to

Call argument maps to callee parameter

Body capture: when capture_bodies is enabled, each function’s body text is hashed with Blake3 and stored as a body_hash attribute. The hash enables change detection across commits without storing full body text for every historical snapshot.

Skeleton mode: when hash_only is set on visitors, body_hash is computed but the full body text is not captured. Used for historical commit indexing where only structural identity and change detection are needed.

Reference

Crate structure (manifest/tree/):

  • src/model.rsCpgNode, CpgEdge, NodeKind, EdgeKind, SourceLocation types

  • src/builder.rsCpgBuilder with add_source, build, build_with_sink, resolve_calls, compute_type_refs

  • src/visitors/ — per-language visitor implementations: c.rs, cpp.rs, python.rs, go.rs, rust_lang.rs, java.rs, js.rs, ts.rs

  • src/visitors/mod.rsget_visitor_with_opts factory

Key APIs:

// Build CPG from source files
let builder = CpgBuilder::new(repo_root)
    .set_capture_bodies(true)
    .set_hash_only(false);
builder.add_source(bytes, "file.c");
let (cpg, bodies) = builder.build();

// Query the graph
cpg.node_count()    // total nodes
cpg.edge_count()    // total edges
cpg.nodes()         // iterator over CpgNode
cpg.edges()         // iterator over CpgEdge

Note

This page is also available as a man page: man apogee-tree

See Also