ifp

Garden Patch Home · Inquiries

Living Document Scale Limits

Question

Is there a practical scale limit for a maintained knowledge graph — a point beyond which the cost of gardening (tending, linking, reviewing, restructuring) exceeds the value of the accumulated knowledge? If so, what are the early warning signs, and does the deep context architecture’s typed structure change the calculus?

Scope

The question arises from three observations:

The evidence for scale — Jerry Michalski has maintained TheBrain continuously since 1997: 620,000+ nodes, 1.5 million+ links, hand-entered at 50-60 per day. This is the longest-running large-scale personal knowledge management system in public existence. It demonstrates that single-author knowledge graphs can scale to hundreds of thousands of nodes across decades.

The evidence against naive scale — Digital garden practitioners report structure ossification at multi-year timescales. The “Reflection on two years of writing evergreen notes” documents how note systems calcify and resist reorganization. If ossification appears at hundreds of notes, what happens at thousands?

The typed structure variable — The deep context architecture’s structural contracts, typed predicates, and growth stages may change where the limit falls. Typed structure makes maintenance more expensive per node (authoring effort) but less expensive per query (agents can triage cheaply via summary fields). The net effect on scale limits is unknown.

Approaches

Interview practitioners — Michalski’s TheBrain, long-running wiki communities, and multi-year digital garden maintainers each offer evidence about what breaks at scale. What gardening operations become prohibitively expensive? What falls behind?

Theoretical modeling — If gardening cost per node grows as O(n) (linear with link density) but retrieval value grows as O(log n) (diminishing returns), there is a crossover point. Typed structure might flatten the gardening curve (predicates make maintenance scriptable) while steepening the retrieval curve (typed traversal finds relevant nodes faster).

Empirical from this vault — Track gardening effort per session as the garden grows. At what node count does “tending existing nodes” consume more session time than “creating new nodes”?

Resolution Criteria

Sources

Relations