← Garden Patch Home · Principles
The knowledge vault needs searchable content, not file formats. The value of a stored binary is the information it contains. When that information can be extracted into a searchable, lightweight rendition, the original binary’s value drops to provenance evidence — and often a sidecar pointing to the canonical source is sufficient.
PDFs: Extract text via pdf2txt or similar tool into a rendition. If the canonical PDF is available at a stable URL, a sidecar with derived_from:: replaces local storage. Store the PDF locally only when no canonical source exists.
Apple iWork files (.key, .pages, .numbers): These change their modification date when opened without edits. In a git-tracked vault, this generates phantom diffs — files that appear changed in git status without actual content changes. iWork files should never be stored where git can see them. If the content matters, create a rendition (export to PDF, then pdf2txt). If the canonical file lives on iCloud or a shared drive, a sidecar with derived_from:: is sufficient.
Audio and video: Canonical sources (YouTube, Zoom cloud recordings) are better storage than local files. A sidecar links to the canonical URL. The transcript (a rendition of the audio) serves the vault’s needs. Store locally only when no canonical source exists AND the binary provides value beyond its transcript — for example, a recording needed for generating VTT timecodes for subtitle upload.
Whisper/VTT intermediate files: A .whisper file is a raw intermediate — reproducible by running the audio through Whisper again. A .vtt timecode file is more useful (uploadable to YouTube as subtitles) but also reproducible. Neither needs git tracking. A sidecar can note “reproducible from audio via Whisper Large v3 Turbo” for provenance.
Store a binary locally only when both conditions are met:
When either condition fails, a rendition + sidecar replaces local storage.
Extracted from the Content Over Container principle section of the compound nodes research note.