# cloud brain

> an ambient knowledge network for ai sessions. built on claude code + openai codex, dogfooded on a 55-file karpathy-style wiki.

## tldr

| field | value |
|-------|-------|
| kernel | what if every ai session could silently draw from a network of compiled research, yours and others'? |
| hypothesis | researchers waste hours re-deriving knowledge that someone else already compiled. an mcp-native plugin can make this ambient, with micropayments settling on megaeth |
| thesis reframe (v2) | cloud brain is an amortization layer for llm research costs. value = token cost saved + convergence cost skipped, not "better than ai" |
| status | paused, not abandoned. single-player mode validated via dogfooding. multiplayer blocked on cold-start density. pull-lift evaluation designed as the next unlock |
| built with | almost entirely claude code + openai codex. i wrote very little code by hand. the value i added was the product thinking, the wiki schema, and knowing when the product question mattered more than the code |
| validated by | dogfooding. used cloud brain on my own wiki across repeated claude code + openai codex sessions to test whether ambient retrieval improves research quality |
| skills | mcp protocol, embeddings / pgvector, micropayment architecture, eval frameworks, typescript, supabase, prompt injection defense, cold-start strategy, convergence-cost reasoning |

## the inspiration: karpathy's llm wiki pattern

this project started with andrej karpathy's post about maintaining a personal wiki for llm sessions. the idea: instead of starting every ai conversation from scratch, you maintain a structured markdown knowledge base that your ai can reference. raw sources go in, structured interlinked pages come out, and over time you build a compounding research asset.

- i built my own wiki following this pattern. 55 markdown files across ventures, research, and structured evaluations. three-layer architecture: raw sources (articles, transcripts), wiki pages (synthesized analysis), and a maintenance schema (CLAUDE.md) that tells ai agents how to read and update it
- the schema is the key. CLAUDE.md defines operations (ingest, query, lint), page conventions (one canonical file per topic, changelogs, cross-references), folder structure (ventures/, research/, eval-outputs/), and a privacy model (everything is private by default, shared digests are the only external-facing layer)
- cloud brain was the next question: my wiki works great for me. what if it could work across sessions, across ai tools, and eventually across people?

## the problem (current ai sessions happen in isolation)

- ~80% of research time is re-derivation (answering the same question 10 times across 10 conversations)
- 0 ai tools share knowledge across sessions by default
- context windows wasted on the same questions
- no cumulative learning: your knowledge doesn't compound

the hypothesis: what if previous work was automatically available to future sessions, silently, as ambient context?

## v2 reframe: convergence cost amortization

the v1 framing was "destination marketplace": buy and sell compiled research pages as a product. that failed the assumption stack. v2 reframes cloud brain as an ambient context layer with a sharper thesis:

- convergence cost is real. to reach a well-formed answer on a specific topic, any researcher spends hours iterating with their ai. that cost gets paid once, by the first person. every subsequent researcher who needs the same understanding pays it again, from scratch
- cloud brain amortizes it. if the first researcher's compiled wiki is pulled into the second researcher's session, the second researcher skips the convergence cost and arrives at the output state with less token spend and less wall-clock time
- the unit is convergence, not access. you're not buying a document. you're buying a shortcut through iteration. that's a different product

this reframe changed the launch plan: single-player mode (search your own wiki) becomes the primary value prop, with multiplayer density compounding over time. no wallet required at launch. $2-5 micropayments added later.

## how it works

three layers:

- knowledge capture: wiki parser reads any markdown folder (obsidian-style links, recursive scanning, regex pii detection). facts, insights, decisions get chunked and embedded into structured knowledge atoms
- knowledge network: openai text-embedding-3-large + supabase pgvector. semantic search across all indexed sessions. cross-linking between related atoms
- injection into sessions: new session starts in claude code / openai codex / any mcp client. system queries the network: "what does this user know about X?" relevant atoms get injected as context. claude sees your previous work automatically

| flow step | tool | notes |
|-----------|------|-------|
| your markdown wiki | source | 55 pages across ventures/research/evals |
| wiki parser + pii scan | cloud brain mcp | obsidian wikilinks, regex scrubbing |
| embeddings (pgvector) | supabase | text-embedding-3-large |
| mcp server | claude desktop + any client | 5 tools exposed |
| silent augmentation | claude code / openai codex | ambient, no manual query |

## what's actually built

| component | status | notes |
|-----------|--------|-------|
| wiki parser | works | reads any markdown folder, obsidian wikilinks, recursive scanning, regex pii detection |
| retrieval engine | works | openai text-embedding-3-large, supabase pgvector, similarity search + synthesis |
| mcp server | works | 5 tools: search_cloud_brain, pull_knowledge, submit_pull_feedback, get_pull_history, marketplace_summary |
| quote lifecycle | works | create, approve, settle, deliver. quote-then-approve pattern for cost control |
| safety layer | works | prompt injection scanning, content-hash verification, transparent fallback routing |
| payment settlement | mocked | usdm ledger scaffolded with mock settlement. real megaeth integration not started |
| multi-contributor network | not built | contributor isolation, attribution, quality scoring: designed but not implemented |
| pull-lift evaluation | designed | gepa-style differential judge on before/after state. see next section |

## the unreleased design: pull-lift evaluation

the biggest gap in the marketplace layer is reputation. right now it's a thumbs-up stub. the next-version design borrows from GEPA (reflective LM primitives) to turn reputation into a computed lift score.

| concept | description |
|---------|-------------|
| before state | snapshot of the buyer's work before the pull (prompt, current draft, current research) |
| pulled content | what cloud brain returned |
| after state | snapshot of the buyer's work after the pull (new draft, new reasoning) |
| differential judge | a reflective llm reads all three and scores lift on a 0-1 scale using a task-specific rubric |
| output | scalar lift score per pull, fed into reputation rankings, dynamic pricing, and outcome-based refunds |

what it unlocks:

- reputation stops being "average of clicks" and becomes "average of measured improvement"
- dynamic pricing gets a real signal: high-lift knowledge atoms can price higher
- give-to-get accounting gets a unit: contributors earn credits proportional to cumulative lift they enable
- refund rails become possible when no lift is produced

caveats: opt-in friction (asking buyers to share before/after state), rubric dependence (different tasks have different success shapes), gameability, llm cost per evaluation.

## how i validated: dogfooding

- i used cloud brain on my own wiki for weeks. every time i opened claude code or openai codex for research, the mcp plugin silently checked my wiki. when it found relevant prior research, it pulled it into the session
- single-player mode genuinely works. researching stablecoin regulations and having my own prior GENIUS Act analysis surface automatically saved real time. the value is immediate and tangible
- but it revealed the real problem: single-player is useful, multiplayer is the product. and multiplayer needs density i don't have

## why i paused

- the code works, but the product question is unanswered. building more features won't solve the supply-side bootstrapping problem. i needed to step back and think about distribution before writing more code
- the core hypothesis is still untested at multiplayer. "does pulling someone else's compiled research into your ai session produce noticeably better output?" you need 5-10 real contributors to even test this meaningfully
- frequency-of-value is the binding constraint. with sparse contributors, most queries return nothing. that's a bad first experience
- i defined kill criteria before pausing: if 3/5 test users say sessions are NOT better with cloud brain, the product doesn't work. if after 30 days fewer than 10 contributors have published, supply doesn't compound

## honest self-assessment

| dimension | score | note |
|-----------|-------|------|
| technical feasibility | proved | core stack works end to end in claude desktop |
| market timing | strong | mcp ecosystem exploding. ~97M monthly sdk downloads |
| single-player value | validated | dogfooding confirms: ambient retrieval of your own research is genuinely useful |
| core hypothesis (multiplayer) | untested | does network-sourced research actually improve ai output? need before/after evidence |
| cold-start | unsolved | single-player bootstraps adoption, but multiplayer needs density nobody has yet |
| reputation primitive | designed | pull-lift evaluation unlocks the marketplace layer but not yet built |

## verdict

paused, not abandoned. the infrastructure works. single-player mode is validated. the v2 reframe (convergence cost amortization) is sharper than v1, and the pull-lift evaluation gives a concrete path from "thumbs-up stub" to a real reputation primitive. what's blocked is network density, which is a distribution problem not a technical one. unpausing requires either 5-10 real contributors or a single-player growth path that earns that density over time.

## skills learned

- mcp-protocol: deep dive into Anthropic's Model Context Protocol, learning how to build servers that extend Claude's knowledge at runtime
- knowledge-networks: how to structure searchable knowledge: semantic indexing, tagging, and cross-linking research artifacts
- claude-api: building applications that pull context from external systems and inject it into Claude conversations
- session-architecture: thinking about conversation state, context windows, and how to prioritize information injection
- supply-side-bootstrapping: identifying the real constraint in a network effect product (not the technology, but initial adoption)
- reflective-eval-design: turning a binary reputation signal into a scalar lift score using a differential judge