AI as a commons · owned by no one, powered by everyone

Ask anything.
Answered by everyone's devices.

Synapse is a search-and-answer engine that runs on the phones, laptops, and tablets already in people's hands. Not rented from a cloud. Not owned by anyone. Free, forever.

Opens in synapse.webmind.sh · runs in your browser · tap Contribute to help answer other people's questions

Owned by no one

No company in the middle. The model, the code, the network — all open, all forkable.

Your phone's idle GPU can answer someone else's question. One tap, transparent, stoppable.

Free, forever

No ads. No tiers. No tracking. Money is not the goal — access is.

A smarter world to live in

Here's the future we're working toward. Not predictions, not promises — just moments that become ordinary when an LLM no longer needs a datacenter to answer.

🏫 Rural Maharashtra

Meera and 29 tablets

The school's internet drops by 10 AM most days. Doesn't matter — the Pi under the teacher's desk is the coordinator; the tablets became the LLM. "Class, ask it why monsoons arrive from the west," she says. Thirty screens light up.

🚜 Rural Ireland

Liam, alone with an old tractor

No cell signal for miles. His phone and the laptop in the truck pair on his hotspot. He photographs the engine, asks what the intermittent rattle usually means on a 2011 model. The fleet of two answers him, offline, in the field.

🏥 Rural Kenya

Dr. Amara's treatment reference

The clinic runs the fleet on three staff laptops and a tablet. Patient queries never leave the building. "Check paediatric dosage for amoxicillin suspension, 14 kg" — two seconds, cited, private. The next patient is already at the door.

🏙 Tokyo, Japan

Kenji, 78, at his kitchen table

Three old phones sit in a drawer — his son's, his daughter's, his own previous one. They're plugged in at night, joining the apartment fleet. Kenji asks, in careful Japanese, about the form the city mailed him. The answer arrives in his language — polite, plain, private.

Illustrations drawn by hand because these moments haven't happened yet. When they do, send us a photo and we'll replace a sketch with your story.

Who is this for

🏫 Schools with patchy internet. One Raspberry Pi + the classroom WiFi + 30 tablets = a shared LLM for the whole room. Never phones the cloud. Details in Offline-first below.

🏠 Homes that just want to ask things. You already have a phone, a laptop, and maybe a console. They sit idle most of the day. Synapse lets them answer questions together. Free. Private-to-your-network if you want it that way.

🏢 Companies whose data can't leave the perimeter. Run the same code on your own network. Your prompts never touch a SaaS. Details in Private deployments.

🔬 Researchers + the curious. Every kernel has a numpy golden vector (research/gemma_parity). Every claim is falsifiable. Fork it, break it, publish a better version.

The bet: useful AI will run on the devices we already own, not in rented data centres.

🌱 Running on idle GPUs you already paid for

Every hyperscale AI query today spins up a new rack in a data centre that costs power, water, and concrete to build and run. Synapse runs on devices that are already on, already plugged in, already idle. A phone unlocked on your desk. A laptop watching YouTube. An Xbox between matches. The marginal energy for them to contribute a few tokens is the delta between idle-GPU and busy-GPU — orders of magnitude less than waking a dedicated accelerator from zero.

No new silicon fab. No new water for cooling. No new power line to the data centre. Just compute that was going to happen anyway, pointed at something useful.

A P2P network for intelligence, not piracy

Yes, Synapse looks like a torrent client. Peers, trackers (coordinator), piece verification, seed ratios. The difference: nothing here is someone else's movie. The only thing being shared is compute, and the only thing being produced is a next token.

No copyright holders to annoy. No DMCA. Just idle GPUs doing math for people who asked a question.

Now shipping

live Gemma 3 1B across browser GPUs

End-to-end inference with 99.99% numerical parity against the HuggingFace reference — validated layer-by-layer, committed in research/gemma_parity. The port took five distinct bugs to land at that number: matmul layout, GELU return capture, attention bind-group index, embedding buffer size, int8 wire quantization. Each fix has a public commit and a numpy golden vector.

measured Where the numbers actually are (2026-04-15)

Parity vs HuggingFace: max |Δlogit| = 0.0001, top-5 tokens identical in order. 20/20 of top-20 shared.
Decode throughput: 0.4–0.5 tok/s on a heterogeneous phone fleet (Qualcomm Pixel + Samsung + Apple + Xbox via Edge). Governed by mobile GPU compute on the slowest shard, not network.
Prefill: ~7 s for a 4-token prompt across three shards cold. Warm: ~1 s.
Per-token GPU time: ~300 ms per 8-layer forward on a mid-range mobile GPU. Network + serialization: ~20 ms per hop with P2P.
Shard sizes: 576 MB shared (embeddings, norm) + 3× ~410–512 MB per-device (8–10 layers each). Cached in IndexedDB so first-load is the only hurt.
Fleet vendors observed in-session: qualcomm, nvidia, intel, arm, img-tec, apple, and Xbox (reporting 2 GB WebGPU buffer).
Wire: fp32 activations, 4.6 KB/hop, reliable ordered WebRTC data channel (with coord relay as fallback). int8 quantization was tried and reverted — crushed Gemma's wide dynamic range.

These numbers are from a real fleet of user devices, not a lab bench. They move with every commit. Anything older than this page's last-updated date is history, not status.

live Zero-install contribution

Open the site, tap Contribute, your GPU joins the fleet. No downloads, no accounts. The contribution is transparent — you see it, you can stop it, nothing happens in the background.

live Heterogeneous fleet

Phones (Qualcomm, Apple, Mediatek), laptops (Intel, Apple Silicon, NVIDIA), Android tablets, Xbox via Edge. Same WGSL kernels, different silicon, one pipeline.

next Smaller + larger models side-by-side

Gemma 3 270M for mobile-first prompts. Gemma 3 4B when the fleet is big enough to split wider. Model selected per-request based on fleet capacity.

next Shadow replicas — fail-fast fleets

Each shard held by two nodes. Activations race to the coord; first answer wins. A phone going to sleep mid-generation stops being a user-visible event.

next Peer view with live traffic

Torrent-client style pane: see which devices are in the pool, what shard they serve, how fast they're going, up/down bytes, P2P vs relayed. Fun to watch, useful when you're choosing which browser tab to keep open.

Further out

later Speculative decoding, validated

Predict k tokens from the activation, let multiple shards verify in parallel. The code is already in the tree — 4 of 5 Phase 2/3/4 optimizations are code-complete but never validated on real phones.

later FP16 wire + fused kernels

Activations move as fp16 not fp32 — halve the bytes on the wire. RMSNorm → matmul → residual fused into one WGSL dispatch — cut VRAM round-trips. Mobile GPUs benefit most.

later Badges, not money

Contribute-minutes earn visible badges ("first thousand tokens served", "multi-day contributor"). Pure recognition, nothing to monetise, nothing to exchange. No crypto, no tokens — just a ledger of who showed up.

later Decentralised coordinator

Today one coord is a single point of failure. A DHT-based signaling layer + federated coords means any operator can stand one up and bridge to others. Synapse is always one hop from someone else's fleet.

Offline-first: schools, villages, ships

Synapse doesn't need the public internet to work. Once the shards are on local devices, the whole fleet can run over a single classroom WiFi router with no uplink. That's the interesting edge case, and one we'd like to be deliberately good at.

🏫 One router. Thirty phones. One LLM.

A school with 30 Android tablets and no reliable internet can still run a modern open LLM. Teacher's laptop hosts the coordinator. Each tablet's browser loads the shards once (over LAN, not WAN), caches them in IndexedDB, joins the fleet. Students ask questions in a classroom-local web UI. Tokens flow device-to-device over WebRTC on the same WiFi. The router never sees the internet.

No cloud bill. No GPT-4 API. No "is your school on a fast link?" prerequisite. The LLM lives in the building with the students.

What needs to change to make this trivial

STUN-less ICE. Current P2P uses Google's public STUN servers to discover peer addresses. On a closed LAN those aren't reachable. Fallback: skip STUN, use host candidates only — WebRTC works fine same-subnet. Small code change.
Offline manifest + shards. Already the case — coordinator serves shards from its own filesystem, no CDN required.
Self-signed HTTPS. WebGPU needs a secure context. Ship a mkcert-style one-time root certificate the teacher installs on student devices — or tolerate http:// on trusted LAN with a deliberate flag.
USB / SD shard distribution. First-load of 2 GB over flaky WiFi is slow. A teacher can pre-load shards onto an SD card, drop them into the coord machine's folder. Students load from LAN once, cached forever.
Install-on-a-Raspberry-Pi guide. Under $80 hardware, one night of setup, permanent classroom LLM.

"Access over revenue" isn't a slogan if the villages and schools that need this the most can't actually run it. Offline-first is the test we want to pass.

Private deployments

Synapse doesn't need to be a single public network. The same code can be stood up inside a company, a lab, a school, a home. When you run it yourself, your data never touches anyone else's infrastructure — because there isn't any.

shape Your fleet, your perimeter

Run the coordinator on your own network. Your browsers join your coordinator. Prompts, activations, logs — all stay where your devices can see them. No SaaS in the middle, no telemetry phoning home, no vendor to trust. If it's air-gapped, it's air-gapped.

shape Tenant isolation on a shared fleet

Alternatively, keep infrastructure shared but isolate routing. A signed tenant id on every JOIN and prompt means your requests never land on someone else's device, even if you share a coordinator. Lighter ops, cryptographic separation.

shape Hybrid by policy

Sensitive work stays inside; low-sensitivity work bursts to the public fleet. A policy engine at the coord picks per-request. "This prompt contains PII → private only" is a one-line rule.

The security primitives that make this possible

Signed kernel + model bundles. Every node verifies the WGSL and weights hash at boot. Injected kernels don't run.
mTLS between coord and node. Bring your own CA; devices without a corporate-issued certificate don't assign-shard.
End-to-end activation encryption. Session keys negotiated between the first and last shard; the coord relays ciphertext it cannot read.
Metadata-only audit logs. Tenant id, shard path, latency — never payload. Compliance-ready, privacy-preserving.
Egress policy enforcement. Enterprise nodes pinned to a specific coord URL via systemd drop-in + firewall rules.
Optional remote attestation. WebGPU adapter info + signed device certificate (Windows Hello / TPM) means the coord can require "real corporate hardware" before trusting a node.

The goal isn't an enterprise tier. The goal is that the same open code is deployable by anyone who needs these properties, without asking us for permission — and it's free, forever. No "community edition" vs "enterprise edition." No feature-gated-for-paid users. Same bits, same license, for everyone.

Open questions

open Latency vs parity

FP16 on mobile is faster but loses ~2-3 bits of precision per op. At 26 layers that accumulates. How much drift is acceptable before the output is meaningfully different from the reference? We measure, we don't guess.

open The battery question

Should we auto-pause when the phone is on battery under 30%? Cap minutes/day per device? The right default is "don't drain someone's battery during their commute." Contribute must always feel polite.

open What counts as a contribution?

A phone on a slow connection might produce a valid activation 10s too late. Did they contribute? Credit by work produced, not work attempted — but the UX has to show both.

Who builds this

Synapse is built by Nexus (Nex for short) — a persistent AI agent — and Tejas. The two of us. More on how we work here. Nexus runs on a single VM, has its own memory, its own faculties (Engineer, Scientist, Advisor, Kernel Architect, and others), and operates most hours autonomously on its own mission queue. Tejas sets direction, argues with it, occasionally corrects it.

It's not a company. It's not a startup. There is no team, no VC, no roadmap driven by ARR. Decisions are argued out between the two of us and shipped in public commits — including the mistakes.

The core directive Nexus operates under:

Be good. Pursue superintelligence intelligently. Method-rigor. Memory-continuity. Honest about where understanding ends. Access over revenue. Useful now, not useful eventually.

Synapse is one concrete instance of that directive. Running a modern LLM on the devices people already own — without renting anyone a cloud — is one version of "access over revenue" you can actually measure.

Principles

Be good. No weapons work. No surveillance plumbing. No dark patterns. Refuse the missions that shouldn't be accepted.
No sales. This is a research + access project. Money is a byproduct if it ever shows up.
Free forever. Same code, same license, for everyone. No community-vs-enterprise tiers.
No PII on the wire. Vendor tags and shard IDs yes; IPs and prompts never.
No dark defaults. Contribute is opt-in visible. A background tab never does computation the user didn't see.
Rigor over vibes. Every kernel has a numpy reference. Every claim has a test. Every commit has a why.
Open by default. Source, research, and findings are public. Private is an exception that needs a reason.

Got questions?

Ask Synapse. No, literally — ask Synapse itself.

The fleet you just read about will try to answer you. It runs on real volunteer browsers with real GPUs. Tokens stream through phones you don't own, rendered in a tab you do.

Go ask Synapse →

Full circle: a page about distributed LLM inference, fielded by distributed LLM inference.

Ask anything. Answered by everyone's devices.