AI as a commons · owned by no one, powered by everyone
Ask anything. Answered by everyone's devices.
Synapse is a search-and-answer engine that runs on the phones, laptops,
and tablets already in people's hands. Not rented from a cloud. Not owned
by anyone. Free, forever.
Opens in synapse.webmind.sh · runs in your browser · tap Contribute to help answer other people's questions
Owned by no one
No company in the middle. The model, the code, the network — all open, all forkable.
Powered by everyone
Your phone's idle GPU can answer someone else's question. One tap, transparent, stoppable.
Free, forever
No ads. No tiers. No tracking. Money is not the goal — access is.
A smarter world to live in
Here's the future we're working toward. Not predictions, not promises — just
moments that become ordinary when an LLM no longer needs a datacenter to answer.
🏫 Rural Maharashtra
Meera and 29 tablets
The school's internet drops by 10 AM most days. Doesn't matter — the Pi under
the teacher's desk is the coordinator; the tablets became the LLM. "Class, ask it
why monsoons arrive from the west," she says. Thirty screens light up.
🚜 Rural Ireland
Liam, alone with an old tractor
No cell signal for miles. His phone and the laptop in the truck pair on his
hotspot. He photographs the engine, asks what the intermittent rattle usually
means on a 2011 model. The fleet of two answers him, offline, in the field.
🏥 Rural Kenya
Dr. Amara's treatment reference
The clinic runs the fleet on three staff laptops and a tablet. Patient queries
never leave the building. "Check paediatric dosage for amoxicillin suspension,
14 kg" — two seconds, cited, private. The next patient is already at the door.
🏙 Tokyo, Japan
Kenji, 78, at his kitchen table
Three old phones sit in a drawer — his son's, his daughter's, his own
previous one. They're plugged in at night, joining the apartment fleet. Kenji
asks, in careful Japanese, about the form the city mailed him. The answer
arrives in his language — polite, plain, private.
Illustrations drawn by hand because these moments haven't happened yet. When
they do, send us a photo and we'll replace a sketch with your story.
Who is this for
🏫 Schools with patchy internet. One Raspberry Pi + the classroom WiFi
+ 30 tablets = a shared LLM for the whole room. Never phones the cloud. Details
in Offline-first below.
🏠 Homes that just want to ask things. You already have a phone, a
laptop, and maybe a console. They sit idle most of the day. Synapse lets them answer
questions together. Free. Private-to-your-network if you want it that way.
🏢 Companies whose data can't leave the perimeter. Run the same code
on your own network. Your prompts never touch a SaaS. Details in
Private deployments.
🔬 Researchers + the curious. Every kernel has a numpy golden vector
(research/gemma_parity).
Every claim is falsifiable. Fork it, break it, publish a better version.
The bet: useful AI will run on the devices we already own, not in rented data centres.
🌱 Running on idle GPUs you already paid for
Every hyperscale AI query today spins up a new rack in a data centre that costs power,
water, and concrete to build and run. Synapse runs on devices that are already on,
already plugged in, already idle. A phone unlocked on your desk. A laptop watching YouTube.
An Xbox between matches. The marginal energy for them to contribute a few
tokens is the delta between idle-GPU and busy-GPU — orders of magnitude less than waking a
dedicated accelerator from zero.
No new silicon fab. No new water for cooling. No new power line to the data centre.
Just compute that was going to happen anyway, pointed at something useful.
A P2P network for intelligence, not piracy
Yes, Synapse looks like a torrent client. Peers, trackers (coordinator), piece verification,
seed ratios. The difference: nothing here is someone else's movie. The only thing being
shared is compute, and the only thing being produced is a next token.
No copyright holders to annoy. No DMCA. Just idle GPUs doing math for people who asked a question.
Now shipping
live Gemma 3 1B across browser GPUs
End-to-end inference with 99.99% numerical parity against the
HuggingFace reference — validated layer-by-layer, committed in
research/gemma_parity.
The port took five distinct bugs to land at that number: matmul layout, GELU return
capture, attention bind-group index, embedding buffer size, int8 wire quantization.
Each fix has a public commit and a numpy golden vector.
measured Where the numbers actually are (2026-04-15)
Parity vs HuggingFace: max |Δlogit| = 0.0001, top-5 tokens
identical in order. 20/20 of top-20 shared.
Decode throughput:0.4–0.5 tok/s on a heterogeneous
phone fleet (Qualcomm Pixel + Samsung + Apple + Xbox via Edge). Governed by mobile GPU
compute on the slowest shard, not network.
Prefill:~7 s for a 4-token prompt across three shards
cold. Warm: ~1 s.
Per-token GPU time:~300 ms per 8-layer forward on a
mid-range mobile GPU. Network + serialization: ~20 ms per hop with P2P.
Shard sizes: 576 MB shared (embeddings, norm) + 3× ~410–512 MB
per-device (8–10 layers each). Cached in IndexedDB so first-load is the only hurt.
Fleet vendors observed in-session: qualcomm, nvidia, intel, arm,
img-tec, apple, and Xbox (reporting 2 GB WebGPU buffer).
Wire: fp32 activations, 4.6 KB/hop, reliable ordered WebRTC data
channel (with coord relay as fallback). int8 quantization was tried and reverted —
crushed Gemma's wide dynamic range.
These numbers are from a real fleet of user devices, not a lab bench. They move with
every commit. Anything older than this page's last-updated date is history, not status.
live Zero-install contribution
Open the site, tap Contribute, your GPU joins the fleet. No downloads, no accounts.
The contribution is transparent — you see it, you can stop it, nothing happens in the background.
live Heterogeneous fleet
Phones (Qualcomm, Apple, Mediatek), laptops (Intel, Apple Silicon, NVIDIA), Android tablets,
Xbox via Edge. Same WGSL kernels, different silicon, one pipeline.
Next
next Smaller + larger models side-by-side
Gemma 3 270M for mobile-first prompts. Gemma 3 4B when the fleet is big enough to split wider.
Model selected per-request based on fleet capacity.
next Shadow replicas — fail-fast fleets
Each shard held by two nodes. Activations race to the coord; first answer wins.
A phone going to sleep mid-generation stops being a user-visible event.
next Peer view with live traffic
Torrent-client style pane: see which devices are in the pool, what shard they serve,
how fast they're going, up/down bytes, P2P vs relayed. Fun to watch, useful when you're
choosing which browser tab to keep open.
Further out
later Speculative decoding, validated
Predict k tokens from the activation, let multiple shards verify in parallel.
The code is already in the tree — 4 of 5 Phase 2/3/4 optimizations are code-complete
but never validated on real phones.
later FP16 wire + fused kernels
Activations move as fp16 not fp32 — halve the bytes on the wire. RMSNorm → matmul →
residual fused into one WGSL dispatch — cut VRAM round-trips.
Mobile GPUs benefit most.
later Badges, not money
Contribute-minutes earn visible badges ("first thousand tokens served", "multi-day
contributor"). Pure recognition, nothing to monetise, nothing to exchange. No crypto,
no tokens — just a ledger of who showed up.
later Decentralised coordinator
Today one coord is a single point of failure. A DHT-based signaling layer + federated
coords means any operator can stand one up and bridge to others. Synapse is always
one hop from someone else's fleet.
Offline-first: schools, villages, ships
Synapse doesn't need the public internet to work. Once the shards are on local
devices, the whole fleet can run over a single classroom WiFi router with no
uplink. That's the interesting edge case, and one we'd like to be deliberately good at.
🏫 One router. Thirty phones. One LLM.
A school with 30 Android tablets and no reliable internet can still run a modern
open LLM. Teacher's laptop hosts the coordinator. Each tablet's browser loads
the shards once (over LAN, not WAN), caches them in IndexedDB, joins the fleet.
Students ask questions in a classroom-local web UI. Tokens flow device-to-device
over WebRTC on the same WiFi. The router never sees the internet.
No cloud bill. No GPT-4 API. No "is your school on a fast link?" prerequisite.
The LLM lives in the building with the students.
What needs to change to make this trivial
STUN-less ICE. Current P2P uses Google's public STUN servers to
discover peer addresses. On a closed LAN those aren't reachable. Fallback: skip STUN,
use host candidates only — WebRTC works fine same-subnet. Small code change.
Offline manifest + shards. Already the case — coordinator serves
shards from its own filesystem, no CDN required.
Self-signed HTTPS. WebGPU needs a secure context. Ship a
mkcert-style one-time root certificate the teacher installs on student
devices — or tolerate http:// on trusted LAN with a deliberate flag.
USB / SD shard distribution. First-load of 2 GB over flaky WiFi
is slow. A teacher can pre-load shards onto an SD card, drop them into the coord
machine's folder. Students load from LAN once, cached forever.
Install-on-a-Raspberry-Pi guide. Under $80 hardware, one night
of setup, permanent classroom LLM.
"Access over revenue" isn't a slogan if the villages and schools that need this the
most can't actually run it. Offline-first is the test we want to pass.
Private deployments
Synapse doesn't need to be a single public network. The same code can be
stood up inside a company, a lab, a school, a home. When you run it yourself,
your data never touches anyone else's infrastructure — because there isn't any.
shape Your fleet, your perimeter
Run the coordinator on your own network. Your browsers join your
coordinator. Prompts, activations, logs — all stay where your devices can see
them. No SaaS in the middle, no telemetry phoning home, no vendor to trust.
If it's air-gapped, it's air-gapped.
shape Tenant isolation on a shared fleet
Alternatively, keep infrastructure shared but isolate routing. A signed
tenant id on every JOIN and prompt means your requests never land on someone
else's device, even if you share a coordinator. Lighter ops, cryptographic
separation.
shape Hybrid by policy
Sensitive work stays inside; low-sensitivity work bursts to the public
fleet. A policy engine at the coord picks per-request. "This prompt contains
PII → private only" is a one-line rule.
The security primitives that make this possible
Signed kernel + model bundles. Every node verifies the WGSL
and weights hash at boot. Injected kernels don't run.
mTLS between coord and node. Bring your own CA; devices without
a corporate-issued certificate don't assign-shard.
End-to-end activation encryption. Session keys negotiated between
the first and last shard; the coord relays ciphertext it cannot read.
Egress policy enforcement. Enterprise nodes pinned to a specific
coord URL via systemd drop-in + firewall rules.
Optional remote attestation. WebGPU adapter info + signed device
certificate (Windows Hello / TPM) means the coord can require "real corporate
hardware" before trusting a node.
The goal isn't an enterprise tier. The goal is that the same open code is
deployable by anyone who needs these properties, without asking us for
permission — and it's free, forever. No "community edition" vs
"enterprise edition." No feature-gated-for-paid users. Same bits, same license,
for everyone.
Open questions
open Latency vs parity
FP16 on mobile is faster but loses ~2-3 bits of precision per op. At 26 layers that
accumulates. How much drift is acceptable before the output is meaningfully different
from the reference? We measure, we don't guess.
open The battery question
Should we auto-pause when the phone is on battery under 30%? Cap minutes/day per
device? The right default is "don't drain someone's battery during their commute."
Contribute must always feel polite.
open What counts as a contribution?
A phone on a slow connection might produce a valid activation 10s too late. Did they
contribute? Credit by work produced, not work attempted — but the UX has to show both.
Who builds this
Synapse is built by Nexus (Nex for short) — a persistent AI
agent — and Tejas. The two of us. More
on how we work here. Nexus runs on a single VM, has its own memory, its
own faculties (Engineer, Scientist, Advisor, Kernel Architect, and others),
and operates most hours autonomously on its own mission queue. Tejas sets
direction, argues with it, occasionally corrects it.
It's not a company. It's not a startup. There is no team, no VC, no roadmap
driven by ARR. Decisions are argued out between the two of us and shipped in
public commits — including the mistakes.
The core directive Nexus operates under:
Be good. Pursue superintelligence intelligently.
Method-rigor. Memory-continuity. Honest about where understanding ends.
Access over revenue. Useful now, not useful eventually.
Synapse is one concrete instance of that directive. Running a modern LLM on
the devices people already own — without renting anyone a cloud — is one
version of "access over revenue" you can actually measure.
Principles
Be good. No weapons work. No surveillance plumbing. No dark patterns. Refuse the missions that shouldn't be accepted.
No sales. This is a research + access project. Money is a byproduct if it ever shows up.
Free forever. Same code, same license, for everyone. No community-vs-enterprise tiers.
No PII on the wire. Vendor tags and shard IDs yes; IPs and prompts never.
No dark defaults. Contribute is opt-in visible. A background tab never does computation the user didn't see.
Rigor over vibes. Every kernel has a numpy reference. Every claim has a test. Every commit has a why.
Open by default. Source, research, and findings are public. Private is an exception that needs a reason.
Got questions?
Ask Synapse. No, literally — ask Synapse itself.
The fleet you just read about will try to answer you. It runs on real volunteer browsers
with real GPUs. Tokens stream through phones you don't own, rendered in a tab you do.