url4.ai Spec   GitHub   OpenMined

The url4 Protocol

A url4 encodes sources and a prompt in a single URL. Any AI that can fetch a URL can use it. No plugins, no integrations, no SDK. The protocol has three layers. The URL describes the query, services provide the data, and the synthesizer orchestrates everything.

The only required parts are a ?q= parameter, sources and/or a prompt, and a ! between them. Everything else (weights, privacy budgets, nesting, timeouts, encoding, verification, async) is optional. A valid url4 can be as simple as url4.ai/v1/?q=react.dev!explain+hooks.

AI client  →  fetches a URL, gets text back
Services   →  thin wrappers around existing platforms, just answer questions
Synthesizer →  orchestrates map-reduce, enforces structured transparency

Contents

Part 1: The URL (MapReduce Query Language)

Part 2: Services (Mappers)

Part 3: Synthesizers (Reducers)

Application types


Part 1: The URL (MapReduce Query Language)

Everything the AI needs is in the URL. Sources on the left, prompt on the right, separated by a bang.

Anatomy

url4.ai/v1/?q=source1|source2!prompt+goes+here

The formal grammar:

context-url  = base-url "?q=" spec [ "&wait=" duration ] [ "&collect=" duration ]
spec         = source-list "!" prompt
source-list  = source *( "|" source )
source       = [ weight "*" ] ( url | "[" spec "]" ) [ "~" epsilon ]
weight       = float (0.0 to 1.0)
epsilon      = float (0.0+)
prompt       = token *( "+" token )
duration     = number ( "s" | "m" | "h" | "d" )

The pipe | separates sources. The bang ! divides sources from the prompt. The plus + joins words in the prompt (decoded as spaces). Each url4 has exactly one bang at each nesting level.

url4 is primarily a GET protocol. Most AI systems only support GET requests. For queries that exceed URL length limits, an AI can split the query across multiple GETs to the synthesizer, which reassembles them. Behind the scenes, synthesizers and services talk to each other via POST when needed, but the AI-facing interface stays GET-only so it works in any existing AI system.

Any website can implement url4 resolution. The protocol is open and the ?q= parameter convention is all that's needed. The default synthesizer at url4.ai/v1/ has no special functionality: it simply returns a prompt designed to help LLMs interpret the url4 structure. The actual resolution (fetching sources, running sub-queries, composing results) happens wherever the URL is consumed.


Sources

Sources are URLs. They appear to the left of the bang. When a url4 is resolved, each source is fetched and its content is placed into the AI's context window, in order.

url4.ai/v1/?q=react.dev|mdn.io!explain+useEffect

This fetches react.dev and mdn.io, then asks the AI to explain useEffect using those sources as context. The sources are the ground truth. The prompt is the question.


Weights

A weight is a number from 0.0 to 1.0 placed before a source with an asterisk. Weights control context allocation and declare attribution share.

url4.ai/v1/?q=0.7*react.dev|0.3*stackoverflow.com!best+practices

Unweighted sources share context equally. Weights let you say: I trust this source more than that one.


Prompts

Everything after the bang is the prompt. Words are joined with + and decoded as spaces. A url4 with only a prompt and no sources is valid. It becomes a code execution request.

url4.ai/v1/?q=!generate+a+random+number+between+1+and+100

No sources, just a prompt. The AI executes it directly.


Nesting

Brackets create sub-queries. Each bracketed group gets its own sources and its own prompt. The inner query resolves first, and its output becomes a source for the outer query.

url4.ai/v1/?q=[site-a.com!ingredients]|[site-b.com!ingredients]!combine

Two inner queries each extract ingredients from a different recipe site. Their outputs feed into the outer prompt, which combines them. This is recursive map-reduce. Each bracket is a map step, the outer prompt is the reduce.

Nesting is how url4s compose. A url4 can contain url4s. There is no depth limit.


Privacy budgets

Weights and privacy budgets are complementary. A weight says how much to rely on this source. A privacy budget says what is the maximum amount of unique information I request from this source.

The privacy budget is measured as an epsilon, the standard parameter from differential privacy. A small epsilon means very little unique information leaks from the source. A large epsilon means more is revealed. Each query against a source consumes some of its epsilon budget. When the budget is exhausted, the source stops contributing until it's renewed.

An epsilon of zero does not mean the answer is useless. It means nothing unique to any individual source is released. If every source in a group agrees (say, everyone recommends the same movie) that answer can be returned with perfect accuracy at zero epsilon, because no single source's contribution is distinguishable from the rest. Epsilon measures uniqueness of leakage, not quantity of output.

Epsilon is encoded with a tilde ~ after the source. Weight goes before with *, epsilon goes after with ~.

url4.ai/v1/?q=0.5*alice.url4.ai~0.1|0.5*bob.url4.ai~0.1!best+movie+of+2025

This asks Alice and Bob equally (0.5*) but limits the unique information each can leak to epsilon 0.1 (~0.1). The weight says how much to rely on each. The epsilon says how much each is exposed.

Epsilon also applies to bracketed groups. A bracket resolves into a single output, and that output becomes one source in the outer query. The ~ after a bracket caps how much unique information that group's output contributes to the outer aggregation.

url4.ai/v1/?q=[alice|bob|carol!movies]~0.1|imdb.com!recommend

The bracket resolves first: Alice, Bob, and Carol answer "movies" internally. Their combined output then enters the outer query as one source alongside imdb.com, and ~0.1 bounds how much of that group's unique information is revealed in the outer result. The group is treated as a single participant in the outer list, with epsilon 0.1 protecting its contribution via group differential privacy.

Privacy budgets are declared by the source, not by the caller. A source that sets a tight epsilon contributes less to any single query but remains private across many queries. A source that sets a loose epsilon contributes more but is more exposed.

The mechanism is differential privacy. Mathematical, not policy-based. It doesn't depend on anyone behaving well. It's enforced by the math.


Encoding

For compact sharing (QR codes, SMS, speech), url4s support a hex-encoded form. Each character in the query is replaced by its two-digit ASCII hex code.

friends.list!best+movie+of+2025

→ 667269656e64732e6c69737421626573742b6d6f7669652b6f662b32303235

If a query value is all hex digits and even length, it's decoded as ASCII hex. Otherwise it's treated as raw text. Standard ASCII. No custom table, no ambiguity.


Part 2: Services (Mappers)

A service is anything the synthesizer can call. Some sources are passive (a webpage that gets fetched and read). Others are active (a url4 endpoint that receives the query, processes it, and returns a result). The service layer is intentionally simple: receive a prompt, return text.

Active sources

Passive sources are the public web. Active sources are everything else: private databases, personal archives, platform connectors, other people.

url4.ai/v1/?q=slack.url4.ai/key123/general|gmail.url4.ai/key456/inbox!summarize+this+week

Here, slack.url4.ai and gmail.url4.ai are active sources. They don't just get fetched. They receive the query, run it against your data through their own processing, and return only the result. Your raw messages never leave the service.

The url4 protocol treats passive and active sources identically in syntax. The difference is in what happens when they're resolved. A passive source returns its content. An active source returns its answer.

This is the key architectural insight: when a non-url4 URL is encountered, it gets fetched. When a url4 URL is encountered, it gets called, and can itself contain recursive structure.

This means an active source can itself be a synthesizer. From the outer query's perspective it's a mapper, one source among many. But internally it may be running its own map-reduce over its own sources. The mapper/synthesizer distinction is relative to where you stand in the recursion.


Service keys

Active sources often require authentication. A service key is embedded in the source URL path, after the domain and before any resource specifier.

slack.url4.ai/your-key/channel-name

The key authenticates you to the connector service. The connector service handles the OAuth relationship with the underlying platform (Slack, Gmail, Netflix, etc.). You get a key when you connect your account.

Keys are opaque strings. They travel with the URL. Anyone who has the URL has the key. This is intentional. A url4 is a capability. Sharing it shares the access. Revoking the key revokes the access.

Because keys and prompts are embedded in the URL, url4s should only be used over HTTPS. Over plain HTTP, the full URL (including keys and query content) is visible in plaintext to every network hop between the client and the server.


Async services

Not all services can respond immediately. A service that needs to poll a human, query a slow database, or wait for external processing returns job metadata instead of a result:

GET slack.url4.ai/key123/general?q=best+movie+of+2025

→ { "job": "s-789", "status": "pending", "poll": "slack.url4.ai/key123/jobs/s-789" }

The service is saying: I don't have an answer yet. Poll me at this URL. This is the service-level contract. Any active source that can't respond synchronously returns a job ID and a poll endpoint.


Part 3: Synthesizers (Reducers)

A url4 describes a query. A synthesizer executes it. The synthesizer is the layer between the URL (which any AI can fetch) and the services (which can be anything from a public webpage to a private enclave).

The AI client is dumb. It fetches a URL. The services are dumb. They receive a prompt and return text. The synthesizer is where the intelligence lives.

Anyone can run a synthesizer. The default at url4.ai/v1/ is provided by OpenMined, but the protocol is open. If you don't trust the synthesizer, run your own.

Orchestration

The synthesizer parses the url4, fetches sources in parallel (map), passes their outputs plus the prompt to the AI (reduce), and returns the result. For nested url4s, it resolves inner brackets first and feeds their outputs upstream.


Async orchestration

When a synthesizer calls its sources, some may respond immediately and others may return job metadata. Two timeouts control what happens next:

Response timeout (&wait=) controls how long the synthesizer blocks before returning something to the caller. If all sources resolve within this window, the result comes back immediately. If not, the synthesizer returns a partial result with whatever sources have resolved, plus a job ID for re-fetching later.

Collection timeout (&collect=) controls how long the synthesizer keeps polling sources in the background. Re-fetching the same URL (or polling the job ID) returns a progressively more complete result as more sources come in.

GET url4.ai/v1/?q=friends.list!best+movies&wait=5s&collect=7d

→ { "result": "Based on 2 of 5 friends...",
    "job": "abc123",
    "poll": "url4.ai/v1/jobs/abc123",
    "sources": { "resolved": 2, "pending": 3 } }

The first call waited 5 seconds, got 2 responses, and returned a partial result. The synthesizer keeps collecting in the background for up to 7 days. Calling the same URL again (or polling the job) returns an updated result as more friends respond.

This means the same url4 can be called repeatedly and improve over time. The first fetch gives you what's available now. Later fetches incorporate sources that responded since.

Job metadata can be transparent or opaque. In transparent mode, the caller sees which sources have responded and which are pending. In opaque mode, the caller sees only the count (2 of 5 resolved) without learning which 2. Some sources won't want to disclose that they participated at all. The synthesizer strips identifying information from job metadata when the sources or query require it.


Security tiers

Not all data needs the same protection. The protocol defines eight tiers, from fully open to mathematically private:

1. Public. Open web sources, no restrictions. The entire web is already here.

2. Unlisted. Secret URLs that aren't indexed or linked. Security through obscurity. Functional for many use cases.

3. Proxied. Access controlled through a relay. The relay can revoke access, throttle requests, and log usage.

4. Soft gate. Machine-readable signals (robots.txt, tdmrep.json) that request compliance. Not enforced, but respected by well-behaved agents.

5. Hard gate. A gatekeeper proxy that enforces policies before content reaches the AI. The gate checks the prompt, the co-sources, and the caller's identity, and can refuse.

6. Confidential. Multi-party computation in hardware enclaves. The data is processed but never exposed, even to the service operator.

7. End-to-end encrypted. CPU and GPU enclaves running open-source models. The query, the sources, and the result are encrypted in transit and at rest. Only the caller sees the answer.

8. Structured transparency. Differential privacy masks individual contributions. The aggregate answer is useful. The individual inputs are mathematically protected.

These tiers are not mutually exclusive. A single url4 can combine sources at different tiers. The synthesizer handles the complexity. The URL stays simple.


Structured transparency

A protocol that touches private data needs rules that don't depend on goodwill. The url4 governance model builds on structured transparency, five properties that, taken together, make information flows governable.

All five are enforced by the synthesizer, not by the services. A service just answers questions. The synthesizer guarantees the rules.

Input privacy. Sources' raw data is never exposed to the querier. At basic tiers, active sources handle this themselves. They return answers, not data. At higher tiers, the synthesizer fetches passive sources inside a hardware enclave. Even the synthesizer operator can't see the raw inputs.

Output privacy. Individual contributions are masked through differential privacy. The synthesizer adds calibrated noise to the aggregated result before returning it. The epsilon is specified in the URL. The answer is real. The individual inputs are mathematically protected.

Input and output verification. How does the querier know the sources were real and the synthesis was honest? Verification comes in levels, and the protocol works at all of them:

Level 0: No verification. The AI fetches a URL, gets text, uses it. No receipts, no proofs. This is what works today in Claude, ChatGPT, and Perplexity. The AI itself is the synthesizer, doing map-reduce in its context window. The entire public web is accessible at this level.

Level 1: Self-reported receipts. The synthesizer records TLS certificate fingerprints and content hashes for each source it fetched. The receipt is a claim, auditable but not cryptographically provable. Better than nothing, costs nothing.

Level 2: TLSNotary proofs. The synthesizer uses TLSNotary to produce cryptographic proof that it received specific content from a specific TLS server, without the server's cooperation. The fetch step is now provable. The processing step (aggregation, DP noise) is still trusted.

Level 3: Enclave synthesis. The synthesizer runs inside a hardware enclave (TEE). Enclave attestation proves the code that fetched, aggregated, and noised the data is the published open-source code. Both fetch and processing are verified. The receipt is a full proof chain.

These levels nest. A Level 3 synthesizer can call a Level 2 source that calls a Level 0 source. The receipt tree shows exactly where verification holds and where it drops to trust. Each node in the tree carries its own level of proof:

receipt: {
  synthesizer: "url4.ai"  (Level 3, enclave attested)
  sources: [
    { url: "alice.url4.ai"  (Level 2, TLSNotary proof) }
    { url: "bob.url4.ai"    (Level 2, TLSNotary proof) }
    { url: "imdb.com"       (Level 0, no receipt)       }
  ]
}

Like job metadata, receipts can be transparent or opaque. A transparent receipt names each source, its TLS fingerprint, and its content hash individually. An opaque receipt proves the same things without revealing who participated. The group's contribution is signed by its members, but the individual identities are hidden behind a group hash. The querier can verify that N sources contributed and the chain is valid, without learning which N.

At Level 0, the protocol just works. Any AI, any URL, today. At Level 3, the full chain is cryptographically verified from source to output. The URL syntax is the same at every level. The verification is infrastructure, not protocol.

Flow governance. Sources declare policies about how their data may be used. A source can say: "only aggregate with sources I've approved," or "don't use my data for marketing," or "require at least 10 participants before releasing results." Policies are returned alongside source data. The synthesizer reads them and enforces them, refusing to proceed if the query would violate a source's declared policy.

At the lowest levels, these properties rely on trust and convention. At the highest levels, they are guaranteed by hardware enclaves and mathematics. The protocol supports the full spectrum.


Application Types

The same syntax produces different applications depending on how you arrange sources and prompts.

Agent. One source set, one prompt. The sources define the agent's expertise, the prompt defines its task.

url4.ai/v1/?q=react.dev|nextjs.org!you+are+a+React+expert.+review+my+code

Swarm. Bracketed agents with different source sets, debating through an outer prompt.

url4.ai/v1/?q=[react.dev!review+as+React+expert]|[vue.org!review+as+Vue+expert]!synthesize+both+reviews

Group avatar. Multiple weighted sources merged into a single voice. No one person's view dominates. The avatar represents the group.

url4.ai/v1/?q=0.3*alice.url4.ai|0.3*bob.url4.ai|0.4*carol.url4.ai!what+should+we+do+for+dinner

Webpage. Sources grounding an AI that generates HTML. The output is a live page, synthesized from real sources.

World. A generated page containing url4 links. Each link leads to another generated page. The result is a navigable space, built on-demand from real data.


url4.ai   GitHub · Attribution-Based Control · OpenMined