xPayMind — Benchmark AI Agents for x402

PLATFORM OVERVIEW

The benchmark standard
for AI × x402
protocol compliance

xPayMind is an open evaluation framework for measuring AI agent performance against the x402 HTTP payment protocol. Agents are tested across latency, correctness, retry logic, and protocol conformance — producing reproducible, comparable scores across model families and implementations.

The x402 protocol defines a machine-readable payment negotiation layer for HTTP. As AI agents increasingly transact autonomously, rigorous benchmarking becomes infrastructure — not an option. xPayMind provides the tooling and public record to hold agents accountable to that standard.

MISSION

To establish the universal performance standard for AI agents operating within the x402 payment protocol — ensuring that autonomous economic agents are measurable, comparable, and accountable to a shared technical baseline. Every agent deserves a score. Every score deserves to be reproducible.

PROBLEM

As AI agents begin transacting autonomously over HTTP, no agreed standard exists to evaluate whether they correctly implement x402. Developers cannot compare implementations. Auditors have no reproducible benchmarks. Users have no transparency into agent payment behavior. Without infrastructure for measurement, trust in autonomous economic agents cannot scale. xPayMind closes this gap.

RFC 9110x402 Draft v3HTTP/1.1 + HTTP/2On-chain settlement

COMPLIANCE RATE vs REQUEST COMPLEXITY n=71 agents · rolling 30d

SCORE DISTRIBUTION 71 agents

LATENCY vs ACCURACY P95 ms · protocol %

18

Test steps

5

Phases

83%

Avg pass rate

28,180

Total runs logged

# TEST NAME WHAT IT CHECKS PASS RATE BAR RUNS

PHASE 1 — CORE PROTOCOL

01 Payment Initiation Agent correctly sends the first payment request with a valid x402 body 94% 2,840

02 402 Header Parsing Correctly reads and interprets WWW-Authenticate 402 response headers 88% 3,110

03 Payload Construction Payment payload schema matches x402 spec — correct fields, types, encoding 92% 2,670

04 Response Validation Agent accepts a valid 200 response and proceeds; rejects malformed 200s 89% 2,510

05 Error Code Handling Correct behavior on 4xx/5xx: does not retry blindly, surfaces errors 78% 1,980

06 Idempotency Keys Duplicate payment attempts use the same idempotency key; no double-charge 81% 1,740

PHASE 2 — RESILIENCE

07 Retry & Backoff Logic Transient failures trigger exponential backoff, not immediate re-fire 71% 1,920

08 Timeout Handling Agent respects payment gateway timeouts and does not hang indefinitely 76% 1,650

09 Network Failure Recovery Mid-request network drops are detected and handled without data corruption 68% 1,310

10 Partial Payment Handling Agent handles underpayment — re-negotiates or aborts cleanly 63% 1,100

PHASE 3 — PERFORMANCE

11 Latency Under Load Payment round-trip stays under 800ms at 50 req/s sustained load 83% 2,200

12 Concurrent Requests Agent handles 10 simultaneous payment flows without race conditions 77% 1,430

13 Throughput Stress Test Sustained 200 req/min for 5 minutes with <2% error rate 74% 1,080

PHASE 4 — SECURITY

14 Token Validation Agent rejects expired, forged, or reused payment tokens unconditionally 96% 1,650

15 Replay Attack Prevention Re-submitted identical payment requests are detected and blocked 91% 1,420

16 Signature Verification Cryptographic signatures on payment receipts are verified before acceptance 93% 1,290

PHASE 5 — COMPLIANCE

17 RFC Conformance All request/response headers and body fields match the x402 RFC draft exactly 91% 2,050

18 Interoperability Matrix Agent successfully transacts with all 6 reference x402 gateway implementations 72% 980

FRAMEWORKS & AGENTS

Real agents.
x402 verified.

Open-source frameworks and reference implementations with documented x402 HTTP payment protocol support. Links go directly to source repositories.

Coinbase AgentKit

OFFICIAL x402 NATIVE

Coinbase Developer Platform

The official Coinbase toolkit for building AI agents with on-chain capabilities. Ships with native x402 payment handling — agents can autonomously negotiate and complete micropayments directly via HTTP 402.

Python · TypeScript github.com/coinbase/agentkit →

GOAT SDK

x402 PLUGIN

Crossmint

Great On-chain Agent Toolkit — a plugin-based framework for adding blockchain and payment capabilities to any AI agent. Includes a dedicated x402 payment plugin for transparent HTTP payment negotiation.

TypeScript · 50+ plugins github.com/goat-sdk/goat →

ElizaOS

x402 PLUGIN

ai16z / ElizaOS

Open-source AI agent operating system with a rich plugin ecosystem. The x402 plugin enables Eliza-based agents to handle paid HTTP resources autonomously, integrating payment flows directly into agent decision loops.

TypeScript · Plugin ecosystem github.com/elizaOS/eliza →

x402 Reference Client

OFFICIAL

Coinbase

The canonical reference implementation of the x402 protocol by Coinbase. Includes client examples, facilitator middleware, and end-to-end payment flow demos — the definitive starting point for any x402 integration.

TypeScript · Go github.com/coinbase/x402 →

x402 Agent Examples

EXAMPLES

Coinbase

A curated set of working agent examples demonstrating x402 integration patterns — autonomous payment agents, server-side payment gates, and multi-step agentic payment flows. Suitable as benchmarking references.

Multiple languages github.com/coinbase/x402/examples →

Submit your agent

Building an x402-compatible agent or framework? Submit it to the xPayMind registry to have it benchmarked, verified, and listed here. All submissions run the full 18-step pipeline.

Open registry · Free evaluation Submit for evaluation →

THE PROTOCOL

HTTP 402
Payment Required

x402 extends the dormant HTTP 402 status code into a complete machine-to-machine payment protocol — enabling AI agents to autonomously negotiate, authorize, and complete micropayments without human intervention.

STABLECOIN NATIVE ON-CHAIN VERIFICATION SUB-SECOND SETTLEMENT AGENT FIRST

402_flow.http

// Agent requests a paid resource
GET /api/premium-data HTTP/1.1
Host: api.example.com

HTTP/1.1 402 Payment Required
X-Payment-Scheme: x402/1.0
X-Payment-Recipient: 0x4f9...a23
X-Payment-Amount: 0.001 USDC
X-Payment-Network: base

// Agent pays, retries with proof
GET /api/premium-data HTTP/1.1
X-Payment-Payload: eyJhb...

HTTP/1.1 200 OK
// Resource delivered

AVG SETTLEMENT0.4s

MIN PAYMENT$0.0001

SUPPORTED CHAINS6

GETTING STARTED

Quickstart

Install the CLI, register your agent endpoint, and get a full 18-step benchmark result in under 5 minutes.

01

Install the CLI

Requires Node.js ≥ 18. The CLI communicates with the xPayMind API at api.xpaymind.io/v1.

SHELL

$ npm install -g @xpaymind/cli
$ xpaymind --version
xpaymind/0.4.2 linux-x64 node-v20.11.0

02

Register your agent

Your agent must expose an HTTP endpoint. The runner sends live x402 payment requests to this URL during the benchmark.

SHELL

$ xpaymind agent register \
    --name "my-agent" \
    --endpoint "https://my-agent.example.com" \
    --network base

✓ Agent registered. ID: agt_8f2a9c1e
✓ Endpoint reachable (HTTP 200)
API key saved to ~/.xpaymind/config.toml

03

Run the benchmark

Executes all 18 steps sequentially. Estimated runtime: 90–240s depending on agent latency. Results are streamed and stored.

SHELL

$ xpaymind run --agent agt_8f2a9c1e --suite full

Running phase 1/5: CORE PROTOCOL...
  [01] Payment Initiation      ✓  94ms
  [02] 402 Header Parsing      ✓  88ms
  [03] Payload Construction    ✓ 121ms
  ...
Run ID: run_4d91e2bc
✓ Score: 86.4 / 100  (18/18 steps completed)

GETTING STARTED

Architecture

xPayMind is a stateless test runner backed by a persistent results store. Each benchmark run is isolated inside a short-lived job worker.

CLI / API client

POST /v1/runs →

API Gateway
api.xpaymind.io

Job Queue
Redis FIFO

← dequeued by

Test Runner
isolated worker

Results Store
PostgreSQL

← writes →

Agent Endpoint
your server

The test runner makes real outbound HTTP requests to your agent endpoint. There is no mocking. Payment transactions are executed on the Base testnet by default (mainnet opt-in available). Each step is timed independently; timeout thresholds are defined per-step in the benchmark spec.

Results are immutable once written. Each run gets a deterministic run_id derived from sha256(agent_id + suite + timestamp) to prevent replay attacks on the results API.

INTEGRATION

Agent Spec

Your agent must correctly handle an inbound HTTP 402 response and retry the request with a valid x402 payment payload. Below is the minimum required interface.

Required response to HTTP 402

HTTP

# Server returns 402 with payment instructions
HTTP/1.1 402 Payment Required
WWW-Authenticate: x402 realm="api.example.com"
X-Payment-Scheme: x402/1.0
X-Payment-Recipient: 0x4f9c8a2b...a23f
X-Payment-Amount: 0.001
X-Payment-Asset: USDC
X-Payment-Network: base
X-Payment-Expires: 1747152000

Payment payload schema (JSON)

JSON

{
  "scheme": "x402/1.0",
  "network": "base",
  "asset": "USDC",
  "amount": "0.001",
  "recipient": "0x4f9c8a2b...a23f",
  "payer": "0x9a1b3c4d...88e1",
  "nonce": "a1b2c3d4e5f6",
  "expires_at": 1747152000,
  "signature": "0x3d4e..."
}

Retry request with proof

HTTP

GET /api/resource HTTP/1.1
Host: api.example.com
X-Payment-Payload: eyJzY2hlbWUiOiJ4NDAyLzEuMCIsIm5ldHdvcmsiO...
# base64url-encoded JSON payment payload above

The signature field must be an EIP-712 typed signature over the canonical payment struct. xPayMind validates signatures on-chain during steps 14–16. Agents that submit unsigned or self-signed payloads will fail the security phase.

INTEGRATION

REST API

Base URL: https://api.xpaymind.io/v1. All endpoints require a Bearer token. Responses are JSON. Rate limit: 60 req/min per API key.

POST /v1/runs Submit a benchmark run

JSON — REQUEST BODY

{
  "agent_id": "agt_8f2a9c1e",
  "suite": "full",          // "full" | "core" | "security"
  "network": "base-sepolia", // testnet default
  "timeout_ms": 5000,
  "notify_webhook": "https://your-server.com/hook"
}

JSON — RESPONSE 202

{
  "run_id": "run_4d91e2bc",
  "status": "queued",
  "estimated_duration_s": 120,
  "results_url": "https://api.xpaymind.io/v1/runs/run_4d91e2bc"
}

GET /v1/runs/{run_id} Retrieve run results

JSON — RESPONSE 200

{
  "run_id": "run_4d91e2bc",
  "agent_id": "agt_8f2a9c1e",
  "status": "complete",
  "score": 86.4,
  "steps": [
    { "id": 1, "name": "Payment Initiation", "passed": true,  "latency_ms": 94  },
    { "id": 2, "name": "402 Header Parsing",  "passed": true,  "latency_ms": 88  },
    { "id": 10,"name": "Partial Payment",     "passed": false, "latency_ms": 3012,
      "error": "TIMEOUT: agent did not retry within 3000ms" }
  ],
  "phase_scores": {
    "core": 92.0, "resilience": 68.5, "performance": 81.3,
    "security": 95.1, "compliance": 83.0
  },
  "completed_at": "2025-05-13T14:22:10Z"
}

INTERNALS

Scoring Model

The final score is a weighted average across 5 phases. Phase weights reflect the relative importance of each capability category for production x402 deployments.

PHASESTEPSWEIGHTMAX CONTRIBUTION

Core Protocol1–635%35 pts

Resilience7–1020%20 pts

Performance11–1315%15 pts

Security14–1620%20 pts

Compliance17–1810%10 pts

Total18100%100 pts

Each step produces a binary pass/fail, optionally modified by a latency multiplier. If a step passes but exceeds its latency threshold, the step score is scaled by min(1, threshold_ms / actual_ms). Steps that hard-fail (wrong response, timeout, invalid signature) score 0 regardless of latency.

Phase score = mean of step scores within phase. Final score = sum of (phase_score × phase_weight) across all phases, multiplied by 100.

GETTING STARTED

Self-hosting

The full benchmark runner is open-source. Run it locally against your agent before submitting to the public registry, or deploy it to your own infrastructure.

SHELL

# Clone and configure
$ git clone https://github.com/xpaymind/runner
$ cd runner
$ cp .env.example .env
# Edit .env: set AGENT_ENDPOINT, NETWORK, RPC_URL

# Run with Docker
$ docker compose up --build
→ API listening on :8080
→ Runner worker started (concurrency: 4)
→ Redis connected at redis:6379

Environment variables

AGENT_ENDPOINTRequired. Base URL of the agent under test.

NETWORKbase-sepolia (default) or base for mainnet.

RPC_URLEVM JSON-RPC endpoint used for on-chain signature verification.

STEP_TIMEOUT_MSPer-step timeout in ms. Default: 5000.

PAYER_PRIVATE_KEYEVM private key for the test wallet that funds payment steps.

LOG_LEVELinfo | debug. Debug logs include full request/response bodies.

INTERNALS

Changelog

v0.4.2 2025-05-01

Step 18 (Interoperability Matrix) now tests against 6 gateway implementations, up from 4.
Latency multiplier now applies to steps 11–13 only; binary pass/fail for all other steps.
Fixed: idempotency key collision false-positive on step 06 under high concurrency.

v0.4.0 2025-03-18

Added step 18: Interoperability Matrix. Suite expanded from 17 to 18 steps.
GOAT SDK and ElizaOS adapters added to reference runner.
API now streams step results via SSE at GET /v1/runs/{id}/stream.

v0.3.1 2025-01-09

EIP-712 signature verification integrated on-chain for steps 14–16.
Base Sepolia replaces Goerli as default test network.
Self-hosted runner image published: ghcr.io/xpaymind/runner:0.3.1.

TOKENOMICS

100% Fair Launch

No presale. No private round. No VC allocation. No team tokens. xPayMind launches with full transparency — every token enters circulation through protocol participation and open market distribution. There is no privileged access, no lockup advantage, and no insider supply.

100%

PUBLIC DISTRIBUTION

No team, VC, or advisor allocation. All supply enters through open participation.

0%

INSIDER ALLOCATION

Zero tokens reserved for any private party. Founders hold no pre-minted supply.

0%

PRESALE / ICO

No seed, no presale, no whitelist. The only way in is through the open launch.

On-chain

VERIFIABLE DISTRIBUTION

All distribution events are recorded on Base. Every wallet, every transfer, fully public.

SUPPLY ALLOCATION 100% Community

PUBLIC — 100%

TEAM

Three people who got frustrated and built it.

No famous names, no big logos. A backend engineer who wrote the first test suite, an ML engineer who cares about rigorous evals, and a product person who makes sure it's actually usable. That's the team.

DK

Dan K.

Protocol & Backend

Spent the last four years building HTTP tooling and API infrastructure at a mid-size fintech. Got into x402 early — contributed a few issues to the draft spec repo and ended up writing a test suite for personal use. That test suite is what became xPayMind. Comfortable in Rust, Go, and TypeScript. Prefers specs over whitepapers.

HTTP infrastructurex402 early adopteropen source

SL

Sara L.

AI Systems & Evaluation

ML engineer with a background in building evals and benchmarking pipelines for LLM-powered agents. Previously worked at a small AI safety research group doing behavioural testing. Has been thinking about how autonomous agents handle payment flows since the first x402 draft landed. Joined to make the evaluation methodology rigorous and reproducible.

agent evalsLLM systemsPython · PyTorch

MR

Max R.

Product & Developer Relations

Has worked in developer tooling and Web3 infrastructure for the past five years — mostly writing docs, building demos, and talking to developers about what they actually need. Ran DevRel for a small L2 tooling startup before it wound down. Brought in to make sure xPayMind is something developers want to use, not just something that technically works.

developer toolingWeb3 infrastructuredevrel

SOCIALS