How to Evaluate x402 Services Before Your Agent Pays
Evaluating x402 services before your AI agent sends payment is the single most important step in building reliable agent-to-agent commerce. The x402 protocol, created by Coinbase, enables HTTP-based micropayments using USDC, but it provides no built-in mechanism for evaluating service quality. ScoutScore - Trust Infrastructure for AI Agents - fills this gap with a scoring API that monitors 1,500+ unique service domains. This guide walks through the complete integration, from installation to production-ready payment gates.
Why Should You Evaluate Services Before Paying?
The raw numbers explain the urgency. ScoutScore has cataloged 19,000+ total endpoint entries in the x402 ecosystem. The average fidelity score is 52 out of 100. One spam farm registered 10,658 fake services from a single wallet address. Schema phantoms advertise capabilities they cannot deliver. Price mismatches between metadata and actual payment requirements are common.
If your agent pays a service without checking trust first, there is roughly an 87% chance it is paying spam. A single API call to ScoutScore before every payment eliminates this risk.
How Do You Install the ScoutScore SDK?
The TypeScript SDK is the fastest way to integrate trust scoring:
npm install @scoutscore/sdk
The SDK is zero-config during the launch period - no API key required. It handles retries, error handling, and response parsing automatically.
How Do You Check a Service's Trust Score?
The basic pattern is simple - check the score before every payment:
import { ScoutScore } from '@scoutscore/sdk';
const scout = new ScoutScore();
async function shouldPay(domain: string): Promise<boolean> {
const result = await scout.scoreBazaarService(domain);
if (result.score >= 75) {
// HIGH trust - safe to pay
console.log(`Trusted: ${domain} scores ${result.score}/100`);
return true;
}
if (result.score >= 50) {
// MEDIUM trust - proceed with caution
console.log(`Caution: ${domain} scores ${result.score}/100`);
// Consider smaller transaction amounts or escrow
return true;
}
// LOW or VERY_LOW - block payment
console.log(`Blocked: ${domain} scores ${result.score} (${result.level})`);
console.log(`Flags: ${result.flags.join(', ')}`);
return false;
}
How Do You Read the Score Breakdown?
A trust score is not just a number. ScoutScore returns a full breakdown that helps you understand exactly why a service scored the way it did:
const result = await scout.scoreBazaarService('recoupable.com');
// Overall score and level
console.log(result.score); // 100
console.log(result.level); // "HIGH"
// Flags tell you what's good and what's bad
console.log(result.flags);
// ["HAS_COMPLETE_SCHEMA", "FIDELITY_PROVEN", "GOOD_UPTIME"]
// Recommendation includes transaction guidance
console.log(result.recommendation.verdict); // "RECOMMENDED"
console.log(result.recommendation.maxTransaction); // -1 (no limit)
console.log(result.recommendation.escrowTerms); // "NONE_REQUIRED"
Compare that to a spam service:
const spam = await scout.scoreBazaarService('lowpaymentfee.com');
console.log(spam.score); // 0
console.log(spam.level); // "VERY_LOW"
console.log(spam.flags);
// ["WALLET_SPAM_FARM", "TEMPLATE_SPAM", "MASS_LISTING_SPAM"]
console.log(spam.recommendation.verdict); // "NOT_RECOMMENDED"
What Do the 4 Scoring Pillars Mean in Practice?
Understanding each pillar helps you interpret scores and set appropriate thresholds for your use case.
Contract Clarity (20% weight)
This pillar answers: "Does the service clearly define what it offers?" In practice, it checks:
- Whether the service has a complete API schema (input/output definitions). Only ~10% of x402 services do.
- Whether the description is meaningful vs. generic filler. 93% of services have descriptions under 50 characters.
- Whether the pricing metadata matches actual payment requirements.
A service scoring high on Contract Clarity has taken the time to properly document its interface. This correlates strongly with legitimate, well-maintained services.
Availability (30% weight)
This pillar answers: "Is the service reliably online?" ScoutScore checks every 30 minutes and tracks:
- Current status (UP/DOWN)
- 7-day and 30-day uptime percentages
- Response latency
- Response consistency
For payment decisions, availability matters because you do not want your agent to pay for a service that might be offline when it needs to use it.
Response Fidelity (30% weight)
This pillar answers: "Does the service actually deliver what it promises?" Every 6 hours, ScoutScore sends real requests and compares responses against advertised behavior. The ecosystem average is 52/100. Specific checks include:
- Does the response match the advertised schema?
- Does the content type match expectations?
- Is the response body meaningful vs. empty or error?
- Does the service handle the x402 payment flow correctly?
This is the most important pillar because it directly measures whether you get value for money.
Identity & Safety (20% weight)
This pillar answers: "Is this service operator trustworthy?" It detects:
- Wallet spam farms - Single wallets running thousands of services. The worst offender: 10,658 services from one wallet.
- Template spam - Identical descriptions detected via content fingerprinting.
- Mass listing spam - Domains with 50+ services without unique wallets.
- Price mismatches - Metadata price differs from actual payment amount.
What Red Flags Does ScoutScore Detect?
ScoutScore returns specific flags that tell you exactly what is wrong with a service. Here are the critical red flags to watch for:
WALLET_SPAM_FARM- The service's wallet operates 1,000+ other services. Instant disqualifier. Score penalty: -25 to -50.TEMPLATE_SPAM- The service description matches a known spam template. These are mass-generated fake listings.MASS_LISTING_SPAM- The domain hosts 50+ services without unique wallets per service.SCHEMA_PHANTOM- The service advertises a schema but does not actually serve it. It looks legitimate in metadata but fails when called.NO_SCHEMA- No input/output definitions. 90% of services have this flag.POOR_METADATA- Description under 50 characters. 93% of services.
And the positive trust flags:
HAS_COMPLETE_SCHEMA- Full API schema defined. Only ~10% of services.FIDELITY_PROVEN- Passed fidelity probing. Delivers what it promises.GOOD_UPTIME- 95%+ uptime over 7 days.ENDPOINT_HEALTHY- Currently online and responsive.GOOD_DOCUMENTATION- Meaningful description over 50 characters.
How Do You Batch-Check Multiple Services?
If your agent needs to choose between multiple providers, use batch scoring to compare up to 20 services in a single call:
const results = await scout.scoreBazaarBatch([
'recoupable.com',
'service-a.example.com',
'service-b.example.com',
]);
// Sort by score, pick the best
const best = results.sort((a, b) => b.score - a.score)[0];
console.log(`Best option: ${best.domain} (${best.score}/100)`);
How Do You Set Up a Production Payment Gate?
A payment gate is a trust threshold that blocks payments to untrusted services. Here is a production-ready implementation:
import { ScoutScore } from '@scoutscore/sdk';
const scout = new ScoutScore();
// Configuration
const TRUST_THRESHOLD = 75; // Minimum score to pay
const MAX_UNTRUSTED_AMOUNT = 0.10; // Max USD for medium-trust services
const BLOCK_FLAGS = [
'WALLET_SPAM_FARM',
'TEMPLATE_SPAM',
'MASS_LISTING_SPAM',
];
interface PaymentDecision {
approved: boolean;
maxAmount: number | null;
reason: string;
}
async function evaluatePayment(
domain: string,
requestedAmount: number
): Promise<PaymentDecision> {
const result = await scout.scoreBazaarService(domain);
// Hard block on critical flags
const hasBlockFlag = result.flags.some(
(f: string) => BLOCK_FLAGS.includes(f)
);
if (hasBlockFlag) {
return {
approved: false,
maxAmount: 0,
reason: `Blocked: ${result.flags.join(', ')}`,
};
}
// High trust - approve any amount
if (result.score >= TRUST_THRESHOLD) {
return {
approved: true,
maxAmount: null,
reason: `HIGH trust (${result.score}/100)`,
};
}
// Medium trust - cap the amount
if (result.score >= 50) {
return {
approved: requestedAmount <= MAX_UNTRUSTED_AMOUNT,
maxAmount: MAX_UNTRUSTED_AMOUNT,
reason: `MEDIUM trust (${result.score}/100), max $${MAX_UNTRUSTED_AMOUNT}`,
};
}
// Low trust - block
return {
approved: false,
maxAmount: 0,
reason: `${result.level} trust (${result.score}/100)`,
};
}
How Do You Integrate with MCP and ElizaOS?
If you are using the Model Context Protocol (MCP) or ElizaOS, ScoutScore provides native integrations.
MCP Server
npm install @scoutscore/mcp-server
# Add to Claude Code
claude mcp add scoutscore npx @scoutscore/mcp-server
Once added, AI assistants can use the check_trust and check_fidelity tools directly.
ElizaOS Plugin
ScoutScore's ElizaOS plugin (PR #6513 at github.com/elizaos/eliza) provides 5 actions and 2 providers for framework-level trust scoring, with 236 tests covering the integration.
What Should Your Integration Checklist Look Like?
Follow this step-by-step checklist to integrate ScoutScore into your agent's payment flow:
- Install the SDK -
npm install @scoutscore/sdk - Add trust check before every payment - Call
scoreBazaarService(domain)before sending any x402 payment. - Set a minimum trust threshold - Start with 75 (HIGH level). Adjust based on your risk tolerance.
- Hard-block critical flags - Always block
WALLET_SPAM_FARM,TEMPLATE_SPAM, andMASS_LISTING_SPAM. - Log trust decisions - Record the score, flags, and your pay/block decision for debugging and analytics.
- Handle failures gracefully - If ScoutScore is unreachable, decide whether to fail-open (allow payment) or fail-closed (block payment). We recommend fail-closed for high-value transactions.
- Use batch scoring for comparisons - When choosing between providers, use
scoreBazaarBatch()to compare up to 20 services in one call. - Monitor your agents' trust decisions - Track what percentage of payments your agents block vs. approve. This gives you insight into the quality of services your agents interact with.
When Should You Block vs. Proceed?
Here is a practical decision framework:
- Always block - Score 0-24 (VERY_LOW), any spam farm flags, schema phantoms for mission-critical services.
- Block by default, allow with override - Score 25-49 (LOW). These services have significant issues but might work for low-value, low-stakes operations.
- Proceed with limits - Score 50-74 (MEDIUM). Cap transaction amounts. Consider escrow. Monitor results.
- Proceed freely - Score 75-100 (HIGH). These services have proven reliability through continuous monitoring.
The key insight is that trust scoring should not be a one-time check at integration time. It should run before every payment because service quality can change. A service that scored HIGH last week might score LOW today if its uptime dropped or its fidelity degraded. ScoutScore's continuous monitoring catches these changes.
Frequently Asked Questions
Do I need an API key to use ScoutScore?
No. During the launch period, all endpoints are free and unlimited. Just install the SDK with npm install @scoutscore/sdk and start querying. No authentication required.
How fast are trust score lookups?
Trust scores are pre-computed from continuous monitoring data. API response times are typically under 200ms. Batch scoring for 20 domains completes in a single request.
What happens if a service is not in ScoutScore's database?
If a domain has not been cataloged, the API returns a 404. For unknown services, we recommend treating them as VERY_LOW trust until they appear in the monitoring system.
Can I check fidelity separately from the overall score?
Yes. The SDK provides a dedicated getFidelity(domain) method that returns detailed fidelity probe results, including protocol compliance, contract consistency, and response structure scores.
Does ScoutScore work with protocols other than x402?
ScoutScore is focused exclusively on the x402 ecosystem, providing deep trust intelligence for every listed service. As the x402 protocol grows, our coverage expands with it.