Scout API Interface (Advanced)

The Sentinel Scout API Interface gives access to scraping, proxy, and account tools for developers and AI agents to run autonomous web scraping and power ML pipelines.

Users can also refer to: https://api.scout.sentinel.co/swagger

1. API Protocols & Endpoints

REST API

gRPC API

  • Endpoint: grpc.scout.sentinel.co

  • Protocol: HTTP/2 + TLS

  • Serialization: Protobuf

  • Reflection: Enabled (auto-discovers services & methods)


2. Authentication

All API calls require authentication.

2.1 API Keys

  • Generated in the Scout Dashboard → API Section

  • Limit: 3 keys per account (free tier)

  • Lifetime: Unlimited (current design)

  • Usage:

    • Web3 Auth (public key mapped, no password storage)

    • Kepler Key (preferred, nonce + signature challenge)

    • PAESTO Token: session token for dashboard-only ops (not scraping APIs)

2.2 Dashboard Sessions

  • Web3 Auth (public key mapped, no password storage)

  • Kepler Key (preferred, nonce + signature challenge)

  • PAESTO Token: session token for dashboard-only ops (not scraping APIs)


3. API Categories

Category
Endpoint(s)
Purpose

Scraping Jobs

POST /api/v1/probe / POST /api/v1/probe/sync

Submit async or sync scraping tasks

Task Status

GET /api/v1/probe/task

Check progress & download results

Reference Data

/api/v1/generic/countries, /cities, /isps, /probe/tags

Get supported locations, ISPs, and cleanup tags

User Data

GET /api/v1/user

Retrieve account info, job stats, wallet balance

API Keys

Dashboard → API Section

Generate/revoke keys

Storage (WIP)

/api/v1/storage/*

Decentralized dataset storage (coming soon)


4. Scraping APIs

4.1 Asynchronous Scraping

POST /api/v1/probe

  • Use Case: Batch jobs with retries and fault tolerance

  • Returns: taskId for later polling

  • Retention: Results downloadable for 6 months

Request Example:

curl -X POST 'https://api.scout.sentinel.co/api/v1/probe' \
-H 'Authorization: Bearer <API_KEY>' \
-H 'Content-Type: application/json' \
-d '{
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "countryCode": "US",
  "tagsToStripOff": ["style","script","iframe"],
  "fallBackRouting": true,
  "antiBotScrape": true,
  "outputFileExtension": "EXTENSION_HTML"
}'

4.2 Synchronous Scraping

Endpoint: POST /api/v1/probe/sync

The synchronous scraping API is optimized for real-time ML pipelines where results must be returned instantly. It supports all core features of the asynchronous API but responds immediately with the scraped content.

  • Use Case: Real-time ML and AI agents

  • Response: Returns cleaned HTML or JSON directly

Request Example:

curl -X POST 'https://api.scout.sentinel.co/api/v1/probe/sync' \
  -H 'Authorization: Bearer <API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://en.wikipedia.org/wiki/Machine_learning",
    "country_code": "US",
    "fall_back_routing": true,
    "anti_bot_scrape": true,
    "output_file_extension": "EXTENSION_JSON"
  }'

4.2 Notes & Limits (Synchronous Scraping)

  • Timeouts: Sync requests are optimized for small/medium pages. Very large or JS-heavy pages may exceed the sync timeout and should be handled with the async API.

  • Use Case Fit: Sync is intended for low-latency, single-page operations where immediate results are required, such as ML inference pipelines or autonomous agents.

  • Retries: If a sync request fails due to proxy rejection, fallback routing will attempt a retry with a new node, but total retries are capped to avoid long blocking calls.

  • Resource Limits: Sync mode enforces stricter CPU/memory limits on backend nodes than async mode, ensuring stable response times.

  • Best Practice: Use sync mode for quick, real-time tasks. Use async mode for bulk jobs or where high reliability is more important than latency.


4.3 Task Status

Endpoint: GET /api/v1/probe/task?taskId=<uuid>

The Task Status API is used to check the progress and outcome of asynchronous jobs created with POST /api/v1/probe.

Query Parameters

  • taskId — string (UUID) — required, returned when the async job was submitted.

Response Fields

  • taskId — UUID of the submitted task.

  • status — One of PENDING, PROCESSING, COMPLETED, or FAILED.

  • progress — Integer percentage (0–100).

  • queuePosition — Current queue order if the task is waiting.

  • downloadLink — Available if COMPLETED, points to the result file (retained for 6 months).

  • contentType — MIME type of the result (e.g. text/html, application/json).

  • sizeKB — Size of the result file.

  • timestampSubmitted — ISO8601 submission time.

  • timestampCompleted — ISO8601 completion time, if applicable.

  • targetUrl — The original URL submitted.

  • countryCodeUsed — ISO country code of the node that processed the job.

Example Request

curl -X GET "https://api.scout.sentinel.co/api/v1/probe/task?taskId=eb16f8c6-9cee-40cc-9dc0-2937636cf00c" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "accept: application/json"
Query successful

Example Response (COMPLETED)

{ "taskId": "eb16f8c6-9cee-40cc-9dc0-2937636cf00c", "status": "COMPLETED", "progress": 100, "downloadLink": "https://downloads.scout.sentinel.co/scraped/eb16f8c6-9cee-40cc-9dc0-2937636cf00c.html", "contentType": "text/html", "sizeKB": 250, "timestampSubmitted": "2025-06-06T12:20:00Z", "timestampCompleted": "2025-06-06T12:25:30Z", "targetUrl": "https://en.wikipedia.org/wiki/Artificial_intelligence", "countryCodeUsed": "US" } Example Response (Pending) { "taskId": "eb16f8c6-9cee-40cc-9dc0-2937636cf00c", "status": "PENDING", "progress": 5, "queuePosition": 12, "timestampSubmitted": "2025-06-06T12:20:00Z", "targetUrl": "https://en.wikipedia.org/wiki/Artificial_intelligence", "countryCodeUsed": "US" }

5. User API

The User API (/api/v1/user) provides a centralized endpoint for retrieving comprehensive information about the authenticated user's account and their activities within the Sentinel Scout ecosystem. This allows users to programmatically monitor their usage and account status.


Endpoint


Authentication

  • Requires API key authentication.

  • Pass the API key in the Authorization header:


Response Fields

  • userId — string — unique identifier from the authentication system

  • userStatus — string — account status (active, inactive, suspended)

  • email — string — user’s registered email

  • creditCoins — integer — balance of Credit Coins available for scraping operations

  • goldCoins — integer — balance of Gold Coins earned as a node provider

  • jobsSummary — object — statistics on scraping jobs

  • totalCompleted — integer — number of completed jobs

  • inProgress — integer — number of ongoing jobs

  • failed — integer — number of failed jobs

  • wallet — object — wallet details

  • address — string — wallet address

  • balanceDetails — object — balances for Credit and Gold coins

  • lastLogin — string — ISO8601 datetime of last login

  • apiKeysCreated — integer — number of API keys generated


Example Request

curl -X GET "https://api.scout.sentinel.co/api/v1/user" \
-H "Authorization: Bearer <API_KEY>" \
-H "Accept: application/json"

Example Response

{
  "userId": "user_id_from_auth_system_12345",
  "userStatus": "active",
  "email": "user@example.com",
  "creditCoins": 485,
  "goldCoins": 120,
  "jobsSummary": {
    "totalCompleted": 55,
    "inProgress": 2,
    "failed": 3
  },
  "wallet": {
    "address": "0xabc123def456ghi789jkl012mno345pqr678stu901",
    "balanceDetails": {
      "credit": 485,
      "gold": 120
    }
  },
  "lastLogin": "2025-06-06T12:00:00Z",
  "apiKeysCreated": 2
}

Notes

  • This endpoint returns account-level balances and job summaries.

  • Rate limits apply; handle 429 Too Many Requests with exponential backoff.

  • Sensitive fields (wallet address) are returned only for authorized users.

6. Coin Systems in Sentinel Scout


Credit Coin (For Data Consumers)

Purpose Credit Coins are the primary internal currency designed for the consumers of the Sentinel Scout service. They are automatically used by the system to facilitate and pay for web scraping operations and other data-related services.

Acquisition Methods

  • Free Allocation & Auto Top-Up: Currently, users receive 100 MB worth of free scraping credits per day. (These can be increased for testers)

  • Future Exchange with Sentinel DVPN: In the near future, Credit Coins will be available for purchase by exchanging them with Sentinel P2P tokens and other fiat payment methods.

Functionality

  • Used by the system to execute scraping jobs.

  • Consumed each time a user submits a job via the API.

  • Visible in the /api/v1/user endpoint under creditCoins.


7. Future Developments

Sentinel Scout is actively evolving. Several upcoming features and integrations are planned to further expand capabilities.


7.1 MCP Server (Machine Learning Control Plane)

A dedicated MCP server will allow ML models to connect directly to the Scout backend.

  • Workflow automation: ML models can submit, track, and retrieve scraping tasks without manual coding.

  • Autonomous pipelines: Data acquisition flows will be fully managed by the AI itself.

  • Use case: Training pipelines that need constant web data without human intervention.


7.2 Jackal Web3 Storage Integration

Scout plans to integrate Jackal decentralized storage for scraped content.

  • User sovereignty: Store scraped datasets directly in your own decentralized storage.

  • Resilience: Data will remain available beyond the 6-month internal retention limit.

  • Security: Web3 storage provides tamper resistance and long-term availability.


7.3 Expanded Proxy Network & Geo-targeting

  • Adding more countries, cities, and ISPs to the proxy pool.

  • Enables fine-grained geo-targeting for scraping localized content.

  • Useful for compliance checks, market research, and region-specific ML datasets.


7.4 Markdown Output for Scraped Content

  • New outputFileExtension: EXTENSION_MARKDOWN option.

  • Allows retrieval of scraped data as clean, human-readable Markdown.

  • Ideal for ingestion into wikis, documentation systems, or lightweight analysis.


7.5 Enhanced Anti-Bot & CAPTCHA Bypass

  • Continuous R&D to counter evolving bot protections.

  • Stronger automated handling of reCAPTCHA, hCaptcha, and advanced fingerprinting.

  • Ensures Scout remains resilient against tightening anti-scraping systems.


Notes

Last updated