Scout API Interface (Advanced)

The Sentinel Scout API Interface gives access to scraping, proxy, and account tools for developers and AI agents to run autonomous web scraping and power ML pipelines.

Users can also refer to: https://api.scout.sentinel.co/swagger

1. API Protocols & Endpoints

REST API

Base URL: https://api.scout.sentinel.co
Format: JSON for request/response payloads
Docs: OpenAPI / Swagger Playground

gRPC API

Endpoint: grpc.scout.sentinel.co
Protocol: HTTP/2 + TLS
Serialization: Protobuf
Reflection: Enabled (auto-discovers services & methods)

2. Authentication

All API calls require authentication.

2.1 API Keys

Generated in the Scout Dashboard → API Section
Limit: 3 keys per account (free tier)
Lifetime: Unlimited (current design)
Usage:
- Web3 Auth (public key mapped, no password storage)
- Kepler Key (preferred, nonce + signature challenge)
- PAESTO Token: session token for dashboard-only ops (not scraping APIs)

2.2 Dashboard Sessions

Web3 Auth (public key mapped, no password storage)
Kepler Key (preferred, nonce + signature challenge)
PAESTO Token: session token for dashboard-only ops (not scraping APIs)

3. API Categories

4. Scraping APIs

4.1 Asynchronous Scraping

POST /api/v1/probe

Use Case: Batch jobs with retries and fault tolerance
Returns: taskId for later polling
Retention: Results downloadable for 6 months

Request Example:

curl -X POST 'https://api.scout.sentinel.co/api/v1/probe' \
-H 'Authorization: Bearer <API_KEY>' \
-H 'Content-Type: application/json' \
-d '{
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "countryCode": "US",
  "tagsToStripOff": ["style","script","iframe"],
  "fallBackRouting": true,
  "antiBotScrape": true,
  "outputFileExtension": "EXTENSION_HTML"
}'

4.2 Synchronous Scraping

Endpoint: POST /api/v1/probe/sync

The synchronous scraping API is optimized for real-time ML pipelines where results must be returned instantly. It supports all core features of the asynchronous API but responds immediately with the scraped content.

Use Case: Real-time ML and AI agents
Response: Returns cleaned HTML or JSON directly

Request Example:

curl -X POST 'https://api.scout.sentinel.co/api/v1/probe/sync' \
  -H 'Authorization: Bearer <API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://en.wikipedia.org/wiki/Machine_learning",
    "country_code": "US",
    "fall_back_routing": true,
    "anti_bot_scrape": true,
    "output_file_extension": "EXTENSION_JSON"
  }'

4.2 Notes & Limits (Synchronous Scraping)

Timeouts: Sync requests are optimized for small/medium pages. Very large or JS-heavy pages may exceed the sync timeout and should be handled with the async API.
Use Case Fit: Sync is intended for low-latency, single-page operations where immediate results are required, such as ML inference pipelines or autonomous agents.
Retries: If a sync request fails due to proxy rejection, fallback routing will attempt a retry with a new node, but total retries are capped to avoid long blocking calls.
Resource Limits: Sync mode enforces stricter CPU/memory limits on backend nodes than async mode, ensuring stable response times.
Best Practice: Use sync mode for quick, real-time tasks. Use async mode for bulk jobs or where high reliability is more important than latency.

4.3 Task Status

Endpoint: GET /api/v1/probe/task?taskId=<uuid>

The Task Status API is used to check the progress and outcome of asynchronous jobs created with POST /api/v1/probe.

Query Parameters

taskId — string (UUID) — required, returned when the async job was submitted.

Response Fields

taskId — UUID of the submitted task.
status — One of PENDING, PROCESSING, COMPLETED, or FAILED.
progress — Integer percentage (0–100).
queuePosition — Current queue order if the task is waiting.
downloadLink — Available if COMPLETED, points to the result file (retained for 6 months).
contentType — MIME type of the result (e.g. text/html, application/json).
sizeKB — Size of the result file.
timestampSubmitted — ISO8601 submission time.
timestampCompleted — ISO8601 completion time, if applicable.
targetUrl — The original URL submitted.
countryCodeUsed — ISO country code of the node that processed the job.

Example Request

curl -X GET "https://api.scout.sentinel.co/api/v1/probe/task?taskId=eb16f8c6-9cee-40cc-9dc0-2937636cf00c" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "accept: application/json"

Query successful

Example Response (COMPLETED)

{ "taskId": "eb16f8c6-9cee-40cc-9dc0-2937636cf00c", "status": "COMPLETED", "progress": 100, "downloadLink": "https://downloads.scout.sentinel.co/scraped/eb16f8c6-9cee-40cc-9dc0-2937636cf00c.html", "contentType": "text/html", "sizeKB": 250, "timestampSubmitted": "2025-06-06T12:20:00Z", "timestampCompleted": "2025-06-06T12:25:30Z", "targetUrl": "https://en.wikipedia.org/wiki/Artificial_intelligence", "countryCodeUsed": "US" } Example Response (Pending) { "taskId": "eb16f8c6-9cee-40cc-9dc0-2937636cf00c", "status": "PENDING", "progress": 5, "queuePosition": 12, "timestampSubmitted": "2025-06-06T12:20:00Z", "targetUrl": "https://en.wikipedia.org/wiki/Artificial_intelligence", "countryCodeUsed": "US" }

5. User API

The User API (/api/v1/user) provides a centralized endpoint for retrieving comprehensive information about the authenticated user's account and their activities within the Sentinel Scout ecosystem. This allows users to programmatically monitor their usage and account status.

Endpoint

Authentication

Requires API key authentication.
Pass the API key in the Authorization header:

Response Fields

userId — string — unique identifier from the authentication system
userStatus — string — account status (active, inactive, suspended)
email — string — user’s registered email
creditCoins — integer — balance of Credit Coins available for scraping operations
goldCoins — integer — balance of Gold Coins earned as a node provider
jobsSummary — object — statistics on scraping jobs
totalCompleted — integer — number of completed jobs
inProgress — integer — number of ongoing jobs
failed — integer — number of failed jobs
wallet — object — wallet details
address — string — wallet address
balanceDetails — object — balances for Credit and Gold coins
lastLogin — string — ISO8601 datetime of last login
apiKeysCreated — integer — number of API keys generated

Example Request

curl -X GET "https://api.scout.sentinel.co/api/v1/user" \
-H "Authorization: Bearer <API_KEY>" \
-H "Accept: application/json"

Example Response

{
  "userId": "user_id_from_auth_system_12345",
  "userStatus": "active",
  "email": "user@example.com",
  "creditCoins": 485,
  "goldCoins": 120,
  "jobsSummary": {
    "totalCompleted": 55,
    "inProgress": 2,
    "failed": 3
  },
  "wallet": {
    "address": "0xabc123def456ghi789jkl012mno345pqr678stu901",
    "balanceDetails": {
      "credit": 485,
      "gold": 120
    }
  },
  "lastLogin": "2025-06-06T12:00:00Z",
  "apiKeysCreated": 2
}

Notes

This endpoint returns account-level balances and job summaries.
Rate limits apply; handle 429 Too Many Requests with exponential backoff.
Sensitive fields (wallet address) are returned only for authorized users.

6. Coin Systems in Sentinel Scout

Credit Coin (For Data Consumers)

Purpose Credit Coins are the primary internal currency designed for the consumers of the Sentinel Scout service. They are automatically used by the system to facilitate and pay for web scraping operations and other data-related services.

Acquisition Methods

Free Allocation & Auto Top-Up: Currently, users receive 100 MB worth of free scraping credits per day. (These can be increased for testers)
Future Exchange with Sentinel DVPN: In the near future, Credit Coins will be available for purchase by exchanging them with Sentinel P2P tokens and other fiat payment methods.

Functionality

Used by the system to execute scraping jobs.
Consumed each time a user submits a job via the API.
Visible in the /api/v1/user endpoint under creditCoins.

7. Future Developments

Sentinel Scout is actively evolving. Several upcoming features and integrations are planned to further expand capabilities.

7.1 MCP Server (Machine Learning Control Plane)

A dedicated MCP server will allow ML models to connect directly to the Scout backend.

Workflow automation: ML models can submit, track, and retrieve scraping tasks without manual coding.
Autonomous pipelines: Data acquisition flows will be fully managed by the AI itself.
Use case: Training pipelines that need constant web data without human intervention.

7.2 Jackal Web3 Storage Integration

Scout plans to integrate Jackal decentralized storage for scraped content.

User sovereignty: Store scraped datasets directly in your own decentralized storage.
Resilience: Data will remain available beyond the 6-month internal retention limit.
Security: Web3 storage provides tamper resistance and long-term availability.

7.3 Expanded Proxy Network & Geo-targeting

Adding more countries, cities, and ISPs to the proxy pool.
Enables fine-grained geo-targeting for scraping localized content.
Useful for compliance checks, market research, and region-specific ML datasets.

7.4 Markdown Output for Scraped Content

New outputFileExtension: EXTENSION_MARKDOWN option.
Allows retrieval of scraped data as clean, human-readable Markdown.
Ideal for ingestion into wikis, documentation systems, or lightweight analysis.

7.5 Enhanced Anti-Bot & CAPTCHA Bypass

Continuous R&D to counter evolving bot protections.
Stronger automated handling of reCAPTCHA, hCaptcha, and advanced fingerprinting.
Ensures Scout remains resilient against tightening anti-scraping systems.

Notes

Future updates will appear in the OpenAPI spec at: https://api.scout.sentinel.co/swagger
Features may roll out gradually with beta testing phases.

PreviousScout API Interface (Easy)NextAkash Chat Integration

Last updated 1 month ago