# Scout API Interface (Advanced)

### Users can also refer to: <https://api.scout.sentinel.co/swagger>

### 1. API Protocols & Endpoints

#### REST API

* **Base URL**: `https://api.scout.sentinel.co`
* **Format**: JSON for request/response payloads
* **Docs**: [OpenAPI / Swagger Playground](https://api.scout.sentinel.co/swagger)

#### gRPC API

* **Endpoint**: `grpc.scout.sentinel.co`
* **Protocol**: HTTP/2 + TLS
* **Serialization**: Protobuf
* **Reflection**: Enabled (auto-discovers services & methods)

***

### 2. Authentication

All API calls require authentication.

#### 2.1 API Keys

* Generated in the **Scout Dashboard → API Section**
* Limit: 3 keys per account (free tier)
* Lifetime: Unlimited (current design)
* Usage:\ <br>

  * **Web3 Auth** (public key mapped, no password storage)
  * **Kepler Key** (preferred, nonce + signature challenge)
  * **PAESTO Token**: session token for dashboard-only ops (not scraping APIs)

#### 2.2 Dashboard Sessions

* **Web3 Auth** (public key mapped, no password storage)
* **Kepler Key** (preferred, nonce + signature challenge)
* **PAESTO Token**: session token for dashboard-only ops (not scraping APIs)

***

### 3. API Categories

| Category       | Endpoint(s)                                                    | Purpose                                          |
| -------------- | -------------------------------------------------------------- | ------------------------------------------------ |
| Scraping Jobs  | `POST /api/v1/probe` / `POST /api/v1/probe/sync`               | Submit async or sync scraping tasks              |
| Task Status    | `GET /api/v1/probe/task`                                       | Check progress & download results                |
| Reference Data | `/api/v1/generic/countries`, `/cities`, `/isps`, `/probe/tags` | Get supported locations, ISPs, and cleanup tags  |
| User Data      | `GET /api/v1/user`                                             | Retrieve account info, job stats, wallet balance |
| API Keys       | Dashboard → API Section                                        | Generate/revoke keys                             |
| Storage (WIP)  | `/api/v1/storage/*`                                            | Decentralized dataset storage (coming soon)      |

***

### 4. Scraping APIs

#### 4.1 Asynchronous Scraping

`POST /api/v1/probe`

* **Use Case**: Batch jobs with retries and fault tolerance
* Returns: `taskId` for later polling
* Retention: Results downloadable for 6 months

**Request Example**:

```bash
curl -X POST 'https://api.scout.sentinel.co/api/v1/probe' \
-H 'Authorization: Bearer <API_KEY>' \
-H 'Content-Type: application/json' \
-d '{
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "countryCode": "US",
  "tagsToStripOff": ["style","script","iframe"],
  "fallBackRouting": true,
  "antiBotScrape": true,
  "outputFileExtension": "EXTENSION_HTML"
}'
```

#### 4.2 Synchronous Scraping

**Endpoint:** `POST /api/v1/probe/sync`

The synchronous scraping API is optimized for real-time ML pipelines where results must be returned instantly. It supports all core features of the asynchronous API but responds immediately with the scraped content.

* **Use Case**: Real-time ML and AI agents
* **Response**: Returns cleaned HTML or JSON directly

**Request Example**:

```bash
curl -X POST 'https://api.scout.sentinel.co/api/v1/probe/sync' \
  -H 'Authorization: Bearer <API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://en.wikipedia.org/wiki/Machine_learning",
    "country_code": "US",
    "fall_back_routing": true,
    "anti_bot_scrape": true,
    "output_file_extension": "EXTENSION_JSON"
  }'
```

#### 4.2 Notes & Limits (Synchronous Scraping)

* **Timeouts**: Sync requests are optimized for small/medium pages. Very large or JS-heavy pages may exceed the sync timeout and should be handled with the async API.
* **Use Case Fit**: Sync is intended for low-latency, single-page operations where immediate results are required, such as ML inference pipelines or autonomous agents.
* **Retries**: If a sync request fails due to proxy rejection, fallback routing will attempt a retry with a new node, but total retries are capped to avoid long blocking calls.
* **Resource Limits**: Sync mode enforces stricter CPU/memory limits on backend nodes than async mode, ensuring stable response times.
* **Best Practice**: Use sync mode for quick, real-time tasks. Use async mode for bulk jobs or where high reliability is more important than latency.

***

#### 4.3 Task Status

**Endpoint:** `GET /api/v1/probe/task?taskId=<uuid>`

The Task Status API is used to check the progress and outcome of asynchronous jobs created with `POST /api/v1/probe`.

**Query Parameters**

* `taskId` — string (UUID) — **required**, returned when the async job was submitted.

**Response Fields**

* `taskId` — UUID of the submitted task.
* `status` — One of `PENDING`, `PROCESSING`, `COMPLETED`, or `FAILED`.
* `progress` — Integer percentage (0–100).
* `queuePosition` — Current queue order if the task is waiting.
* `downloadLink` — Available if `COMPLETED`, points to the result file (retained for 6 months).
* `contentType` — MIME type of the result (e.g. `text/html`, `application/json`).
* `sizeKB` — Size of the result file.
* `timestampSubmitted` — ISO8601 submission time.
* `timestampCompleted` — ISO8601 completion time, if applicable.
* `targetUrl` — The original URL submitted.
* `countryCodeUsed` — ISO country code of the node that processed the job.

**Example Request**

```bash
curl -X GET "https://api.scout.sentinel.co/api/v1/probe/task?taskId=eb16f8c6-9cee-40cc-9dc0-2937636cf00c" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "accept: application/json"
```

```
Query successful
```

**Example Response (COMPLETED)**

{\
"taskId": "eb16f8c6-9cee-40cc-9dc0-2937636cf00c",\
"status": "COMPLETED",\
"progress": 100,\
"downloadLink": "<https://downloads.scout.sentinel.co/scraped/eb16f8c6-9cee-40cc-9dc0-2937636cf00c.html",\\>
"contentType": "text/html",\
"sizeKB": 250,\
"timestampSubmitted": "2025-06-06T12:20:00Z",\
"timestampCompleted": "2025-06-06T12:25:30Z",\
"targetUrl": "<https://en.wikipedia.org/wiki/Artificial\\_intelligence",\\>
"countryCodeUsed": "US"\
}\
\
**Example Response (Pending)**\
\
{\
"taskId": "eb16f8c6-9cee-40cc-9dc0-2937636cf00c",\
"status": "PENDING",\
"progress": 5,\
"queuePosition": 12,\
"timestampSubmitted": "2025-06-06T12:20:00Z",\
"targetUrl": "<https://en.wikipedia.org/wiki/Artificial\\_intelligence",\\>
"countryCodeUsed": "US"\
}

### 5. User API

The User API (`/api/v1/user`) provides a centralized endpoint for retrieving comprehensive information about the authenticated user's account and their activities within the Sentinel Scout ecosystem. This allows users to programmatically monitor their usage and account status.

***

#### Endpoint

***

#### Authentication

* Requires API key authentication.
* Pass the API key in the `Authorization` header:

***

#### Response Fields

* `userId` — string — unique identifier from the authentication system
* `userStatus` — string — account status (`active`, `inactive`, `suspended`)
* `email` — string — user’s registered email
* `creditCoins` — integer — balance of Credit Coins available for scraping operations
* `goldCoins` — integer — balance of Gold Coins earned as a node provider
* `jobsSummary` — object — statistics on scraping jobs
* `totalCompleted` — integer — number of completed jobs
* `inProgress` — integer — number of ongoing jobs
* `failed` — integer — number of failed jobs
* `wallet` — object — wallet details
* `address` — string — wallet address
* `balanceDetails` — object — balances for Credit and Gold coins
* `lastLogin` — string — ISO8601 datetime of last login
* `apiKeysCreated` — integer — number of API keys generated

***

#### Example Request

```bash
curl -X GET "https://api.scout.sentinel.co/api/v1/user" \
-H "Authorization: Bearer <API_KEY>" \
-H "Accept: application/json"
```

**Example Response**

```json
{
  "userId": "user_id_from_auth_system_12345",
  "userStatus": "active",
  "email": "user@example.com",
  "creditCoins": 485,
  "goldCoins": 120,
  "jobsSummary": {
    "totalCompleted": 55,
    "inProgress": 2,
    "failed": 3
  },
  "wallet": {
    "address": "0xabc123def456ghi789jkl012mno345pqr678stu901",
    "balanceDetails": {
      "credit": 485,
      "gold": 120
    }
  },
  "lastLogin": "2025-06-06T12:00:00Z",
  "apiKeysCreated": 2
}
```

***

#### Notes

* This endpoint returns account-level balances and job summaries.
* Rate limits apply; handle `429 Too Many Requests` with exponential backoff.
* Sensitive fields (wallet address) are returned only for authorized users.

### 6. Coin Systems in Sentinel Scout

***

#### Credit Coin (For Data Consumers)

**Purpose**\
Credit Coins are the primary internal currency designed for the consumers of the Sentinel Scout service. They are automatically used by the system to facilitate and pay for web scraping operations and other data-related services.

**Acquisition Methods**

* **Free Allocation & Auto Top-Up:** Currently, users receive 100 MB worth of free scraping credits per day. (These can be increased for testers)
* **Future Exchange with Sentinel DVPN:** In the near future, Credit Coins will be available for purchase by exchanging them with Sentinel P2P tokens and other fiat payment methods.&#x20;

**Functionality**

* Used by the system to execute scraping jobs.
* Consumed each time a user submits a job via the API.
* Visible in the `/api/v1/user` endpoint under `creditCoins`.

***

### 7. Future Developments

Sentinel Scout is actively evolving. Several upcoming features and integrations are planned to further expand capabilities.

***

#### 7.1 MCP Server (Machine Learning Control Plane)

A dedicated MCP server will allow **ML models to connect directly** to the Scout backend.

* **Workflow automation:** ML models can submit, track, and retrieve scraping tasks without manual coding.
* **Autonomous pipelines:** Data acquisition flows will be fully managed by the AI itself.
* **Use case:** Training pipelines that need constant web data without human intervention.

***

#### 7.2 Jackal Web3 Storage Integration

Scout plans to integrate **Jackal decentralized storage** for scraped content.

* **User sovereignty:** Store scraped datasets directly in your own decentralized storage.
* **Resilience:** Data will remain available beyond the 6-month internal retention limit.
* **Security:** Web3 storage provides tamper resistance and long-term availability.

***

#### 7.3 Expanded Proxy Network & Geo-targeting

* Adding more **countries, cities, and ISPs** to the proxy pool.
* Enables fine-grained geo-targeting for scraping localized content.
* Useful for compliance checks, market research, and region-specific ML datasets.

***

#### 7.4 Markdown Output for Scraped Content

* New `outputFileExtension: EXTENSION_MARKDOWN` option.
* Allows retrieval of scraped data as clean, human-readable Markdown.
* Ideal for ingestion into wikis, documentation systems, or lightweight analysis.

***

#### 7.5 Enhanced Anti-Bot & CAPTCHA Bypass

* Continuous R\&D to counter evolving bot protections.
* Stronger automated handling of reCAPTCHA, hCaptcha, and advanced fingerprinting.
* Ensures Scout remains resilient against tightening anti-scraping systems.

***

#### Notes

* Future updates will appear in the OpenAPI spec at:\
  <https://api.scout.sentinel.co/swagger>
* Features may roll out gradually with beta testing phases.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sentinel-11.gitbook.io/sentinel-scout-ai-data-layer/scout-api-interface-advanced.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
