Skip to main content

REST: Scraping Sources

GET /api/getScrapingSourcesCount

Requires auth. Returns source count for the current user.

GET /api/getScrapingSourcesPage/{page}

Requires auth. Returns paged source summaries.

POST /api/scrapingSources

Requires auth. Uploads sources from multipart form data.

Accepted form fields:

  • file
  • scrapeSourceTextarea
  • clipboardScrapeSources

Success (200):

{"sourceCount": 18}

Rejected sources return 400 with details for blocked and/or unsafe targets:

{
"error": "One or more scrape sources are not allowed",
"blocked_sources": ["https://blocked.example/list.txt"],
"unsafe_sources": ["http://127.0.0.1/internal"],
"websiteBlacklist": ["blocked.example"]
}

Notes:

  • Oversized uploads return 413.
  • If sources are saved but queueing fails, backend rolls back and returns 503.

DELETE /api/scrapingSources

Requires auth.

Request body is an array of scrape source IDs:

[12, 13, 14]

Response is a JSON string, for example: "Deleted 3 scraping sources.".

GET /api/scrapingSources/{id}

Requires auth. Returns detailed source stats.

GET /api/scrapingSources/{id}/proxies

Requires auth. Returns paged proxies associated with a source.

Query params:

  • page
  • pageSize
  • search
  • same filter params as proxy list:
    • status, protocol, country, type, anonymity, reputation, maxTimeout, maxRetries

GET /api/scrapingSources/check?url=...

Requires auth. Checks robots.txt allowance.

Response:

{
"allowed": true,
"robots_found": true,
"error": ""
}

GET /api/scrapingSources/respectRobots

Requires auth.

Response:

{
"respect_robots_txt": true
}