REST: Scraping Sources
GET /api/getScrapingSourcesCount
Requires auth. Returns source count for the current user.
GET /api/getScrapingSourcesPage/{page}
Requires auth. Returns paged source summaries.
POST /api/scrapingSources
Requires auth. Uploads sources from multipart form data.
Accepted form fields:
filescrapeSourceTextareaclipboardScrapeSources
Success (200):
{"sourceCount": 18}
Rejected sources return 400 with details for blocked and/or unsafe targets:
{
"error": "One or more scrape sources are not allowed",
"blocked_sources": ["https://blocked.example/list.txt"],
"unsafe_sources": ["http://127.0.0.1/internal"],
"websiteBlacklist": ["blocked.example"]
}
Notes:
- Oversized uploads return
413. - If sources are saved but queueing fails, backend rolls back and returns
503.
DELETE /api/scrapingSources
Requires auth.
Request body is an array of scrape source IDs:
[12, 13, 14]
Response is a JSON string, for example: "Deleted 3 scraping sources.".
GET /api/scrapingSources/{id}
Requires auth. Returns detailed source stats.
GET /api/scrapingSources/{id}/proxies
Requires auth. Returns paged proxies associated with a source.
Query params:
pagepageSizesearch- same filter params as proxy list:
status,protocol,country,type,anonymity,reputation,maxTimeout,maxRetries
GET /api/scrapingSources/check?url=...
Requires auth. Checks robots.txt allowance.
Response:
{
"allowed": true,
"robots_found": true,
"error": ""
}
GET /api/scrapingSources/respectRobots
Requires auth.
Response:
{
"respect_robots_txt": true
}