Core · Service · Active scaffold
platform-scraper-service
Core service-to-service crawler runtime that accepts crawl requests, renders pages with Playwright, stores crawl state in Redis and delivers results through callbacks.
- TypeScript
- NestJS 11
- BullMQ
- Redis
- Playwright
- @platform/contracts-scraper
Spec sheet
Boundary
Core / Scraping
Runtime
NestJS 11 HTTP service plus BullMQ processor
Default port
3800 in local stack, 3700 standalone
Proxy host
http://scraper.cs.lvh.me:8080
Queue/state
BullMQ and Redis
Security
Internal crawl API guarded by x-platform-internal-token
Responsibilities
- Accept internal crawl creation requests with callback configuration.
- Render pages through a browser runtime and extract title, text and links.
- Follow only same-origin URLs under the root path prefix.
- Persist crawl lifecycle state from QUEUED/RUNNING to terminal statuses.
- Deliver completion or failure payloads to service-to-service callbacks.
- Protect the platform from unsafe local, private and reserved network targets.
Interfaces and contract surface
- GET /health
- POST /internal/scraper/crawls
- GET /internal/scraper/crawls/:crawlId
Consumers
- platform-mcp-service
- Backend/BFF services
Dependencies and external touchpoints
- platform-local-stack
- @platform/contracts-scraper
- Redis
- Playwright Chromium runtime
Notes
- Frontend applications must not call this service directly.
- The crawler blocks localhost, private networks and reserved ranges unless a local test explicitly opts in.
- The callback receiver is responsible for owning product-specific extraction and persistence.
Source references
platform-scraper-service/README.mdplatform-scraper-service/src/scraperplatform-scraper-service/package.json