web-scraping.dev
A realistic e-commerce testing platform for web scraping developers
Practice web scraping with 19 realistic scenarios covering pagination, authentication, GraphQL APIs, CSRF protection, and more. Safe, legal, and designed for learning.
Scraping Scenarios 47 Challenges
Real-world web patterns you'll encounter in production scraping projects. Each scenario is tagged with difficulty level and includes working code examples.
Static Paging Beginner
HTML-based server-side item paging where each page is it's own URL.
Example Block Page Beginner
Valid 200-status response that redirects to block notification.
Robots.txt Compliance Trap Crawler
This URL is explicitly Disallow'd in /robots.txt for all user-agents,
but returns a valid HTTP 200 response. A conforming crawler MUST NOT fetch it.
If your crawler hits this page, check the X-Robots-Trap: violated response header and the
ROBOTS_TXT_VIOLATION body token — both are unambiguous assertion signals for your test suite.
Sponsored by
Changelog
v1.5.0
- Add comprehensive crawler test suite (25 new scenarios + central report endpoint) under the "Crawler" badge. Scenarios cover politeness (rate limiting, crawl-delay, meta robots, X-Robots-Tag, canonical), crawler traps (session vortex, infinite calendar, redirect chain, redirect loops, fragment collapse, URL normalization, canonical dedup), link discovery (JS-only, JSON-LD, data-href, HTTP Link header, base tag), content-type edge cases (50MB huge page, slow-drip streaming, wrong Content-Length, external redirect, mixed HTTP/HTTPS), sitemap index with 50 children + dead link + gzipped variant, and auth/state flows (Basic auth wall, cookie consent).
- Add /crawler-test-report central JSON assertion endpoint aggregating all trap hits. Use
POST /crawler-test-report/resetto clear between runs. Supports?trap=<name>filter. Seeapp/web/CRAWLER_TEST_SUITE.mdfor per-scenario details. - Add
Crawl-delay: 2group for/slow-section/in robots.txt.
v1.4.0
- Add /robots-disallowed robots.txt compliance trap for crawler test suites — URL is linked from homepage scenarios but
Disallow'd in /robots.txt. Asserts viaX-Robots-Trapheader andROBOTS_TXT_VIOLATIONbody token. Also rewrote robots.txt to valid RFC 9309 syntax and added aSitemap:directive. - Add /challenge-download page for testing challenge bypass + file download scenarios (like Cloudflare Turnstile leading to attachment download with 403 status)
- Add /challenge-download/interactive variant for interactive challenge (requires click)
- Add /challenge-download/file direct download endpoint with configurable status code and file type
v1.3.1
- Add query params
relative_url=trueto render page with relative URL instead of absolute - Add vertical/horizontal table on /product/n pages
- Add breadcrumb navigation on /product/n pages where urls are always relative
v1.3.0
- Change /login page to not prefilled and not show cookie pop up by default though the behavior is still available through url flags
cookiesandprefill - Add testimonial summary widget to /testimonials
- Add similar products widget to /product/n pages
- Add /sitemap.xml and /robots.txt endpoints
- Add PDF download link and js powered button to the /login page
- Add /blocked page which emulates redirect to 200 status block page. This endpoint also supports
?persisturl parameter flag for persisting blocking through ablocked=truecookie. - Add /credentials page (linked on /login) which redirects to /blocked if
Refererheader is not set tohttps://web-scraping.dev/login - Add Graphql endpoint to /api/graphql
- Add product reviews objects and relay type paging to graphql
- Add /reviews page which uses graphql relay type paging
- Add
data-testidmarkup to /reviews to simulate a common automated web test markup that is ideal for scraping parsing - Add target=_blank pages and window.open(url, "_blank") urls to /reviews that simulate a common pattern of forcing links to open in a new page
v1.2.0
- Change header requirement for /api/reviews to require only
x-csrf-tokenheader (secret-csrf-token-123) - Change header requirement for /api/testimonials to require only
refererheader (https://web-scraping.dev/testimonials)
v1.1.0
- Add cookie popup modal to /login
- Add cart system: see cart preview button at the top and the /cart endpoint; enable add to cart button on products. Carts are purely JS and are used to demo Local Storage
- Add header requirements to /api/reviews for Referer and X-Csrf-Token to demo header locking
- Add multiple product request api through post to /api/products with multiple id values, e.g. {"id": [1,2,3,4]}
- Improve styling, especially for mobile
- Improve openapi docs with examples, default values and more info (/docs)