web-scraping.dev

web-scraping.dev is a mock website for testing and learning about web scraping. It covers popular web patterns encountered in web scraping so take a look at the scenarios section for details.

This platform is used in:

Refer to ☝️ for learning web scraping.

Scenarios

web-scraping.dev implements many web patterns encountered in modern web scraping:

Changelog

v1.3.0
  • Change /login page to not prefil and not show cookie pop up by default though the behavior is still available through url flags cookies and prefill
  • Add testimonial summary widget to /testimonials
  • Add similar products widget to /product/n pages
  • Add /sitemap.xml and /robots.txt endpoints
  • Add PDF download link and js powered button to the /login page
  • Add /blocked page which emulates redirect to 200 status block page. This endpoint also supports ?persist url parameter flag for persisting blocking through a blocked=true cookie.
  • Add /credentials page (linked on /login) which redirects to /blocked if Referer header is not set to https://web-scraping.dev/login
  • Add Graphql endpoint to /api/graphql
  • Add product reviews objects and relay type paging to graphql
  • Add /reviews page which uses graphql relay type paging
  • Add data-testid markup to /reviews to simulate a common automated web test markup that is ideal for scraping parsing
  • Add target=_blank pages and window.open(url, "_blank") urls to /reviews that simulate a common pattern of forcing links to open in a new page
v1.2.0
  • Change header requirement for /api/reviews to require only x-csrf-token header (secret-csrf-token-123)
  • Change header requirement for /api/testimonials to require only referer header (https://web-scraping.dev/testimonials)
v1.1.0
  • Add cookie popup modal to /login
  • Add cart system: see cart preview button at the top and the /cart endpoint; enable add to cart button on products. Carts are purely JS and are used to demo Local Storage
  • Add header requirements to /api/reviews for Referer and X-Csrf-Token to demo header locking
  • Add multiple product request api through post to /api/products with multiple id values, e.g. {"id": [1,2,3,4]}
  • Improve styling, especially for mobile
  • Improve openapi docs with examples, default values and more info (/docs)