web-scraping.dev
web-scraping.dev is a mock website for testing and learning about web scraping. It covers popular web patterns encountered in web scraping so take a look at the scenarios section for details.
This platform is used in:
Refer to ☝️ for learning web scraping.
Scenarios
web-scraping.dev implements many web patterns encountered in modern web scraping:
Changelog
v1.3.1
- Add query params
relative_url=true
to render page with relative URL instead of absolute - Add vertical/horizontal table on /product/n pages
- Add breadcrumb navigation on /product/n pages where urls are always relative
v1.3.0
- Change /login page to not prefil and not show cookie pop up by default though the behavior is still available through url flags
cookies
andprefill
- Add testimonial summary widget to /testimonials
- Add similar products widget to /product/n pages
- Add /sitemap.xml and /robots.txt endpoints
- Add PDF download link and js powered button to the /login page
- Add /blocked page which emulates redirect to 200 status block page. This endpoint also supports
?persist
url parameter flag for persisting blocking through ablocked=true
cookie. - Add /credentials page (linked on /login) which redirects to /blocked if
Referer
header is not set tohttps://web-scraping.dev/login
- Add Graphql endpoint to /api/graphql
- Add product reviews objects and relay type paging to graphql
- Add /reviews page which uses graphql relay type paging
- Add
data-testid
markup to /reviews to simulate a common automated web test markup that is ideal for scraping parsing - Add target=_blank pages and window.open(url, "_blank") urls to /reviews that simulate a common pattern of forcing links to open in a new page
v1.2.0
- Change header requirement for /api/reviews to require only
x-csrf-token
header (secret-csrf-token-123
) - Change header requirement for /api/testimonials to require only
referer
header (https://web-scraping.dev/testimonials
)
v1.1.0
- Add cookie popup modal to /login
- Add cart system: see cart preview button at the top and the /cart endpoint; enable add to cart button on products. Carts are purely JS and are used to demo Local Storage
- Add header requirements to /api/reviews for Referer and X-Csrf-Token to demo header locking
- Add multiple product request api through post to /api/products with multiple id values, e.g. {"id": [1,2,3,4]}
- Improve styling, especially for mobile
- Improve openapi docs with examples, default values and more info (/docs)