Content-Type header declares a wrong charset.
Your scraper must detect the actual encoding and decode the content correctly.
| Declared charset (header) | utf-8 |
|---|---|
| Actual encoding (body) | cp1252 |
| Scenario | Raw CP1252 bytes with smart quotes and special symbols but header declares charset=utf-8 |
“Hello World” – said the ‘developer’ with a …pause.
Price: €100 – €200 (euro sign in CP1252)
Trademark™ and Copyright© symbols.
This café serves crème brûlée.