Bad Encoding Scenario

Select scenario:
Cyrillic German Mixed CP1252 Latin-1 Invalid UTF-8
Disable iframe
Challenge: The HTTP Content-Type header declares a wrong charset. Your scraper must detect the actual encoding and decode the content correctly.
Declared charset (header)utf-8
Actual encoding (body)utf-8
ScenarioContent with truncated/invalid UTF-8 sequences: broken multi-byte chars, lone continuation bytes, overlong encodings

Invalid UTF-8 Sequences

Truncated 2-byte: caf is missing the second byte.

Truncated 3-byte: price missing last byte of euro sign.

Lone continuation: helloworld.

Mixed valid/invalid: Strae in Mnchen (CP1252 in UTF-8 stream).

Overlong slash: pathfile (overlong encoding).

This paragraph is completely valid UTF-8 and should survive intact.

Illegal bytes: data end.

Embedded iframe (different encoding)

This iframe is served as ISO-8859-1 bytes with header charset=utf-8 — a different mismatch than the main page.


All scenarios | Home