Bad Encoding Scenario

Select scenario:
Cyrillic German Mixed CP1252 Latin-1 Invalid UTF-8
Disable iframe

Challenge: The HTTP Content-Type header declares a wrong charset. Your scraper must detect the actual encoding and decode the content correctly.

Declared charset (header)	`utf-8`
Actual encoding (body)	`utf-8`
Scenario	Content with truncated/invalid UTF-8 sequences: broken multi-byte chars, lone continuation bytes, overlong encodings

Invalid UTF-8 Sequences

Truncated 2-byte: caf� is missing the second byte.

Truncated 3-byte: price � missing last byte of euro sign.

Lone continuation: hello��world.

Mixed valid/invalid: Stra�e in M�nchen (CP1252 in UTF-8 stream).

Overlong slash: path��file (overlong encoding).

This paragraph is completely valid UTF-8 and should survive intact.

Illegal bytes: data�� end.

Embedded iframe (different encoding)

This iframe is served as ISO-8859-1 bytes with header charset=utf-8 — a different mismatch than the main page.

All scenarios | Home