🔤 Gibberish Text - Unicode Obfuscation
Extract clean cipher text from Unicode-obfuscated gibberish content.
🧪 Live Demo
Below you can see the raw gibberish text and the cleaned result side by side:
Raw Text (with invisible characters):
Character count: 5222 characters (includes 5188 invisible Unicode characters)
Cleaned Text (visible characters only):
Character count: 34 characters | Removed: 5188 invisible characters (99.3% of original)
📋 Technical Details
- Method: GET
- Endpoint:
/api/gibberish-text - Response Type: Plain text with Unicode obfuscation
- Challenge Type: Text cleaning, Unicode normalization
- Difficulty: Intermediate
🎯 What You Should Get
After cleaning with the code examples below, you should extract this cipher (same as shown in the cleaned text above):
Неllо, thе сiрhеr is: sсrарflу2017
Note: This cipher contains visible Cyrillic letters mixed with special Unicode characters, but no invisible zero-width spaces. The challenge is removing the 5188 invisible characters to reveal this clean text.
- Prevent simple copy-paste or text extraction
- Break automated parsers that don't handle Unicode properly
- Hide content fingerprints from bot detection systems
- Obfuscate email addresses or sensitive data from scrapers
💻 Code Examples
▶ Method 1: Python with Unicode Category Filtering
The most precise approach - filter out invisible Unicode characters by category.
import requests
import unicodedata
# Fetch the gibberish text
response = requests.get("https://web-scraping.dev/api/gibberish-text")
raw_text = response.text
# Define invisible Unicode categories to remove
invisible_categories = {
'Cf', # Format characters (includes zero-width spaces)
'Cc', # Control characters
'Cs', # Surrogate characters
'Co', # Private use characters
}
# Filter out invisible characters
cleaned_text = ''.join(
char for char in raw_text
if unicodedata.category(char) not in invisible_categories
)
print(f"Original length: {len(raw_text)}")
print(f"Cleaned length: {len(cleaned_text)}")
print(f"Removed: {len(raw_text) - len(cleaned_text)} invisible characters")
print(f"\nCleaned text:\n{cleaned_text}")
▶ Method 2: JavaScript with Unicode Regex
Browser-based approach using regular expressions to remove invisible characters.
// Fetch the gibberish text
const response = await fetch('/api/gibberish-text');
const rawText = await response.text();
// Remove invisible Unicode characters
// This regex matches common invisible characters
const cleanedText = rawText.replace(/[\u200B-\u200D\uFEFF\u2060-\u2069\u202A-\u202E]/g, '');
console.log(`Original length: ${rawText.length}`);
console.log(`Cleaned length: ${cleanedText.length}`);
console.log(`Removed: ${rawText.length - cleanedText.length} invisible characters`);
console.log(`\nCleaned text:\n${cleanedText}`);
▶ Method 3: Python with Regex Pattern
Alternative approach using regex to match and remove invisible characters.
import requests
import re
response = requests.get("https://web-scraping.dev/api/gibberish-text")
raw_text = response.text
# Regex pattern for common invisible Unicode characters
invisible_pattern = r'[\u200B-\u200D\uFEFF\u2060-\u2069\u202A-\u202E]'
# Remove invisible characters
cleaned_text = re.sub(invisible_pattern, '', raw_text)
print(f"Cleaned text: {cleaned_text}")
▶ Method 4: cURL with ScrapFly API
Use ScrapFly's API to fetch the content, then clean it locally. ScrapFly handles proxies, browser rendering, and anti-bot bypassing.
# Fetch with ScrapFly API (replace YOUR_API_KEY with your actual key)
curl "https://api.scrapfly.io/scrape?key=YOUR_API_KEY&url=https://web-scraping.dev/api/gibberish-text" \
| jq -r '.result.content' \
| python3 -c "
import sys
import unicodedata
# Read from stdin
raw_text = sys.stdin.read()
# Filter out invisible Unicode characters
invisible_categories = {'Cf', 'Cc', 'Cs', 'Co'}
cleaned_text = ''.join(
char for char in raw_text
if unicodedata.category(char) not in invisible_categories
)
print(cleaned_text)
"
Note: Sign up at scrapfly.io to get your API key. Free tier includes 1,000 API credits/month.
▶ Method 5: ScrapFly Python SDK
Using ScrapFly Python SDK with post-processing for cleaning.
from scrapfly import ScrapflyClient, ScrapeConfig
import unicodedata
# Initialize ScrapFly client
client = ScrapflyClient(key='YOUR_API_KEY')
# Scrape the page
result = client.scrape(ScrapeConfig(
url='https://web-scraping.dev/api/gibberish-text',
render_js=False, # No JavaScript needed for this endpoint
))
raw_text = result.content
# Clean the text
invisible_categories = {'Cf', 'Cc', 'Cs', 'Co'}
cleaned_text = ''.join(
char for char in raw_text
if unicodedata.category(char) not in invisible_categories
)
print(f"Original: {len(raw_text)} chars")
print(f"Cleaned: {len(cleaned_text)} chars")
print(f"Removed: {len(raw_text) - len(cleaned_text)} invisible chars")
print(f"\nCleaned text:\n{cleaned_text}")
Install the SDK: pip install scrapfly-sdk
📚 Common Invisible Unicode Characters
| Character | Code Point | Name | Category |
|---|---|---|---|
\u200B |
U+200B | Zero Width Space | Cf (Format) |
\u200C |
U+200C | Zero Width Non-Joiner | Cf (Format) |
\u200D |
U+200D | Zero Width Joiner | Cf (Format) |
\uFEFF |
U+FEFF | Zero Width No-Break Space | Cf (Format) |
\u2060 |
U+2060 | Word Joiner | Cf (Format) |
\u2061-\u2069 |
U+2061-U+2069 | Invisible Math Operators | Cf (Format) |
\u202A-\u202E |
U+202A-U+202E | Bidirectional Text Control | Cf (Format) |