Encoding and Decoding
beginner
encoding
base64
url-encoding
unicode
| Encoding Type | Description | Encoding Example | Used in Offense / Defense | Security Relevance |
|---|---|---|---|---|
| UTF-8 | Variable-length encoding for Unicode (1–4 bytes). Most widely used. | "A" → 0x41, "é" → 0xC3 0xA9 |
✅ Common in XSS, LFI, filter bypass | Attackers mix character representations to evade input filters. Normalize before processing. |
| UTF-16 / UTF-32 | Fixed-width Unicode encodings. Often with Byte Order Mark (BOM). | "A" → 0x00 0x41 (UTF-16) |
⚠️ Used in obfuscated PowerShell, malware droppers | Bypasses detection if BOM not handled. Normalize and scan memory for variants. |
| ASCII | 7-bit encoding for basic Latin alphabet. | "A" → 0x41 |
✅ Used in shellcode, malware loaders | ASCII art payloads and shell launchers. Straightforward for basic filters. |
| ISO-8859-1 (Latin-1) | 8-bit encoding for Western European characters. | "é" → 0xE9 |
⚠️ Exploited in charset confusion attacks | Can cause XSS/XSRF if web server misinterprets encoding. Normalize charset headers. |
| Windows-1252 | Superset of Latin-1 with smart quotes and symbols. | "“" → 0x93 |
⚠️ Used in phishing to replace characters deceptively | Smart quotes / symbol abuse. Leads to visual deception and filter bypass. |
| Base64 | Binary-to-text using 64-character ASCII subset. | "Hi" → SGk= |
✅ Common in malware payloads, obfuscation, phishing emails | Decode PowerShell commands, scripts, and email content to detect payloads. |
| Hexadecimal | Represents each byte in base-16. | "Hi" → 0x48 0x69 |
✅ Shellcode, encoded payloads, registry obfuscation | %25 encoding in URLs, malware configs, and evasion techniques. Decode before logging. |
| Binary (ASCII) | Text encoded in 8-bit binary form. | "A" → 01000001 |
⚠️ Used in stego, low-level keyloggers | Covert channels or firmware attacks. Rare in phishing, but valuable in forensics. |
| Morse Code | Dots and dashes representing characters. | "S" → ... |
🧪 Seen in advanced phishing (e.g., Morse obfuscated JS) | Used to sneak past traditional filters. Microsoft 2021 Nobelium case. |
| Braille (Unicode) | Unicode for raised dot system (visually impaired). | "A" → ⠁ (U+2801) |
🧪 Rare — used in Unicode stego and obfuscation | Unicode abuse in detection evasion or visual trickery (e.g., malicious PDF). |
| QR Code / Barcode | Graphical encoding of data. | "Hello" → 📷 QR Code |
✅ Used in ransomware notes, phishing attachments | Attackers embed malicious URLs. Always validate before scanning codes in SOC environments. |
| Caesar Cipher | Rotational substitution cipher (e.g., ROT3). | "ABC" → "DEF" |
🧪 Seen in CTFs, basic malware obfuscation | Easy to decode. Sometimes still used in red team tricks or forum malware. |
| ROT13 | Caesar cipher with fixed 13-letter rotation. | "HELLO" → "URYYB" |
🧪 Used for masking C2 commands or jokes | Occasionally used in old scripts or for visual bypass. Decode automatically during threat hunting. |
| URL Encoding | Encodes special characters as %XX. |
" " → %20, "é" → %C3%A9 |
✅ Common in XSS, LFI, SQLi, and payload delivery | Critical vector in web attacks. Normalize URLs before validation/logging. |
| HTML Entities | Reserved HTML characters encoded for safe display. | "<" → <, "&" → & |
✅ Obfuscates malicious HTML in XSS payloads | Decode during sanitization to avoid injection via entity abuse. |
| Base32 | Binary-to-text using 32-character ASCII set. | "Hi" → JBSWY=== |
✅ Used in 2FA (TOTP), DNS tunneling, malware exfil | Base32 decoding in DNS logs helps detect covert channels. |
| Base58 | Modified Base64 without ambiguous characters. | "123" → BukQL |
✅ Used in Bitcoin, ransomware payment addresses | Seen in crypto wallet IDs and blockchain-related phishing or ransom notes. |
| Punycode | Encodes Unicode domain names into ASCII. | "münich.de" → xn--mnich-kva.de |
✅ Used in phishing via homograph domain spoofing | ɡoogle.com ≠ google.com. Enable IDN detection in email gateways and browsers. |
| Percent-Encoding | Like URL encoding — encodes special characters. | "@" → %40 |
✅ Used for bypass in query strings and POST bodies | Exploited in directory traversal and SQLi. Normalize percent sequences before analysis. |
| Quoted-Printable | MIME encoding for 8-bit text in emails. | "é" → =E9 |
✅ Used by Emotet, QakBot, and phishing kits | Email filters must decode this to extract URLs, attachments, or commands. |