Skip to main content

🔢 Hashing

A fingerprint for data

The Fingerprint Analogy

A fingerprint is unique to you:

  • Can identify you among millions
  • Can't recreate your body from a fingerprint
  • Same person typically = Same fingerprint

Hashing creates a digital fingerprint of data. For a given hash function, the same input produces the same output, but you can't feasibly reverse it.


What Is Hashing?

Input of (almost) any size → Fixed-size output (hash/digest)

"Hello"              → 185f8db32271fe25
"Hello World"        → 64ec88ca00b268
A large file         → 7d865e959b2466

Fixed size for a chosen algorithm, regardless of input size.

Key Properties

1. DETERMINISTIC
  Same input → Same hash (for the same algorithm and encoding)

2. ONE-WAY
  Can't get input back from the hash in any practical way (for strong cryptographic hashes)

3. FIXED SIZE
   Any input → Same output size

4. AVALANCHE EFFECT
   Tiny input change → Completely different hash

One-Way: Can't Reverse It

Forward (easy):
  "password123" → hash → "ef92b778bafe..."

Reverse (not practical):
  "ef92b778bafe..." → ??? → Can't get "password123"

For strong cryptographic hashes, reversing is designed to be computationally infeasible.
The information is lost during hashing.

Why This Matters

Storing passwords:
  Store hash, not password
  Even if database leaked, passwords unknown!

Attacker has: "ef92b778bafe..."
Attacker wants: "password123"
Attacker method: Guess and check every possible password

Avalanche Effect

"hello" → 5d41402abc4b2a76b9719d911017c592
"Hello" → 8b1a9953c4611296a827abf8c47804d7
"HELLO" → eb61eead90e3b899c6bcbe27ac581660

One tiny change → Completely different hash!

This prevents patterns from being exploited.

Common Hash Functions

For General Use (Fast)

AlgorithmDigest SizeStatus
MD5Fixed⚠️ Legacy (collisions exist)
SHA-1Fixed⚠️ Legacy (collisions exist)
SHA-256Fixedâś… Common choice
SHA-3Variesâś… Modern family

For Passwords (Intentionally Slow)

AlgorithmPurposeStatus
bcryptPassword hashingâś… Standard
scryptMemory-hardâś… Good
Argon2Password hashingâś… Recommended
PBKDF2Key derivationâś… Acceptable

Why Password Hashes Are Different

Fast Hashes = Bad for Passwords

SHA-256: Can be extremely fast on GPUs

Attacker tries every password:
  "a" → hash → compare
  "b" → hash → compare
  "aa" → hash → compare
  ...
  "password123" → MATCH!

Fast hash = Fast cracking.

Slow Hashes = Good for Passwords

bcrypt: Often tens to hundreds of hashes per second (intentionally slow)

Same attack:
  Billions fewer guesses possible
  Cracking takes years instead of hours

Slow down the attacker!

Work Factor

bcrypt(password, cost=12)

Approximate rates on a modern CPU (varies by hardware):
  cost=10: ~10-25 hashes/sec
  cost=12: ~2-6 hashes/sec (PHP 8.4 default)
  cost=14: <1 hash/sec

Higher cost = Slower = More resistant to guessing
Common tuning goal: on the order of a fraction of a second per verification.

Salting: Defeating Rainbow Tables

The Problem

Rainbow table: Pre-computed hashes
  "password" → ef92b...
  "123456" → e10adc...
  "qwerty" → d8578e...

Attacker looks up hash → Instant password!

The Solution: Salt

Salt = Random data added before hashing

User 1: hash("password" + "x7g9A2")  → [unique hash 1]
User 2: hash("password" + "k3mP9z")  → [unique hash 2]

Same password → Different hashes!
Rainbow tables useless.

Each user gets unique random salt.
Salt stored with the hash (not secret).

Use Cases for Hashing

1. Password Storage

Don't store plaintext passwords.

Registration:
  hash = bcrypt(password, salt)
  store(hash)

Login:
  hash = bcrypt(entered_password, stored_salt)
  compare(hash, stored_hash)

2. File Integrity

Download a file, verify it wasn't tampered:

Original SHA-256: a7b3c9f2e...
Your file SHA-256: a7b3c9f2e...
Match? File is authentic!
Match? File matches the expected hash (integrity)

No match? File was modified!

3. Data Structures

Hash tables use hashes for fast lookup:
  hash(key) → bucket location
  ~O(1) average-case lookup!

4. Digital Signatures

Hash the document, sign the hash.
Verify hash of document matches signed hash.

Common Mistakes

1. Using MD5 or SHA-1 for Security

These have known collision vulnerabilities.
Use SHA-256 or SHA-3 for new projects.

2. Using Fast Hashes for Passwords

SHA-256(password) is fast to crack.
Use bcrypt, scrypt, or Argon2 instead.

3. No Salt

Same password → Same hash
Rainbow tables can work well.
Use a unique salt per password.

4. Weak Salt

Salt should be:
  - Random
  - Unique per user
  - Long enough (often 16+ bytes)

Not: username, empty, or constant.

FAQ

Q: Hashing vs Encryption?

Hashing: one-way, can't get original back Encryption: two-way, can decrypt with key

Q: Why not encrypt passwords?

If you can decrypt, so can attackers who get the key. One-way hashing is safer.

Q: Is bcrypt still good?

Yes. bcrypt is still widely used. Argon2 is newer and often recommended for new projects.

Q: How do I verify a hashed password?

Hash the entered password with same algorithm/salt, compare hashes.


Summary

Hashing creates one-way, fixed-size fingerprints of data - essential for password storage and data integrity.

Key Takeaways:

  • Hash = one-way function, can't reverse
  • Same input → Same output (deterministic)
  • Small change → Completely different hash
  • Use bcrypt/Argon2 for passwords (slow!)
  • Use a unique salt with passwords
  • Use SHA-256/SHA-3 for general hashing

Hashing helps protect passwords even if your database is stolen.

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.