Well, you could look it up in Wikipedia… But since you want an explanation, I’ll do my best here:
They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there’s a one-way mapping going on. It’s always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it’s capable of inputting (If you feed every possible 1mb file into MD5, you’ll get a ton of collisions).
The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.
Now, if you wanted to decode the hash, you’d first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you’d need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you’d have over 10^64 possibilities to try. And that’s just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it’s all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it’s actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.
They provide a 1:1 mapping between an arbitrary length input and output. And they are always reversible. The important thing to note is that it’s reversible using some method. And it’s always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.
Use a hash function when you want to compare a value but can’t store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don’t want to store them plain-text for security reasons (and shouldn’t). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn’t work as well in practice because of re-encoding, changing file headers, etc, but it’s an example use-case).
Use a hash function when you’re checking validity of input data. That’s what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That’s why it’s recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don’t really care what the password was, you care if it’s the same as the one that was stored. That’s why you should use hashes for passwords.
Use encryption whenever you need to get the input data back out. Notice the word need. If you’re storing credit card numbers, you need to get them back out at some point, but don’t want to store them plain text. So instead, store the encrypted version and keep the key as safe as possible.
Hash functions are also great for signing data. For example, if you’re using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So, you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it’s the same, you know it wasn’t tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.
A key feature of cryptographic hash functions is that they should be very fast to create, and very difficult/slow to reverse (so much so that it’s practically impossible). This poses a problem with passwords. If you store sha512(password), you’re not doing a thing to guard against rainbow tables or brute force attacks. Remember, the hash function was designed for speed. So it’s trivial for an attacker to just run a dictionary through the hash function and test each result.
Adding a salt helps matters since it adds a bit of unknown data to the hash. So instead of finding anything that matches md5(foo), they need to find something that when added to the known salt produces md5(foo.salt) (which is very much harder to do). But it still doesn’t solve the speed problem since if they know the salt it’s just a matter of running the dictionary through.
So, there are ways of dealing with this. One popular method is called key strengthening (or key stretching). Basically, you iterate over a hash many times (thousands usually). This does two things. First, it slows down the runtime of the hashing algorithm significantly. Second, if implemented right (passing the input and salt back in on each iteration) actually increases the entropy (available space) for the output, reducing the chances of collisions. A trivial implementation is:
There are other, more standard implementations such as PBKDF2, BCrypt. But this technique is used by quite a few security related systems (such as PGP, WPA, Apache and OpenSSL).
The bottom line, hash(password) is not good enough. hash(password + salt) is better, but still not good enough… Use a stretched hash mechanism to produce your password hashes…
Do not under any circumstances feed the output of one hash directly back into the hash function:
The reason for this has to do with collisions. Remember that all hash functions have collisions because the possible output space (the number of possible outputs) is smaller than then input space. To see why, let’s look at what happens. To preface this, let’s make the assumption that there’s a 0.001% chance of collision from sha1() (it’s much lower in reality, but for demonstration purposes).
Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1’s rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That’s twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2128) or 1/(3×1038). While that seems small, thanks to the birthday attack it’s not really as small as it seems).
Instead, by re-appending the salt and password each time, you’re re-introducing data back into the hash function. So any collisions of any particular round are no longer collisions of the next round. So:
Has the same chance of collision as the native sha512 function. Which is what you want. Use that instead.
- Transparent Data Encryption (TDE) - msdn.microsoft.com - April 12th, 2018
- Encryption Software Market - Global Forecast to 2022 - March 24th, 2018
- What AES Encryption Is And How It's Used To Secure File Transfers - March 24th, 2018
- Encryption vs. Cryptography - What is the Difference? - March 24th, 2018
- Energy-efficient encryption for the internet of things | MIT News - February 16th, 2018
- The Best Encryption Software - TopTenReviews - February 16th, 2018
- File-Based Encryption | Android Open Source Project - February 7th, 2018
- Beyond Encryption | Secure Enterprise email using existing ... - February 1st, 2018
- Azure Search enterprise security: Data encryption and user ... - January 26th, 2018
- Skype finally getting end-to-end encryption | Ars Technica - January 13th, 2018
- FBI chief says phone encryption is a 'major public safety issue' - January 13th, 2018
- Encryption and Export Administration Regulations (EAR) - December 27th, 2017
- Key (cryptography) - Wikipedia - December 21st, 2017
- What Is Encryption? | Surveillance Self-Defense - December 4th, 2017
- Comodo Disk Encryption Download - softpedia.com - December 1st, 2017
- Encryption - Simple English Wikipedia, the free encyclopedia - November 24th, 2017
- BitLocker Drive Encryption Overview - technet.microsoft.com - November 23rd, 2017
- The Encrypting File System - technet.microsoft.com - November 18th, 2017
- FBI cant break the encryption on Texas shooters smartphone - November 13th, 2017
- DOJ: Strong encryption that we dont have access to is ... - November 13th, 2017
- DOJ Fires Up New War With Apple Over Encryption - November 12th, 2017
- Security Awareness - Encryption | Office of Information ... - October 15th, 2017
- Data Encryption and Decryption (Windows) - October 14th, 2017
- Trumps DOJ tries to rebrand weakened encryption as responsible ... - October 11th, 2017
- How to encrypt (almost) anything | PCWorld - September 22nd, 2017
- Private Internet Access | VPN Encryption - September 21st, 2017
- Encryption Substitutes | Privacy | Encryption - September 21st, 2017
- Data Encryption: Hardware & Software Security: Online ... - September 21st, 2017
- How To Enable BitLocker Drive Encryption In Windows 10? - September 21st, 2017
- PGP Encryption Tool - iGolder - September 21st, 2017
- encryption - How to encrypt String in Java - Stack Overflow - September 21st, 2017
- Encryption Software Market, Size, Trends and Forecast 2020 - September 21st, 2017
- Encryption Definition - Tech Terms - September 20th, 2017
- Why You Should Be Encrypting Your Devices and How to Easily Do It - Gizmodo - September 6th, 2017
- Black Hats, White Hats, and Hard Hats The Need for Encryption in Mining and Resources - Australian Mining - September 6th, 2017
- How can enterprises secure encrypted traffic from cloud applications? - TechTarget - September 6th, 2017
- Encryption Explained - Arizona Daily Wildcat - September 6th, 2017
- News in brief: Call to link encryption to ID; Facebook maps everyone ... - Naked Security - September 2nd, 2017
- 'Independent' gov law reviewer wants users preemptively identified before they're 'allowed' to use encryption - The Register - September 2nd, 2017
- High-Dimensional Quantum Encryption Performed in Real-World ... - Futurism - September 2nd, 2017
- It's Time to Replace Your Encryption-Key Spreadsheet - Data Center Knowledge - September 2nd, 2017
- Legislation to limit smartphone encryption 'may be necessary,' deputy AG Rosenstein says - Washington Times - August 31st, 2017
- Cloud Encryption Market by Component, Service Model, Organization Size, Vertical And Region - Global Forecast to ... - Markets Insider - August 31st, 2017
- Cipher Suites: Ciphers, Algorithms and Negotiating Security Settings - Hashed Out by The SSL Store (registration) (blog) - August 31st, 2017
- Encryption in Office 365 - Office 365 - August 29th, 2017
- Need-to-Know Only: Use Encryption to Make Data Meaningless to ... - Security Intelligence (blog) - August 29th, 2017
- Four strategies to prevent data encryption from hijacking your network - Digital News Asia - August 29th, 2017
- Amber Rudd is wrong - real people do want end-to-end encryption - ITProPortal - August 29th, 2017
- Why encryption is for everyone - IFEX - August 29th, 2017
- 4D quantum encryption successful in first real-world test - New Atlas - New Atlas - August 29th, 2017
- For the First Time Ever, Quantum Communication is Demonstrated in Real-World City Conditions - Futurism - August 26th, 2017
- High-Dimensional Quantum Encryption Takes Place in Real-World ... - Photonics.com - August 26th, 2017
- Hedvig Bakes Encryption into Software-Defined Storage Platform - IT Business Edge (blog) - August 26th, 2017
- Hedvig storage upgrade adds flash tier, encryption options - TechTarget - August 26th, 2017
- How to use EFS encryption to encrypt individual files and folders on Windows 10 - Windows Central - August 26th, 2017
- Cloud Encryption Market Worth 2401.9 Million USD by 2022 - Markets Insider - August 23rd, 2017
- To Protect Genetic Privacy, Encrypt Your DNA - WIRED - August 23rd, 2017
- Data Encryption in OneDrive for Business and SharePoint Online - August 21st, 2017
- Researchers use encryption to keep patients' DNA private - Engadget - August 21st, 2017
- Additional proof that Lancaster County Commissioners should reconsider encrypting police transmissions - LancasterOnline - August 21st, 2017
- iPhone Secure Enclave firmware encryption key leaked - TechTarget - August 21st, 2017
- Encryption, speed push the modern mainframe into the future - TechTarget - August 21st, 2017
- Hardware encryption vs software encryption: the simple guide - Kroll Ontrack UK (press release) (blog) - August 21st, 2017
- Encryption Technology Could Protect the Privacy of Your DNA - Gizmodo - August 21st, 2017
- Beginner's guide to Windows 10 encryption - Windows Central - August 18th, 2017
- Encryption key for iPhone 5s Touch ID exposed, opens door to further research - AppleInsider (press release) (blog) - August 18th, 2017
- How security pros look at encryption backdoors - Help Net Security - August 18th, 2017
- The Laws of Mathematics and the Laws of Nations: The Encryption Debate Revisited - Lawfare (blog) - August 18th, 2017
- 72 percent of security pros say encryption backdoors won't stop terrorism - BetaNews - August 18th, 2017
- Ex-MI5 Boss Evans: Don't Undermine Encryption - Infosecurity Magazine - August 14th, 2017
- Despite end to end encryption, apps like WhatsApp, Messenger are still vulnerable to hacking: Study - Firstpost - August 13th, 2017
- What is Encryption? (with pictures) - wiseGEEK - August 12th, 2017
- Ex-MI5 chief warns against crackdown on encrypted messaging ... - The Guardian - August 12th, 2017
- Former UK security service head says weakening encryption would be too dangerous - 9to5Mac - August 12th, 2017
- News in brief: facial recognition planned for Carnival; spy chief backs encryption; ginger emoji planned - Naked Security - August 12th, 2017
- Avoid getting lost in encryption with these easy steps - We Live Security (blog) - August 12th, 2017
- Here's why IBM Z Mainframe Wants to Encrypt the World - Edgy Labs (blog) - August 10th, 2017
- Symantec Announces Plesk Will Integrate Symantec Encryption Everywhere Security Into Its Website Management ... - Business Wire (press release) - August 10th, 2017
- Australia: Shelve Proposed Law to Weaken Encryption - Human Rights Watch (press release) - August 6th, 2017
- IBM India Helps Create Breakthrough Encryption Technology That's Completely Hacker Proof - Indiatimes.com - August 6th, 2017