Why password encryption matters
In today's digital age, protecting our personal and sensitive information is more important than ever before. One of the most fundamental ways of securing our online accounts and digital assets is by using strong passwords. However, even the strongest passwords can be compromised if they are not properly encrypted and stored. This is where password encryption comes into play. So, how does password encryption work?
Password encryption can be used at different levels, such as in applications, file systems, and databases. We are particularly interested in the latter encryption, as database password protection is crucial to prevent unwanted intruders from inspecting the information contained inside. Breaching into the database means the cyber attacker can perform malicious actions if granted access. This can lead to a variety of security risks, including identity theft, fraud, and data breaches.
In this blog post, we will analyze how to choose a password encryption algorithm for your database, given the circumstances and the level of security and performance that you want to achieve. We will look at two opposing scenarios of a fictional company X and will present BCrypt and SHA-2, two different hashing algorithms which you can opt for at any given time. At the end of this article, you should have no more doubts about the best data encryption standard for your company!
Strong password encryption optimizes for security
Company X uses database DB for their day-to-day operations. The industry is not known to us, but the leadership is very worried about DB’s security since they don’t know if keeping their extremely sensitive and confidential information in the DB is secure enough. They don’t have a lot of reads and writes to the database, and most of the work is done via the database administrator, who is granted visibility over the sensitive information and data.
The database offers two different password encryption algorithms, BCrypt and SHA-2. It seems that BCrypt is more secure, so they opt for the secure hashing algorithm.
What makes BCrypt so secure?
BCrypt is a hashing function - a slow and computationally expensive algorithm but resistant to brute-force attacks and other common forms of password cracking.
BCrypt takes a password as input data and applies a series of transformations to it, resulting in a fixed-length output string, known as the hash, unique to each inputted password.
When a user creates a password, it is first hashed using the BCrypt hash function and then stored in a database. When the user logs in, their input password is hashed using the same algorithm. Then the hashed password is compared to the stored hash. If the hashes match, the user is authenticated and granted access to the system.
Blowfish password encryption as a core
At its core, BCrypt uses the Blowfish encryption algorithm to perform hashing. Blowfish is a symmetric-key block cipher that was designed by Bruce Schneier in 1993. It operates on 64-bit blocks of data and supports key sizes of up to 448 bits.
In addition to Blowfish, BCrypt also incorporates a technique called key stretching, which makes the hashing function more computationally expensive and resistant to hacker attacks. Key stretching involves repeatedly applying a cryptographic function to the password hash, which increases the amount of time and resources required to perform a successful attack.
Let’s use BCrypt for all the use cases if it's so secure
Meanwhile, in company X, after penetration testing had been performed, leadership was extremely encouraged to approve the BCrypt password encryption to be used as the hashing algorithm to secure passwords in all the databases in the company. They thought all their security concerns would be solved and that they didn’t need to worry about any data breach.
They were only partially correct because, at the same time, some other problems emerged in one of the databases using BCrypt. The analytics department of company X was using DB’ since it had optimized reads and a low memory footprint. The benchmarks, which were showing incredible performance, were now timing out for an unknown reason. Surely, the reason would be the BCrypt algorithm which at the time was being used in all of the databases. But why would such a harmless change make any difference?
A faster password encryption algorithm optimizes database performance (in some cases)
The benchmarking workload in company X involved an arbitrary number of concurrent clients running in parallel to execute as many queries as possible. For some reason, the benchmarks were very different when running the community version of the database (which had the top performance) versus the enterprise version (which had user authentication).
In the picture above, we can see the performance graph that depends on the number of clients that concurrently execute queries. The more clients connected to the database, the more timed-out queries, which narrows the culprit down to BCrypt performance. Since BCrypt has been optimized to prevent brute-force attacks, it can certainly clog the processor's performance. Instead of executing queries, time is spent calculating the unique hash of the password.
In BCrypt, the slowest and most computationally expensive part of the hash algorithm is the key derivation function (KDF), which is used to stretch the input password into a longer and more complex string before it is hashed using the Blowfish algorithm.
Why is the key derivation function slow?
The reason why the KDF is designed to be slow is to make it harder for attackers to perform malicious actions and to reduce the speed of offline attacks that involve pre-computing random string and hash values for a large number of possible passwords. By increasing the computational cost of hashing, BCrypt makes these types of attacks much more difficult and time-consuming.
Unfortunately, with high-security guarantees, performance benefits dropped when the analytics department chose BCrypt. Their data was not particularly sensitive, and all they wanted was the performance so their day-to-day operations were not delayed. However, DB’ offered another password encryption algorithm, SHA-2. After replacing BCrypt with SHA-2, the user passwords were still encrypted, and the performance was also back to normal. In the graph below, we can see a significant improvement when dealing with concurrent connections using SHA-2 vs. the BCrypt algorithm. The benchmarks show that SHA-2 actually doesn't add any significant performance overhead with respect to the results without password encryption.
What makes SHA-2 faster than BCrypt?
SHA-2 is a fast, general-purpose hash function that is widely used for a variety of cryptographic applications, including digital signatures, message authentication codes, and data integrity checks. It’s optimized for speed and can quickly generate a hash value for a given input. However, it is not specifically designed for password hashing and does not include features like key stretching or salt that are used to make password hashes more resistant to attacks.
Ever since SHA-1 hashing algorithm was proven unsafe, a new version of SHA-2 algorithms (with SHA-256, SHA-384, and SHA-512 versions) emerged and proved to be quite successful in dealing with brute force attacks, as in order to make the computation of password hashes more difficult, one must only need to increase the size of the digest. However, BCrypt is still considered a safer option since it makes the brute-force algorithm itself more costly to compute.
This leaves us with a trade-off: How much security to sacrifice in order to gain edge case optimization performance?
Achieving both performance and security with BCrypt
Although BCrypt was proven non-performant, you can still leverage its security and deal with the actual authentication performance on the application level. Most applications hold a connection pool to the database, which can be optimized based on resources and the system which is hosting the database. It’s not always beneficial to have as many connections in the database as there are users since, potentially, not everyone will be running queries on the database in parallel. A rule of thumb for the maximum number of connections to a database would be:
connections = (corecount 2) + effectivespindlecount
core_count is the number of virtual cores on your systems, and
effective_spindle_count is the number of rotating hard disks on your system. The theory behind the formula lies in the fact that the database can utilize some other connection while a different connection performs a lot of I/O operations, which block thread execution. That means the database can do some other work while the data is being fetched from the disk.
When there are fewer connections open to the database, less BCrypt will be spent during query execution time, and some connections can be left open and reused for further purposes.
The answer to the right password encryption algorithm lies in your workload
Now, let's get back to our initial question: What’s the optimal password encryption algorithm?
The answer, like always, is—it depends.
Based on your day-to-day workload with the database you use, you can narrow your option for password encryption algorithms. In the first scenario, we saw that company X prioritized using a more secure BCrypt algorithm to make sure the sensitive data is protected at all costs.
In the second scenario, the company opted for a little bit less secure but more performant SHA-2 since the results and values from the analysis were far more critical than its security.
SHA-2 pros and cons
In the bullet points below, we can see some of the summarized pros & cons of SHA-2:
- SHA-2 is faster than BCrypt
- Implementation is cheaper as less computing power is needed to compute hash values
- The algorithm was not originally designed for password hashing
- Vulnerable to common attacks such as brute-force and rainbow table attacks
BCrypt pros and cons
As for BCrypt:
- It is designed for password hashing
- Salting passwords increases the complexity, thus making them more secure
- The hashing process is slower in comparison to SHA-2
- It was not designed for encrypting large amounts of data
Not everything is always black and white, and neither are development workloads and the operations you do on your database.
In order to make sure that you have the best password encryption algorithms for your workloads, it’s best to run benchmarks and E2E tests on your own system since datasets and system architectures can vary from company to company.
In Memgraph, our development team’s first supported password encryption algorithm was BCrypt, and it was done to maximize security. However, with respect to all edge cases that could happen in a system, we decided to support SHA-2 as well in order to retrieve optimal performance for a minimum sacrifice in security. Now clients can use either encryption algorithm, depending on their business needs. Giving users a choice between different algorithms, combined with professional advice in under-the-hood blog posts, puts users in a great position to create their ideal environment of tools.