Your Django application may contain sensitive user information that would be very harmful if leaked. There are many kinds of data you might want to protect: imagine a financial application that stores credit card data. Leaks happen all the time, companies suffer terrible PR consequences, and customer trust is lost.
What extra precautions can you take as an app developer to prevent this?
Encrypting sensitive data is one potential solution. But first, let’s learn about what kinds of encryption exist. We’ll also touch on the difference between encryption and hashing, as it’s important to understand this.
What are the different kinds of encryption?
It’s important to be clear about what kind of encryption we’re referring to. There is “encryption at rest” and there is encryption at the application level. These solutions operate at different levels and solve specific problems.
Encryption at Rest
Encryption at rest refers to the practice of disk encryption, most often seen in a cloud provider such as AWS or Google Cloud. This usually involves the cloud provider using their own key to encrypt and decrypt your data. Some providers may allow you to supply your own encryption key. From your perspective, the data is always available and appears to be decrypted.
With encryption at rest, your data is protected against scenarios such as the physical theft of the server or disk. Someone that hacks your application, however, would probably still gain access to the data.
Application Level Encryption
Application level encryption implies that you are responsible for the encryption and decryption of the data. In other words, if you were to examine a database table, the row data would be encrypted. Code in your application would encrypt the data going into the database and do the opposite with data coming out.
With this level of encryption, you are protected against both physical compromise of the server, and the prospect of malicious parties hacking the application. An intruder, for instance, may gain access to the database, but as long as they do not have the encryption key, the data would be unusable.
Encryption vs. Hashing
We must properly distinguish between encryption and hashing.
Encryption is a reversible process, i.e. with the correct key, the data can be decrypted. Hashing, on the other hand, is irreversible.
When should you encrypt vs. hash data?
Hashing is useful when you don’t need or want to know the actual value, but you need to verify it against the real value.
The classic scenario for hashing is password storage. You don’t want to store the actual passwords, but you need to verify they are entered correctly.
PBKDF2 is an example of a strong hashing algorithm. The amount of work needed to calculate a hash determines its strength. A strong hashing algorithm is important if an attacker should gain access to your data. When a hashing algorithm is too weak, an attacker can try millions of password permutations until they find a match in the original data.
Adding Encryption to a Django Application
We’ll use the django-encrypted-model-fields package to add field level encryption to database models.
Using Django Encrypted Model Fields
First install the django-encrypted-model-fields package using pip.
pip install django-encrypted-model-fields
Next we need to configure settings.py for usage of the package. Add encrypted_model_fields to the INSTALLED_APPS list.
INSTALLED_APPS = [ ... encrypted_model_fields, ]
We also need to provide an encryption key for the package to use.
FIELD_ENCRYPTION_KEY = os.environ.get('FIELD_ENCRYPTION_KEY', '')
import os import base64 new_key = base64.urlsafe_b64encode(os.urandom(32)) print(new_key)
And finally, here is an example of using the package to mark a specific field as encrypted.
from encrypted_model_fields.fields import EncryptedCharField class UserProfile(models.Model): encrypted_name = EncryptedCharField(max_length=30)
It’s important to note that once a field is marked as encrypted, it cannot be sorted or filtered as part of a query. If you’re going to be encrypting a large amount of data, you may also want to consider offloading that work to a Celery queue so that it can be done in the background.
Securely Storing the Encryption Key
Encrypting the database fields is only half the battle when it comes to security. After all, the encrypted data is only as secure as the key. Once an intruder has acquired the encryption key, the data is up for grabs. Therefore, it’s important to think about how to securely handle the key.
If you’re deploying to a major cloud provider there are services such as Amazon KMS (Key Management Service) that can securely store your keys. There are also self-hosted options for secure key storage. The most popular among them is probably HashiCorp Vault, but be prepared for a significant learning curve.
The goal is to avoid the compromise of a single VM leading to a total lapse in security. Separating storage of the key from the app host is an essential safeguard.