To meet the requirements for protecting user data security, Apache Cloudberry supports Transparent Data Encryption (TDE).
TDE is a technology used to encrypt database data files:
The key management module is the core component of TDE, implementing a two-tier key structure: master key and DEK. The master key is used to encrypt the DEK and is stored outside the database; the DEK is used to encrypt database data and is stored in the database in ciphertext.
Encryption algorithms are divided into the following types:
Block encryption algorithms in symmetric encryption are the mainstream choice, offering better performance than stream encryption and asymmetric encryption. Apache Cloudberry supports two block encryption algorithms: AES and SM4.
AES is an internationally standardized block encryption algorithm, supporting 128, 192, and 256-bit keys. Common encryption modes include:
More ISO/IEC encryption algorithms include:
Before using the TDE feature, ensure the following conditions are met:
When deploying Apache Cloudberry, you can enable the TDE feature through settings, making all subsequent data encryption operations completely transparent to users. To enable TDE during database initialization, use the gpinitsystem command with the -T parameter. Apache Cloudberry supports two encryption algorithms: AES and SM4. Here are examples of enabling TDE:
Using the AES256 encryption algorithm:
gpinitsystem -c gpinitsystem_config -T AES256
Using the SM4 encryption algorithm:
gpinitsystem -c gpinitsystem_config -T SM4
The transparent data encryption feature is invisible to users, meaning that enabling or disabling this feature does not affect the user experience during read and write operations. However, to verify the effectiveness of encryption, you can simulate a key file loss scenario and ensure that the database cannot start without the key file by following these steps.
The key file is located on the Coordinator node. To locate the key file, first find the data directory of the Coordinator node. For example:
COORDINATOR_DATA_DIRECTORY=/home/gpadmin/work/data0/master/gpseg-1
Then, find the key files:
$ pwd /home/gpadmin/work/data0/master/gpseg-1 $ ls -l pg_cryptokeys/live/ total 8 -rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 relation.wkey -rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 wal.wkey
The relation.wkey file is the key used to encrypt data files, while the wal.wkey file is used to encrypt WAL logs. Currently, only relation.wkey is active; the WAL logs are not yet encrypted.
Create a table and insert data.
Create an append-only (AO) table and insert data:
postgres=# create table ao2 (id int) with(appendonly=true); postgres=# insert into ao2 select generate_series(1,10);
Ensure the data has been successfully inserted.
Stop the database.
gpstop -a
Simulate key file loss.
Switch to the directory where the key files are stored:
cd /home/gpadmin/work/data0/master/gpseg-1/pg_cryptokeys/
Move the key files to another directory (to simulate key file loss):
mv live backup
Attempt to start the database.
Start the database using the gpstart command:
gpstart -a
The database will fail to start because of the missing key files. You will see an error in the database logs on the Coordinator node, similar to the following:
FATAL: cluster has no data encryption keys
This confirms that the database cannot start without the key files, ensuring data security.
Restore the key files by moving the previously backed-up key files back to the original directory:
mv backup live
Restart the database and verify the data.
Start the database again using the gpstart command:
gpstart -a
Once the database has successfully started, query the ao2 table to verify the data:
postgres=# select * from ao2 order by id; id ---- 1 2 3 4 5 6 7 8 9 10 (10 rows)
By following these steps, you can verify the effectiveness of the transparent data encryption feature, ensuring that the database cannot start without the key files, thus securing the data at rest.