Streamsets
Described here is how to install and configure the Streamsets plugin for Privacera Encryption and Ranger.
Prerequisites
You should already have a working Streamsets installation.
Privacera Encryption in Streamsets Data Collector (SDC)
Enable Encryption for SDC
-
Run the following command:
cd ~/privacera/privacera-manager/config cp sample-vars/vars.crypto.streamset.yml custom-vars/vars.crypto.streamset.yml
-
Run the update.
cd ~/privacera/privacera-manager/ ./privacera-manager.sh update
Configure Encryption for SDC
-
Copy the Streamsets Privacera package.
-
If you have Streamsets and Privacera Manager running on different systems, copy the following two files from the location,
~/privacera/privacera-manager/output/streamset/
of the Privacera Manager host machine:- privacera-streamset.tar.gz
- crypto-config
If you have JCEKS enabled, copy the following file from the location,
~/privacera/privacera-manager/config/keystores/
of the Privacera Manager host machine:- cryptoprop.jceks
-
If you have Streamsets and Privacera Manager running on same systems, do the following:
cp ~/privacera/privacera-manager/output/streamset/privacera-streamset.tar.gz ~/privacera/downloads cp -r ~/privacera/privacera-manager/output/streamset/crypto-config ~/privacera/downloads/crypto-config
If you have JCEKS enabled, do the following:
cp ~/privacera/privacera-manager/config/keystores/cryptoprop.jceks ~/privacera/downloads/crypto-config/
-
-
Extract the Streamsets Privacera package.
cd ~/privacera/downloads mkdir streamsets tar xfz ~/privacera/downloads/privacera-streamset.tar.gz -C streamsets
-
Access the Streamsets installation directory as root user.
sudo su
-
Set Streamsets installation directory.
export STREAMSET_HOME=/opt/streamset/streamsets-datacollector-3.13.0
-
Copy the Privacera library into the Streamsets data collector
user-libs
directory:cp -r streamsets/privacera-streamset/ $<STREAMSET_HOME>/user-libs/
-
Copy the configuration files.
cp -r crypto-config $<STREAMSET_HOME>/../crypto-config
-
Define security policy.
cat << EOF >> $<STREAMSET_HOME>/etc/sdc-security.policy grant < permission java.io.FilePermission "/opt/privacera/-", "read"; permission java.io.FilePermission "/opt/streamset/-", "read,write"; permission java.net.SocketPermission "*", "connect,accept,listen,resolve"; >; EOF
-
Stop Streamsets.
kill -9 $(ps aux | grep 'sdc' | awk '<print $2>')
-
Restart Streamsets.
ulimit -n 32768 nohup $<STREAMSET_HOME>/bin/streamsets dc &
-
Verify the logs to make sure that Streamsets is running.
tail -f $<STREAMSET_HOME>/log/sdc.log
Verification
-
Configure a sample pipeline to encrypt a local file. You can use the following sample. Import this sample pipeline into Streamsets.
-
Access the Streamsets installation directory as root user.
sudo su
-
Create data directories.
DATA_DIR=/opt/streamset/ cd $<DATA_DIR> mkdir -p customer_data/input mkdir -p customer_data/output mkdir -p customer_data/input_error mkdir -p customer_data/output/encrypted_error
-
Create a sample data file:
cat << EOF > customer_data/input/customer_data_with_header.csv id,name,ssn,email_address,amount 1,Tamara,898453744,aphillips@vang.info,162454.67 2,Richard,65511350,vreynolds@gmail.com,602.89 3,Tanya,634090950,harringtonwilliam@diaz-king.com,48712.67 4,Richard,829439881,martinvalerie@yahoo.com,5122.02 5,Raymond,227804351,sarachavez@yahoo.com,97963.857 6,Melissa,553465892,kevinwillis@gmail.com,36654.806 7,Deborah,782539839,brittney24@yahoo.com,19.231 8,Rodney,515337130,jenniferkelly@davis-bond.biz,65083.651 9,Katherine,137057143,jperkins@gmail.com,4822.343 10,David,432941241,wmccann@hotmail.com,4069.34 EOF
-
Create a metadata file to map the input dataset columns to Privacera Encryption schema columns:
cat << EOF > customer_data/customer_data.meta COLUMN_NAME|SCHEME_NAME id| name|SYSTEM_PERSON_NAME ssn|SYSTEM_SSN email_address|SYSTEM_EMAIL amount| EOF
To run the sample pipeline, make sure you have the Privacera user created in your Ranger and it has permissions on the KMS keys starting with pmsk*.
Ranger Configuration: Add Permission for Keys
-
Login to the Ranger UI as an administrator and create the Privacera user. You can grant permissions to the Privacera user on keys.
-
Login to Ranger with keyadmin credentials and click on
privacera_kms
. -
Create or update policy for Privacera user.
-
Now run the Streamsets pipeline preview and verify the encrypted value on right side of the table as shown in the screenshot below.