Setup for Access Management for Apache Flink on Kubernetes¶
This section outlines the steps to set Apache Flink on Kubernetes Cluster with the Privacera Plugin. Please ensure that all prerequisites are completed before beginning the setup process.
Generate the configuration files¶
Perform following steps to configure Apache Flink connector:
- SSH to the instance where Privacera Manager is installed.
- Run the following command to navigate to the
config/
directory.Bash - Run the following command to copy the Apache Flink OLAC yml file from sample-vars, if it's not already present in custom-vars.
Bash -
Update the following properties in the
vars.flink.yml
file, refer below table for the values to be provided.Property Description Sample Value FLINK_HOME_DIR Path where Apache Flink is installed in the Docker image of Apache Flink /opt/flink
FLINK_S3_FS_PLUGIN_DIR Path to the folder which has Apache Flink plugin for S3 /opt/flink/plugins/s3-fs-hadoop
-
Once the configuration has been done, execute the below command to complete the deployment.
- Once the deployment is successful, the setup script and configuration folder i.e
flink_custom_conf.zip
andprivacera_flink_setup.sh
are available in the~/privacera/privacera-manager/output/flink
location of the Privacera Manager host. This needs to be copied to the server where the Docker image of Apache Flink is built.
- Once the deployment is successful, the setup script and configuration folder i.e
Create Apache Flink Docker Image Using Privacera OLAC in Session Mode¶
-
Create the Privacera setup folder on the machine where your Dockerfile for creating Apache Flink Docker exists
-
The installation files can be found on the Privacera Manager host machine. Copy the installation files from Privacera Manager output folder to the machine where you will be building the Docker image for Apache Flink
-
Update your Dockerfile to add the Privacera’s plugin for Apache Flink
Note
Please replace the APACHE_FLINK_VERSION with the version of Apache Flink you are using. Refer Supported Apache Flink Versions for the supported versions.
-
Build the Docker image. You can update the Docker image name and tag as per your naming convention
Bash -
Push the docker image to your Docker registry. In AWS it could be to your AWS ECR.
-
You will need to create a Kubernetes Config Map with a JWT token that will be used by the Privacera plugin to authenticate the service and human user. You can update your existing Config Map to add Privacera’s plugin configuration. Below is the code which needs to be embedded in your Config Map. Please make sure to update the placeholders.
Tip
You need to generate JWT which is valid till the container is running. Since JWT are short lived, you will have to generate them every time before you submit the job.
Here is the template for config map which needs to be created in the Kubernetes cluster.
Note
Please replace the value for JWT_TOKEN_HERE with the actual JWT token
-
Here is a sample
jobmanager-service.yaml
. There is no Privacera specific configuration needed in this file.Note
Below file is only for reference. Nothing specific for Privacera needs to be added here.
-
You will need to update your
jobmanager-session-deployment-non-ha.yaml
fileNote
In the below file please make sure to replace the placeholder DOCKER_IMAGE_URL with the docker url of the custom Apache Flink image with plugin installed
-
You will need to update your
taskmanager-session-deployment.yaml
file with the Privacera specific configurationNote
In the below file please make sure to replace the placeholder DOCKER_IMAGE_URL with the docker url of the custom Apache Flink image with plugin installed
-
Run below command by replacing the PLEASE_CHANGE placeholder with the Kubernetes namespace where you want to deploy Apache Flink
-
Verify the deployment by executing the below command to get all the resources and make sure all the pods are up and running
Bash
Validation¶
To validate the integration of Privacera Access Management with Apache Flink on Kubernetes, follow these steps :
Prerequisites¶
Ensure the following prerequisites are met before proceeding:
- Apache Flink is deployed on a Kubernetes cluster and secured using the steps mentioned in the previous sections.
- Create two resource-based policies in the
privacera_s3
repository in Privacera, one for the source S3 bucket and another for the destination bucket. The user should haveread
access to the source S3 object andwrite
access to the folder where the output S3 object will be written. Ensure these policies align with the paths and operations used in your validation jobs. For example, you might use:- Source S3 Path:
s3://my-source-bucket/data/input/test_object.txt
– This path is used to read data for processing. Note: This file should be present in the specified location. - Destination S3 Path:
s3://my-destination-bucket/data/output/
– This path is used to write the processed output data.
- Source S3 Path:
Steps to Validate¶
-
Connect to the Job Manager Pod
Use
kubectl exec
to access thejob-manager
pod in your Apache Flink Kubernetes cluster. Once connected, navigate to the Flink installation directory and prepare to download the test jar for validation.Make sure to replace
<job-manager-pod-name>
with the actual name of your job-manager pod. -
Download the Test Jar
Replace
<PRIVACERA_BASE_DOWNLOAD_URL>
with your Privacera Manager base download URL, then run the following commands to download and extract the test jar. The test jar contains the Flink job that reads data from an S3 source. -
Verify the Downloaded Files
Check that the jar file
privacera-flink-test-1.0.0-SNAPSHOT-S3ReadWriteJob.jar
and other related files are correctly extracted in theprivacera-flink-test
directory.
Running a Flink Job to Read and Write Data in S3¶
In this use case, we will submit a Flink job that reads data from an S3 source and writes the processed data to an S3 destination.
-
Submit a Flink Job to Read Data from S3 and Write to S3
While still in the
job-manager
pod, submit the following job to read data from an S3 source and write to an S3 destination. Update the placeholders with the correct values:SOURCE_PATH_WITH_PROTOCOL
: The full S3 path (including the protocol) of the source data to read from. For example,s3://my-source-bucket/data/input/test_object.txt
– This path is used to read data for processing.DESTINATION_PATH_WITH_PROTOCOL
: The full S3 path (including the protocol) where the data will be written. For example,s3://my-destination-bucket/data/output/
– This path is used to write the processed output data.
-
Verify the Job Execution
To ensure that the job worked correctly, verify the following:
- Check that the expected output objects are present in the destination S3 path (
s3://my-destination-bucket/data/output/
). The output should reflect the processed data. - Review the audit logs in Privacera to confirm that the access events for both reading from the source S3 and writing to the destination S3 were recorded. This will validate that the data access was correctly managed and audited by Privacera.
Running a Flink Job to Consume Kafka Data and Write to S3¶
In this use case, we will submit a Flink job that reads data from a Kafka source and writes the processed data to an S3 destination.
-
Submit a Flink Job to Read Data from Kafka and Write to S3
Use the following command to submit a job that reads data from a Kafka source and writes to an S3 destination. Replace the placeholders as needed: -
KAFKA_IP
: The IP address of the Kafka host. -KAFKA_PORT
: The port number on which Kafka is running. -DESTINATION_PATH_WITH_PROTOCOL
: The full S3 path (including the protocol) where the data will be written. -
Verify the Job Execution
To ensure that the job worked correctly, verify the following:
- Check that the expected output objects are present in the destination S3 path (
s3://my-destination-bucket/data/output/
). The output should reflect the processed data. - Review the audit logs in Privacera to confirm that the access events for both reading from the source S3 and writing to the destination S3 were recorded. This will validate that the data access was correctly managed and audited by Privacera.
- Prev topic: Prerequistes
- Next topic: Troubleshooting