Databricks Clusters BOTO3 Use-Case¶
This section describes how to use the AWS SDK (Boto3) to enforce access control on AWS S3 file data through a Privacera Dataserver proxy.
Prerequisites¶
- Ensure that you have an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
- Databricks should be connected to Privacera Manager.
- Obtain the
Access Key
andSecret Key
from Privacera Manager using one of the following methods:- To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Token -> GENERATE TOKEN
URL endpoint
for the Privacera Manager Dataserver.
For FGAC Clusters
If your DataServer uses an external port other than 443
(e.g., port 8282
), follow these steps:
- Modify the Databricks init script:
- Add the necessary
iptables
configuration to allow outgoing connections on the specified port. - Example: To enable
boto3
access on port8282
, run the following command:Bash
- Add the necessary
- Restart the Databricks cluster to apply the changes.
- Please ensure that you possess an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
- Databricks should be connected to PrivaceraCloud.
- Obtain the
Access Key
andSecret Key
from PrivaceraCloud using one of the following methods:- To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Tokens -> GENERATE TOKEN
- To use an valid existing token, navigate to the Privacera portal and go to Launch Pad → Setup AWS Cli → DOWNLOAD TOKEN
URL endpoint
of the PrivaceraCloud Dataserver.
Setup¶
Following commands needs to be run in a databricks notebook
- Install the aws boto3 libraries
Python - Import the required libraries
Python - Access the S3 files
Make sure to replace following values with your actual values.
your-file-path
your-bucket-name
${privacera_access_key}
${privacera_secret_key}
endpoint-url
your-bucket-region-name
-
Without read permission to the S3 path:
- Getting error like below:
Bash
- Getting error like below:
-
With read permission to the S3 path:
- Able to see output like below:
- Prev topic: Databricks
- Next topic: DataServer