Using Boto3 from Databricks Cluster with FGAC¶

Prev Connector Guide

Databricks Clusters BOTO3 Use-Case¶

This section describes how to use the AWS SDK (Boto3) to enforce access control on AWS S3 file data through a Privacera Dataserver Proxy.

Prerequisites¶

Self ManagedPrivaceraCloud

Make sure to check with your Databricks administrator regarding Privacera DataServer port before proceeding with the following steps.

Ensure that you have an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
Databricks should be connected to Privacera Manager.
Obtain the Access Key and Secret Key from Privacera Manager using one of the following methods:
- To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Token -> GENERATE TOKEN
URL endpoint for the Privacera Manager Dataserver.

Please ensure that you possess an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
Databricks should be connected to PrivaceraCloud.
Obtain the Access Key and Secret Key from PrivaceraCloud using one of the following methods:
- To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Tokens -> GENERATE TOKEN
- To use an valid existing token, navigate to the Privacera portal and go to Launch Pad → Setup AWS Cli → DOWNLOAD TOKEN
URL endpoint of the PrivaceraCloud Dataserver.

Setup¶

Follow the steps recommended by Databricks to install the Boto3 library in your Databricks cluster.

Here are the steps for your reference

Python
pip install boto3

In your Databricks notebook, you can use the following code to access S3 files using Boto3:

Import the required libraries

Python
1	`import boto3`

Access the S3 files

Make sure to replace following values with your actual values.

Text Only
- `ACCESS_KEY`
- `SECRET_KEY`
- `PRIVACERA_DATASERVER_ENDPOINT_URL`
- `YOUR_BUCKET_REGION_NAME`

- `BUCKET_NAME`
- `OBJECT_PATH`

Make sure the bucket name and file path are correct and the file exists

Python
# Update the following values with your actual values
access_key = "<ACCESS_KEY>"
secret_key = "<SECRET_KEY>"
endpoint_url = "<PRIVACERA_DATASERVER_ENDPOINT_URL>"
region_name = "<YOUR_BUCKET_REGION_NAME>"  # E.g. us-east-1
dataserver_cert = "" 

bucket = "<BUCKET_NAME>"  # e.g. your_org_privacera
readFilePath =  "<OBJECT_PATH>"  # e.g. privacera/test.csv

def check_s3_file_exists(bucket, key):
exec_status = False
try:
  s3 = boto3.resource(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)
  print(s3.Object(bucket_name=bucket, key=key).get()['Body'].read().decode('utf-8'))
  exec_status = True
except Exception as e:
  print("Got error: {}".format(e))
finally:
  return exec_status  

def read_s3_file(bucket, key):
exec_status = False
try:
  s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)
  obj = s3.get_object(Bucket=bucket, Key=key)
  print(obj['Body'].read().decode('utf-8'))
  exec_status = True
except Exception as e:
  print("Got error: {}".format(e))
finally:
  return exec_status

print(f"Accessing file===== {readFilePath} ============= bucket= {bucket}")
exists_status = check_s3_file_exists(bucket, readFilePath)
read_status = read_s3_file(bucket, readFilePath)

print(f"check_s3_file_exists status: {exists_status}, read_s3_file status: {read_status}")

Output

Without read permission to the S3 path you would get the following error

Bash
Accessing file===== your-file-path ============= bucket= your-bucket-name
check_s3_file_exists status: False, read_s3_file status: False

With appropriate read permission to the S3 path you would get the following output

Bash
Accessing file===== your-file-path ============= bucket= your-bucket-name
check_s3_file_exists status: True, read_s3_file status: True

Prev Connector Guide

Using Boto3 from Databricks Cluster with FGAC¶

Databricks Clusters BOTO3 Use-Case¶

Prerequisites¶

Setup¶

Comments