Skip to content

Using Boto3 from Databricks Cluster with FGAC

Databricks Clusters BOTO3 Use-Case

This section describes how to use the AWS SDK (Boto3) to enforce access control on AWS S3 file data through a Privacera Dataserver Proxy.

Prerequisites

Make sure to check with your Databricks administrator regarding Privacera DataServer port before proceeding with the following steps.

  • Ensure that you have an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
  • Databricks should be connected to Privacera Manager.
  • Obtain the Access Key and Secret Key from Privacera Manager using one of the following methods:
    • To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Token -> GENERATE TOKEN
  • URL endpoint for the Privacera Manager Dataserver.
  • Please ensure that you possess an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
  • Databricks should be connected to PrivaceraCloud.
  • Obtain the Access Key and Secret Key from PrivaceraCloud using one of the following methods:
    • To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Tokens -> GENERATE TOKEN
    • To use an valid existing token, navigate to the Privacera portal and go to Launch PadSetup AWS CliDOWNLOAD TOKEN
  • URL endpoint of the PrivaceraCloud Dataserver.

Setup

Follow the steps recommended by Databricks to install the Boto3 library in your Databricks cluster.

Here are the steps for your reference
Python
pip install boto3

In your Databricks notebook, you can use the following code to access S3 files using Boto3:

Import the required libraries

Python
import boto3

Access the S3 files

Make sure to replace following values with your actual values.
Text Only
1
2
3
4
5
6
7
- `ACCESS_KEY`
- `SECRET_KEY`
- `PRIVACERA_DATASERVER_ENDPOINT_URL`
- `YOUR_BUCKET_REGION_NAME`

- `BUCKET_NAME`
- `OBJECT_PATH`

Make sure the bucket name and file path are correct and the file exists

Python
# Update the following values with your actual values
access_key = "<ACCESS_KEY>"
secret_key = "<SECRET_KEY>"
endpoint_url = "<PRIVACERA_DATASERVER_ENDPOINT_URL>"
region_name = "<YOUR_BUCKET_REGION_NAME>"  # E.g. us-east-1
dataserver_cert = "" 

bucket = "<BUCKET_NAME>"  # e.g. your_org_privacera
readFilePath =  "<OBJECT_PATH>"  # e.g. privacera/test.csv

def check_s3_file_exists(bucket, key):
exec_status = False
try:
  s3 = boto3.resource(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)
  print(s3.Object(bucket_name=bucket, key=key).get()['Body'].read().decode('utf-8'))
  exec_status = True
except Exception as e:
  print("Got error: {}".format(e))
finally:
  return exec_status  

def read_s3_file(bucket, key):
exec_status = False
try:
  s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)
  obj = s3.get_object(Bucket=bucket, Key=key)
  print(obj['Body'].read().decode('utf-8'))
  exec_status = True
except Exception as e:
  print("Got error: {}".format(e))
finally:
  return exec_status

print(f"Accessing file===== {readFilePath} ============= bucket= {bucket}")
exists_status = check_s3_file_exists(bucket, readFilePath)
read_status = read_s3_file(bucket, readFilePath)

print(f"check_s3_file_exists status: {exists_status}, read_s3_file status: {read_status}")

Output

  • Without read permission to the S3 path you would get the following error

    Bash
    Accessing file===== your-file-path ============= bucket= your-bucket-name
    check_s3_file_exists status: False, read_s3_file status: False
    

  • With appropriate read permission to the S3 path you would get the following output

    Bash
    Accessing file===== your-file-path ============= bucket= your-bucket-name
    check_s3_file_exists status: True, read_s3_file status: True
    

Comments