Skip to content

Databricks Clusters BOTO3 Use-Case

This section describes how to use the AWS SDK (Boto3) to enforce access control on AWS S3 file data through a Privacera Dataserver proxy.

Prerequisites

  • Ensure that you have an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
  • Databricks should be connected to Privacera Manager.
  • Obtain the Access Key and Secret Key from Privacera Manager using one of the following methods:
    • To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Token -> GENERATE TOKEN
  • URL endpoint for the Privacera Manager Dataserver.
For FGAC Clusters

If your DataServer uses an external port other than 443 (e.g., port 8282), follow these steps:

  • Modify the Databricks init script:
    • Add the necessary iptables configuration to allow outgoing connections on the specified port.
    • Example: To enable boto3 access on port 8282, run the following command:
      Bash
      sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT
      
  • Restart the Databricks cluster to apply the changes.
  • Please ensure that you possess an existing Databricks account with login credentials that grant sufficient privileges to manage your Databricks cluster.
  • Databricks should be connected to PrivaceraCloud.
  • Obtain the Access Key and Secret Key from PrivaceraCloud using one of the following methods:
    • To generate a new Privacera token, navigate to the Privacera portal and go to Launch Pad -> Privacera Tokens -> GENERATE TOKEN
    • To use an valid existing token, navigate to the Privacera portal and go to Launch PadSetup AWS CliDOWNLOAD TOKEN
  • URL endpoint of the PrivaceraCloud Dataserver.

Setup

Following commands needs to be run in a databricks notebook

  • Install the aws boto3 libraries
    Python
    pip install boto3
    
  • Import the required libraries
    Python
    import boto3
    
  • Access the S3 files
    Python
    def check_s3_file_exists(bucket, key, access_key, secret_key, endpoint_url, dataserver_cert, region_name):
      exec_status = False
      access_key = access_key
      secret_key = secret_key
      endpoint_url = endpoint_url
      try:
        s3 = boto3.resource(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)
        print(s3.Object(bucket_name=bucket, key=key).get()['Body'].read().decode('utf-8'))
        exec_status = True
      except Exception as e:
        print("Got error: {}".format(e))
      finally:
        return exec_status  
    
    def read_s3_file(bucket, key, access_key, secret_key, endpoint_url, dataserver_cert, region_name):
      exec_status = False
      access_key = access_key
      secret_key = secret_key
      endpoint_url = endpoint_url
      try:
        s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)
        obj = s3.get_object(Bucket=bucket, Key=key)
        print(obj['Body'].read().decode('utf-8'))
        exec_status = True
      except Exception as e:
        print("Got error: {}".format(e))
      finally:
        return exec_status
    
    readFilePath =  "your-file-path"
    bucket = "your-bucket-name"
    access_key = "${privacera_access_key}"
    secret_key = "${privacera_secret_key}"
    endpoint_url = "endpoint-url"
    dataserver_cert = ""
    region_name = "your-bucket-region-name"
    print(f"got file===== {readFilePath} ============= bucket= {bucket}")
    status = check_s3_file_exists(bucket, readFilePath, access_key, secret_key, endpoint_url, dataserver_cert, region_name)
    
Make sure to replace following values with your actual values.
  • your-file-path
  • your-bucket-name
  • ${privacera_access_key}
  • ${privacera_secret_key}
  • endpoint-url
  • your-bucket-region-name
  • Without read permission to the S3 path:

    • Getting error like below:
      Bash
      Got error: An error occurred (403) when calling the GetObject operation: Forbidden
      
  • With read permission to the S3 path:

    • Able to see output like below:
      Bash
      got file===== your-file-path ============= bucket= your-bucket-name
      file content
      

Comments