Skip to main content

Privacera Documentation

Table of Contents

Other Boto3/Pandas examples to secure S3 in Databricks notebook with PrivaceraCloud

This section presents the following example programs using the AWS SDK (Boto3) and the Python Data Analysis Library (Pandas) for PrivaceraCloud to secure S3 file data:

  • Read an S3 file

  • Read an S3 file and write it to another location

These examples are intended to be run in a Databricks notebook.

Prerequisites

Make sure you have the following ready:

  • Your Databricks datasource has been connected to PrivaceraCloud. See Connect Databricks to PrivaceraCloudConnect Databricks to PrivaceraCloud

  • Your PrivaceraCloud API key and URL endpoint for use in the program. See API Key on PrivaceraCloud. In the program, this is shown as ${privacera_endpoint_url}.

  • At least one Privacera resource policy that you want to associate with the S3 path and enforce via this program. For a description, see Configure AWS S3 resource policies.

  • The examples assume you have a user whose policy allows access to the S3 data and another user who does not have that policy.

  • The name of the AWS region where your data is. In the programs, this is shown as ${aws_region}.

  • The path and name of a data file on S3 whose access you want to control. In the example programs, this file is called sample.csv.

Create and run programs in a Databricks notebook

Install and upgrade the AWS Boto3 libraries:

pip install --upgrade boto3

Read the S3 file using Boto3

import boto3

access_key = "${privacera_access_key}"
secret_key =  "${privacera_secret_key}"

endpoint_url = "${privacera_endpoint_url}"
region_name = "${aws_region}"

# Create S3 client from this session
s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)

bucket = "infraqa-test-bubble"
key = "file data/data/format=csv/sample/sample.csv"

obj = s3.get_object(Bucket=bucket, Key=key)

print(obj['Body'].read().decode('utf-8'))

A user whose Privacera policy allows access successfully sees the desired data, as in this example:

EMP_SSN,CC,FIRST_NAME,LAST_NAME,ADDRESS,ZIPCODE,EMAIL,US_PHONE_FORMATTED
749-51-0571,341675287917956,Travis,Zyllah,939 Park Avenue,85297,Aliss.HENDRICK9807@gmail.com,(484) 363-0385
214-99-5552,372017749988022,CHRYSTTOPHER,zylman,8380 Andell
.
.
.

A user who does not have a Privacera policy for access sees a failure message similar to the following. Notice HTTP status code 403 Forbidden.

ClientError: An error occurred (403) when calling the GetObject operation: Forbidden

Read an S3 file with Pandas

import boto3
import pandas as pd

access_key = "${privacera_access_key}"
secret_key =  "${privacera_secret_key}"

endpoint_url = "${privacera_endpoint_url}"
region_name = "${aws_region}"

# Create S3 client from this session
s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)

bucket = "infraqa-test-bubble"
key = "file data/data/format=csv/sample/sample.csv"

response = s3.get_object(Bucket=bucket, Key=key)
emp_df = pd.read_csv(response['Body'])
print(emp_df.head(2)) 
 

Write a copy of a file to a different path

import boto3
from io import StringIO

access_key = "${privacera_access_key}"
secret_key =  "${privacera_secret_key}"

endpoint_url = "${privacera_endpoint_url}"
region_name = "${aws_region}"

# Create S3 resource from this session
s3 = boto3.resource(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name)

csv_buffer = StringIO()
emp_df.to_csv(csv_buffer, sep="|", index=False)

# Write buffer to S3 object
s3.Object("infraqa-test-bubble","file data/output/format=csv/sample/sales_data/out/sample.csv").put(Body=csv_buffer.getvalue())

A user whose Privacera policy allows access sees a success message similar to the following. Notice HTTP status code 200 Success.

Out[6]: {'ResponseMetadata': {'RequestId': 'H40YEBKWX0QSWRC0',
  'HostId': 'Vjt46vIFn1wP4wYzKC5XveBmJKO+g1/y91T17iZdJrBwwJYyyFje04+cPhtNcB1sOmTDATDzJ3o=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Tue, 31 Jan 2023 18:24:18 GMT',
  .
  .
  .

A user who does not have a Privacera policy for access sees a failure message similar to the following. Notice HTTP status code 403 Forbidden.

ClientError: An error occurred (403) when calling the GetObject operation: Forbidden

Audit records of access success or failure

For each of the successful or failed attempts to work with the files in these program examples, Privacera records an audit record. See Examples of audit search.