Skip to main content

Privacera Documentation

Table of Contents

Secure Azure file via Azure SDK in Databricks notebook

This section describes how to use the Azure SDK for PrivaceraCloud to access Azure Data Storage/Datalake file data through a Privacera DataServer proxy.

The following commands must be run in a Databricks notebook.

  1. Install the Azure SDK libraries:

    pip install azure-storage-file-datalake
  2. Import the required libraries:

    import os, uuid, sys
    from azure.storage.filedatalake import DataLakeServiceClient
    from azure.core._match_conditions import MatchConditions
    from azure.storage.filedatalake._models import ContentSettings
  3. Initialize the account storage through connection string method:

    def initialize_storage_account_connect_str(my_connection_string):
        
        try:  
            global service_client
            print(my_connection_string)
       
            service_client = DataLakeServiceClient.from_connection_string(conn_str=my_connection_string, headers={'x-ms-version': '2020-02-10'})
        
        except Exception as e:
            print(e)
  4. Prepare the connection string:

    def prepare_connect_str():
        try:
            
            connect_str = "DefaultEndpointsProtocol=https;AccountName=${privacera_access_key}-{storage_account_name};AccountKey=${base64_encoded_value_of(privacera_access_key|privacera_secret_key)};BlobEndpoint=https://ds.privaceracloud.com;"
            
           # sample value is shown below
           #connect_str = "DefaultEndpointsProtocol=https;AccountName=MMTTU5Njg4Njk0MDAwA6amFpLnBhdGVsOjE6MTY1MTU5Njg4Njk0MDAw==-pqadatastorage;AccountKey=TVRVNUTU5Njg4Njk0MDAwTURBd01UQTZhbUZwTG5CaGRHVnNPakU2TVRZMU1URTJOVGcyTnpVMTU5Njg4Njk0MDAwVZwLzNFbXBCVEZOQWpkRUNxNmpYcjTU5Njg4Njk0MDAwR3Q4N29UNFFmZWpMOTlBN1M4RkIrSjdzSE5IMFZic0phUUcyVHTU5Njg4Njk0MDAwUxnPT0=;BlobEndpoint=https://ds.privaceracloud.com;"
    
            return connect_str
        except Exception as e:
          print(e)
  5. Define a sample access method to get Azure file and directories:

    def list_directory_contents(connect_str):
        try:
            initialize_storage_account_connect_str(connect_str)
            
            file_system_client = service_client.get_file_system_client(file_system="{storage_container_name}")
            #sample values as shown below
            #file_system_client = service_client.get_file_system_client(file_system="infraqa-test")
    
            paths = file_system_client.get_paths(path="{directory_path}")
            #sample values as shown below
            #paths = file_system_client.get_paths(path="file data/data/format=csv/sample/")
    
            for path in paths:
                print(path.name + '\n')
    
        except Exception as e:
          print(e)
  6. To verify that the proxy is functioning, call the access methods:

    connect_str = prepare_connect_str()
    list_directory_contents(connect_str)