Discovery on Databricks
This topic covers the installation of Privacera Discovery on Databricks.
Configuration
-
SSH to the instance as USER.
-
Run the following commands.
cd ~/privacera/privacera-manager cp config/sample-vars/vars.discovery.databricks.yml config/custom-vars/ vi custom-vars/vars.discovery.databricks.yml
-
Add and provide the following details in custom-vars/vars.discovery.databricks.yml file if the Databricks plugin is not enabled. To configure Databricks plugin, see Configuration in Databricks Spark Fine-Grained Access Control Plugin (FGAC) (Python, SQL).
DATABRICKS_HOST_URL: "<PLEASE_UPDATE>" DATABRICKS_TOKEN: "<PLEASE_UPDATE>" DATABRICKS_WORKSPACES_LIST: - alias: DEFAULT databricks_host_url: "{{DATABRICKS_HOST_URL}}" token: "{{DATABRICKS_TOKEN}}"
-
Edit the following properties. For property details and description, refer to the Configuration Properties below.
DATABRICKS_DRIVER_INSTANCE_TYPE: "m5.xlarge" DATABRICKS_INSTANCE_TYPE: "m5.xlarge" DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true" DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12" DATABRICKS_DISCOVERY_INSTANCE_PROFILE: "arn:aws:iam::<ACCOUNT_ID>:instance-profile/<DATABRICKS_CLUSTER_IAM_ROLE>" DISCOVERY_AWS_CLOUD_ASSUME_ROLE: "true" DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN: "arn:aws:iam::<ACCOUNT_ID>:role/<DISCOVERY_IAM_ROLE>"
DATABRICKS_DRIVER_INSTANCE_TYPE: "Standard_DS3_v2" DATABRICKS_INSTANCE_TYPE: "Standard_DS3_v2" DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true" DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"
Note
PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL is no longer in use. The Discovery Databricks packages will be downloaded from PRIVACERA_BASE_DOWNLOAD_URL.
Configuration Properties
Property | Description | Example |
---|---|---|
DATABRICKS_DRIVER_INSTANCE_TYPE |
For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge" For Azure driver's instance type can be "Standard_DS3_v2" |
m5.xlarge |
DATABRICKS_INSTANCE_TYPE |
For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge" For Azure driver's instance type can be "Standard_DS3_v2" |
m5.xlarge |
SETUP_DATABRICKS_JAR | ||
USE_DATABRICKS_SPARK | ||
DATABRICKS_ELASTIC_DISK | ||
DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT | Set to true if you want to create databricks init script. | false |
DATABRICKS_DISCOVERY_WORKERS | ||
DATABRICKS_DISCOVERY_JOB_NAME | ||
DATABRICKS_DISCOVERY_SPARK_VERSION |
Spark version can be as follows:
|
7.3.x-scala2.12 |
DATABRICKS_DISCOVERY_INSTANCE_PROFILE |
Property is used for the instance role, for the Databricks instance node where your discovery will be running |
arn:aws:iam::1234564835:instance-profile/privacera_databricks_cluster_iam_role |
DISCOVERY_AWS_CLOUD_ASSUME_ROLE |
Property to grant Discovery access to AWS services to perform the scanning operation. |
true |
DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN |
ARN of the AWS IAM Role |
arn:aws:iam::12345671758:role/DiscoveryCrossAccAssumeRole_k |