Skip to main content

Privacera Platform

Appendix

:

AWS topics

AWS CLI
Enable AWS CLI
  1. In the Privacera Portal, click LaunchPad from the left menu.

  2. Under the AWS Services section, click image63.jpgto open the AWS CLI dialog. This dialog provides the means to download an AWS CLI setup script specific to your installation. It also provides a set of usage instructions.

  3. In AWS CLI, under Configure Script, click Download Script to save the script on your local machine. If you will be running AWS CLI on another system such as a 'jump server' copy it to that host.

    1. Alternatively, use 'wget' to pull this script down to your execution platform, as shown below. Substitute your installation's Privacera Platform host domain name or IPv4 address for "<PRIVACERA_PORTAL_HOST>".

      wget http://<PRIVACERA_PORTAL_HOST>:6868/api/cam/download/script -O privacera_aws.sh
      # USE THE "--no-check-certificate" option for HTTPS - and remove the # below
      # wget --no-check-certificate https://<PRIVACERA_PORTAL_HOST>:6868/api/cam/download/script -O privacera_aws.sh
      
    2. Copy the downloaded script to home directory.

      cp privacera_aws.sh ~/
      cd ~/
      
    3. Set this file to be executable:

      chmod a+x . ~/privacera_aws.sh
      
  4. Under the AWS Cli Generate Token section, first, generate a platform token.

    Note

    All the commands should be run with a space between the dot (.) and the script name (~/privacera_aws.sh).

    1. Run the following command:

      . ~/privacera_aws.sh --config-token
      
    2. Select/check Never Expired to generate a token that does not expire. Click Generate.

  5. Enable the Proxy or the endpoint and run one of the two commands shown below.

    . ~/privacera_aws.sh --enable-proxy
    

    or:

    . ~/privacera_aws.sh --enable-endpoint
    
  6. Under the Check Status section, run the command below.

    . ~/privacera_aws.sh --status
    
  7. To disable both the proxy and the endpoint, under the AWS Access section, run the commands shown below.

    . ~/privacera_aws.sh --disable-proxy
    . ~/privacera_aws.sh --disable-endpoint
    
AWS CLI Examples

Get Database

aws glue get-databases --region ca-central-1
aws glue get-databases --region us-west-2

Get Catalog Import Status

aws glue get-catalog-import-status --region us-west-2

Create Database

aws glue create-database --cli-input-json '{"DatabaseInput":{"CreateTableDefaultPermissions": \[{"Permissions": \["ALL"\],"Principal": {"DataLakePrincipalIdentifier": "IAM\_ALLOWED\_PRINCIPALS"}}\],"Name":"qa\_test","LocationUri": "s3://daffodil-us-west-2/privacera/hive\_warehouse/qa\_test.db"}}' --region us-west-2 --output json

Create Table

aws glue create-table --database-name qa\_test --table-input file://tb1.json --region us-west-2

A tb1.json file should be created by the user on the location where the create table command will be executed. Sample json file:

                            {
    "    ""Name":"tb1",
    "    ""Retention":0,
    "    ""StorageDescriptor":{
        "        ""Columns":"\\"[
            "          "{
                "            ""Name":"CC",
                "            ""Type":"string""        "
            },
            "        "{
                "Name":"FST\\_NM",
                "            ""Type":"string""        "
            },
            "        "{
                "            ""Name":"LST\\_NM",
                "            ""Type":"string""        "
            },
            "        "{
                "            ""Name":"SOC\\_SEC\\_NBR",
                "            ""Type":"string""        "
            }"        \\"
        ],
        "        ""Location":"s3://daffodil-us-west-2/data/sample\\_parquet/index.html",
        "        ""InputFormat":"org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
        "        ""OutputFormat":"org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
        "        ""Compressed":false,
        "        ""NumberOfBuckets":0,
        "        ""SerdeInfo":{
            "            ""SerializationLibrary":"org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
            "            ""Parameters":{
                "                ""serialization.format":"1""            "
            }"        "
        },
        "        ""SortColumns":"\\"[
            "\\"
        ],
        "        ""StoredAsSubDirectories":"false

        "
    },
    "    ""TableType":"EXTERNAL\\_TABLE",
    " ""Parameters":{
        "        ""classification":"parquet""    "
    }
    }

Delete Table

aws glue delete-table --database-name qa\_db --name test --region us-west-2
aws glue delete-table --database-name qa\_db --name test --region us-east-1
aws glue delete-table --database-name qa\_db --name test --region ca-central-1
aws glue delete-table --database-name qa\_db --name test --region ap-south-1

Delete Database

aws glue delete-database --name qa\_test --region us-west-2
aws glue delete-database --name qa\_test --region us-east-1
aws glue delete-database --name qa\_test --region ap-south-1
aws glue delete-database --name qa\_test --region ca-central-1
AWS Kinesis - CLI Examples

CreateStream:

aws kinesis create-stream --stream-name SalesDataStream --shard-count 1 --region us-west-2

Put Record:

aws kinesis put-records --stream-name SalesDataStream --records Data=name,PartitionKey=partitionkey1 Data=sales\_amount,PartitionKey=partitionkey2 --region us-west-2

Read Record:

aws kinesis list-shards --stream-name SalesDataStream --region us-west-2
#Copy Shard id from above command output.
aws kinesis get-shard-iterator --stream-name SalesDataStream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --region us-west-2

#Copy Shard Iterator from above command output.

aws kinesis get-records --shard-iterator AAAAAAAAAAG13t9nwsYft2p0IDF8qJOVh/Dc69RXm5v+QEqK4AW0CUlu7YmFChiV5YtyMzqFvourqhgHdANPxa7rjduAiIOUUwgaBNjJuc67SYeqZQLMgLosfQBiF6BeRQ+WNzRkssCZJx7j3/W53kpH70GJZym+Qf73bvepFWpmflYCAlRuFUjpJ/soWUmO+2Q/R1rJCdFuyl3YvGYJYmBnuzzfDoR6cnPLI0sjycI3lDJnlzrC+A==

#Copy Data from above command output.

#We Received the encoded Data, Copy Data and Use it in Below Command.

echo <data> | base64 --decode

Kinesis Firehose

CreateDelivery Stream:

aws firehose create-delivery-stream --delivery-stream-name SalesDeliveryStream --delivery-stream-type DirectPut --extended-s3-destination-configuration "BucketARN=arn:aws:s3:::daffodil-data,RoleARN=arn:aws:iam::857494200836:role/privacera\_user\_role" --region us-west-2

Put Record:

aws firehose put-record --delivery-stream-name SalesDeliveryStream --record="{\\"Data\\":\\"Sales\_amount\\"}" --region us-west-2

Describe Delivery Stream:

aws firehose describe-delivery-stream --delivery-stream-name SalesDeliveryStream --region us-west-2
AWS DynamoDB CLI examples

create-table

aws dynamodb create-table \
  --attribute-definitions AttributeName=id,AttributeType=N AttributeName=country,AttributeType=S \
  --table-name SalesData --key-schema AttributeName=id,KeyType=HASH AttributeName=country,KeyType=RANGE \ 
  --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 --region us-west-2 \
  --output json

put-item

aws dynamodb put-item --table-name SalesData \
  --item '{"id": {"N": "3"},"country": {"S": "UK"},\
  "region": {"S": "EUl"},"city": {"S": "Rogerville"},"name": {"S": "Nigel"},\
  "sales_amount": {"S": "87567.74"}}'  \
  --region us-west-2

scan

aws dynamodb scan --table-name SalesData --region us-west-2
AWS IAM

When running in AWS, the Privacera Manager host virtual machine requires privileges in order to complete the deployment of Privacera Platform components. Additionally, once installed, Privacera Platform components will also require privileges to provide access to targeted data repositories and in order to execute. The specific access required will depend on the functions requested and the scope of data coverage requested.

AWS uses a 'policy/role/object' paradigm known as "AWS Identity and Access Management" (IAM) in order to assign and manage access and functionality rights. Roles and policies are both IAM objects, but are created and managed somewhat independently of each other. Access and rights are defined in one or more policies. Policies are then attached to Roles. A Role, may be attached to a user account or an instance. When attached to an instance the role is known as an instance profile.

Policies may be created using the AWS console, or aws command line. They can be represented, stored, imported, or exported in JSON format. This document contains a library of Policies. In this guide, to create a recommended policy, you will select a policy from the Privacera Manager policy library, import/copy it to the console, modify it to meet your specific enterprise requirements, and save it as a named policy.

In a subsequent step, you will attach one or more of these policies to a Role, and then to the Privacera Manager host.

AWS IAM role and attach policy(s)
  1. In AWS Console. Open IAM Services.

  2. Click Roles on the left side navigation and then click Create role

  3. Create Role: Choose a use case. Select 'EC2' use case. (This "Allows EC2 instances to call AWS services on 'your' behalf").

    Click Next: Permissions to transition to the next wizard page.

  4. Create Role: Attach permissions policies. Using Filter policies, search for the previously created policy (e.g. 'privacera_s3_all'). Select it, (click in the checkbox).

    Click Next: Tags.

  5. Create Role: Add tags. Optionally add a tag based on your enterprise resource tag standards. Click Next: Review.

  6. Create Role: Review. Enter a Role name such as 'privacera_s3_role'. Click Create role.

  7. Confirm the Role has been created by searching for it in the Role list.

AWS IAM create and attach policy
  1. In AWS Console. Open IAM Services.

  2. Click Policies on the left side navigation and then click Create Policy.

  3. Click on the JSON tab.

  4. Select a Policy from the list below, copy and paste it into the JSON edit box in the AWS Create policy dialog. Click Review policy (at the bottom of the page).

    Full S3 Access - All Buckets

                                  {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "DataServerS3FullAccess",
                "Effect": "Allow",
                "Action": "s3:*",
                "Resource": "*"
            }
        ]
    }
    

    Limited S3 Access - Limited Buckets

    Note

    (That in this example policy accessible buckets are represented as "&lt;PLEASE_ASSIGN_BUCKET_NAME_x&gt;". Assign or adjust this sample policy for your enterprise and selected controlled S3 buckets.)

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "DataServerS3Limited",
          "Effect": "Allow",
          "Action": [
            "s3:PutObject",
            "s3:GetObjectAcl",
            "s3:GetObject",
            "s3:ListBucket",
            "s3:DeleteObject",
            "s3:DeleteBucket",
            "s3:ListBucketMultipartUploads",
            "s3:GetBucketAcl",
            "s3:GetBucketPolicy",
            "s3:ListMultipartUploadParts",
            "s3:AbortMultipartUpload",
            "s3:GetBucketLocation",
            "s3:PutObjectAcl"
            ],
          "Resource": [
            "arn:aws:s3:::<PLEASE_ASSIGN_BUCKET_NAME_1>/*",
            "arn:aws:s3:::<PLEASE_ASSIGN_BUCKET_NAME_2>",
            "arn:aws:s3:::<PLEASE_ASSIGN_BUCKET_NAME_3>/*",
            "arn:aws:s3:::<PLEASE_ASSIGN_BUCKET_NAME_4>"
            ]
        },
        {
          "Sid": "DataServerS3ListAndCreateBucketAccess",
          "Effect": "Allow",
          "Action": [
            "s3:ListAllMyBuckets",
            "s3:HeadBucket",
            "s3:CreateBucket"
            ],
          "Resource": "*"
        }
      ]
    }
    

    DynamoDB Access

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "DataserverDynamoDBAccess",
          "Effect": "Allow",
          "Action": [
              "dynamodb:Query",
              "dynamodb:PutItem",
              "dynamodb:DeleteItem",
              "dynamodb:Scan",
              "dynamodb:UpdateItem",
              "dynamodb:CreateTable",
              "dynamodb:DescribeTable",
              "dynamodb:DeleteTable",
              "dynamodb:UpdateTable",
              "dynamodb:GetItem",
              "dynamodb:CreateBackup",
              "dynamodb:BatchGetItem",
              "dynamodb:BatchWriteItem",
              "dynamodb:TagResource",
              "dynamodb:UntagResource"
          ],
          "Resource": [
              "*"
          ]
        },
        {
          "Sid": "DataserverDynamoDBAccessListing",
          "Effect": "Allow",
          "Action": [
              "dynamodb:ListTables",
              "dynamodb:ListBackups"
          ],
          "Resource": "*"
        }
      ]
    }
    

    Kinesis Access

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "ManageStreams",
          "Effect": "Allow",
          "Action": [
            "kinesis:PutRecord",
            "kinesis:DeleteStream",
            "kinesis:DescribeStreamSummary",
            "kinesis:CreateStream",
            "kinesis:GetShardIterator",
            "kinesis:GetRecords",
            "kinesis:DescribeStream",
            "kinesis:PutRecords",
            "kinesis:AddTagsToStream",
            "kinesis:DecreaseStreamRetentionPeriod",
            "kinesis:IncreaseStreamRetentionPeriod",
            "kinesis:ListTagsForStream",
            "kinesis:RemoveTagsFromStream",
            "kinesis:RegisterStreamConsumer",
            "kinesis:DeregisterStreamConsumer",
            "kinesis:DescribeStreamConsumer",
            "kinesis:ListStreamConsumers",
            "kinesis:DisableEnhancedMonitoring",
            "kinesis:EnableEnhancedMonitoring",
            "kinesis:UpdateShardCount",
            "kinesis:MergeShards",
            "kinesis:SplitShard",
            "kinesis:StartStreamEncryption",
            "kinesis:StopStreamEncryption",
            "kinesis:ListShards"
          ],
          "Resource": "*"
        },
        {
          "Sid": "KinesisListing",
          "Effect": "Allow",
          "Action": [
            "kinesis:ListStreams",
            "kinesis:DescribeLimits"
          ],
          "Resource": "*"
        }
      ]
    }
    

    Firehose Access (Requires IAM Policy and Trust Relationship

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "DataserverKinesisFirehoseAccess",
          "Effect": "Allow",
          "Action": [
            "firehose:DescribeDeliveryStream",
            "firehose:DeleteDeliveryStream",
            "firehose:PutRecord",
            "firehose:CreateDeliveryStream",
            "firehose:UpdateDestination"
          ],
          "Resource": "*"
        },
        {
          "Sid": "DataserverKinesisFirehoseListingAccess",
          "Effect": "Allow",
          "Action": "firehose:ListDeliveryStreams",
          "Resource": "*"
        },
        {
          "Sid": "DataserverKinesisFirehosePassRoleAccess",
          "Effect": "Allow",
          "Action": [
            "iam:GetRole",
            "iam:PassRole"
          ],
          "Resource": "*"
        },
        {
          "Sid": "DataserverKinesisFirehoseKinesisAccess",
          "Effect": "Allow",
          "Action": [
            "kinesis:GetShardIterator",
            "kinesis:DescribeStream",
            "kinesis:GetRecords"
          ],
          "Resource": "*"
        },
        {
          "Sid": "DataserverKinesisS3Access",
          "Effect": "Allow",
          "Action": [
              "s3:PutObject",
              "s3:CreateBucket",
              "s3:ListBucket"
          ],
          "Resource": "*"
        }]
    }
    

    Add a Trust Relationship for Role "privacera-access-role':

                                  {
    "Sid": "DataserverKinesisAssumeRole",
    "Effect":"Allow",
    "Principal":{
        "Service":"firehose.amazonaws.com"
    },
    "Action":"sts:AssumeRole"
    }
    

    Lamda Access

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "DataserverLambdaManagementAccess",
          "Effect": "Allow",
          "Action": [
              "lambda:CreateFunction",
              "lambda:InvokeFunction",
              "lambda:GetEventSourceMapping",
              "lambda:GetFunction",
              "lambda:DeleteFunction",
              "lambda:DeleteEventSourceMapping"
          ],
          "Resource": [
              "*"
          ]
        },
        {
          "Sid": "DataserverLambdaManagementListing",
          "Effect": "Allow",
          "Action": [
              "lambda:ListFunctions",
              "lambda:ListEventSourceMappings",
              "lambda:CreateEventSourceMapping"
          ],
          "Resource": "*"
        },
        {
          "Sid": "DataserverLambdaKinesisStreamRead",
            "Effect": "Allow",
            "Action": [
                "kinesis:SubscribeToShard",
                "kinesis:DescribeStreamSummary",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords",
                "kinesis:DescribeStream"
            ],
          "Resource": "*"
        },
        {
          "Sid": "DataserverLambdaKinesisListing",
          "Effect": "Allow",
          "Action": [
              "kinesis:ListStreams",
              "kinesis:ListShards"
          ],
          "Resource": "*"
        },
        {
          "Sid": "DataserverLambdaS3BucketsListing",
          "Effect": "Allow",
          "Action": "s3:ListAllMyBuckets",
          "Resource": "*"
        }
      ]
    }
    

    Add a Trust Relationship for AWS IAM role 'privacera-access-role'

                                  {
    "Sid": "DataserverLambdaAssumeRole",
    "Effect":"Allow",
    "Principal":{
        "Service":"lambda.amazonaws.com"
    },
    "Action":"sts:AssumeRole"
    }
    

    Athena Access

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "DataServerAthenaAccess",
          "Effect": "Allow",
          "Action": [
            "athena:TagResource",
            "athena:UntagResource",
            "athena:StartQueryExecution",
            "athena:GetQueryResultsStream",
            "athena:DeleteWorkGroup",
            "athena:GetQueryResults",
            "athena:DeleteNamedQuery",
            "athena:UpdateWorkGroup",
            "athena:GetNamedQuery",
            "athena:CreateWorkGroup",
            "athena:ListTagsForResource",
            "athena:ListQueryExecutions",
            "athena:ListNamedQueries",
            "athena:GetWorkGroup",
            "athena:CreateNamedQuery",
            "athena:GetQueryExecution",
            "athena:StopQueryExecution",
            "athena:BatchGetNamedQuery",
            "athena:BatchGetQueryExecution"
          ],
          "Resource": [
            "arn:aws:athena:*:*:workgroup/primary"
          ]
        },
        {
          "Sid": "DataServerAthenaGlue",
          "Effect": "Allow",
          "Action": [
              "glue:CreateDatabase",
              "glue:DeleteDatabase",
              "glue:GetDatabase",
              "glue:GetDatabases",
              "glue:UpdateDatabase",
              "glue:CreateTable",
              "glue:DeleteTable",
              "glue:BatchDeleteTable",
              "glue:UpdateTable",
              "glue:GetTable",
              "glue:GetTables",
              "glue:BatchCreatePartition",
              "glue:CreatePartition",
              "glue:DeletePartition",
              "glue:BatchDeletePartition",
              "glue:UpdatePartition",
              "glue:GetPartition",
              "glue:GetPartitions",
              "glue:BatchGetPartition",
              "glue:GetCatalogImportStatus"
          ],
          "Resource": [
              "*"
          ]
        }
      ]
    }
    

    Glue Access

                                  {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "glue:CreateDatabase",
            "glue:DeleteDatabase",
            "glue:GetDatabase",
            "glue:GetDatabases",
            "glue:UpdateDatabase",
            "glue:CreateTable",
            "glue:DeleteTable",
            "glue:BatchDeleteTable",
            "glue:UpdateTable",
            "glue:GetTable",
            "glue:GetTables",
            "glue:BatchCreatePartition",
            "glue:CreatePartition",
            "glue:DeletePartition",
            "glue:BatchDeletePartition",
            "glue:UpdatePartition",
            "glue:GetPartition",
            "glue:GetPartitions",
            "glue:BatchGetPartition",
            "glue:GetCatalogImportStatus"
          ],
          "Resource": [
            "*"
          ]
        }
      ]
    }
    
  5. In Create policy: Review Policy, give each policy a descriptive name, such as 'privacera_s3_all_policy', or 'privacera_s3_limited_policy'. Suggested practice is to use 'privacera_' as a prefix for each policy created for Privacera Manager or Privacera Platform.

  6. Click Create policy at the bottom of the dialog.

Attach Policy to Privacera Host IAM Role

In your initial creation of the Privacera Host VM, you created a role, with the suggested name: "Privacera_PM_Role". Note that as this role is already attached to the Privacera Host virtual machine, it will convey any attached policy rights to the Privacera Host.

  1. If you are not already in the AWS console at IAM: Policies, open it now:

    1. In AWS Console. Open IAM Services.

    2. Click Policies on the left side navigation and then click Create Policy.

  2. Locate the Policy(s) to be attached by searching for each by name in the Create Policy dialog. (Use a substring such as "privacera" to find all with this name prefix.)

  3. Select a Policy to attach by clicking on the 'radio' button to the left of the policy name.

  4. Click the Policy actions menu at the top of this dialog. Select Attach. This will open the Attach policy dialog.

  5. Select the Privacera_PM_Role, and Attach Policy (at the bottom of the dialog). This will attach the policy to the Privacera_PM_Role, and those rights will be conveyed to the Privacera Manager Host virtual machine.

AWS IAM role and policy for Databricks
Add S3 IAM role to Databricks
  1. Login to Databricks and click on top-right menu.

  2. Click the Admin Console.

  3. Click the IAM Roles tab.

  4. Click the +Add IAM Role.

  5. Enter the Instance Profile ARN which you have created in step 1 Create IAM Role and Policy to Access S3 Bucket

    [image64.jpg]

    Databricks validates that this Instance Profile ARN is both syntactically and semantically correct. To validate semantic correctness, Databricks does a dry run by launching a cluster with this IAM role. Any failure in this dry run produces a validation error in the UI.

  6. Click Add.

  7. You can specify the users who can launch clusters with the IAM role. (Optional)

    [image65.jpg]

Launch Cluster with S3 IAM Role

  1. Login to Databricks and click the Clusters from left menu.

  2. Select or create a cluster.

  3. Expand the Advanced Options section, under Instances tab, select the IAM role from the IAM Role drop-down list. This drop-down includes all of the IAM roles that are available for the cluster.

    [image66.jpg]

PostgreSQL PolicySync
Lambda Setup for PostgreSQL Audits

This AWS Lambda function will send the audits from AWS CloudWatch to SQS queue.

Create an Audit policy

Create a policy to be attached while creating an AWS Lambda function (discussed below) to send audit information to the SQS Queue.

  1. Login to AWS Console and go to the Policies section from IAM Service.

  2. Click on Create Policy and go to the JSON tab.

  3. Copy the policy below and enter it in the JSON textbox.

                                     {
      "Version":"2012-10-17",
      "Statement":[
          {
            "Effect":"Allow",
            "Action":"logs:CreateLogGroup",
            "Resource":"arn:aws:logs:${REGION}:${ACCOUNT_ID}:*"
          },
          {
            "Effect":"Allow",
            "Action":[
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource":[
                "arn:aws:logs:${REGION}:${ACCOUNT_ID}:log-group:/aws/lambda/${LAMBDA_FUNCTION_NAME}:*"
            ]
          },
          {
            "Effect":"Allow",
            "Action":"sqs:SendMessage",
            "Resource":"arn:aws:sqs:${REGION}:${ACCOUNT_ID}:${SQS_QUEUE_NAME}"
          }
      ]
    }
    
  4. Click Review Policy.

  5. Enter a name for the policy. For example, privacera-postgres-audits-lambda-execution-policy.

  6. Click Create Policy.

Create an IAM Role

  1. In the AWS Console, go to Roles.

  2. Click Create Role.

  3. Select Lamda as the use case, and click Next Permissions.

  4. In Attach permi ssion policies, search for the policy created above, and select it.

  5. Click Next: Tags.

  6. Click Next: Review.

  7. Give a name for the role. For example, privacera-postgres-audits-lambda-execution-role.

  8. Click Create Role.

Create Lambda Function

  1. In the AWS Console, go to the Lambda service.

  2. Click Create Function.

  3. Configure the following in Basic Information:

    • Name: privacera-postgres-${RDS_CLUSTER_NAME}-audits

    • Runtime: Node.js 12.x

  4. In Choose or create an execution role, select Use existing role.

  5. Search and select for the IAM role created above.

  6. Click Create Function.

  7. In the Designer view, click Add Trigger.

    • Select CloudWatch Logs.

    • In the Log group, enter ${YOUR_RDS_LOG_GROUP}.

    • In the Filter name, add auditTrigger.

  8. Click Add.

  9. Access the Lambda Code Editor, and add the following code.

    // CloudWatch logs encoding
    var encoding = process.env.ENCODING || 'utf-8';  // default is utf-8
    var awsRegion = process.env.REGION || 'us-east-1';
    var sqsQueueURL = process.env.SQS_QUEUE_URL;
    var ignoreDatabase = process.env.IGNORE_DATABASE;
    var ignoreUsers = process.env.IGNORE_USERS;
    
    var ignoreDatabaseArray = ignoreDatabase.split(',');
    var ignoreUsersArray = ignoreUsers.split(',');
    
    // Import the AWS SDK
    const AWS = require('aws-sdk');
    
    // Configure the region
    AWS.config.update({region: awsRegion});
    
    exports.handler = function (event, context, callback) {
    
        var zippedInput = Buffer.from(event.awslogs.data, 'base64');
    
            zlib.gunzip(zippedInput, function (e, buffer) {
            if (e) {
                callback(e);
            }
    
            var awslogsData = JSON.parse(buffer.toString(encoding));
    
            // Create an SQS service object
            const sqs = new AWS.SQS({apiVersion: '2012-11-05'});
    
            console.log(awslogsData);
            if (awslogsData.messageType === 'DATA_MESSAGE') {
    
                // Chunk log events before posting
                awslogsData.logEvents.forEach(function (log) {
    
                    //// Remove any trailing \n
                    console.log(log.message)
    
                    // Checking if message falls under ignore users/database
                    var sendToSQS = true;
    
                    if(sendToSQS) {
    
                        for(var i = 0; i < ignoreDatabaseArray.length; i++) {
                          if(log.message.toLowerCase().indexOf("@" + ignoreDatabaseArray[i]) !== -1) {
                                sendToSQS = false;
                                break;
                          }
                        }
                    }
    
                    if(sendToSQS) {
    
                        for(var i = 0; i < ignoreUsersArray.length; i++) {
                          if(log.message.toLowerCase().indexOf(ignoreUsersArray[i] + "@") !== -1) {
                                sendToSQS = false;
                                break;
                          }
                        }
                    }
    
                    if(sendToSQS) {
    
                        let sqsOrderData = {
                            MessageBody: JSON.stringify(log),
                            MessageDeduplicationId: log.id,
                            MessageGroupId: "Audits",
                            QueueUrl: sqsQueueURL
                        };
    
                        // Send the order data to the SQS queue
                        let sendSqsMessage = sqs.sendMessage(sqsOrderData).promise();
    
                        sendSqsMessage.then((data) => {
                            console.log("Sent to SQS");
                        }).catch((err) => {
                            console.log("Error in Sending to SQS = " + err);
                        });
    
                    }
                });
            }
        });
    };
    
  10. In the Code Editor of the Lambda code, go to Environment Variables > Manage Environment Variables > Add environment variables and set the following variables.

    • REGION: ${REGION}

    • SQS_QUEUE_URL: ${SQS_QUEUE_URL}

    • IGNORE_DATABASE: ${POSTGRESQL_DB}

    • IGNORE_USERS: ${POSTGRES_ADMIN_USER}

  11. Click Save. It saves the environment variables.

  12. In Designer view, click Save.

IAM Role for EC2

Create the following IAM Policy and associate it with the IAM role attached to the EC2 instance where PolicySync is installed.

                           {
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "sqs:DeleteMessage",
            "sqs:GetQueueUrl",
            "sqs:ListDeadLetterSourceQueues",
            "sqs:ReceiveMessage",
            "sqs:GetQueueAttributes"
         ],
         "Resource":"${SQS_QUEUE_ARN}"
      },
      {
         "Effect":"Allow",
         "Action":"sqs:ListQueues",
         "Resource":"*"
      }
   ]
}
Configure S3 for real-time scanning
Configure S3 for real-time scanning

To configure S3 Buckets for enabling realtime scan, use the following steps:

  1. Configure event notification from S3 bucket to SQS Queue.

    1. Login to AWS console and go to S3 service.

    2. Navigate to your bucket which needs to be realtime scanned.

    3. Under Properties tab, navigate to the Event Notifications section and choose Create event notification.

    4. In the Event name, enter a name.

    5. For real-time scanning - In the Event types section, select PUT, POST, COPY, Multipart upload completed, and All object delete events. You will receive notifications for these event types.

      For AWS S3 tag sync - In the Event types --> Object Tagging section, select Object tags added and Object tags deleted. You will receive notifications for these event types.

    6. Select Destination type as SQS Queue, and then, from the dropdown list, choose the SQS queue. If SQS queue was auto-created by PM, then the name will be prefixed by privacera_bucket_sqs_ along with your environment name {{DEPLOYMENT_ENV_NAME}}.

    7. Click Save Changes.

    Related Information

    Refer the AWS documentation for a detailed information on event notifications, click here.

  2. Apply access policy in SQS Queue to allow S3 bucket to send events.

    1. Navigate to SQS Queue and select the queue on which the access policy is to be applied.

    2. Provide the correct Access Policy to SQS queue, so that S3 is allowed to put events into SQS queue. Refer to the following example to apply access policy:

         {"Version":"2012-10-17","Id":"arn:aws:sqs:{region_name}:{account_id}:{sqs_queue_name}/SQSDefaultPolicy","Statement":[{"Sid":"AllowS3Notify1","Effect":"Allow","Principal":{"AWS":"*"},"Action":"SQS:SendMessage","Resource":"arn:aws:sqs:{region_name}:{account_id}:{sqs_queue_name}","Condition":{"ArnLike":{"aws:SourceArn":"arn:aws:s3:*:*:{s3_bucket_to_be_scanned}"}}}]}
      

    Related Information

    Refer the AWS documentation for a detailed information on SQS access policy, click here.

Enable AWS S3 tag sync

Install Docker and Docker compose (AWS-Linux-RHEL)
  1. Log on to your Privacera host as ec2-user or a user with 'sudo' privileges.

  2. Install Docker:

    (a) Use 'yum' to obtain Docker;

    (b) Reconfigure limits;

    (c) Start Docker Service

    (d) Add your root user (e.g. 'ec2-user') to the docker group.

    (e) Exit

    From the command prompt execute the following:

    sudo yum install -y docker
    sudo sed -i 's/1024:4096/1024000:1024000/g' /etc/sysconfig/docker
    sudo cat /etc/sysconfig/docker
    sudo service docker start
    sudo usermod -a -G docker ec2-user
    exit
    
  3. Log back into the same user account as in step 1. (This forces the usermod action).

  4. Install Docker-Compose:

    1. Set the requested Docker Compose version;

    2. Download docker-compose

    3. Set access to eXecutable

      DOCKER_COMPOSE_VERSION="1.23.2"
      sudo curl -L https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
      sudo chmod +x /usr/local/bin/docker-compose
      
AWS S3 MinIO quick setup
AWS S3 MinIO quick setup
Docker
  1. SSH to instance as ${USER}.

  2. Download and run the minio setup script.

    sudo su
    mkdir -p /tmp/downloads
    cd /tmp/downloads
    wget https://privacera.s3.amazonaws.com/public/run_minio.sh -O run_minio.sh
    chmod a+x run_minio.sh
    ./run_minio.sh
    
  3. Check the minio service is up and running.

    docker ps | grep minio
    exit
    
  4. Open the minio browser. For example, http://192.468.12.412:9000/minio/

    http://${MINIO SERVER HOST}:${MINIO SERVER PORT}/minio/

  5. Enter the login credentials. Get the Access Key and Secret Key from your System Administrator.

    login_id: ${MINIO_ACCESS_KEY} password: ${MINIO_SECRET_KEY}

  6. Click image70.jpg button to create a new bucket. Give a name. For example, minio-s3-1.

    The list of buckets is displayed on the left.

    image71.jpg
Cross account IAM role for Databricks
Cross account IAM role for Databricks

If a Databricks instance and AWS EC2 instance are running in two different accounts, then a cross account role is required for the Databricks instance to access the EC2 instance and the other resources.

The following is an example of a cross account IAM role for Databricks:

{"Version":"2012-10-17","Statement":{"Effect":"Allow","Action":"sts:AssumeRole","Resource":"arn:aws:iam::123456789012:role/IAM_role_attached_to_EC2"}}
Integrate Privacera services in separate VPC

In some network topologies, the systems that Privacera needs to work with (such as Databricks or other data source) might be in a Virtual Private Cloud (VPC) that is separate from the VPC where Privacera runs. This separate VPC might be behind a required firewall that must not be changed due to security requirements.

This network configuration needs some manual steps to configure Privacera properties to use a private link between those separate VPCs and certain Privacera services. The affected Privacera services are as follows:

  • Privacera Ranger for installed plugins to retrieve policies.

  • Privacera Audit Server for installed plugins to push audits data.

  • Privacera Data Server for the Privacera Signed URL feature.

Prerequisites
  • You have already installed Privacera Manager.

  • You have identified the VPCs that must be linked.

  • The load balancer between the VPCs must be a Network Load Balancer (NLB), not a classic load balancer, which is not sufficiently performant for this network topology.

Steps

The details here explain the manual steps needed to to configure certain properties to allow a private link between Privacera and those VPC-protected systems.

Configure Privacera Ranger Load Balancer Properties
  1. Create a Ranger configuration directory:

    cd ~/privacera/privacera-manager
    mkdir -p config/custom-vars/ranger-admin
    
  2. Edit a Privacera Ranger properties configuration file to add the following lines:

    vi config/custom-vars/ranger-admin/ranger-service.yml
    
    metadata:
    
      annotations:
    
        service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
        service.beta.kubernetes.io/aws-load-balancer-type: 'nlb'
    
  3. Save the file.

Configure Privacera Audit Server Load Balancer Properties
  1. Create an Audit Server configuration directory:

    cd ~/privacera/privacera-manager
    mkdir -p config/custom-vars/auditserver
    
  2. Edit a Privacera Ranger Audit Server configuration file to add the following lines:

    vi config/custom-vars/auditserver/auditserver-service.yml
    
    metadata:
    
      annotations:
    
        service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
        service.beta.kubernetes.io/aws-load-balancer-type: 'nlb'
    
  3. Save the file.

Configure Privacera Data Server Load Balancer Properties
  1. Create a Data Server configuration directory:

    cd ~/privacera/privacera-manager
    mkdir -p config/custom-vars/dataserver
    
  2. Edit a Privacera Data Server configuration file to add the following lines:

    vi config/custom-vars/dataserver/dataserver-service.yml
    
    metadata:
    
      annotations:
    
        service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
        service.beta.kubernetes.io/aws-load-balancer-type: 'nlb'
    
  3. Save the file.

Update Privacera Manager
cd ~/privacera/privacera-manager
./privacera-manager.sh update
Securely access S3 buckets ssing IAM roles
Create IAM Role and Policy to Access S3 Bucket
  1. Login to AWS console and go to IAM service.

  2. Click the Roles from left menu and then click the Create role.

    • Under Select type of trusted entity, select the AWS service.

    • Under Choose a use case (that will use this role), select the EC2.

    image270.jpg
    • Click the Next: Permissions.

    • Click the Next: Tags.

    • Click the Next: Review.

    • Enter the Role name.

    • Click the Create role. The Role is created successfully. You will be nagivated to create role page with newly created role.

  3. In the role list, click on the newly created role. Now, let’s add an inline policy to the role. This policy grants access to the S3 bucket.

    • Under Permissions tab, click the + Add inline policy.

    • Click the JSON tab

    • Copy the below policy and set ${s3_bucket_name} to the name of your bucket. Note: The policy in the JSON can be changed as per your requirement.

    {
        "Version": "2012-10-17", "Statement": [
          {
            "Effect": "Allow", "Action": [
              "s3:ListBucket"
            ], "Resource": [
              "arn:aws:s3:::${s3_bucket_name}"
            ]
          }, {
            "Effect": "Allow", "Action": [
              "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:PutObjectAcl"
            ], "Resource": [
              "arn:aws:s3:::${s3_bucket_name}/*"
            ]
          }
        ]
      }
  4. Click the Review Policy.

  5. Enter the Policy name.

  6. Click the Create policy.

  7. In the role summary, copy the Instance Profile ARNs.

image271.jpg
Add S3 IAM Role to EC2 IAM Role
  1. Login to AWS console and go to IAM service.

  2. Click the Policies from left menu and then click the Create policy.

  3. Click the JSON tab.

    • Copy the below policy and update ${iam_role_for_s3_access} with the role which you have created in above section (Create IAM Role and Policy to Access S3 Bucket).

    {
        "Version": "2012-10-17", "Statement": [
            {
                "Effect": "Allow", "Action": [
                    "iam:PassRole"
                ], "Resource": "arn:aws:iam::xxxxxxxxxxxx:role/${iam_role_for_s3_access}"
            }
        ]
    }
  4. Click the Review pol icy.

  5. Enter the Policy name.

  6. Click the Create policy. Now attach this policy to IAM role which was used to create the Databricks instance.

Add S3 IAM Role to Databricks
  1. Login to Databricks and click on top-right menu.

  2. Click the Admin Console.

  3. Click the IAM Roles tab.

  4. Click the +Add IAM Role.

  5. Enter the Instance Profile ARN which you have created in step 1 Create IAM Role and Policy to Access S3 Bucket

    image272.jpg

    Databricks validates that this Instance Profile ARN is both syntactically and semantically correct. To validate semantic correctness, Databricks does a dry run by launching a cluster with this IAM role. Any failure in this dry run produces a validation error in the UI.

  6. Click Add.

  7. You can specify the users who can launch clusters with the IAM role. (Optional)

    image273.jpg
Launch Cluster with S3 IAM Role
  1. Login to Databricks and click the Clusters from left menu.

  2. Select or create a cluster.

  3. Expand the Advanced Options section, under Instances tab, select the IAM role from the IAM Role drop-down list. This drop-down includes all of the IAM roles that are available for the cluster.

    image274.jpg
Multiple AWS account support in Dataserver using Databricks
Multiple AWS Account Support in Dataserver Using Databricks

You want to run Spark queries in Databricks to query data available in buckets which are in multiple AWS Accounts.

To achieve this, configure Privacera Dataserver that will maintain a Security Configuration JSON which will contain IAM Roles mapped to bucket names.

To configure the above, click here.

Now, try the query from Databricks to access buckets in multiple accounts.

Multiple AWS S3 IAM role support in Dataserver
Multiple AWS S3 IAM role support in Dataserver

Dataserver supports configuring IAM Role which will be assumed to send the request to AWS S3 including in bucket level.

You want to run Spark queries in Databricks to query data available in buckets which are in multiple AWS accounts. Mutliple IAM role support in Dataserver will solve the problem by mapping buckets into specific IAM roles.

For each query, Privacera Dataserver will -

  1. Extract bucketName from the request.

  2. Find the IAM Role to be assumed from the mapping property, DATASERVER_AWS_S3_MULTI_ACCOUNT_MAPPING.

The following are the steps to configure IAM role.

  1. SSH to EC2 instance where Privacera Dataserver is installed.

  2. Enable multi-account access in Privacera Dataserver. Click the tab to reveal steps for Privacera Manager CLI and UI.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataserver.aws.yml config/custom-vars/
    vi config/custom-vars/vars.dataserver.aws.yml
    

    Add the new property.

    DATASERVER_AWS_S3_MULTI_ACCOUNT_ACCESS_ENABLE:"true"
    DATASERVER_AWS_S3_MULTI_ACCOUNT_DEFAULT_IAM:"<default-role-ARN>"
    DATASERVER_AWS_S3_MULTI_ACCOUNT_MAPPING:-"<role-arn>|<bucketA,bucketB*>"-"<role-arn>|<bucketC*,bucketD>"

    Property

    Description

    DATASERVER_AWS_S3_MULTI_ACCOUNT_ACCESS_ENABLE

    Property to enable or disable the AWS S3 multiple IAM role support in Dataserver.

    DATASERVER_AWS_S3_MULTI_ACCOUNT_DEFAULT_IAM

    Property to set the role ARN of the AWS S3 bucket. The default IAM role will be used, if IAM Role mapping is not found for any s3 bucket This bucket can be a shared bucket containing common artifacts or resources.

    DATASERVER_AWS_S3_MULTI_ACCOUNT_MAPPING

    Property to define the mapping between role ARNs and buckets. You can add comma-separated buckets.

    Note

    • The above role-bucket mapping applicable only for AWS S3 service and not applicable for other AWS services. To authenticate other AWS services, Dataserver will always use 'DefaultRole'.

    • Wildcards are supported while specifying bucket names in the mapping. For example, buck*.

  3. Update Privacera Manager. Click the tab to reveal steps for Privacera Manager CLI and UI.

    Run the following command.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
  4. Set assume role permission for Dataserver Instance Role in AWS console.

    • The IAM role used for Privacera Dataserver instance will be used to assume other roles. Hence, this IAM role (instance IAM role) should have permissions to assume other roles (IAM roles configured in Security config JSON ) which can be given from AWS Console.

    • Login to AWS console and go to IAM Service and then click Roles.

    • Select Privacera Dataserver Role* and edit the existing or Add New Policy.

    • Enter the following definition.

              {
              "Version": "2012-10-17",
              "Statement": [
                  {
                  "Sid": "VisualEditor0",
                  "Effect": "Allow",
                  "Action": "sts:AssumeRole",
                  "Resource": [
                      "arn:aws:iam::123456789012:role/RoleA",
                      "arn:aws:iam::987654321012:role/RoleB",
                      "arn:aws:iam::123456654321:role/DefaultRole"
                  ]
                  }
              ]
              }
  5. Each IAM Role which is added above needs to have trust on the IAM role attached to your Privacera Dataserver (e.g. arn:aws:iam::999999999999:role/PRIV_DATASERVER_ROLE).

    • Go to IAM Service and click Roles.

    • Select your IAM Role and edit Trust Relationship.

    • Enter the following definition.

              {
              "Version": "2012-10-17",
              "Statement": [
                  {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": [
                      "arn:aws:iam::999999999999:role/PRIV_DATASERVER_ROLE"
                      ]
                  },
                  "Action": "sts:AssumeRole"
                  }
              ]
              }

Azure topics

Azure CLI
Enable Azure CLI
  1. In the Privacera Portal, click LaunchPad from the left menu.

  2. Under the Azure Services section, click azure_cli.jpgto open the Azure CLI dialog. This dialog provides the means to download an Azure CLI setup script specific to your installation. It also provides a set of usage instructions.

  3. In Azure CLI, under Configure Script, click Download Script to save the script on your local machine. If you will be running Azure CLI on another system such as a 'jump server' copy it to that host.

    1. Alternatively, use 'wget' to pull this script down to your execution platform, as shown below. Substitute your installation's Privacera Platform host domain name or IPv4 address for "<PRIVACERA_PORTAL_HOST>".

      wget http://<PRIVACERA_PORTAL_HOST>:6868/api/cam/download/script -O privacera_azure.sh
      # USE THE "--no-check-certificate" option for HTTPS - and remove the # below
      # wget --no-check-certificate https://<PRIVACERA_PORTAL_HOST>:6868/api/cam/download/script -O privacera_azure.sh
      
    2. Copy the downloaded script to home directory.

      cp privacera_azure.sh ~/
      cd ~/
      
    3. Set this file to be executable:

      chmod a+x . ~/privacera_azure.sh
      
  4. Under the Azure Cli Generate Token section, first, generate a platform token.

    Note

    All the commands should be run with a space between the dot (.) and the script name (~/privacera_aws.sh).

    1. Run the following command:

      . ~/privacera_azure.sh --config-token
      
    2. Select/check Never Expired to generate a token that does not expire. Click Generate.

  5. Enable the Proxy or the endpoint and run one of the two commands shown below.

    . ~/privacera_azure.sh --enable-proxy
    

    or:

    . ~/privacera_azure.sh --enable-endpoint
    
  6. Under the Check Status section, run the command below.

    . ~/privacera_azure.sh --status
    
  7. To disable both the proxy and the endpoint, under the Azure Access section, run the commands shown below.

    . ~/privacera_azure.sh --disable-proxy
    . ~/privacera_azure.sh --disable-endpoint
    
Azure CLI Examples

List files in container

az storage blob list --container-name ${AZURE_CONTAINER_NAME} --output table

Upload a file

az storage blob upload --container-name ${AZURE_CONTAINER_NAME} --file ${FILE_TO_UPLOAD} --name ${FILE_NAME}

Download a file

az storage blob download --container-name ${AZURE_CONTAINER_NAME} --file ${FILE_TO_DOWNLOAD} --name ${FILE_NAME}
Azure Rest APIs

Azure offers rest APIs to access their ADLS storage (like Azure CLI). Following examples show how to access ADLS storage using Rest APIs in more secured way using Privacera.

Export Data Server Properties

export DATASERVER_URL=<dataserver-url>export AZURE_ADLS_STORAGE_ACCOUNT_NAME=<azure-storage-account-name>export AZURE_ADLS_CONTAINER_NAME=<azure-container-name>export PRIVACERA_TOKEN="<privacera-access-token>|<privacera-secret-token>"

Download Data Server CA certificates

curl -s -k "${DATASERVER_URL}/services/certificate" -o /tmp/cacerts
chmod 400 /tmp/cacerts

List containers

curl -v -X GET "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/?comp=list" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" --cacert /tmp/cacerts

List BLOBs

curl -v -X GET "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/${AZURE_ADLS_CONTAINER_NAME}?restype=container&comp=list" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" --cacert /tmp/cacerts

Upload BLOB

curl -v -X PUT "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/${AZURE_ADLS_CONTAINER_NAME}/{FILE_NAME}" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" -d'@{FILE_TO_UPLOAD}' -H "x-ms-blob-type: BlockBlob" --cacert /tmp/cacerts
Azure Rest APIs

Azure offers rest APIs to access their ADLS storage (like Azure CLI). Following examples show how to access ADLS storage using Rest APIs in more secured way using Privacera.

Export Data Server Properties

export DATASERVER_URL=<dataserver-url>export AZURE_ADLS_STORAGE_ACCOUNT_NAME=<azure-storage-account-name>export AZURE_ADLS_CONTAINER_NAME=<azure-container-name>export PRIVACERA_TOKEN="<privacera-access-token>|<privacera-secret-token>"

Download Data Server CA certificates

curl -s -k "${DATASERVER_URL}/services/certificate" -o /tmp/cacerts
chmod 400 /tmp/cacerts

List containers

curl -v -X GET "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/?comp=list" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" --cacert /tmp/cacerts

List BLOBs

curl -v -X GET "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/${AZURE_ADLS_CONTAINER_NAME}?restype=container&comp=list" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" --cacert /tmp/cacerts

Upload BLOB

curl -v -X PUT "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/${AZURE_ADLS_CONTAINER_NAME}/{FILE_NAME}" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" -d'@{FILE_TO_UPLOAD}' -H "x-ms-blob-type: BlockBlob" --cacert /tmp/cacerts

Download BLOB

curl -v -X GET "${DATASERVER_URL}/${AZURE_ADLS_STORAGE_ACCOUNT_NAME}/${AZURE_ADLS_CONTAINER_NAME}/{FILE_TO_DOWNLOAD}" -H "Authorization: Bearer ${PRIVACERA_TOKEN}" -H "x-ms-version: 2018-11-09" --cacert /tmp/cacerts -o {FILE_NAME}
Create Azure AD application

Prerequisites

  • Need Azure AD application with access to Azure Graph API

Steps

  1. Login to Azure portal and click the Azure Active Directory from left panel.

    image160.jpg
  2. Navigate to App registrations and click on +New r egistration.

    image161.jpg
  3. Enter the following details as:

    • Name: Azure AD User Sync

    • Supported account types: Accounts in any organizational directory (Any Azure AD directory - Multitenant) and personal Microsoft accounts (e.g. Skype, Xbox)

    image162.jpg
    • Redirect URI. This is an optional field.

    • Click Register.

  4. After the application is created, copy and note down the Application (client) ID and Directory (tenant) ID as this will required later for setup of User Sync.

    image163.jpg
  5. Now, click on Certificates & secrets and then click +New client secret.

  6. Enter the following details to generate client secret:

    • Description: Ranger UserSync

    • Expires: Never

    image164.jpg
  7. Click on Add and copy the value shown in Client secrets section under Value column. This will required for User Sync setup.

    image165.jpg
  8. Now, go to API permissions and then click +Add a permission and select the Microsoft Graph option.

    image166.jpg
  9. Select the Application permissions

    image167.jpg
  10. Under Select permissions, select the following details as:

    • Directory: Directory.Read.All

    • User: User.Read.All

    • Group: Group.Read.All

    image168.jpg
  11. Click the Add permissions. After permission has been added, The confirmation message displays with a Yes and No button.

    image169.jpg
  12. Click Yes. If you do not have privileges to grant consent, you can connect with Administrator to grant consent for the application.

    image170.jpg
  13. After the Successful consent grant, the screen will be displayed with all granted permissions.

    image171.jpg
Azure storage account creation
  1. Create Azure Storage Account through the Azure console (https://portal.azure.com).

  2. Note the following details for the Storage account from the Access Keys option in the left navigation:

    1. Storage Account Name

    2. Access keys Key1 and Key2

Setting up Azure application and register with Azure active directory
  1. Within Azure Active Directory, create and register a new app under App Registrations.

    1. For supported account types, select ‘Accounts in this organizational directory only (XXXXXXXXX)’

    2. In Redirect URI drop-down, select “Public client”.

    3. Value:urn:ietf:wg:oauth:2.0:oob

    4. Click the Register.

      image72.jpg
  2. Once the App is registered, click the Certificates & secrets.

    image73.jpg
  3. Click the New client secret.

    image74.jpg
  4. Note down the generated value of the client secret as this will not be visible later.

    image75.jpg
Set IAM Role for this application under storage account

To navigate to set IAM role, use the following steps:

  1. Go to Storage Accounts in Azure.

  2. Select Account.

  3. Access Control (IAM).

  4. Click on Add and select Role Assignments from dropdown menu.

Add the following role assignments for the application registered with Azure AD.

  1. Role Assignment 1

    1. Select Role as Owner.

    2. Assign Access to as default (no change).

    3. For “Select” use Azure application created above.

  2. Role Assignment 2

    1. Select Role as Storage Blob Data Contributor.

    2. Assign Access to as default (no change).

    3. For “Select” use Azure application created above.

Install Docker and Docker compose (Azure-Ubuntu)
  1. SSH to the VM as the administrator ( ${VM_USER} ).

  2. Install Docker on the VM.

    sudo apt install docker.io -y
    sudo service docker start
    sudo usermod -a -G docker ${VM_USER}
    exit
    
  3. Reattach to the VM (SSH to VM as ${VM_USER}).

  4. Confirm docker installation Download Docker-compose.

    #confirm docker installation
    docker info
    #
    DOCKER_COMPOSE_VERSION="1.23.2"
    sudo  curl -L https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose
    
Get ADLS properties
Get ADLS properties
  1. Open Azure portal and go to the storage account and select a storage account.

  2. In that storage account screen, select Overview, you get the RESOURCE_GROUP and SUBSCRIPTION_ID values.

    resource_grp_subscription.jpg
  3. Select Access Keys in the same screen to get STORAGE_ACCOUNT_NAME and STORAGE_SHAREDKEY.

    storage_account_info.jpg
  4. Open Azure portal and go to the App Registration and select the application created. Select Overview to get TENANT_ID and CLIENT_ID.

    app_id_client_id.jpg
  5. On the same screen, select Certificates and Secrets and get the client secret key.

    client_secret.jpg

GCP topics

Google sink to Pub/Sub
Overview

This topic covers how to use a Sink based approach to read the real time audit logs for real time scanning in Pkafka for Discovery, instead of using the Cloud logging API. The following are key advantages of Sink based approach:

  • All the logs will be synchronized to a Sink.

  • Sinks are exported to a destination Pub/Sub topic.

  • Pkafka subscribes to the Pub/Sub topic and it will read the audit data from the topic and will pass on the Privacera topic and a real time scan will be triggered.

Summary of configuration steps

You need to create following resources on Google Cloud Console:

  1. Destination to write logs from Sink: Following destination are available to write logs from Sink:

    a. Cloud Storage

    b. Pub/Sub Topic

    c. Big Query

    In this document, Pub/Sub Topic is considered as a destination for a Sink.

  2. Create a Sink

Create Pub/Sub topic
  1. Log on to Google Cloud Console and navigate to Pub/Sub topics page.

  2. Click the + CREATE TOPIC.

  3. In the Create a topic dialog, enter the following details:

    • Enter the unique topic name in the Topic ID field. For example, DiscoverySinkTopic.

    • Select Add a default subscription checkbox.

  4. Click CREATE TOPIC.

    If required, you can create a subscription in a later stage, after creating the topic, by navigating to Topic > Create Subscription > Create a simple subscription.

    Note down the subscription name as it will be used inside a property in Discovery.

  5. If you created a default subscription, or created a new subscription, you need to change the following properties:

    • Acknowledgement deadline: Set as 600.

    • Retry policy: Select as Retry after exponential backoff delay and enter the following values:

      • Minimum backoff(seconds): 10

      • Maximum backoff (seconds): 600

  6. Click Update.

    Notice

    You can configure GCS lineage time using custom properties, that are not readly apparent by default. See Properties Table

Create a Sink
  1. Login to the Google Cloud Console and navigate to the Logs Router page. You can perform the above action using the Logs Explorer page as well by navigating to Actions > Create Sink.

  2. Click CREATE SINK.

  3. Enter Sink details:

    a. Sink name (Required: Enter the identifier for Sink.

    b. Sink description (Optional): Describe the purpose, or use case for the Sink.

    c. Click NEXT.

  4. Now, enter Sink destination:

    a. Select Sink service.

    b. Select the service where you want your logs routed. The following services and destinations are available:

    • Cloud Logging logs bucket: Select or create a Logs Bucket.

    • BigQuery: Select or create the particular dataset to receive the exported logs. You also have the option to use partitioned tables.

    • Cloud Storage: Select or create the particular Cloud Storage bucket to receive the exported logs.

    • Pub/Sub: Select or create the particular topic to receive the exported logs.

    • Splunk: Select the Pub/Sub topic for your Splunk service.

    • Select as Other Project: Enter the Google Cloud service and destination in the following format:

      SERVICE.googleapis.com/projects/PROJECT_ID/DESTINATION/DESTINATION_ID

      For example, if your export destination is a Pub/Sub topic, then the Sink destination will be as following:

      pubsub.googleapis.com/projects/google_sample_project/topics/sink_new
  5. Choose which logs to include in the Sink:

    Build an inclusion filter: Enter a filter to select the logs that you want to be routed to the Sink's destination. For example:

    (resource.type="gcs_bucket" AND
    resource.labels.bucket_name="bucket-to-be-scanned" AND
    (protoPayload.methodName="storage.objects.create" OR protoPayload.methodName="storage.objects.delete" OR
    protoPayload.methodName="storage.objects.get")) OR
    resource.type="bigquery_resource"

    Add all of the bucket names you want to scan in the above filter as resources in Discovery.

    bucket_name="bucket-to-be-scanned" AND

    In case of multiple buckets, you will need to specify it as an “OR” condition, for example:

    (resource.type="gcs_bucket" AND resource.labels.bucket_name="bucket_1" OR resource.labels.bucket_name="bucket_2" OR resource.labels.bucket_name="bucket_3"

    In above example, three buckets are identified to be scanned - bucket_1, bucket_2, bucket_3.

  6. Click DONE.

Cross Project
  • For cross project scanning of GCS & GBQ resources, you need to create a Sink in another project and add the destination as a Pub/Sub topic of project one.

  • You can refer to the same step as mentioned above for creating the Sink in the destination by navigating to Destination > Select as Other project and enter the Pub/Sub topic name in the following format:

    'pubsub.googleapis.com/projects/google_sample_project/topics/sink_new'

  • To access the Sink created in another project, you need to add the Sink writer identity service account in the IAM administration page of the project where you have the Pub/Sub topic and the VM instance present.

  • To get the Sink Writer Identity, perform the following steps:

    • Go to the Logs Router page > select the Sink > select the dots icon > select Edit Sink Details > Writer Identity section, copy the service account.

    • Go to the IAM Administration page of the project where you have the Pub/Sub Topic and the VM instance > select Add member > Add the service account of the Writer Identity of the Sink created above.

    • Choose the role Owner and Editor

    • Click Save. Verify whether the service account which you added is present as a member on the IAM Administration page.

Configure properties
  • Add the following properties to the file: vars.pkafka.gcp.yml

    PKAFKA_USE_GCP_LOG_SINK_API: "true"
    PKAFKA_GCP_SINK_DESTINATION_PUBSUB_SUBSCRIPTION_NAME: ""
  • For the above property, add the Subscription name as the value created in the Pub/Sub Topic.

Note that Subscription ID can be used as the value of the above property. Refer to the following screenshot for more information.

sink.jpg
Generate audit logs using GCS lineage

To generate audit logs using GCS lineage, configure the following setting on the GCP Console to get the events in Google Logs Explorer.

  1. Log on to Google Cloud Console and navigate to IAM Admin ->> Audit Logs

  2. In the search bar, search for the "Google Cloud Storage" and then select the checkbox in the grid.

  3. At right panel, in Google Cloud Storage dialog, select the Admin Read and Data Read checkboxes and click Save.

    Note

    Perform the above step for all the projects where you will be executing GCS realtime scan and expecting lineage to be generated.

Note

You can configure GCS lineage time using custom properties that are not included by default. See Discovery.

Kubernetes

Kubernetes RBAC

Using the RBAC method in Kubernetes, you can manage the Kubernetes objects and regulate the access to a Kubernetes cluster.

To change the Kubernetes objects, perform the following steps:

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.kubernetes.custom-rbac.yml config/custom-vars/
    vi config/custom-vars/vars.kubernetes.custom-rbac.yml
  3. Remove the # (hash) and edit the following properties.

    #K8S_SERVICE_ACCOUNT: "privacera-sa"#K8S_ROLE_NAME: "privacera-sa-role"#K8S_ROLE_BINDING_NAME: "privacera-sa-role-bind"
Customize deployment files
Customize deployment files

This topic shows how you can configure additional properties by merging Kubernetes configuration YAML files. When you install and deploy Privacera services, default Kubernetes configuration files for each Privacera service get created. If you want to extend the configuration of a Privacera service, you can create a new configuration file where all the new properties get defined, and then merge them together.

Configuration filenames

The following table provides the list of Privacera services whose configurations can be merged. The tables gives the list of configuration files for a Privacera service that can be created and merged, and where these configuration files should be stored in a directory. You would need to refer this table to get the filename and location when creating the new configuration file.

Service Name

Custom Service Directory

Config File Names

Auditserver

~/privacera/privacera-manager/config/custom-vars/auditserver

- auditserver-service.yml

- auditserver-storageclass.yml

- auditserver-statefulset.yml

Audit-fluentd

~/privacera/privacera-manager/config/custom-vars/audit-fluentd

- audit-fluentd-service.yml

- audit-fluentd-storageclass.yml

- audit-fluentd-statefulset.yml

Access-Request-Manager

~/privacera/privacera-manager/config/custom-vars/portal

- access-request-manager-service.yml

- access-request-manager-deployment.yml

Mariadb

~/privacera/privacera-manager/config/custom-vars/mariadb

- mariadb-service.yml

- mariadb-secret.yml

- mariadb-pvc.yml

- mariadb-storageclass.yml

- mariadb-deployment.yml

Zookeeper

~/privacera/privacera-manager/config/custom-vars/zookeeper

- zookeeper-service.yml

- zookeeper-poddisruptionbudget.yml

- zookeeper-storageclass.yml

- zookeeper-statefulset.yml

Solr

~/privacera/privacera-manager/config/custom-vars/solr

- solr-service.yml

- solr-poddisruptionbudget.yml

- solr-storageclass.yml

- solr-statefulset.yml

Ranger-admin

~/privacera/privacera-manager/config/custom-vars/ranger-admin

- ranger-service.yml

- ranger-service-ingress.yml

- ranger-deployment.yml

Ranger-usersync

~/privacera/privacera-manager/config/custom-vars/ranger-usersync

- usersync-deployment.yml

Ranger-kms/crypto

~/privacera/privacera-manager/config/custom-vars/ranger-kms

- ranger-kms-service.yml

- ranger-kms-deployment.yml

Peg

~/privacera/privacera-manager/config/custom-vars/peg

- peg-service.yml

- peg-deployment.yml

- peg-hpa.yml

Portal

~/privacera/privacera-manager/config/custom-vars/portal

- portal-service.yml

- portal-deployment.yml

Dataserver

~/privacera/privacera-manager/config/custom-vars/dataserver

- dataserver-service.yml

- dataserver-service-account.yml

- dataserver-role-binding.yml

- dataserver-deployment.yml

Discovery

~/privacera/privacera-manager/config/custom-vars/discovery

- discovery-service.yml

- discovery-pvc.yml

- discovery-storageclass.yml

- discovery-deployment.yml

Policysync

~/privacera/privacera-manager/config/custom-vars/policysync

- policysync-deployment.yml

- policysync-pvc.yml

- policysync-rocksdb-pvc.yml

- policysync-storageclass.yml

Kafka

~/privacera/privacera-manager/config/custom-vars/kafka

- kafka-statefulset.yml

Pkafka

~/privacera/privacera-manager/config/custom-vars/pkafka

- pkafka-deployment.yml

Trino

~/privacera/privacera-manager/config/custom-vars/trino

- trino-deployment.yml

- trino-service.yml

- trino-worker-statefulset.yml

- trino-worker-storageclass.yml

Grafana

~/privacera/privacera-manager/config/custom-vars/grafana

- grafana-service.yml

- grafana-pvc.yml

- grafana-storageclass.yml

- grafana-deployment.yml

Graphite

~/privacera/privacera-manager/config/custom-vars/graphite

- graphite-service.yml

- graphite-pvc.yml

- graphite-storageclass.yml

- graphite-deployment.yml

Common - RBAC

~/privacera/privacera-manager/config/custom-vars/rbac

- service-account.yml

- role.yml

- role-binding.yml

Procedure

To merge Kubernetes configuration files, perform the following steps:

  1. Refer to the table above, and choose the service whose configuration you want to be merged. Get the filename of the configuration file, and the directory where the file will be stored.

  2. Create the directory with the service name. Replace <SERVICE_NAME> with the name of the Privacera service whose configuration you want to merge.

    cd ~/privacera/privacera-manager/config/custom-vars
    mkdir <SERVICE_NAME>
    
  3. Create the new configuration file. Replace <CONFIG_FILENAME> with the name of the configuration file of the Privacera service.

    vi <CONFIG_FILENAME>
    
  4. Add the properties in the configuration file. The following is an example of adding a nodeselector property.

    spec:template:spec:nodeSelector:node:privacera
    
  5. Verify the deployment file by running the setup command.

    ./privacera-manager.sh setup
    

    Once the command is completed, you can find the deployment file at the following location:

    vi ~/privacera/privacera-manager/output/kubernetes/helm/portal/templates/<CONFIG_FILENAME>
    
  6. Run the update command.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Example for assigning pods to a node

If you want to assign a pod to a node for the Portal service, perform the following steps:

  1. From the table above, refer the Portal service, and get the filename, portal-deployment.yml.

  2. Create the directory with the service name as portal.

    cd ~/privacera/privacera-manager/config/custom-vars
    mkdir portal
    
  3. Create the configuration file, portal-deployment.yml.

    vi portal-deployment.yml
    
  4. Add the following property in the configuration file. Modify the <key> and <value>.

    spec:template:spec:nodeSelector:<key>:<value>
  5. Before running the install, verify the deployment file by running the setup command.

    ./privacera-manager.sh setup
    

    Once the command is completed, you can find the deployment file at the following location:

    vi ~/privacera/privacera-manager/output/kubernetes/helm/portal/templates/portal-deployment.yml
    

    Contents of the custom portal deployment file is merged with the regular portal deployment file already available in Privacera Manager using Ansible Combine Filter. This merge only works with hashes/dictionaries. The new deployment file is generated in the output folder in the YAML format.

    CLick the tabs to display the properties of the deployment file before and after running the setup command.

    Before

    The following are the properties of the deployment file before running the setup command.

                apiVersion: apps/v1
                kind: Deployment
                metadata:
                labels:
                    app: portal
                name: portal
                spec:
                replicas: 1
                selector:
                    matchLabels:
                    app: portal
                strategy:
                    type: Recreate
                template:
                    metadata:
                    labels:
                        app: portal
                    spec:
                    containers:
                    - image: hub2.privacera.com/privacera:rel.latest
                        imagePullPolicy: IfNotPresent
                        livenessProbe:
                        failureThreshold: 3
                        initialDelaySeconds: 400
                        periodSeconds: 30
                        tcpSocket:
                            port: 6868
                        name: portal
                        ports:
                        - containerPort: 6868
                        readinessProbe:
                        failureThreshold: 6
                        initialDelaySeconds: 120
                        periodSeconds: 30
                        tcpSocket:
                            port: 6868
                        resources:
                        limits:
                            cpu: '0.5'
                            memory: 2457M
                        requests:
                            cpu: '0.2'
                            memory: 307M
                        volumeMounts:
                        - mountPath: /opt/privacera/portal/conf
                        name: conf-vol
                        - mountPath: /opt/privacera/portal/bin
                        name: bin-vol
                    imagePullSecrets:
                    - name: privacera-hub
                    initContainers:
                    - command:
                        - bash
                        - -c
                        - /scripts/wait-for-it.sh zk-0.zkensemble:2181:2181 -t 300 --
                        image: hub2.privacera.com/privacera:rel.latest
                        name: wait-for-zookeeper
                    - command:
                        - bash
                        - -c
                        - /scripts/wait-for-it.sh solr-service:8983 -t 300 --
                        image: hub2.privacera.com/privacera:rel.latest
                        name: wait-for-solr
                    - command:
                        - bash
                        - -c
                        - /scripts/wait-for-it.sh mariadb:3306 -t 300 --
                        image: hub2.privacera.com/privacera:rel.latest
                        name: wait-for-mariadb
                    - command:
                        - bash
                        - -c
                        - cp -r /conf_ro/. /opt/privacera/portal/conf
                        image: hub2.privacera.com/privacera:rel.latest
                        name: copy-conf
                        volumeMounts:
                        - mountPath: /opt/privacera/portal/conf
                        name: conf-vol
                        - mountPath: /conf_ro
                        name: portal-conf
                    - command:
                        - bash
                        - -c
                        - cp -r /bin_ro/. /opt/privacera/portal/bin
                        image: hub2.privacera.com/privacera:rel.latest
                        name: copy-bin
                        volumeMounts:
                        - mountPath: /opt/privacera/portal/bin
                        name: bin-vol
                        - mountPath: /bin_ro
                        name: portal-bin
                    restartPolicy: Always
                    securityContext:
                        fsGroup: 200
                    serviceAccountName: privacera-sa
                    topologySpreadConstraints:
                    - labelSelector:
                        matchLabels:
                            app: portal-1
                        maxSkew: 1
                        topologyKey: zone
                        whenUnsatisfiable: ScheduleAnyway
                    - labelSelector:
                        matchLabels:
                            app: portal-1
                        maxSkew: 1
                        topologyKey: node
                        whenUnsatisfiable: DoNotSchedule
                    volumes:
                    - configMap:
                        name: portal-conf
                        name: portal-conf
                    - configMap:
                        defaultMode: 493
                        name: portal-bin
                        name: portal-bin
                    - emptyDir: {}
                        name: conf-vol
                    - emptyDir: {}
                        name: bin-vol
                status: {}
    

    After

    The following are the properties of the deployment file after running the setup command. Two additional lines nodeSelector: and node: privacera are added.

                apiVersion: apps/v1
                kind: Deployment
                metadata:
                labels:
                    app: portal
                name: portal
                spec:
                replicas: 1
                selector:
                    matchLabels:
                    app: portal
                strategy:
                    type: Recreate
                template:
                    metadata:
                    labels:
                        app: portal
                    spec:
                    containers:
                    - image: hub2.privacera.com/privacera:rel.latest
                        imagePullPolicy: IfNotPresent
                        livenessProbe:
                        failureThreshold: 3
                        initialDelaySeconds: 400
                        periodSeconds: 30
                        tcpSocket:
                            port: 6868
                        name: portal
                        ports:
                        - containerPort: 6868
                        readinessProbe:
                        failureThreshold: 6
                        initialDelaySeconds: 120
                        periodSeconds: 30
                        tcpSocket:
                            port: 6868
                        resources:
                        limits:
                            cpu: '0.5'
                            memory: 2457M
                        requests:
                            cpu: '0.2'
                            memory: 307M
                        volumeMounts:
                        - mountPath: /opt/privacera/portal/conf
                        name: conf-vol
                        - mountPath: /opt/privacera/portal/bin
                        name: bin-vol
                    imagePullSecrets:
                    - name: privacera-hub
                    initContainers:
                    - command:
                        - bash
                        - -c
                        - /scripts/wait-for-it.sh zk-0.zkensemble:2181:2181 -t 300 --
                        image: hub2.privacera.com/privacera:rel.latest
                        name: wait-for-zookeeper
                    - command:
                        - bash
                        - -c
                        - /scripts/wait-for-it.sh solr-service:8983 -t 300 --
                        image: hub2.privacera.com/privacera:rel.latest
                        name: wait-for-solr
                    - command:
                        - bash
                        - -c
                        - /scripts/wait-for-it.sh mariadb:3306 -t 300 --
                        image: hub2.privacera.com/privacera:rel.latest
                        name: wait-for-mariadb
                    - command:
                        - bash
                        - -c
                        - cp -r /conf_ro/. /opt/privacera/portal/conf
                        image: hub2.privacera.com/privacera:rel.latest
                        name: copy-conf
                        volumeMounts:
                        - mountPath: /opt/privacera/portal/conf
                        name: conf-vol
                        - mountPath: /conf_ro
                        name: portal-conf
                    - command:
                        - bash
                        - -c
                        - cp -r /bin_ro/. /opt/privacera/portal/bin
                        image: hub2.privacera.com/privacera:rel.latest
                        name: copy-bin
                        volumeMounts:
                        - mountPath: /opt/privacera/portal/bin
                        name: bin-vol
                        - mountPath: /bin_ro
                        name: portal-bin
                    nodeSelector:
                      node: privacera
                    restartPolicy: Always
                    securityContext:
                        fsGroup: 200
                    serviceAccountName: privacera-sa
                    topologySpreadConstraints:
                    - labelSelector:
                        matchLabels:
                            app: portal-1
                        maxSkew: 1
                        topologyKey: zone
                        whenUnsatisfiable: ScheduleAnyway
                    - labelSelector:
                        matchLabels:
                            app: portal-1
                        maxSkew: 1
                        topologyKey: node
                        whenUnsatisfiable: DoNotSchedule
                    volumes:
                    - configMap:
                        name: portal-conf
                        name: portal-conf
                    - configMap:
                        defaultMode: 493
                        name: portal-bin
                        name: portal-bin
                    - emptyDir: {}
                        name: conf-vol
                    - emptyDir: {}
                        name: bin-vol
                status: {}
    
  6. Run the update command.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

Microsoft SQL topics

Install Microsoft SQL CLI

mssql-cli is a command line query tool for MS SQL Server. It runs on Windows, macOS, and Linux.

For more general information and detailed installation instructions see Microsoft Docs / SQL / Tools / Command prompt utilities / mssql-cli.

For macOS and Windows platforms you can generally install using pip.

$ pip install mssql-cli

On AWS/CentOS /RHEL flavored systems use sudo to first install python-pip, then use pip.

sudo yum install -y python-pip
sudo pip install mssql-cli

For Ubuntu flavor Linux, use apt-get:

# Import the public repository GPG keys
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

# Register the Microsoft Ubuntu repository
sudo apt-add-repository https://packages.microsoft.com/ubuntu/18.04/prod

# Update the list of products
sudo apt-get update

# Install mssql-cli
sudo apt-get install mssql-cli

# Install missing dependencies
sudo apt-get install -f
Microsoft SQL - Privacera data access for evaluation sequence
Microsoft SQL- Privacera data access for evaluation sequence

This topic steps through a test sequence intended to help confirm Privacera Data Access and policy-based controls for an MS SQL Server.

Install Microsoft SQL CLI

mssql-cli is a command line query tool for MS SQL Server. It runs on Windows, macOS, and Linux.

For more general information and detailed installation instructions see Microsoft Docs / SQL / Tools / Command prompt utilities / mssql-cli.

For macOS and Windows platforms you can generally install using pip.

$ pip install mssql-cli

On AWS/CentOS /RHEL flavored systems use sudo to first install python-pip, then use pip.

sudo yum install -y python-pip
sudo pip install mssql-cli

For Ubuntu flavor Linux, use apt-get:

# Import the public repository GPG keys
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

# Register the Microsoft Ubuntu repository
sudo apt-add-repository https://packages.microsoft.com/ubuntu/18.04/prod

# Update the list of products
sudo apt-get update

# Install mssql-cli
sudo apt-get install mssql-cli

# Install missing dependencies
sudo apt-get install -f

Create test database and content

Login as Administrator or user with sufficient privileges to create and populate a database.

mssql-cli -S ${MSSQL_SERVER_NAME}.database.windows.net -d${DATABASE} -U${ADMIN_USER}

mssql-cli

CREATE DATABASE customer

CREATESCHEMA customer_schema;

CREATETABLE customer_schema.customer_data (

idint,

person_name varchar(100),

email_address varchar(100),

ssn varchar(100),

country varchar(100),

us_phone varchar(100),

address varchar(100),

account_id varchar(100),

zipcode varchar(100));

insertinto customer_schema.customer_data values (1, 'Nancy','nancy@yahoo.com','201-99-5532','US','856-232-9702','939 Park Avenue','159635478','33317');

insertinto customer_schema.customer_data values (2,'Gene','gene@google.us','202-99-5532','UK','954-583-0575','303 Johnston Blvd','236854569','95202');

insertinto customer_schema.customer_data values (3,'Edward','edward@facebook.com','203-99-5532','US','209-626-9041','130 Hollister','365412985','60173');

insertinto customer_schema.customer_data values (4,'Pearlene','pearlene@gmail.com','204-99-5532','US','708-471-6810','17 Warren Rd','452189732','90017');

insertinto customer_schema.customer_data values (5,'James','james@cuvox.de','205-99-5532','US','661-338-6787','898 Newport Gray Rd','517836427','94041');

insertinto customer_schema.customer_data values (6,'Pamela','pamela@cuvox.de','206-99-5532','UK','650-526-5259','861 Strick Rd','685231473','80214');

insertinto customer_schema.customer_data values (7,'Donna','donna@fleckens.hu','207-99-5532','US','303-239-4282','1784 S Shore Dr','789563258','1730');

insertinto customer_schema.customer_data values (8,'Amy','amy@gustr.com','208-99-5532','US','774-553-4736','9522 Apple Valley Dr','854126945','55102');

insertinto customer_schema.customer_data values (9,'Adam','adam@teleworm.us','209-99-5532','UK','651-297-1448','745 Old Springville Rd','965412381','43201');

insertinto customer_schema.customer_data values (10,'Lucille','lucille@armyspy.com','210-99-5532','US','740-320-1270','4223 Midway Road','785651236','89102');

insertinto customer_schema.customer_data values (11,'Edard','edu@gustr.com','211-99-5532','UK','702-257-8796','3659 Dye Street','965121354','53207');

insertinto customer_schema.customer_data values (12,'Nick','nick@jourrapide.com','212-99-5532','US','414-483-8638','2966 Nutters Barn Lane','563515264','72764');

insertinto customer_schema.customer_data values (13,'Brian','brian@einrot.com','213-99-5532','US','479-872-9783','3300 Worthington Drive','654621233','91303');

insertinto customer_schema.customer_data values (14,'Stella','stella@jourrapide.com','214-99-5532','US','818-596-6681','1893 Ingram Road','261613654','35816');

insertinto customer_schema.customer_data values (15,'Leona','leona@dayrep.com','215-99-5532','UK','256-250-5413','4244 Burnside Court','986513211','75069');

SELECT * FROM customer_schema.customer_data;

Create a client 'Users'

Log into Privacera Portal.

In Privacera Portal Access Management: Users/Groups/Roles:

  1. Create Role "Sales_Role".

  2. Create User "Emily" and make Emily part of the Sales_Role.

Test use cases

1. Confirm the ability to log on to the Customer database as user 'emily'.

mssql-cli -S ${MSSQL_SERVER_NAME}.database.windows.net -d${DATABASE} -U${USER}

# For example : mssql-cli -S test.database.windows.net -d customer -U emily

Evaluate Privacera access control

In Privacera Portal: Access Management: Resource Policies, open the privacera_mssql application (in the MSSQL System).

Confirm policy "all - database, schema, table, column" is in place and defined.

Return to your mssql client and confirm access by user emily. While logged in as 'emily', select from customer database.

select * from customer_schema.customer_data;

Return to Privacera Portal: Access Management: Resource Policies, privacera_mssql application, open the policy 'all - database, schema, table, column' for to edit. Disable this policy.

Return to the mssql client and attempt the selection. This selection should fail.

select * from customer_schema.customer_data;

Configure Microsoft SQL server for database synapse audits

To configure MSSQL server for database or Synapse audits, use the following steps:

  1. Login to Azure portal.

  2. Search for SQL Servers in which you want to configure MSSQL for Azure AD users.

    searchsql.jpg
  3. Select Azure AD user, and then click Auditing.

  4. Click the toggle button to ON to Enable Azure SQL Auditing.

  5. Select the Storage checkbox to set audit log destination as storage account, and then select your existing storage account from the Storage Details.

  6. Click the Save button.

    auditing.jpg
  7. To open the Azure cloud shell, click the shell icon, shellicon.jpg, on the top menu bar, and then click PowerShell

    powershell.jpg
  8. Click Show advanced settings.

    setting.jpg
  9. In the Cloud Shell region text box, enter your region.

  10. In the Storage account, select Use existing to use your existing storage account.

  11. In the File share, select Create new, and then enter name.

  12. Click the Create storage button.

    storage.jpg

    Cloud powershell window will appear, you can run your commands in the powershell.

    pcommand.jpg
  13. Run the following command, if you have Azure MSSQL Database:

    Set-AzSqlServerAudit -ResourceGroupName "${RESOURCE_GROUP}" -ServerName "${MSSQL_SERVER_NAME}" -AuditActionGroup
    SCHEMA_OBJECT_ACCESS_GROUP,DATABASE_OBJECT_CHANGE_GROUP,SCHEMA_OBJECT_CHANGE_GROUP
    
  14. Run the following command, if you have Azure Synapse Database:

    Set-AzSqlServerAudit -ResourceGroupName "${RESOURCE_GROUP}" -ServerName "${MSSQL_SERVER_NAME}" -AuditActionGroup
    BATCH_COMPLETED_GROUP
    

    The above queries will take around one or two minutes to be completed.

  15. Go to your Storage account in which you have configured MSSQL Server for auditing purpose, and then click Containers.

    containers.jpg

    Note

    Make sure that your MSSQL Server name directory is visible inside your audit log container. It might take some time to appear inside the container.

    servername.jpg

    Now, you need to form an Audit Storage URL for your MSSQL Server.

  16. Go to Properties, and then copy the container URL.

    url.jpg

    Now, your Audit storage url will be ${CONTAINER_URL}/${MSSQL_SERVER_NAME}

Snowflake configuration for PolicySync

Snowflake configuration for PolicySync

Before configuring Snowflake with Privacera Manager, you must first manually create the Snowflake warehouse, database, users, and roles required by PolicySync. All of this can be accomplished by manually executing SQL queries.

Note

Log in to Snowflake as a user with ACCOUNTADMIN privileges.

Creating PolicySync role

The PRIVACERA_POLICYSYNC_ROLE role, which we will create in this step, will be used in the SNOWFLAKE_ROLE_TO_USE property when configuring Snowflake with Privacera Manager.

  1. Drop a role.

    DROP ROLE IF EXISTS "PRIVACERA_POLICYSYNC_ROLE";
    
  2. Create a role.

    CREATE ROLE IF NOT EXISTS "PRIVACERA_POLICYSYNC_ROLE";
    
  3. Grant this role permission to users to create/update/delete roles.

    GRANT ROLE USERADMIN TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
  4. Grant this permission to the role, allowing them to provide grants/revokes privileges on user/roles to create warehouse/database on account.

    GRANT ROLE SYSADMIN TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
  5. Grant this permission to the role so that it can manage grants for snowflake resources.

    GRANT MANAGE GRANTS ON ACCOUNT TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
  6. Grant this permission to the role so that it can create native Masking policies.

    GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
  7. Grant this permission to the role so that it can create native row filter policies.

    GRANT APPLY ROW ACCESS POLICY ON ACCOUNT TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
Creating a warehouse

The PRIVACERA_POLICYSYNC_WH warehouse, which we will create in this step, will be used in the SNOWFLAKE_WAREHOUSE_TO_USE property when configuring Snowflake with Privacera Manager.

Create a warehouse for PolicySync. Change the warehouse size according to deployment.

  CREATE WAREHOUSE IF NOT EXISTS "PRIVACERA_POLICYSYNC_WH" WITH WAREHOUSE_SIZE='XSMALL'WAREHOUSE_TYPE='STANDARD'AUTO_SUSPEND=600AUTO_RESUME= TRUE  MIN_CLUSTER_COUNT=1MAX_CLUSTER_COUNT=1SCALING_POLICY='ECONOMY';
Granting role permission to read access audits

To get read access audit permission on the Snowflake database, follow the steps below.

  1. Grant warehouse usage access so we can query the snowflake database and get the Access Audits.

    GRANT USAGE ON WAREHOUSE "PRIVACERA_POLICYSYNC_WH" TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
  2. Grant our role PRIVACERA_POLICYSYNC_ROLE to read Access Audits in the snowflake database.

    GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
Creating database for Privacera UDFs

The database name PRIVACERA_DB will be used in the SNOWFLAKE_JDBC_DB property when configuring Snowflake with Privacera Manager.

  1. This step is optional. If you already have the database and want to use it, you can skip this step.

    CREATE DATABASE IF NOT EXISTS "PRIVACERA_DB";
    
  2. Grant our role PRIVACERA_POLICYSYNC_ROLE database access so that we can create UDFs in the database.

    GRANT ALL ON DATABASE "PRIVACERA_DB" TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
    GRANT ALL ON ALL SCHEMAS IN DATABASE "PRIVACERA_DB" TO ROLE "PRIVACERA_POLICYSYNC_ROLE";
    
Creating user

The user which we will create in this step will be used in the SNOWFLAKE_JDBC_USERNAME and SNOWFLAKE_JDBC_PASSWORD properties when configuring Snowflake with Privacera Manager.

  1. Create a user

    CREATE USER IF NOT EXISTS "PRIVACERA_POLICYSYNC_USER"PASSWORD='<PLEASE_CHANGE>'MUST_CHANGE_PASSWORD=FALSE DEFAULT_WAREHOUSE="PRIVACERA_POLICYSYNC_WH"DEFAULT_ROLE="PRIVACERA_POLICYSYNC_ROLE";
    
  2. Grant the user the PRIVACERA_POLICYSYNC_ROLE role.

    GRANT ROLE "PRIVACERA_POLICYSYNC_ROLE" TO USER "PRIVACERA_POLICYSYNC_USER";
    
Creating owner role

By configuring the following property in vars.policysync.snowflake.yml, PolicySync can take ownership of all objects managed by it. PolicySync requires this in order to create row-filtering and column-Masking policies.

SNOWFLAKE_OWNER_ROLE:"PRIVACERA_POLICYSYNC_ROLE"

Note

If PolicySync is not configured to take ownership of all objects managed by PolicySync, keep the property value blank.

SNOWFLAKE_OWNER_ROLE:""
Masking and row level filtering

To run the Masking and Row Level Filter, the following permissions must be granted to each database managed by PolicySync. <DATABASE_NAME> must be replaced with the specific value.

GRANT ALL ON DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE";

GRANT ALL ON ALL SCHEMAS IN DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE";

GRANT ALL ON FUTURE SCHEMAS IN DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE"

GRANT ALL ON ALL TABLES IN DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE"

GRANT ALL ON FUTURE TABLES IN DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE"

GRANT ALL ON ALL VIEWS IN DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE"

GRANT ALL ON FUTURE VIEWS IN DATABASE "<DATABASE_NAME>" TO ROLE "PRIVACERA_POLICYSYNC_ROLE"
Using reduced permissions for existing PolicySync

If Privacera PolicySync is currently configured with ACCOUNTADMIN privileges, the steps below must be completed as an ACCOUNTADMIN in order for PolicySync to work with the reduced permissions specified in the previous sections.

  1. Drop UDFs.

    DROP FUNCTION IF EXISTS "<DATABASE_NAME>"."PUBLIC".ThrowColumnAccessException(string);
    

    Note

    • For PolicySync versions 4.7 or earlier,

      <DATABASE_NAME> must be replaced with the value provided in configuration jdbc.db.

    • For PolicySync versions 5.0 or later:

      <DATABASE_NAME> must be replaced with the value provided in configuration ranger.policysync.connector.snowflake.masking.functions.db.name.

  2. Drop row level filter access policies.

    DROP ROW ACCESS POLICY IF EXISTS "<DATABASE_NAME>"."<SCHEMA_NAME>"."<ROW_ACCESS_POLICY_NAME>";

    Note

    • For PolicySync version 4.7:

      Row Level Filter access policies must be deleted in all databases and schemas managed by PolicySync.

      The following is the format of a Row Level Filter access policy name: :

      {database}_{schema}_{table}_row_filter_policy.

      For example, "db1_sch1_tbl1_row_filter_policy"

    • For PolicySync versions 5.0 or later:

      If PolicySync is configured to create Row Level Filter access policies in a specific database and schema (see below), Row Level Filter access policies must be deleted from the specified database and schema.

      • ranger.policysync.connector.snowflake.row.filter.policy.db.name

      • ranger.policysync.connector.snowflake.row.filter.policy.schema.name

      Or else, Row Level Filter access policies in all databases and schemas managed by PolicySync must be deleted.

      The following is the format of a Row Level Filter access policy name: :

      {database}{separator}{schema}{separator}{table}.

      For example, "db1_PRIV_sch1_PRIV_tbl1".

    Use the following command to list Row Level Filter access policies:

    SHOW ROW ACCESS POLICIES;
  3. Drop masking policies.

    DROP MASKING POLICY IF EXISTS "<DATABASE_NAME>"."<SCHEMA_NAME>"."<MASKING_POLICY_NAME>";

    Note

    • For PolicySync versions 4.7 or earlier:

      Masking policies must be deleted in all databases and schemas managed by PolicySync.

      The following is the format of a Masking policy name:

      {table}{separator}{column}.

      For example, "tbl1_priv_col1"

    • For PolicySync versions 5.0 or later:

      If PolicySync is configured to create Masking policies in a specific database and schema (see below), Masking policies must be deleted from the specified database and schema.

      • ranger.policysync.connector.snowflake.masking.policy.db.name

      • ranger.policysync.connector.snowflake.masking.policy.schema.name

      Or else, Masking policies in all databases and schemas managed by PolicySync must be deleted.

      The following is the format of a Masking policy name:

      {database}{separator}{schema}{separator}{table}{separator}{column}.

      For example, "db1_PRIV_sch1_PRIV_tbl1_PRIV_col1".

    Use the following command to list all masking policies:

    SHOW MASKING POLICIES;
    

Create Azure resources

Create Azure resources

Creation of azure resources can be done with managed-identity.

Enable system-assigned managed identity

  1. Sign in to the Azure portal.

  2. Navigate to the Virtual Machine > Identity.

  3. Under System assigned, Status, select On and then click Save.

Assign a role to a managed identity

  1. Select the Access control (IAM).

  2. Select + Add role assignment. In the next step, assign the following 3 roles for the identity:

    • Cosmos DB Operator

    • DocumentDB Account Contributor

    • Storage Account Contributor

  3. In the Add role assignment, configure the following values, and then click Save:

    • Scope: Select the scope.

    • Subscription: Select the subscription.

    • Resource Group: Select the resource group in which you want to create the resources for Discovery

    • Role: Select the role mentioned above.

Databricks

Create an endpoint in Databricks SQL
Create an endpoint in Databricks SQL
  1. Login to your Databricks account.

  2. After logging into your Databricks, go to SQL Analytics.

  3. Go to Endpoints and click on New SQL Endpoint.

  4. Create the endpoint as per your requirement as shown below.

    dbx_sql_analytics_endpoint.jpg
  5. After creating endpoint click on the endpoint connection details and note down the JDBC url for configuration with PolicySync.

    dbx_sql_analytics_jdbc.jpg
  6. Click on personal access token to create token.

    dbx_sql_analytics_access_token.jpg
  7. Click on Generate New Token.

  8. Put the name for the token and validity and click on Generate.

  9. Copy the generated token. This is the JDBC password of the user when connecting from PolicySync, and the email ID of the user is the JDBC username.

  10. Grant Admin privileges.

    1. Go to Workspace.

    2. To access the Admin Console, go to the top right of the workspace UI, click the

      Account Icon

      user account icon, and select Admin Console.

    3. In the Admin column, select the checkbox for the user.

Add custom Spark configuration for Databricks
Authenticate Databricks using JWT

For information on the section, click here.

Add extra properties to Spark configuration

To add custom properties in the Databricks cluster init script, do the following:

  1. Create a custom configuration file.

    cd ~/privacera/privacera-manager
    vi config/custom-properties/databricks-spark.conf
    
  2. Now, you can use the file to add any custom spark properties. Just add the properties one below the other, and then save the file.

    For example, you can add the following property and save the file.

    "spark.databricks.delta.formatCheck.enabled"="false"
    

    Note

    Avoid putting comments, extra words, or blank lines in the config file.

  3. Run the following command.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configure Databricks cluster policy
  1. Add the following two properties in vars.databricks.plugin.yml.

    • DATABRICKS_SQL_CLUSTER_POLICY_SPARK_CONF

    • DATABRICKS_SCALA_CLUSTER_POLICY_SPARK_CONF

    For example,

    DATABRICKS_SQL_CLUSTER_POLICY_SPARK_CONF:
    - Note: first spark conf
        key : "spark.hadoop.first.spark.test"
        value: "test1"
    - Note: second spark first spark conf
        key: "spark.hadoop.first.spark.test2"
        value: "test2"
    DATABRICKS_SCALA_CLUSTER_POLICY_SPARK_CONF:
    - Note: first spark conf
        key : "spark.hadoop.first.spark.test"
        value: "test1"
    - Note: second spark first spark conf
        key: "spark.hadoop.first.spark.test2"
        value: "test2"
    
  2. To add custom properties with Java agent, add the following property in vars.databricks.plugin.yml.

    • DATABRICKS_SPARK_PLUGIN_AGENT_JAR

    For example,

    DATABRICKS_SPARK_PLUGIN_AGENT_JAR: " -Dmy.custom.propery=test -javaagent:/databricks/jars/privacera-agent.jar"
    
Configure service name for Databricks Spark plugin

By default in Privacera Portal, all policies are defined in privacera_hive under Access Management > Resource Policies. This page explains how to configure a custom Ranger repository.

For custom repositories, you will change the DATABRICKS_SERVICE_NAME_PREFIX property in the config/custom-vars/vars.databricks.plugin.yml file. (This property is applicable only for the Databricks FGAC plugin.)

Your service repositories will be named using this value as a prefix:

  • Hive: DATABRICKS_SERVICE_NAME_PREFIX_hive.

  • S3: DATABRICKS_SERVICE_NAME_PREFIX_s3.

  • ADLS: DATABRICKS_SERVICE_NAME_PREFIX_adls.

  • Files: DATABRICKS_SERVICE_NAME_PREFIX_files.

For example, if your DATABRICKS_SERVICE_NAME_PREFIX is dev your policies would be named the following:

dev_hive
dev_s3
dev_adls
dev_files                 

To customize a new service name:

  1. In the Privacera Portal, under Access Management > Resource Policies, create the repositories with your custom names.

    When creating the policies:

    • Make sure the Username and Password fields have valid values.

    • The Active status is enabled/on.

    • The Common Name of the Certification= Ranger.

    Learn more about how to configure Resource Policies

  2. Open the config/custom-vars/vars.databricks.plugin.yml file.

    Modify the DATABRICKS_SERVICE_NAME_PREFIX property to your custom service name prefix.

  3. Update Privacera Manager by running the following script:

    ./privacera_manager.sh update
                            

    Then restart the cluster which is pointing to updated init script.

Override Databricks region URL mapping on AWS

You can override the default configuration that maps AWS regions to Databricks private URLs.

Prerequisites

  • You installed Privacera on AWS

Procedure

  1. Log in to the system where you installed Privacera Manager, and then enter the following command:

    cd ~/privacera/privacera-manager
  2. To copy the default AWS region mapping file into the config/custom-properties directory, enter the following command:

    cp ansible/privacera-docker/roles/templates/dataserver/common/sample.dbx-region-url-mapping-aws.json  config/custom-properties/dbx-region-url-mapping-aws.json
  3. Edit the JSON mapping file. The configuration is of the following shape:

    {
      "regionToPrivateUrlMap": {
        "<AWS_REGION>": {
          "url": "<DATABRICKS_URL>"
        }
    }

    Replace <AWS_REGION> with the name of the AWS region and <DATABRICKS_URL> with the Databricks private URL for the region.

  4. To update your Privacera Manager installation, enter the following command:

    ./privacera-manager.sh update

Verification

  1. Log in to a system where you can run kubectl commands.

  2. To run a shell on a Data Server pod, enter the following command:

    kubectl exec -it <POD_NAME>  bash -n <NAMESPACE>

    where:

    • POD_NAME: Specifies the name of a Data Server pod

    • NAMESPACE: Specifies the name of the namespace where the pod is running

  3. To confirm that your change is applied, enter the following command:

    cat privacera-dataserver/conf/dbx-region-url-mapping-aws.json
  4. To exit the shell, enter exit.

Databricks policy management
  • Databricks Integration - AWS can now be done directly using Privacera Manager. See topic: Databricks - AWS.

  • Databricks Integration - Azure can now be done directly using Privacera Manager. See topic: Databricks - Azure.

Create Policy in Portal

To create a policy in Privacera Portal, use the following steps:

  1. Login to Privacera Portal.

  2. On the Privacera home page, expand the Settings menu and click on Databricks Policies from left menu.

  3. Click the +Create Policy.

    image305.jpg
  4. Enter the Policy Name. (Mandatory)

  5. Select the Users, Groups, IAM Role from the drop-down.

    You can select multiple Users and Groups.

  6. Enter the Additional JSON (If any). This will append with the existing JSON which will be fetched from backend.

    image306.jpg
  7. Click Save.

    The policy is created successfully.

Possible permission error

By default, Admin groups will have permission to all the policies. If you have not configured Databricks properties in Privacera Portal properties file then you will get the following error.

image307.jpg
To correct this error:
  • The Token should be generated from a user who is an Admin.

  • Additional JSON that can be used to create policy.

    {
    "autoscale.min_workers": {
            "type": "range",
            "minValue": 1,
            "hidden": false
        },
        "autoscale.max_workers": {
            "type": "range",
            "maxValue": 2
        },
        "cluster_name": {
            "type": "fixed",
            "value": "secured"
        },
        "spark_version": {
            "type": "regex",
            "pattern": "5.5.x-scala2.11"
        },
        "spark_conf.spark.hadoop.hadoop.security.credential.provider.path": {
            "type": "fixed",
            "value": "jceks://dbfs@/${JCEKS_FILE_PATH}",
            "hidden": true
        },
        "spark_conf.spark.databricks.delta.formatCheck.enabled": {
            "type": "fixed",
            "value": "false",
            "hidden": true
        },
        "spark_conf.spark.databricks.delta.preview.enabled": {
            "type": "fixed",
            "value": "true",
            "hidden": true
        },
        "node_type_id": {
            "type": "regex",
            "pattern": "m4.*"
        },
        "autotermination_minutes": {
            "type": "unlimited",
            "defaultValue": 50
        }
    }
    
    Create Cluster in Databricks

    To create a Cluster in Databricks through policy, use the following steps:

    1. Login to Databricks.

    2. Click on Clusters from left menu.

    3. Click on Create Cluster.

    4. Select the Policy from the drop down.

    5. Enter the the required details.

    6. Click on Create Cluster.

    The Cluster is created successfully.

    Supported actions

    Policy

    • Create: Setting users and group permissions

    • Update:Setting users and group permissions

    • Delete:

    Form elements

    • Ranger Enabled

      • True: Compulsory JSON will be added from backend

      • False:Compulsory JSON will not be added from the backend.

    IAM role (Optional)

    If selected then the below JSON value will be added from backend.

    {
       "aws_attributes.instance_profile_arn":{
          "type":"fixed",
          "value":{SELECTED_VALUE},
          "hidden":false
       }
    }
    

    Spark Plug-in

    Properties
    Spark configuration table properties
    Fine-grained access control
    Object-level access control

    Azure key vault

    Connect with a client ID and client secret

    To configure a connection to the Azure Key Vault with ID and Secret:

    Generate the Client ID
    1. Login to the Azure portal.

    2. Search for Azure Key Vault.

    3. Click +Add to create a new key vault as shown below:

      image76.jpg
      image77.jpg
    4. After vault is created, from the left navigation, select the Overview section and note the Vault URI AZURE_KEYVAULT_URL.

    5. To connect to the vault, we need to create an application registration through the app registration.

    6. Register the application (e.g. rangerkmsdemo) as shown in the following example:

      image78.jpg
    7. Click on the registered application and in the left menu, navigate to the Overview section.

    8. Note the Application (client) ID which is the AZURE_CLIENT_ID for connecting.

    Generate Client Secret
    1. In the application screen, click on Certificates & Secrets in the left menu.

      image79.jpg
    2. Create a new client secret as shown in the example below:

      image80.jpg
    3. The Client Secret as shown - the secret value is the AZURE_CLIENT_SECRET.

      image81.jpg
    4. Next, go the key vault that was created in Step 3.

    5. Select Access Policies> +Add Access Policy.

      image82.jpg
    Add Access Policy
    1. In the Add access policy screen, we need to set permissions to access the vault with the application that was created.

    2. Select the Key permissions (mandatory), Secret permissions (optional), and Certificate permissions (optional).

    3. For Select principal , select the application you created.

      image83.jpg

      Go to Privacera/docker/ranger/kms/install.properties and change the following values:

      AZURE_KEYVAULT_ENABLED=true
      AZURE_KEYVAULT_SSL_ENABLED=false
      AZURE_CLIENT_ID=(from step 3.3)
      AZURE_CLIENT_SECRET=(from step 3.6)
      #AZURE_AUTH_KEYVAULT_CERTIFICATE_PATH (mandatory field. Value can be None/dummy)
      AZURE_AUTH_KEYVAULT_CERTIFICATE_PATH=/home/machine/Desktop/azureAuthCertificate/keyvault-MyCert.pfx  # Initialize below prop if your certificate file has any password
      # AZURE_AUTH_KEYVAULT_CERTIFICATE_PASSWORD (mandatory field. Value can be None/dummy)
      AZURE_AUTH_KEYVAULT_CERTIFICATE_PASSWORD=certPass AZURE_MASTERKEY_NAME=RangerMasterKey
      # E.G. RSA, RSA_HSM, EC, EC_HSM, OCT
      AZURE_MASTER_KEY_TYPE=RSA
      # E.G. RSA_OAEP, RSA_OAEP_256, RSA1_5, RSA_OAEP
      ZONE_KEY_ENCRYPTION_ALGO=RSA_OAEP
      AZURE_KEYVAULT_URL=(from step 4 )
      

      Note

      The fields that say 'Value can be none/dummy' must have some value - cannot be blank.

    4. Restart Ranger KMS as follows:

      cd ~/privacera/docker
      ./privacera_services restart ranger-kms
      
    5. The master key is created when Ranger KMS is restarted. Verify that master key (name that is set in the properties) is created in the vault under Keys:

      image84.jpg

      When the Client ID and Client certificate are added and the Ranger KMS is restarted, an error occurs in the KMS logs: ~/privacera/docker/logs/ranger/kms/.

    6. Exit the container and restart Ranger KMS.

    Connect with a client ID and certificate

    To configure a connection to the Azure Key Vault with ID and Certificate:

    1. Follow the same steps as in Generate the Client ID in the topic Connect to Key Vault with Client ID and Secret.

    2. Go to the Key Vault generated and select the Certificates>Generate/Import.

      image85.jpg

      You have the option to generate a certificate outside the vault and import it here.

    3. Select Generate to generate a certificate.

    4. Enter the certificate details as shown below:

      image86.jpg
    5. In the example shown, a certificate 'test' is generated.

      image87.jpg
    6. Click on the certificate that is disabled and enable it.

    7. Click open the certificate and download it as shown:

      image88.jpg

      Download the certificate and. copy the certificate to the SSL folder: ~/privacera/privacera-manager/config/ssl/.

    8. Open the certificate and delete the private key and save the public certificate as shown in this example:

      image89.jpg
      image90.jpg
    9. Upload the certificate to the Azure application that was created as follows:

      image91.jpg
      image92.jpg
    10. Go the Key vault that was created and click on Access Policies.

    11. Follow the instructions in Add Access Policy.

      Note

      The certificate path should be as it is show in the ranger/kms/install.properties and cannot change. Also, if you need a password for the certificate, add it in the .properties file. All fields in the .properties file are required and cannot be removed. Value can be none/dummy.

    Add custom properties

    In Privacera Manager, all the properties of the following are configurable:

    • PEG

    • Discovery

    • Dataserver

    • Ranger Admin

    • Ranger Usersync

    If you want to override any default/hard-coded property value, just add the properties in its corresponding YAML file and define a value for it.

    For example: When you install Privacera platform on your instance, the default username and password for the Discovery service are padmin/padmin. If you wish to override these default values and define your own username and password for the Discovery service, perform the following steps:

    1. SSH to the instance.

    2. Run the following commands according to your environment configuration.

      AWS

      Docker

      cd ~/privacera/privacera-manager
      cp config/sample-vars/vars.discovery.aws.yml config/custom-vars/
      vi config/custom-vars/vars.discovery.aws.yml
      

      Azure

      Docker

      cd ~/privacera/privacera-manager
      cp config/sample-vars/vars.discovery.azure.yml config/custom-vars/
      vi config/custom-vars/vars.discovery.azure.yml
      

      GCP

      Docker

      cd ~/privacera/privacera-manager
      cp config/sample-vars/vars.discovery.gcp.yml config/custom-vars/
      vi config/custom-vars/vars.discovery.gcp.yml
      
    3. Add the following two properties in the YAML file, and enter the username and password as per your choice.

      DISCOVERY_PORTAL_SERVICE_USERNAME: ${Username}
      DISCOVERY_PORTAL_SERVICE_PASSWORD: ${Password}
      
    4. Run the following commands.

      cd ~/privacera/privacera-manager
      ./privacera-manager.sh update
      

    Migrate Ranger KMS master key

    Migrate Ranger KMS master key

    The following steps will migrate the master key of Ranger KMS from its database to the Azure Key Vault.

    1. Run the following commands to enter the Ranger KMS shell.

      Docker shell

      cd /home/ec2-user/privacera/docker
      ./privacera_services shell ranger-kms
      

      Kubernetes shell

      In the variable, <NAMESPACE>, provide your namespace.

      kubectl get pods -n <NAMESPACE>
      kubectl exec -it <ranger_kms_pod_name> -n <NAMESPACE> -- bash
      
    2. Run the following commands to run the migration script.

      bash DBMKTOAZUREKEYVAULT.sh <azureMasterKeyName> <azureMasterKeyType> 
      <zoneKeyEncryptionAlgo> <azureKeyVaultUrl> <azureClientId> <isSSLEnabled:true/false> 
      <clientSecret / Certificate Path>
      

      Parameter

      Description

      <azureMasterKeyName>

      Name of the Master Key you want to migrate.

      <azureMasterKeyType>

      Type of the Master Key. For example, RSA

      <zoneKeyEncryptionAlgo>

      Encryption algorithm used in the Master Key. For example, RSA_OAEP

      <azureKeyVaultUrl>

      Azure Key Vault URL. To get the URL, click here.

      <azureClientId>

      Azure Client ID. To get the ID, click here.

      <isSSLEnabled:true/false>

      Enable SSL. For example, true

      <clientSecret / Certificate Path>

      If the authentication is done without SSL enabled, get the client secret. For more information, click here.

      If the authentication is done with SSL enabled, get the certificate secret. For more information, click here.

    IAM policy for AWS controller

    Attach the following to your Kubernetes cluster:

    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Effect":"Allow",
             "Action":[
                "acm:DescribeCertificate",
                "acm:ListCertificates",
                "acm:GetCertificate"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CreateSecurityGroup",
                "ec2:CreateTags",
                "ec2:DeleteTags",
                "ec2:DeleteSecurityGroup",
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAddresses",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceStatus",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeTags",
                "ec2:DescribeVpcs",
                "ec2:ModifyInstanceAttribute",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:RevokeSecurityGroupIngress"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "elasticloadbalancing:AddListenerCertificates",
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:CreateLoadBalancer",
                "elasticloadbalancing:CreateRule",
                "elasticloadbalancing:CreateTargetGroup",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:DeleteRule",
                "elasticloadbalancing:DeleteTargetGroup",
                "elasticloadbalancing:DeregisterTargets",
                "elasticloadbalancing:DescribeListenerCertificates",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:DescribeRules",
                "elasticloadbalancing:DescribeSSLPolicies",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetGroupAttributes",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:ModifyRule",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:ModifyTargetGroupAttributes",
                "elasticloadbalancing:RegisterTargets",
                "elasticloadbalancing:RemoveListenerCertificates",
                "elasticloadbalancing:RemoveTags",
                "elasticloadbalancing:SetIpAddressType",
                "elasticloadbalancing:SetSecurityGroups",
                "elasticloadbalancing:SetSubnets",
                "elasticloadbalancing:SetWebAcl"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "iam:CreateServiceLinkedRole",
                "iam:GetServerCertificate",
                "iam:ListServerCertificates"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "cognito-idp:DescribeUserPoolClient"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "waf-regional:GetWebACLForResource",
                "waf-regional:GetWebACL",
                "waf-regional:AssociateWebACL",
                "waf-regional:DisassociateWebACL"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "tag:GetResources",
                "tag:TagResources"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "waf:GetWebACL"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "wafv2:GetWebACL",
                "wafv2:GetWebACLForResource",
                "wafv2:AssociateWebACL",
                "wafv2:DisassociateWebACL"
             ],
             "Resource":"*"
          },
          {
             "Effect":"Allow",
             "Action":[
                "shield:DescribeProtection",
                "shield:GetSubscriptionState",
                "shield:DeleteProtection",
                "shield:CreateProtection",
                "shield:DescribeSubscription",
                "shield:ListProtections"
             ],
             "Resource":"*"
          }
       ]
    }

    Policy to upload ACM/IAM certificates

    If you want Privacera to manage/upload ACM/IAM certificates for your Ingress Application Load Balancers, then attach the following policy to your Privacera Manager Host:

    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"VisualEditor0",
             "Effect":"Allow",
             "Action":[
                "iam:GetServerCertificate",
                "iam:UpdateServerCertificate",
                "iam:DeleteServerCertificate",
                "iam:UploadServerCertificate"
             ],
             "Resource":"arn:aws:iam::${AWS_ACCOUNT_ID}:server-certificate/privacera/*"
          },
          {
             "Sid":"VisualEditor1",
             "Effect":"Allow",
             "Action":"iam:ListServerCertificates",
             "Resource":"*"
          }
       ]
    }

    Policy for Kubernetes cluster details

    To discover Kubernetes cluster details such as VPC Id, Subnets and Security group, attach the following policy to your Privacera Manager Host:

    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"VisualEditor0",
             "Effect":"Allow",
             "Action":"eks:DescribeCluster",
             "Resource":"arn:aws:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${EKS_CLUSTER_NAME}"
          }
       ]
    }

    Customize topic and table names

    By default, topic and table names are assigned and managed internally by Privacera Discovery. Also, the deployment environment name is attached as a suffix to the topic and table names.

    For example, the default name for a Classification Topic in Privacera Discovery is shown as below:

    CLASSIFICATION_TOPIC: "privacera_classification_info_{{DEPLOYMENT_ENV_NAME}}"
    

    To customize the name of a topic or table, you can do one of the following:

    • Remove the {{DEPLOYMENT_ENV_NAME}} variable as suffix.

    • Re-define a new topic/table name.

    If you want to customize any topic or table name, refer to the property in the the following table.

    • Uncomment the topic, and enter a name, along with the {{DEPLOYMENT_ENV_NAME}} as the suffix.

    • To remove the {{DEPLOYMENT_ENV_NAME}} as the suffix, refer to the DISCOVERY_DEPLOYMENT_SUFFIX_ID property in this table.

    • {{DEPLOYMENT_ENV_NAME}} is the name of the environment you have given in the vars.privacera.yml

    Property

    Description

    Example customization

    PRIVACERA_PORTAL_TOPIC_DYNAMIC_PREFIX

    Uncomment and enter a custom name to add a prefix to the real-time topic for Data Sources in Privacera Portal.

    PRIVACERA_PORTAL_TOPIC_DYNAMIC_PREFIX="privacera_scan_worker"

    CLASSIFICATION_TOPIC

    Streams Privacera Discovery classification information generated after scanning for consumers to post-process, such as writing the data to Solr

    CLASSIFICATION_TOPIC: "privacera_classification_info_{{DEPLOYMENT_ENV_NAME}}"

    ALERT_TOPIC

    Streams alert data which consumers to post-process, such as writing the data to Solr

    ALERT_TOPIC: "privacera_alerts_{{DEPLOYMENT_ENV_NAME}}"

    SPARK_EVENT_TOPIC

    Streams Spark events for debugging purpose

    SPARK_EVENT_TOPIC: "privacera_spark_events_{{DEPLOYMENT_ENV_NAME}}"

    RESULT_TOPIC

    Streams error logs consumers to post-process, such as writing the data to Solr for displaying on the Privacera Portal diagnostic page

    RESULT_TOPIC: "privacera_results_{{DEPLOYMENT_ENV_NAME}}"

    OFFLINE_SCAN_TOPIC

    Streams batch file events after listing, which is consumed by Privacera Discovery to initiate scanning of batch file

    OFFLINE_SCAN_TOPIC: "privacera_offline_scan_{{DEPLOYMENT_ENV_NAME}}"

    AUDITS_TOPIC

    Streams real-time audit events consumed by Privacera Discovery for real-time scanning

    AUDITS_TOPIC: "privacera_audits_{{DEPLOYMENT_ENV_NAME}}"

    SCAN_RESOURCE_INFO_TOPIC

    Streams data for scan summary information reporting about scan request jobs

    SCAN_RESOURCE_INFO_TOPIC: "privacera_scan_resources_info_{{DEPLOYMENT_ENV_NAME}}"

    RIGHT_TO_PRIVACY_TOPIC

    Streams events for triggering the Right to Privacy compliance policy

    : RIGHT_TO_PRIVACY_TOPIC: "privacera_right_to_privacy_{{DEPLOYMENT_ENV_NAME}}"

    DELAY_QUEUE_TOPIC

    Streams real-time events to HDFS for delayed processing

    DELAY_QUEUE_TOPIC: "privacera_delay_queue_{{DEPLOYMENT_ENV_NAME}}"

    APPLY_SCHEME_TOPIC

    Streams events for triggering the de-identification compliance policy

    APPLY_SCHEME_TOPIC: "privacera_apply_scheme_{{DEPLOYMENT_ENV_NAME}}"

    ML_CLASSIFY_TAG_TOPIC

    Streams events for triggering tag detection via Machine Learning Models

    ML_CLASSIFY_TAG_TOPIC: "privacera_ml_classify_tag_{{DEPLOYMENT_ENV_NAME}}"

    LINEAGE_TOPIC

    Streams lineage information for consumers for writing the data to Solr

    LINEAGE_TOPIC: "privacera_lineage_{{DEPLOYMENT_ENV_NAME}}"

    RESOURCE_TABLE

    ALERT_TABLE

    SCAN_REQUEST_TABLE

    ACTIVE_SCANS_TABLE

    MLRESOURCE_TABLE

    LINEAGE_TABLE

    AUDIT_SUMMARY_TABLE

    STATE_TABLE

    SCAN_STATUS_TABLE

    You can customize the table names.

    Uncomment the table, and enter a name, along with the {{DEPLOYMENT_ENV_NAME}} as the suffix.

    To remove the {{DEPLOYMENT_ENV_NAME}} as the suffix, refer to the DISCOVERY_DEPLOYMENT_SUFFIX_ID property in this table.

    RESOURCE_TABLE: "privacera_resource_v2_{{DEPLOYMENT_ENV_NAME}}"

    ALERT_TABLE: "privacera_alert_{{DEPLOYMENT_ENV_NAME}}"

    SCAN_REQUEST_TABLE: "privacera_scan_request_{{DEPLOYMENT_ENV_NAME}}"

    ACTIVE_SCANS_TABLE: "privacera_active_scans_{{DEPLOYMENT_ENV_NAME}}"

    MLRESOURCE_TABLE: "privacera_mlresource_v2_{{DEPLOYMENT_ENV_NAME}}"

    LINEAGE_TABLE:"privacera_lineage_{{DEPLOYMENT_ENV_NAME}}"

    AUDIT_SUMMARY_TABLE: "privacera_audit_summary_{{DEPLOYMENT_ENV_NAME}}"

    STATE_TABLE: "privacera_state_{{DEPLOYMENT_ENV_NAME}}"

    SCAN_STATUS_TABLE: "privacera_scan_status_{{DEPLOYMENT_ENV_NAME}}"

    DISCOVERY_DEPLOYMENT_SUFFIX_ID

    Use this property to remove the {{DEPLOYMENT_ENV_NAME}} variable as suffix from the topic/table names.

    Note

    This is a custom property, and has to be added separately to the YAML file.

    DISCOVERY_DEPLOYMENT_SUFFIX_ID: ""

    DISCOVERY_BUCKET_SQS_NAME

    You can customize the SQS bucket name.

    Uncomment the table, and enter a name, along with the {{DISCOVERY_DEPLOYMENT_SUFFIX_ID}} as the suffix.

    DISCOVERY_BUCKET_SQS_NAME: "privacera_bucket_sqs_{{DISCOVERY_DEPLOYMENT_SUFFIX_ID}}"

    Configure SSL for Privacera

     

    If required, you can enable/disable SSL for the following Privacera services. Just add the SSL property of the service you want to configure to the vars.ssl.yml file, and set it to true/false.

    Note

    Support Chain SSL - Preview Functionality

    Previously Privacera services were only using one SSL certificate of LDAP server even if a chain of certificates was available. Now as a Preview functionality, all the certificates which are available in the chain certificate are imported it into the truststore. This is added for Privacera usersync, Ranger usersync and portal SSL certificates.

    Properties to enable SSL

    Service

    Property

    Solr

    Note

    If you are transitioning an existing, working non-SSL Privacera environment where all the Privacera services are running to SSL or vice-versa, then the entire update process would take around 15-30 minutes more to complete due to the additional Solr transition process.

    SOLR_SSL_ENABLE:"true"

    AuditServer

    AUDITSERVER_SSL_ENABLE:"true"

    Portal

    PORTAL_SSL_ENABLE:"true"

    Grafana

    GRAFANA_SSL_ENABLE:"true"

    Ranger

    RANGER_SSL_ENABLE:"true"

    DataServer service
    Enable dataServer proxy SSL
    Self-signed
    • DATASERVER_PROXY_SSL:"true"

    Signed
    1. Copy the following keys to the location ~/privacera/privacera-manager/config/ssl:

      • Signed PEM Full Chain

      • Signed PEM Private Key

    2. Add the following properties.

      DATASERVER_SSL_SELF_SIGNED:"false"
      DATASERVER_HOST_NAME:"<PLEASE_CHANGE>"
      DATASERVER_SSL_SIGNED_PEM_FULL_CHAIN:"<PLEASE_CHANGE>"
      DATASERVER_SSL_SIGNED_PEM_PRIVATE_KEY:"<PLEASE_CHANGE>"
      DATASERVER_SSL_SIGNED_CERT_FORMAT:"<PLEASE_CHANGE>"

      (Optional) Along with the properties above, if your CA certificate is generated with a private key, then copy the Signed Root CA Public Key to the location ~/privacera/privacera-manager/config/ssl and add the following:

      DATASERVER_SSL_SIGNED_ROOT_CA_PUBLIC_KEY:"<PLEASE_CHANGE>"
    Disable DataServer proxy SSL
    1. Set DATASERVER_PROXY_SSL:"false"

    2. When switching between Dataserver SSL to non-SSL or self-signed to signed, or vice-versa, then remove previously generated DataServer SSL configuration before you run the Privacera Manager update.

      This can be done by running:

      rm -rf ~/privacera/privacera-manager/config/ssl/dataserver*

    Configure Real-time scan across projects in GCP

    Real-time scan across projects in GCP

    You can enable real-time scan for applications in different projects in GCP. An application in GCP can be Google Cloud Storage (GCS) or Google BigQuery (GBQ).

    By default, only one application of GCS is created at the time of installation. If you have multiple projects containing resources in GCP and want to scan them in real-time, then do the following:

    Prerequisites

    Ensure the following prerequisites are met:

    • Get the project IDs of each project:

      • Project where the instance is configured

      • Cross project(s) containing the resources to be scanned

    • Give permissions to the project instance to access the cross project resources (GCS buckets, GBQ datasets).

      1. Get the service account name of the project where the instance is configured.

      2. Navigate to the cross project > IAM & Admin > IAM > Click Add.

      3. Enter the service account name, and add the following roles:

        • Editor

        • Private Logs Viewer

    Configuration

    1. Add the following property to the vars.discovery.gcp.yml YML file, and assign the projects IDs.

      PKAFKA_CROSS_PROJECT_IDS=project_id_2,project_id_3
      
    2. Run the following commands.

      cd ~/privacera/privacera-manager
      ./privacera-manager.sh update
      
    3. After installing/updating Privacera Manager, add the GCP projects in Privacera Portal.

      1. In Privacera Portal, add new GCS and GBQ with the project ID.

        1. On the Privacera home page, expand the Settings menu and click on Data Source Registration from left menu.

        2. On the Data Source Registration page, click +Add System.

          image147.jpg

          The Add System pop-up displays.

          image148.jpg
        3. Enter System Name in the Name field. (Mandatory) Example: Azure

        4. Enter the description in the Description field. (Optional)

        5. Click Save.

        The Application page displays with newly added system.

        Now, let’s add the application in system, use the following steps:

        1. Click on the Setting icon of the system and then click +Add Application.

          image149.jpg
        2. Select the Application. Example: Google Cloud Storage

        3. Enter the Application Name, Application Code, and Project ID. (Mandatory)

        4. Click Save.

      2. After adding the application, you will be instructed to manually create a topic in the GCP Console as shown in the image below.

        discovery_create_topic_manual.jpg

        In the image, the topic name is privacera_scan_worker_gcs_11_nj. Use this name to create a topic on the instance where Privacera is installed. For more information on creating a topic in GCP, click here.

    Upload custom SSL certificates

    1. Copy your certificates to the instance where Privacera Manager is installed.

      Get the file path of your certificate and enter it in the following code:

      cd ~/privacera/privacera-manager
      mkdir -p config/ssl/custom_certificates
      cp ${file_path_of_your_certificate} config/ssl/custom_certificates/
    2. (Optional) Do this step, if your SSL certificate is of the type, jks/p12.

      You will create a file where you will add the passwords for the SSL certificate. The passwords will be used while importing certificates to global-truststore.jks

      1. Create a password file as {storeType}_{storeFileName}.pwd, where {storeType} is type of certificate, and {storeFileName} is filename of the certificate you want to upload.

        For example, if jks is your SSL truststore type and and trustore filename is certificate1.jks, create a file name as jks_certificate1.pwd.

      2. Open the file and add the password.

      3. Place the password file in config/ssl/custom_certificates.

        cp jks_certificate1.pwd config/ssl/custom_certificates/

    Deployment size

    Pod

    Small

    Medium

    Large

    Memory

    CPU

    Disk

    Replication Factor

    Memory

    CPU

    Disk

    Replication Factor

    Memory

    CPU

    Disk

    Replication Factor

    Portal

    2GB

    0.5

    NA

    min=1 max=1

    4G

    2

    NA

    8G

    4

    NA

    Maria DB

    1GB

    0.5

    12

    4GB

    2

    12

    8GB

    4

    12

    Data Server

    2GB

    1

    NA

    min=1 max=1

    8GB

    2

    NA

    min=2 max=4

    8GB

    2

    NA

    min=3 max=20

    Discovery - Driver

    2GB

    1

    32

    8GB

    4

    32

    16GB

    8

    32

    Discovery - Executor

    2GB

    1

    NA

    2GB

    2

    NA

    4GB

    4

    NA

    PolicySync

    2GB

    2

    32

    8GB

    4

    32

    32GB

    8

    32

    Solr

    1.5GB

    1

    64

    1

    8GB

    4

    64

    3

    32GB

    8

    64

    3

    Zookeeper

    1GB

    0.5

    32

    1

    2GB

    1

    32

    3

    4GB

    2

    32

    3

    Atlas

    Ranger KMS

    1GB

    0.5

    12 NA

    2GB

    2

    12 NA

    4GB

    4

    12 NA

    Ranger UserSync

    1GB

    0.5

    12 NA

    4GB

    2

    12 NA

    8GB

    4

    12 NA

    Grafana

    1GB

    0.5

    1

    4GB

    2

    1

    8GB

    4

    1

    Graphite

    1GB

    0.5

    32

    4GB

    2

    32

    8GB

    4

    32

    Kafka

    1GB

    0.5

    32

    4GB

    2

    32

    8GB

    4

    32

    PEG

    1GB

    0.5

    NA

    min=1 max=2

    4GB

    2

    NA

    min=2 max=10

    8GB

    4

    NA

    min=3 max=20

    pkafka

    1GB

    0.5

    NA

    4GB

    2

    NA

    8GB

    4

    NA

    Ranger Admin

    2GB

    1

    NA

    8GB

    4

    NA

    min=2 max=4

    16GB

    8

    NA

    min=2 max=4

    Flowable

    1GB

    0.5

    NA

    4GB

    2

    NA

    8GB

    4

    NA

    Audit Server

    1GB

    1

    32

    4GB

    2

    32

    16GB

    8

    32

    FluentD

    1GB

    1

    32

    4GB

    2

    32

    16GB

    8

    32

    289

    289

    289

    Service-level system properties

    Service-level system properties

    The following table provides the list of Privacera services whose system properties can be appended over the existing system properties. To learn how to use these properties, click here.

    In the Config Files column of the table, the file name of the given service is where the existing system properties have been defined, whereas, in the Custom-Properties/Files column, the file name of the service has to be used to create a new custom file where all the additional system properties will be configured.

    Service

    Config Files

    Custom-Properties/Files

    Access Request Manager

    privacera-custom.properties

    access-request-manager-custom.properties

    Auditserver

    audit.properties

    audit-custom.properties

    run.sh

    audit-run-custom.sh

    Crypto

    crypto.properties

    crypto-custom.properties

    Databricks

    databrickscfg

    databrickscfg-custom

    custom_env.sh

    databricks-env-custom.sh

    ranger_enable.sh

    databricks_ranger_enable_custom.sh

    ranger_enable_scala.sh

    databricks_ranger_enable_scala_custom.sh

    Databricks Spark Plugin

    privacera_spark_custom.properties

    DATASERVER

    dataserver-env-custom.sh

    dataserver-env-custom.sh

    privacera_dataserver.properties

    dataserver-custom.properties

    DISCOVERY

    privacera_discovery_custom.properties

    discovery-custom.properties

    DISCOVERY CONSUMER

    privacera_discovery_custom.properties

    discovery-consumer-custom.properties

    Docker

    env

    docker-env-custom

    PEG

    peg.application-custom.properties

    peg-custom.properties

    peg-env-custom.sh

    peg-env-custom.sh

    peg.crypto.properties

    peg-crypto-custom.properites

    ranger-peg-audit.xml

    ranger-peg-audit-custom.xml

    ranger-peg-security.xml

    ranger-peg-security-custom.xml

    ranger-policymgr-ssl.xml

    ranger-policymgr-ssl-custom.xml

    PKAFKA

    pkafka_config.properties

    pkafka-custom.properties

    penv.sh

    pkafka-penv-custom.sh

    grok-patterns

    pkafka-grok-patterns-custom

    POLICYSYNC

    rangersync.properties

    rangersync-custom.properties

    policy-sync-env-custom.sh

    policysync-env-custom.sh

    PORTAL

    application-custom.properties

    portal-custom.properties

    privacera-env-custom.sh

    portal-env-custom.sh

    run.sh

    portal-run-custom.sh

    RANGER-ADMIN

    install.properties

    ranger-admin-custom.properties

    ranger-admin-env-custom.sh

    ranger-admin-env-custom.sh

    ranger-privacera-site.xml

    ranger-privacera-site-custom.xml

    RANGER-USERSYNC

    install.properties

    ranger-usersync-custom.properties

    RANGER-KMS

    install.properties

    ranger-kms-custom.properties

    ranger-kms-env-custom.sh

    ranger-kms-env-custom.sh

    ranger-kms-audit.xml

    ranger-kms-audit-custom.xml

    ranger-kms-security.xml

    ranger-kms-security-custom.xml

    ranger-policymgr-ssl.xml

    ranger-kms-policymgr-ssl.xml

    Privacera UserSync

    usersync.properties

    usersync-custom.properties

    PrestoSQL standalone installation

    Note

    PrestoSQL will be discontinued in future releases of Privacera. Use Privacera Trino instead. For more information, see Trino Open Source.

    Ranger PrestoSQL Plug-In

    To install Apache Ranger PrestoSQL plug-in, use the following steps:

    Download Presto plug-in package
    1. Set the Privacera Image Tag version.

      export PRIVACERA_IMAGE_TAG=${PRIVACERA_IMAGE_TAG}
    2. Download the PrestoSQL plug-in package.

      mkdir -p ~/privacera/downloads
      cd ~/privacera/downloads
      wget https://privacera.s3.amazonaws.com/ranger/${PRIVACERA_IMAGE_TAG}/ranger-2.1.0-SNAPSHOT-presto-plugin.tar.gz -O ranger-2.1.0-SNAPSHOT-presto-plugin.tar.gz
      ls -lrth
    3. Copy ranger-2.1.0-SNAPSHOT-presto-plugin.tar.gz file to the machine where presto-server is running.

    Setup the environment
    1. SSH to the machine where presto-server is running.

    2. Go to directory where ranger-2.1.0-SNAPSHOT-presto-plugin.tar.gz have been copied.

    3. Extract the plug-in tar.gz

      tar xvf ranger-2.1.0-SNAPSHOT-presto-plugin.tar.gz
    4. Create a Symlink.

      ln -s ranger-2.1.0-SNAPSHOT-presto-plugin ranger-presto-plugin
    Configuration
    • Edit the install.properties.

      cd ranger-presto-plugin/
      vi install.properties
    • Update the properties as per the table below:

      Property

      Default

      Description

      POLICY_MGR_URL

      NONE

      This is a Ranger Admin URL. E.g. http://10.100.10.10:6080

      REPOSITORY_NAME

      privacera_presto

      This indicates Presto Ranger policy.

      COMPONENT_INSTALL_DIR_NAME

      /usr/lib/presto

      This indicates Presto server installed directory.

      XAAUDIT.SOLR.ENABLE

      false

      Enable/Disable solr audit. Set as ‘true’ to enable.

      XAAUDIT.SOLR.URL

      NONE

      Solr audit URL or audit server URL.

      E.g. http://10.100.10.10:8983/solr/ranger_audits

      XAAUDIT.SOLR.BASIC.AUTH.ENABLED

      false

      Set to ‘true’ if solr/auditserver authentication is enabled

      XAAUDIT.SOLR.USER

      NONE

      -

      XAAUDIT.SOLR.PASSWORD

      NONE

      -

      RANGER_POLICY_AUTHZ_SHOW_CATALOG_DISABLED

      false

      Set as true to disable authorization for show catalog query.

      HIVE_POLICY_AUTHZ_ENABLED

      false

      Enable/Disable Hive policy authorization for Hive catalog

      Set as ‘true’ to use Hive policies to authorize hive catalog queries.

      HIVE_POLICY_REPO_CATALOG_MAPPING

      privacera_hive:hive

      This indicates Hive policy repository and Hive catalog mapping.

      Format: < hive_policy_repo-1 >:< comma_seperated_hive_catalogs >;

      < hive_policy_repo-2 >:< comma_seperated_hive_catalogs >

      E.g. privacera_hive:hivecatalog1, hivecatalog2; privacer_hive_1:hive3, hive4, hive5

      FILE_LOCATION_AUTHZ_ENABLED

      true

      This indicates file permission authorization using Privacera S3, ADLS, files policies for the external location in create schema and table.

      REPOSITORY_NAME_S3

      privacera_s3

      This indicates policy to be used to authorize S3 location.

      REPOSITORY_NAME_ADLS

      privacera_adls

      This indicates policy to be used to authorize ADLS location.

      REPOSITORY_NAME_FILES

      privacera_files

      This indicates policy to be used to authorize locations other than S3 and ADLS.

    • If Apache Ranger is SSL enabled, then set the following properties:

      SSL_KEYSTORE_FILE_PATH=${Path-to-ranger-plugin-keystore-jks}
      SSL_KEYSTORE_PASSWORD=${Plugin-keystore-password}
      SSL_TRUSTSTORE_FILE_PATH=${Path-to-ranger-plugin-truststore-jks}
      SSL_TRUSTSTORE_PASSWORD=${Plugin-truststore-password}
      CREDENTIAL_PROVIDER_FILE=${Path-to-ranger-jceks}
    Installation
    • Enable the presto-plugin by running the enable-presto-plugin.sh command using root user.

      cd ranger-presto-plugin/
      ./enable-presto-plugin.sh
      
    • Now, restart Presto server.