Skip to content

Prerequisites for Discovery on AWS

Note

The prerequisites for Privacera Discovery are the same for both Self-Managed and PrivaceraCloud Data Plane deployments.

Privacera Discovery module leverages AWS services such as S3, DynamoDB, and SQS for scanning data. To enable this, you need to create the necessary AWS resources and IAM roles for the Discovery and Portal pods to access these resources. Privacera Manager can create these resources for you or you can create them manually and provide their ARNs during the installation of the Discovery module.

Here are the prerequisites for setting up Privacera Discovery on AWS:

Prerequisites Description
S3 bucket and path The S3 bucket and path where the configurations and temporary files for Discovery are stored.
DynamoDB tables Used to store metadata and tags.
SQS Used when real-time scanning is enabled. The change events for the S3 objects are retrieved from the SQS queue.
IAM Role for Privacera Manager (optional) IAM Role to create AWS resources required by Privacera Discovery by Privacera Manager.
IAM Role for Discovery and Portal pods IAM Role for Service Account (IRSA) for the Discovery driver, executor, consumer, and portal pods to access AWS resources.
Assign IAM roles to EKS Service Accounts Assign the IAM roles to the EKS Service Accounts for the Discovery and Portal pods.

AWS S3 bucket and path

An AWS S3 bucket and path is required to store the configuration for Privacera Discovery. It is recommended to create a dedicated bucket for Privacera Discovery. You can use a sub-folder in the bucket to store the configurations and temporary files for Discovery. You need to make sure that the IAM roles for the Discovery and Portal pods have read/write access to this bucket and/or the sub-folder.

You will need to provide the bucket name and path to Privacera Manager during the installation configuration. This bucket can be created manually or let Privacera Manager create it for you.

Examples are (replace acme or acme-prod with your bucket name):

  1. Separate bucket for each environment:

    s3://acme-prod/privacera-discovery-config/privacera-prod

  2. Common bucket, but different folders for each environment:

    s3://acme/privacera-discovery-config/privacera-prod

AWS DynamoDB tables

AWS DynamoDB tables are required to store the metadata for Privacera Discovery. The recommended naming convention for these tables are privacera_*_DEPLOYMENT_ENV_NAME. You can create these tables manually or let Privacera Manager create them for you.

Discovery Team: Review the list of table names customers need to create manually (if they decided to)

Table Naming Convention

The table names recommended to be suffixed with the DEPLOYMENT_ENV_NAME (e.g. privacera-prod) to avoid conflicts with other deployments.

Assuming your DEPLOYMENT_ENV_NAME is privacera-prod, the table names could be suffixed with privacera-prod as shown above. If you are manually creating the tables, you can use the following naming convention and schema, but replace the DEPLOYMENT_ENV_NAME with your actual deployment environment name.

The table names and their corresponding hash key, and range key are as follows:

Table Name Hash Key Hash Key Type Range Key Range Key Type
privacera_scan_requests_privacera-prod scan_id S id S
privacera_resource_v2_privacera-prod appCode S id S
privacera_alert_privacera-prod id S id S
privacera_audit_summary_privacera-prod appCode S id S
privacera_active_scans_privacera-prod topicName S id S
privacera_state_privacera-prod id S

AWS SQS

An AWS SQS queue is used for real-time scanning. The change events for the S3 objects are retrieved from the SQS queue and processed by the Discovery pods. Privacera Manager can create these for you or you can create them manually. The recommended naming convention for the SQS queue is privacera_bucket_sqs_DEPLOYMENT_ENV_NAME.

Examples are (replace privacera-prod with your DEPLOYMENT_ENV_NAME):

privacera_bucket_sqs_privacera-prod

IAM Policies For Discovery

There are 2 sets of IAM policies required for Privacera Discovery.

  1. For Privacera Manager: Permissions to create the AWS resources required for Privacera Discovery by Privacera Manager. This is optional. You can have Privacera Manager create it during installation or you can create these resources manually and provide their ARNs during installation of the Discovery module. These IAM policies should be attached to the EC2 instance where Privacera Manager is running.
  2. For Discovery Services: IAM roles for the Discovery and Portal pods to access the AWS resources. This is mandatory for scanning AWS services such as S3 and DynamoDB. These roles need to be created manually and configured in the Privacera Manager during installation. You can limit the access to only the required resources that will be scanned by Discovery.

Step 1: IAM Role for Privacera Manager

You can skip this step if you do not want Privacera Manager to create these resources. However, you will need to create the resources manually and provide their ARNs to Privacera Manager during the configuration steps.

The following additional IAM policy must be attached to the Privacera Manager EC2 instance to enable the creation of AWS resources for Discovery, such as DynamoDB tables, S3 buckets, and SQS queues.

The below IAM policies provide the create and update policies for the following AWS Resources:

Summary of IAM policies for Creating AWS resources for Discovery by Privacera Manager
1
2
3
Resource":"arn:aws:dynamodb:<AWS_REGION>:<ACCOUNT_ID>:table/privacera*
Resource":"arn:aws:s3:::<DISCOVERY_BUCKET>"
Resource":"arn:aws:sqs:<AWS_REGION>:<ACCOUNT_ID>:privacera*"

After you created the IAM policies, you can attach them to the role used by your Privacera Manager EC2 instance. (e.g. privacera-manager-role-privacera-prod)

IAM policies for Creating AWS resources for Discovery by Privacera Manager

IAM Policies for Creating AWS resources for Discovery by Privacera Manager

Replace the following placeholders

AWS_REGION: The AWS region where the resources are created.

ACCOUNT_ID: The AWS account ID where the resources are created.

DISCOVERY_BUCKET: The S3 bucket name where the Privacera meta-data is stored.

The table name and SQS queue name are in the format [privacera_*_DEPLOYMENT_ENV_NAME]

discovery-privacera-manager-iam-policy.json
     {
    "Version":"2012-10-17",
    "Statement":[
        {
            "Sid":"CreateDynamodb",
            "Effect":"Allow",
            "Action":[
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:ListTables",
                "dynamodb:TagResource",
                "dynamodb:UntagResource",
                "dynamodb:UpdateTable",
                "dynamodb:UpdateTableReplicaAutoScaling",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:ListTagsOfResource",
                "dynamodb:DescribeContinuousBackups"
            ],
            "Resource":"arn:aws:dynamodb:<AWS_REGION>:<ACCOUNT_ID>:table/privacera*"
        },
        {
            "Sid":"CreateS3Bucket",
            "Effect":"Allow",
            "Action":[
                "s3:CreateBucket",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation"

            ],
            "Resource":[
                "arn:aws:s3:::<DISCOVERY_BUCKET>"
            ]
        },
        {
            "Sid":"CreateSQSMessages",
            "Effect":"Allow",
            "Action":[
                "sqs:CreateQueue",
                "sqs:ListQueues"
            ],
            "Resource":[
                "arn:aws:sqs:<AWS_REGION>:<ACCOUNT_ID>:privacera*"
            ]
        }
    ]
    }

Step 2: IAM Role for Discovery Services

Pod level IAM roles are supported since Privacera Platform version 9.0.0.1. Prior to that you had to give these IAM policies to the nodes of the Kubernetes cluster

Privacera Discovery runs on Apache Spark, and its pods require access to AWS resources to scan data. The IAM roles for the Discovery and Portal pods must be created manually and configured in Privacera Manager during installation.

The Discovery and Portal pods require the following IAM policies to access the AWS resources. The recommendation is to create these policies and attach them to the IAM roles for the Discovery and Portal pods.

Here are the recommended IAM Role names and the policies to be attached to them:

  1. Role Name: privacera-discovery-role-DEPLOYMENT_ENV_NAME (e.g. privcera-discovery-role-privacera-prod)
  2. Discovery Service Policies: privacera-discovery-service-policies-DEPLOYMENT_ENV_NAME (e.g. privacera-discovery-service-policies-privacera-prod)
  3. Discovery Scan Policies: privacera-discovery-scan-policies-DEPLOYMENT_ENV_NAME (e.g. privacera-discovery-scan-policies-privacera-prod)

The above role will be attached to the following pods (DEPLOYMENT_ENV_NAME will be replaced with your actual deployment environment name):

  1. Privacera Portal (In Self-Managed Deployment) or Privacera Discovery Admin Console (In PrivaceraCloud Data Plane Deployment)
  2. Discovery Driver and Executor pods
  3. Discover pKafka pod (If real-time scanning is enabled)
graph TD

    subgraph RolesAndPolicies
        A[privacera-discovery-role]
        B[privacera-discovery-service-policies]
        C[privacera-discovery-scan-policies]
    end

    subgraph Pods
        D[Portal or\nDiscovery Admin Console]
        E[Discovery Spark]
        F[pKafka]
    end


    B --> A
    C --> A

    A --> D
    A --> E
    A --> F

a. Discovery Service Policies

Summary of IAM policies for reading and writing to S3, DynamoDB and SQS
1
2
3
Resource":"arn:aws:dynamodb:<AWS_REGION>:<ACCOUNT_ID>:table/privacera*
Resource":"arn:aws:s3:::<DISCOVERY_BUCKET>"
Resource":"arn:aws:sqs:<AWS_REGION>:<ACCOUNT_ID>:privacera*"

After you created the IAM policies, you can attach them to the role used by your Discovery and Portal pods. (e.g. privacera-discovery-role-privacera-prod)

IAM Policy for Discovery Service

Replace the following placeholders

AWS_REGION: The AWS region where the resources are created.

ACCOUNT_ID: The AWS account ID where the resources are created.

DEPLOYMENT_ENV_NAME: The Privacera deployment environment name.

DISCOVERY_BUCKET: The S3 bucket name where the Privacera meta-data is stored.

discovery-service-policies-privacera-prod.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Dynamodb",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGet*",
                "dynamodb:DescribeStream",
                "dynamodb:DescribeTable",
                "dynamodb:Get*",
                "dynamodb:Query",
                "dynamodb:Scan",
                "dynamodb:BatchWrite*",
                "dynamodb:Update*",
                "dynamodb:Put*"
            ],
            "Resource": "arn:aws:dynamodb:<AWS_REGION>:<ACCOUNT_ID>:table/privacera_*_DEPLOYMENT_ENV_NAME"
        },
        {
            "Sid": "S3ObjectAllpermissions",
            "Effect": "Allow",
            "Action": [
                "s3:List*",
                "s3:Put*",
                "s3:Get*",
                "s3:Delete*"
            ],
            "Resource": [
                "arn:aws:s3:::<DISCOVERY_BUCKET>/PATH/*",
                "arn:aws:s3:::<DISCOVERY_BUCKET>"
            ]
        },
        {
            "Sid": "SQSObjectAllpermissions",
            "Effect": "Allow",
            "Action": [
                "sqs:ListQueues",
                "sqs:SendMessage",
                "sqs:ReceiveMessage",
                "sqs:DeleteMessage"
            ],
            "Resource": [
                "arn:aws:sqs:<AWS_REGION>:<ACCOUNT_ID>:privacera_*_DEPLOYMENT_ENV_NAME"
            ]
        }
    ]
}

b. Discovery Scan Policies

It is highly recommended to set this up during the installation phase, but if you won't be scanning data in S3, then you can skip this step.

Summary of read only IAM policies for scanning S3 buckets.
1
2
3
4
Resource":"arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME1>/*"
Resource":"arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME1>"
Resource":"arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME2>/*"
Resource":"arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME2>"

After you created the IAM policies, you can attach them to the role used by your Discovery and Portal pods. (e.g. privacera-discovery-role-privacera-prod)

IAM Policy for Discovery Scan

Replace the following placeholders

AWS_REGION: The AWS region where the resources are created.

ACCOUNT_ID: The AWS account ID where the resources are created.

DEPLOYMENT_ENV_NAME: The Privacera deployment environment name.

DISCOVERY_SCAN_BUCKET_NAME1: The S3 bucket name where the data to be scanned is stored.

DISCOVERY_SCAN_BUCKET_NAME2: The S3 bucket name where the data to be scanned is stored.

discovery-scan-policies-privacera-prod.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3ObjectScanOnlypermissions",
            "Effect": "Allow",
            "Action": [
                "s3:List*",
                "s3:Get*"
            ],
            "Resource": [
                "arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME1>/*",
                "arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME1>",
                "arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME2>/*",
                "arn:aws:s3:::<DISCOVERY_SCAN_BUCKET_NAME2>"
            ]
        }
    ]
}

c. Assign IAM roles to EKS Service Accounts

The IAM role (privacera-discovery-role-privacera-prod) created above should be assigned to the EKS service accounts for the Discovery and Portal pods.

Service Accounts for Discovery and Portal pods
1
2
3
4
discovery-consumer-privacera-sa
discovery-privacera-sa
portal-privacera-sa
pkafka-privacera-sa

You can follow the instructions here for creating the IAM role for service accounts.

Final Checklist

It is extremely important to ensure that all the prerequisites are met before proceeding with the installation of Privacera Discovery.

  • Create IAM policies and roles for Privacera Manager to create AWS resources required for Privacera Discovery (optional).
  • Create an S3 bucket and path for storing the configurations and temporary files for Privacera Discovery or let Privacera Manager create it for you.
  • Create DynamoDB tables to store metadata and tags or let Privacera Manager create them for you.
  • Create an SQS queue for real-time scanning or let Privacera Manager create it for you.
  • Create IAM policies and roles for the Discovery and Portal pods to access the AWS resources.
  • Create IAM policies for the Discovery and Portal pods to scan the S3 bucket (optional).
  • Assign the IAM roles to the EKS Service Accounts for the Discovery and Portal pods.

Comments