Skip to content

Enabling for Realtime Discovery

Discovery supports Realtime discovery to monitor and scan data in real-time. For enabling Realtime discovery, there are a few prerequisites and configurations that you need to set up.

Prerequisites

Prerequisite Description
Setting up PKafka Service This service listens to the messaging queue for audit events. The configuration for each cloud differs slightly and is outlined in the Setup section.

Even though the service name is called PKafka, it supports multiple messaging services like AWS SQS, Azure Event Hub, and GCP Pub/Sub

Each cloud provider requires additional prerequisites and configurations. Follow the steps based on the cloud provider.

To configure PKafka with AWS, you need to set up an Amazon SQS queue and an IAM role. These steps are covered below in the section for installing the base Privacera Discovery service. Refer to the Prerequisites -> AWS section.

Prerequisite Description
AWS SQS Queue Name of the AWS SQS to fetch the change events for AWS S3 and DynamoDB
AWS IAM Role ARN of the AWS IAM Role which has permissions to the SQS Queue. E.g. privacera-discovery-role-privacera-prod

For configuring PKafka with Azure, you need to set up Event HUB. These steps are covered in the section for installing the base Privacera Discovery service. Refer to the Prerequisites -> Azure section.

Prerequisite Description
Create an Event Hub namespace, Event Hub and Consumer Group Used for real-time scanning, capturing change events for resources and used for parallel processing of events. Connection String from Event Hub used to connect resource with the Azure Event Hub
Create an Event Subscription Defines how events are routed from a source to a target.

For configuring PKafka with GCP, you need to set up Google Looging Sink. This steps are covered in the section for installing the base Privacera Discovery service. Refer to the Prerequisites -> GCP section.

Prerequisite Description
Create Google Logging Sink Create Google Logging Sink to receive the logs from the GCP resources.
Create PubSub topic Create a pubsub topic to receive the logs from the Google Logging Sink.

Setup

Copy the vars.pkafka.aws.yml from config/sample-vars to config/custom-vars and edit the file.

Bash
1
2
3
cd ~/privacera/privacera-manager
cp -n config/sample-vars/vars.pkafka.aws.yml config/custom-vars/
vi config/custom-vars/vars.pkafka.aws.yml

Replace the following placeholders

PKAFKA_SQS_ENDPOINT: Amazon SQS Queue name URL. It would have this format, where DEPLOYMENT_ENV_NAME is the name of the deployment environment .e.g privacera-prod: https://sqs.<AWS_REGION>.amazonaws.com/<ACCOUNT_ID>/privacera_bucket_sqs_<DEPLOYMENT_ENV_NAME>

PKAFKA_IAM_ROLE_ARN: ARN of the IAM role created for Privacera Discovery Service. E.g. arn:aws:iam::<ACCOUNT_ID>:role/privacera-discovery-role-privacera-prod

Add or edit the following variables:

Bash
1
2
3
4
PKAFKA_SQS_ENDPOINT: "<PLEASE_CHANGE>"

PKAFKA_USE_POD_IAM_ROLE: "true"
PKAFKA_IAM_ROLE_ARN: "<PLEASE_CHANGE>"

Follow these steps to configure the AWS SQS queue for real-time scanning

Copy the vars.pkafka.azure.yml from config/sample-vars to config/custom-vars and edit the file.

Bash
1
2
3
cd ~/privacera/privacera-manager
cp -n config/sample-vars/vars.pkafka.azure.yml config/custom-vars/
vi config/custom-vars/vars.pkafka.azure.yml

Add or edit the following variables:

Bash
1
2
3
4
PKAFKA_EVENT_HUB: "<PLEASE_CHANGE>"
PKAFKA_EVENT_HUB_NAMESPACE: "<PLEASE_CHANGE>"
PKAFKA_EVENT_HUB_CONSUMER_GROUP: "<PLEASE_CHANGE>"
PKAFKA_EVENT_HUB_CONNECTION_STRING: "<PLEASE_CHANGE>"

Replace the following placeholders

These values you can refer from Prerequisites -> Azure section.

PKAFKA_EVENT_HUB: Name of the event hub created for realtime scanning to receive change events (such as object creation, deletion, or modification) from ADLS via Event Grid.(Example: discovery-eventhub)

PKAFKA_EVENT_HUB_NAMESPACE: Event hub namespace created for realtime scanning.(Example: discovery-eventhub-namespace)

PKAFKA_EVENT_HUB_CONSUMER_GROUP: You can provide $Default as a value, if planning to use default consumer group created by event hub else you can provide newly created consumer group unique name.

PKAFKA_EVENT_HUB_CONNECTION_STRING: Primary connection string value to access event hub RootManageSharedAccessKey.

Copy the vars.pkafka.gcp.yml from config/sample-vars to config/custom-vars and edit the file.

Bash
1
2
3
cd ~/privacera/privacera-manager
cp -n config/sample-vars/vars.pkafka.gcp.yml config/custom-vars/
vi config/custom-vars/vars.pkafka.gcp.yml

Add or edit the following variables:

Bash
PKAFKA_GCP_SINK_DESTINATION_PUBSUB_SUBSCRIPTION_NAME: "<PLEASE_CHANGE>"

Enable Realtime Discovery

  1. Log in to Privacera:
    • For Self-Managed, log in to the Privacera Portal.
    • For Data Plane, log in to the Privacera Discovery Admin Console.
  2. Navigate to Settings > Data Source Registration.
  3. Edit the application for which you want to enable Realtime discovery.

    Note

    Realtime discovery is supported only for AWS S3, Google BigQuery, Google Cloud Storage, and Azure Data Lake Storage.

  4. Select Application Properties.

  5. Turn on Toggle Enable Real-Time.
  6. Click Save.

Restart Privacera Services

Bash
1
2
3
cd ~/privacera/privacera-manager
./privacera-manager.sh setup
./pm_with_helm.sh upgrade 
  • Prev Advanced Configuration
  • Next Troubleshooting

Comments