Skip to main content

Privacera Platform

Table of Contents

TagSync using Apache Ranger

:

Privacera Discovery allows you to classify your data using tags. Tags can be used in access policies to manage access to sensitive data.

Apache Ranger requires the tagged information while applying a policy. This topic describes how you can propagate the tag details from Discovery to Apache Ranger.

Enable TagSync

You need to enable TagSync in the Privacera Portal by configuring the following properties in the Application Properties UI. See General Process for more information.

ranger.writer.enable=true
send.inherited.table.tags.to.ranger=true

Properties to add based on service type

Apart from above properties, you need to add the additional properties based on service type in Application Properties UI. These properties will help to verify TagSync in Apache Ranger using the Ranger utility script.

For example:

service_name=privacera_s3
cluster_name=privacera

The value of service_name depends on the application that you want to apply TagSync to. The following is a list of services and values for each application:

S3

service_name=privacera_s3
cluster_name=privacera

Redshift

service_name=privacera_redshift
cluster_name=privacera

PostgreSQL

service_name=privacera_postgres
cluster_name=privacera

Snowflake

service_name=privacera_snowflake
cluster_name=privacera

DynamoDB

service_name=privacera_dynamodb
cluster_name=privacera

MSSQL/Synapse

service_name=privacera_mssql
cluster_name=privacera

MySql/MariaDB/AuroraDB/Databricks Spark SQL

service_name=privacera_hive
cluster_name=privacera
TagSync validation scenarios

TagSync can be validated in the following scenarios:

Note

Allowed and rejected tags will not be synced to Apache Ranger.

Auto scanning

On the Classifications page, files are classified with system classified tags. After classification, all system-classified and manually accepted tags are synced to Apache Ranger.

Parent-Child Level TagSync in Apache Ranger:

Based on database applications or file systems, the following is the criteria to sync parent and child tags:

Database applications

Example 1. Scenario

If the resource is a database, then the database gets classified as:

  • Database, tag1, tag2, etc.

In Ranger, child entries are created as below:

  • (Database): tag1, tag2, etc.



Example 2. Scenario

If the resource is a table, the classification is as shown as below:

  • (Database, table), tag1, tag2, etc. then in Ranger child level entry can be seen as below:

In Ranger, child level entry can be seen as below:

  • (Database, table): tag1, tag2, etc.



Example 3. Scenario

If the resource is a column, on the UI the classification is as shown below:

  • (Database, table, column), tag1, tag2, etc.

In Ranger, only column level tags will be synced:

  • (Database, table, column), tag1, tag2. etc.



File System

  • For a folder or file, all the tag levels are allowed.

  • For a field, only the same tag level is allowed.

Meta tagging

Meta tags are applied at the table or file level. They are also synced to Apache Ranger at the table or file level. Only system classified and manually classified tags are synced to Apache Ranger.

Post-processing tags

System classified and manually classified tags that are applied using post processing rules are synced to Apache Ranger.

Re-evaluate

In the case of re-evaluation, system classified and manually classified datazone tags are synced to Apache Ranger. Resources that are deleted through datazone policies will be removed from Apache Ranger as well.

Add or edit tags

You can add or edit tags manually on the original classified resources from following pages:

  • Classifications: From the navigation menu, select Data Inventory > Classifications.

  • Resource Detail: From the navigation menu, select Data Inventory > Classifications. Select a resource and click Resource Detail.

  • Data Explorer: From the navigation menu, select Data Inventory > Data Explorer.

  • Data Zone Dashboard: From the navigation menu, select Compliance Workflow > Data Zone Dashboard.

When a user adds tags manually from the pages listed above, the tag status is set by default to “Accepted : Manually classified” and it will be synced to Apache Ranger.

Add a resource

You can manually add tags to unclassified resources. When you add such resources and add a tag to them, the tag status is set by default to “Accepted : Manually classified” and it will be synced to Apache Ranger.

To add resource, select Data Inventory > Classifications from the navigation menu and click Add Resource.

Tag status changes

Tag status changes will affect TagSync. Only system classified and manually accepted tags will be synced to Apache Ranger. The following are few scenarios for tag status changes:

  • If the status of a tag is changed from system classified to rejected or allowed, then the tag will be removed from Apache Ranger.

  • If the status of the tag is changed from manually accepted to allowed or rejected, then the tag will be removed from Apache Ranger.

  • If the tag status resets to system classified from rejected or allowed, then the tag be synced Apache Ranger.

  • If the tag status is changed to manually classified from rejected or allowed, then the tag will be synced to Apache Ranger.

  • If the tag status is changed from system classified to manually classified, then the synced tags in Apache Ranger will remain unchanged.

Remove tags

You can manually remove added tags if you have rejected them. If you remove a tag from a resource using the Add/Edit option, then the tag will be removed from Apache Ranger as soon as you reject it.

Remove resources

If a resource is added manually and has only manually classified tags, then after your reject the last tag the resource will be removed from Apache Ranger.

If a resource has system classified tags and you reject the last tag, the resource will be removed from Apache Ranger as last TagSync for the same resource will get removed.

Rescan of same file
  • If you rescan a resource that is already synced with Apache Ranger and no changes were made to rules or datazone policies, then TagSync will remain unchanged.

  • If post-processing rules are disabled, then rescanning a file will remove post-processing tags.

  • If a datazone tag is disabled or a resource removed from a datazone, then the datazone tag will be removed from Apache Ranger upon rescan.

  • If a meta tag rule or a meta tag is disabled, then the meta tag will be removed from Apache Ranger upon rescan.

  • If a status change is applied before a rescan of a file, as per status change TagSync will also affect.

Validate TagSync in Apache Ranger

You can view tags that are getting pushed to Apache Ranger using curl commands as well as using the Ranger tag utility script.

Validate TagSync using curl command

curl -i -L -k -u admin:${PRIVACERA_PASSWORD} -H "Content-type: application/json" -X GET 
https://${PRIVACERA_HOST}:6182/service/tags/resources/service/privacera_postgres

The above curl command will give the list of resources that are synced to Apache Ranger, but the response of this curl command is not in a readable format. Therefore , it is recommended to use the Ranger tag utility to check TagSync.

Validate TagSync using the Ranger Tag Utility

The following is a Python script created to communicate with all Ranger API methods. This will return the response in a readable format:

  • Run the following command to download required files:

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/ranger_tag_utility.py -O ranger_tag_utility.py
    
  • Download the file on your local system and execute the following command to view the TagSync response.

    SSL instance

    python3 ranger_tag_utility.py     --operation list_tags     --host ${PRIVACERA_HOST}    --port 6182     --username 
    ${RANGER_USERNAME}     --password ${RANGER_PASSWORD}     --servicename privacera_redshift    --ssl True     --verifyssl False
    

    Non-SSL instance

    python3 ranger_tag_utility.py     --operation list_tags     --host ${PRIVACERA_HOST}     --port 6080     --username 
    ${RANGER_USERNAME}     --password ${RANGER_PASSWORD}     --servicename privacera_maprfs     --ssl True     --verifyssl False
    
  • (Optional) Change the service name as per the application.

    Output

    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test.avro'] => tags :: ['SSN', 'PERSON_NAME', 'AU_BAN', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test.snappy.avro'] => tags :: ['US_ADDRESS', 'SSN', 'US_PHONE_NUMBER', 'AU_BAN', 'PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test1.avro'] => tags :: ['SSN', 'US_PHONE_NUMBER', 'PERSON_NAME', 'US_ADDRESS', 'AU_BAN', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/twitter.avro'] => tags :: ['PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/twitter.snappy.avro'] => tags :: ['PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
    

Adding Tags with Ranger REST API

Prerequisite: Make sure the repo is created on Ranger for tags and Hive has the same tag service selected.

To add a tag using Rest API in Ranger, use the following steps:

  1. Create privacera_tags in the Ranger Tag Based Policy.

  2. Associate the privacera_tags to Hive service.

    vi atlas_tag_test.json
    
  3. Edit the JSON file shown below based on your specific table/tag information.

        {
          "op": "replace",
          "serviceName": "dublin_hive",
          "tagVersion": 0,
          "tagDefinitions": {
            "0": {
              "name": "TEST_TAG",
              "source": "Atlas",
              "attributeDefs": [],
              "id": 0,
              "isEnabled": true
            }
          },
          "tags": {
            "0": {
              "type": "TEST_TAG",
              "owner": 0,
              "attributes": {},
              "id": 0,
              "isEnabled": true
            }
          },
          "serviceResources": [
            {
              "serviceName": "dublin_hive",
              "resourceElements": {
                "database": {
                  "values": [
                    "db_name"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                },
                "column": {
                  "values": [
                    "column_name"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                },
                "table": {
                  "values": [
                    "table_name"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                }
              },
              "id": 0,
              "isEnabled": true
            }
          ],
          "resourceToTagIds": {
            "0": [
              0
            ]
          }
        }
                

Update the following variables

  • serviceName

  • tagDefinitions[‘0’].name

  • tags[‘0’].type

  • serviceResources[0].serviceName

  • serviceResources[0].resourceElements[‘database’].values[0]

  • serviceResources[0].resourceElements[‘column’].values[0]

  • serviceResources[0].resourceElements[‘table’].values[0]

    curl -i -L -k -u admin:${RANGER_ADMIN_PASSWORD} \
    -H "Content-type: application/json" \
    -d @atlas_tag_test.json \
    -X PUT http://<RANGER_HOST>:6080/service/tags/importservicetags
    
  • Wait for a couple of minutes and run the following:

    select * from <database_name>.<table_name>
Hive
  1. Create privacera_tags in the Ranger Tag Based Policy.

  2. Associate the privacera_tags to Hive service.

  3. Create a JSON file where you can add tags.

    vi hive_tag.json
    
  4. Edit the JSON file shown below based on your specific table/tag information.

        {
          "op": "add_or_update",
          "serviceName": "${Hive_Service_Name}",
          "tagVersion": 0,
          "tagDefinitions": {
            "0": {
              "name": "${Tag_Name}",
              "source": "Atlas",
              "attributeDefs": [],
              "id": 0,
              "isEnabled": true
            }
          },
          "tags": {
            "0": {
              "type": "${Tag_Type}",
              "owner": 0,
              "attributes": {},
              "id": 0,
              "isEnabled": true
            }
          },
          "serviceResources": [
            {
              "serviceName": "${Hive_Service_Name}",
              "resourceElements": {
                "database": {
                  "values": [
                    "${Database}"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                },
                "table": {
                  "values": [
                    "${Table}"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                },
                "column": {
                  "values": [
                    "${Column}"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                }
              },
              "id": 0,
              "isEnabled": true
            }
          ],
          "resourceToTagIds": {
            "0": [
              0
            ]
          }
        }
    

    Sample hive_tag.json

         {
          "op": "add_or_update",
          "serviceName": "privacera_hive",
          "tagVersion": 0,
          "tagDefinitions": {
            "0": {
              "name": "SSN",
              "source": "Atlas",
              "attributeDefs": [],
              "id": 0,
              "isEnabled": true
            }
          },
          "tags": {
            "0": {
              "type": "SSN",
              "owner": 0,
              "attributes": {},
              "id": 0,
              "isEnabled": true
            }
          },
          "serviceResources": [
            {
              "serviceName": "privacera_hive",
              "resourceElements": {
                "database": {
                  "values": [
                    "finance"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                },
                "table": {
                  "values": [
                    "ssn_finance_us"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                },
                "column": {
                  "values": [
                    "SocialSecurity"
                  ],
                  "isExcludes": false,
                  "isRecursive": false
                }
              },
              "id": 0,
              "isEnabled": true
            }
          ],
          "resourceToTagIds": {
            "0": [
              0
            ]
          }
        }
    
  5. Push the tag to Ranger.

Add Tag

curl -i -L -k -u admin:<RANGER_ADMIN_PASSWORD> -H "Content-type: application/json" -d @hive_tag.json -X PUT http://<RANGER_HOST>:6080/service/tags/importservicetags

Get Tagged Resource

curl -i -L -k -u admin:<RANGER_ADMIN_PASSWORD> -H "Content-type: application/json" -X GET http://<RANGER_HOST>:6080/service/tags/resources
S3
  1. Create privacera_tags in the Ranger Tag Based Policy

  2. Associate the privacera_tags to S3 Service.

  3. Create a JSON file where you can add tags.

                      vi s3_tag.json
    
                   
                      {"op":"add_or_update","serviceName":"${S3_Service_Name}","tagVersion":0,"tagDefinitions":{"0":{"name":"${Tag_Name}","source":"Atlas","attributeDefs":[],"id":0,"isEnabled":true}},"tags":{"0":{"type":"${Tag_Type}","owner":0,"attributes":{},"id":0,"isEnabled":true}},"serviceResources":[{"serviceName":"${S3_Service_Name}","resourceElements":{"bucketname":{"values":["${Bucket_Name}"],"isExcludes":false,"isRecursive":false},"objectpath":{"values":["${Resource_Path_Name}"],"isExcludes":false,"isRecursive":false}},"id":0,"isEnabled":true}],"resourceToTagIds":{"0":[0]}}
                   

    Sample JSON:

                      {"op":"add_or_update","serviceName":"privacera_s3","tagVersion":0,"tagDefinitions":{"0":{"name":"SSN","source":"Atlas","attributeDefs":[],"id":0,"isEnabled":true}},"tags":{"0":{"type":"SSN","owner":0,"attributes":{},"id":0,"isEnabled":true}},"serviceResources":[{"serviceName":"privacera_s3","resourceElements":{"bucketname":{"values":["pscanzone"],"isExcludes":false,"isRecursive":false},"objectpath":{"values":["finance/finance_us.csv"],"isExcludes":false,"isRecursive":false}},"id":0,"isEnabled":true}],"resourceToTagIds":{"0":[0]}}
                   
  4. Push the tag to Ranger.

                      curl -i -L -k -u admin:welcome1 -H "Content-type: application/json" -d @s3_tag.json -X PUT http://${RANGER_HOST}.privacera.com:6080/service/tags/importservicetags
    
                   

    Response:

                      HTTP/1.1 204 No Content
    Set-Cookie: RANGERADMINSESSIONID=517FD2032481415D188C6925FA96E7E3; Path=/; HttpOnly
    X-Frame-Options: DENY
    X-XSS-Protection: 1; mode=block
    Strict-Transport-Security: max-age=31536000; includeSubDomains
    Content-Security-Policy: default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; connect-src 'self'; img-src 'self'; style-src 'self' 'unsafe-inline';font-src 'self'
    Cache-Control: no-cache, no-store, max-age=0, must-revalidate
    Pragma: no-cache
    Expires: 0
    X-Content-Type-Options: nosniff
    Content-Type: application/json
    Date: Sun, 08 Mar 2020 18:55:44 GMT
    Server: Apache Ranger
    
                   

    To get the tagged resources list.

                      curl -i -L -k -u admin:welcome1 -H "Content-type: application/json" -X GET http://${RANGER_HOST}.privacera.com:6080/service/tags/resources
    
                   

    Response:

                      [{"id":5,"guid":"6b9234f1-69d9-40b0-9865-fe5bec45b469","isEnabled":true,"createdBy":"Admin","updatedBy":"Admin","createTime":1581570409000,"updateTime":1581570409000,"version":2,"serviceName":"privacera_hive","resourceElements":{"database":{"values":["sales"],"isExcludes":false,"isRecursive":false},"column":{"values":["name"],"isExcludes":false,"isRecursive":false},"table":{"values":["sales_data"],"isExcludes":false,"isRecursive":false}},"resourceSignature":"82a4eb3e2148ee77686538a653dc6d8e027e9b3443b5b09494af6a38db815a64"},{"id":7,"guid":"76ef1384-8432-4ed5-9778-c305bfb6d4c0","isEnabled":true,"createdBy":"Admin","updatedBy":"Admin","createTime":1583715849000,"updateTime":1583715849000,"version":2,"serviceName":"privacera_s3","resourceElements":{"bucketname":{"values":["pscanzone"],"isExcludes":false,"isRecursive":false},"objectpath":{"values":["finance/finance_us.csv"],"isExcludes":false,"isRecursive":false}},"resourceSignature":"02d7ffe3fc9065ed63c935faec14268cc6f3823aa68b2b81a030e5c93cb60843"}]
                   

Test the Tag-Based Policies for S3 with the sample given above:

  1. Create user <kate> in EC2 and add permissions read, metaread, write, metawrite to the S3 bucket ${Bucket_Name} in privacera_s3 service.

  2. Create a deny tag-based policy for user <kate> - tag = SSN, Component = S3, permissions = read, write.

  3. Now try to access the ${Bucket_Name} with user <kate>.

  4. Denied audit is seen with ${SSN} tag in the audits.

REST API endpoints for working tags

Add Tag

curl -i -L -k -u admin:welcome1 \
-H "Content-type: application/json" \
-d @atlas_tag_test.json \
-X PUT http://<RANGER_HOST>:6080/service/tags/importservicetags

Get Tagged Resource

curl -i -L -k -u admin:welcome1 \
-H "Content-type: application/json" \
-X GET http://<RANGER_HOST>:6080/service/tags/resources

Delete Tagged Resource

curl -i -L -k -u admin:welcome1 \
-H "Content-type: application/json" \
-X GET http://<RANGER_HOST>:6080/service/tags/resources

Get ALL Tags

curl -i -L -k -u admin:welcome1 \
-H "Content-type: application/json" \
-X GET http://<RANGER_HOST>:6080/service/tags/tags

Get Tag by ID

>curl -i -L -k -u admin:welcome1 \
-H "Content-type: application/json" \
-X GET http://<RANGER_HOST>:6080/service/tags/tag/<id>
 

List All Tagged Resources

curl -i -L -k -u admin:welcome1 
-H "Content-type: application/json" 
-X GET http://<RANGER_HOST>:6080/service/tags/resources

List Tag-Resource Mapping

curl -i -L -k -u admin:welcome1 
-H "Content-type: application/json" 
-X GET http://<RANGER_HOST>:6080/service/tags/tagresourcemaps

Get Tagged Resources By ResourceID

curl -i -L -k -u admin:welcome1 
-H "Content-type: application/json" 
-X GET http://<RANGER_HOST>:6080/service/tags/resource/<resourceId>