Skip to content

Configuring EMR to use External Hive Metastore

If you are using an External Hive Metastore (EHM) with AWS EMR, then you need to follow the below steps to configure Privacera to use the External Hive Metastore in EMR

This is for AWS EMR. For AWS EMR Serverless, refer to the EMR Serverless documentation.

  1. SSH to the instance where Privacera is installed.

  2. Run the following command to navigate to the /config directory.

    Bash
    cd ~/privacera/privacera-manager/config
    

  3. Run the following command to open the .yml file to be edited.

    Bash
    vi custom-vars/vars.emr.yml
    

  4. Modify the following properties:

    Variable Definition
    EMR_HIVE_METASTORE Set to 'hive' to enable External Hive Metastore
    EMR_HIVE_METASTORE_CONNECTION_URL Set the JDBC Connection URL (ex: jdbc:mysql://:3306/?createDatabaseIfNotExist=true)
    EMR_HIVE_METASTORE_CONNECTION_DRIVER Set JDBC Driver Name (ex: "org.mariadb.jdbc.Driver")
    EMR_HIVE_METASTORE_CONNECTION_USERNAME Set the JDBC username
    EMR_HIVE_METASTORE_CONNECTION_PASSWORD Set the JDBC password
  5. Once the properties are configured, run the following commands to update your Privacera Manager platform instance:

    Step 1 - Setup which generates the helm charts. This step usually takes few minutes.

    Bash
    cd ~/privacera/privacera-manager
    ./privacera-manager.sh setup
    
    Step 2 - Apply the Privacera Manager helm charts.
    Bash
    cd ~/privacera/privacera-manager
    ./pm_with_helm.sh upgrade
    
    Step 3 - Post-installation step which generates Plugin tar ball, updates Route 53 DNS and so on.

    Bash
    cd ~/privacera/privacera-manager
    ./privacera-manager.sh post-install
    
  6. After the post-install, create a new cluster with newly generated emr-template.json file from output directory.

Update hive-site configuration in emr template as below and create new emr cluster with this template.

privacera-emr-hive-site
JSON
{
  "Classification": "hive-site",
  "ConfigurationProperties": {
    "javax.jdo.option.ConnectionURL": "<jdbc-connection-url>",
    "javax.jdo.option.ConnectionDriverName": "<jdbc-driver>",
    "javax.jdo.option.ConnectionUserName": "<jdbc-username>",
    "javax.jdo.option.ConnectionPassword": "<jdbc-password>",
    "hive.server2.enable.doAs": "false",
    "parquet.column.index.access": "true",
    "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
    "hive.metastore.warehouse.dir": {
      "Ref": "<hive-metastore-s3path>"
    }
  }
}

Comments