Skip to content

Configuring External Hive Metastore (EHM) with AWS EMR Serverless

Setup

If you are using an External Hive Metastore (EHM) with AWS EMR Serverless and you want to run jobs which needs access to it, then you need to configure the Docker image with the required JDBC driver and connection properties.

After you have configured the Docker image with the required JDBC driver and connection properties, you can run jobs which needs access to the External Hive Metastore. Refer to this section for instructions on how to submit jobs with the database credentials.

To configure the External Hive Metastore (EHM) with EMR Serverless, follow the below steps:

  1. Add the following command to the Dockerfile_Privacera_Spark_OLAC or Dockerfile_Privacera_Spark_OLAC_FGAC as per desired OLAC or OLAC_FGAC plugin to be installed right after copying the Privacera plugin files:

    1. Locate this line in your Dockerfile:
      Bash
      RUN cp -r /usr/lib/spark/jars/privacera-* /opt/privacera/plugin/privacera-spark-plugin/spark-plugin/
      
    2. Add the following command to download the required file:

      Note

      The download URL provided for JDBC driver is only for reference. You can replace it with the URL of the JDBC driver that you want to use

      Bash
      RUN wget https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.2/mariadb-java-client-2.7.2.jar -O /usr/lib/spark/jars/mariadb-connector-java.jar
      

      Note

      To configure AWS RDS with SSL enabled as an external Hive Metastore in an EMR Serverless application, download the SSL certificate to /home/hadoop/ to establish a secure connection to RDS instance. This certificate should be specified in the JDBC URL configured in the application under spark.hadoop.javax.jdo.option.ConnectionURL. Add the following command to download the required file:

      Bash
      RUN wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem  -O /home/hadoop/global-bundle.pem
      

  2. Build the Docker image and push it to the ECR repository.

Run the job with the External Hive Metastore

Refer to this section for instructions on how to submit jobs.

Comments