Configuring External Hive Metastore (EHM) with AWS EMR Serverless¶
Setup¶
If you are using an External Hive Metastore (EHM) with AWS EMR Serverless and you want to run jobs which needs access to it, then you need to configure the Docker image with the required JDBC driver and connection properties.
After you have configured the Docker image with the required JDBC driver and connection properties, you can run jobs which needs access to the External Hive Metastore. Refer to this section for instructions on how to submit jobs with the database credentials.
To configure the External Hive Metastore (EHM) with EMR Serverless, follow the below steps:
-
Add the following command to the
Dockerfile_Privacera_Spark_OLAC
orDockerfile_Privacera_Spark_OLAC_FGAC
as per desired OLAC or OLAC_FGAC plugin to be installed right after copying the Privacera plugin files:- Locate this line in your Dockerfile:
Bash -
Add the following command to download the required file:
Note
The download URL provided for JDBC driver is only for reference. You can replace it with the URL of the JDBC driver that you want to use
Bash Note
To configure AWS RDS with SSL enabled as an external Hive Metastore in an EMR Serverless application, download the SSL certificate to
/home/hadoop/
to establish a secure connection to RDS instance. This certificate should be specified in the JDBC URL configured in the application underspark.hadoop.javax.jdo.option.ConnectionURL
. Add the following command to download the required file:Bash
- Locate this line in your Dockerfile:
-
Build the Docker image and push it to the ECR repository.
Run the job with the External Hive Metastore¶
Refer to this section for instructions on how to submit jobs.
- Prev topic: Advanced Configuration