Skip to content

AWS EMR Serverless - Access - With External Hive Metastore

If you are using an External Hive Metastore with AWS EMR Serverless and you want to run jobs which needs access to it, then you need to configure the Docker image with the required JDBC driver and connection properties. This section provides the additional steps for running jobs which needs access to the External Hive Metastore.

Tip

For using External Hive Metastore, you have to make sure that the Docker image is already configured with the required JDBC driver and connection properties. Refer to Configuring External Hive Metastore with AWS EMR Serverless for more details.

Warning

Privacera doesn't provide access control to the External Hive Metastore. You have to ensure that the users using this integration need the necessary permissions to read and write the External Hive Metastore. Also, you need to ensure that the connection properties are secure and only accessible to the users who need it.

  1. Update the existing Application configuration by adding the following properties under spark-defaults:

    Add the following to the spark-defaults classification section in the Application configuration.

    JSON
    1
    2
    3
    4
    "spark.hadoop.javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
    "spark.hadoop.javax.jdo.option.ConnectionURL": "jdbc:mysql://<host>:3306/<database_name>",
    "spark.hadoop.javax.jdo.option.ConnectionUserName": "<user_name>",
    "spark.hadoop.javax.jdo.option.ConnectionPassword": "<password>"
    
    To set up EMR Serverless with External Hive Metastore for RDS with SSL enabled, update the spark.hadoop.javax.jdo.option.ConnectionURL property as shown below:
    JSON
    "spark.hadoop.javax.jdo.option.ConnectionURL": "jdbc:mysql://<host>:3306/<database_name>?createDatabaseIfNotExist=true&useSSL=true&serverSslCert=/home/hadoop/global-bundle.pem",
    

  2. Disable the Use AWS Glue Data Catalog as metastore checkbox under Additional configurations.

Comments