AWS EMR Serverless - Access - Spark OLAC with Iceberg
If you are using Apache Iceberg with AWS EMR Serverless, you need to configure the Docker image with the required Iceberg JARs. For Hadoop Catalog, there are no additional Privacera configurations required. However, for Glue Catalog, you need to pass additional property.
You can configure Iceberg with either Hadoop or Glue Catalog by updating the existing Application configuration by adding properties under spark-defaults
.
For the application, in the spark-defaults section, add the following properties.
This is just for your reference. You can modify the properties as per your requirement.
Application configuration for OLAC:
Add the following to the spark-defaults
classification section in the Application configuration.
JSON |
---|
| "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.hadoop_catalog": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.hadoop_catalog.type": "hadoop",
"spark.sql.catalog.hadoop_catalog.warehouse": "s3://amzn-s3-demo-bucket/example-prefix/",
"spark.jars": "/usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar"
|
Application configuration for OLAC_FGAC:
Add the following to the spark-defaults
classification section in the Application configuration.
JSON |
---|
| "spark.sql.extensions": "com.privacera.spark.agent.SparkSQLExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.hadoop_catalog": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.hadoop_catalog.type": "hadoop",
"spark.sql.catalog.hadoop_catalog.warehouse": "s3://amzn-s3-demo-bucket/example-prefix/",
"spark.jars": "/usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar"
|
For the application, in the spark-defaults section, add the following properties. Update the properties for the warehouse location. Also for Privacera, you need to update the property spark.sql.catalog.glue_catalog.s3.client-factory-impl
Application configuration for OLAC:
Add the following to the spark-defaults
classification section in the Application configuration.
JSON |
---|
| "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.glue_catalog": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.glue_catalog.warehouse": "s3://amzn-s3-demo-bucket/example-prefix/",
"spark.sql.catalog.glue_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
"spark.sql.catalog.glue_catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
"spark.sql.catalog.glue_catalog.s3.client-factory-impl" : "com.privacera.iceberg.aws.s3.PrivaceraAwsClientFactory"
|
Application configuration for OLAC_FGAC:
Add the following to the spark-defaults
classification section in the Application configuration.
JSON |
---|
| "spark.sql.extensions": "com.privacera.spark.agent.SparkSQLExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.glue_catalog": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.glue_catalog.warehouse": "s3://amzn-s3-demo-bucket/example-prefix/",
"spark.sql.catalog.glue_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
"spark.sql.catalog.glue_catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
"spark.sql.catalog.glue_catalog.s3.client-factory-impl" : "com.privacera.iceberg.aws.s3.PrivaceraAwsClientFactory"
|