Delta Lake
- SSH to emr master node
Bash |
| ssh hadoop@<emr-master-node>
- Run the following command
- Connect to spark tool
If you are using OLAC, connect to pyspark
as below
Bash |
| pyspark \
--conf "" \
--conf "" \
If you are using FGAC or OLAC_FGAC, update the spark.sql.extensions
as below:
Bash |
| pyspark \
--conf "spark.sql.extensions=com.privacera.spark.agent.SparkSQLExtension," \
--conf "" \
Include the below additional configuration if you have enabled JWT authorization in the cluster.
To pass the JWT token directly as a command-line argument, use the following configuration:
Bash |
| --conf "spark.hadoop.privacera.jwt.token.str=<your-jwt-token>"
To use the file path containing the JWT token, use the following configuration:
Bash |
| --conf "spark.hadoop.privacera.jwt.token=<path-to-jwt-token-file>"
Run spark read/write
Bash |
| df ="delta").option("header", "true").option("inferSchema", "true").load("s3a://${S3_BUCKET}/${DELTA_FILE}")
df.write.format("delta").option("header", "true").mode("overwrite").save("s3a://${S3_BUCKET}/${DELTA_FILE}")
If you are using OLAC, connect to spark-shell
as below
Bash |
| spark-shell \
--conf "" \
--conf "" \
If you are using FGAC or OLAC_FGAC, update the spark.sql.extensions
as below:
Bash |
| spark-shell \
--conf "spark.sql.extensions=com.privacera.spark.agent.SparkSQLExtension," \
--conf "" \
Include the below additional configuration if you have enabled JWT authorization in the cluster.
To pass the JWT token directly as a command-line argument, use the following configuration:
Bash |
| --conf "spark.hadoop.privacera.jwt.token.str=<your-jwt-token>"
To use the file path containing the JWT token, use the following configuration:
Bash |
| --conf "spark.hadoop.privacera.jwt.token=<path-to-jwt-token-file>"
Run spark read/write
Bash |
| val df ="delta").option("header", "true").option("inferSchema", "true").load("s3a://${S3_BUCKET}/${DELTA_FILE}")
df.write.format("delta").option("header", "true").mode("overwrite").save("s3a://${S3_BUCKET}/${DELTA_FILE}")
If you are using OLAC, connect to spark-sql
as below
Bash |
| spark-sql \
--conf "" \
--conf "" \
If you are using FGAC or OLAC_FGAC, update the spark.sql.extensions
as below:
Bash |
| spark-sql \
--conf "spark.sql.extensions=com.privacera.spark.agent.SparkSQLExtension," \
--conf "" \
Include the below additional configuration if you have enabled JWT authorization in the cluster.
To pass the JWT token directly as a command-line argument, use the following configuration:
Bash |
| --conf "spark.hadoop.privacera.jwt.token.str=<your-jwt-token>"
To use the file path containing the JWT token, use the following configuration:
Bash |
| --conf "spark.hadoop.privacera.jwt.token=<path-to-jwt-token-file>"
Run spark sql query
Bash |
| CREATE TABLE IF NOT EXISTS <db_name>.<table_name> (
id INT,
name STRING,
age INT,
city STRING)
SELECT * FROM <db_name>.<table_name>;