Using S3 Table in Apache Spark OLAC¶

If you have enabled S3 Table support (see Enable S3 Table), the required 3 Table configurations are automatically automatically applied to spark-defaults.conf. You can start Spark without any additional configuration.

Navigate to ${SPARK_HOME}/bin folder and export the JWT token
Bash
1 2
cd <SPARK_HOME>/bin export JWT_TOKEN="<JWT_TOKEN>"

Start spark-session (choose one of spark-shell, pyspark, or spark-sql)

To pass the JWT token directly as a command-line argument, use the following configuration when connecting to the cluster:
Bash
1 2
./<spark-shell | pyspark | spark-sql> \ --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"

To use the file path containing the JWT token, use the following configuration:

Bash
./<spark-shell | pyspark | spark-sql> \
--conf "spark.hadoop.privacera.jwt.token=<path-to-jwt-token-file>" 

If you want to override the warehouse path, add the following configuration:

Bash
./<spark-shell | pyspark | spark-sql> \
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" \
--conf "spark.sql.catalog.s3tables.warehouse=arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>"

Use S3 Table tables

pysparkspark-shellspark-sql

Python
# List databases
spark.sql("SHOW NAMESPACES IN s3tables").show()

# Query existing table
df = spark.read.table("s3tables.s3table_db.s3table_table")
df.show()

# Create table
spark.sql("""
CREATE TABLE s3tables.s3table_db.s3table_table (
    id INT,
    product STRING,
    amount DOUBLE,
    sale_date DATE
) """)

# Read table
spark.read.table("s3tables.s3table_db.s3table_table").show()

Scala
// List databases
spark.sql("SHOW NAMESPACES IN s3tables").show()

// Query existing table
spark.table("s3tables.s3table_db.s3table_table").show()

// Create table
spark.sql("""
CREATE TABLE s3tables.s3table_db.s3table_table (
    id INT,
    product STRING,
    amount DOUBLE,
    sale_date DATE
) """)

// Read table
spark.table("s3tables.s3table_db.s3table_table").show()

SQL
-- List databases
SHOW NAMESPACES IN s3tables;

-- List tables
SHOW TABLES IN s3tables.s3table_db;

-- Query existing table
SELECT * FROM s3tables.s3table_db.s3table_table;

-- Create table
CREATE TABLE s3tables.s3table_db.s3table_table (
    id INT,
    product STRING,
    amount DOUBLE,
    sale_date DATE
);

-- Query table
SELECT * FROM s3tables.s3table_db.s3table_table;

Prev topic: Connector Guide