Skip to content

Using S3 Table in Apache Spark OLAC

If you have enabled S3 Table support (see Enable S3 Table), the required 3 Table configurations are automatically automatically applied to spark-defaults.conf. You can start Spark without any additional configuration.

  1. Navigate to ${SPARK_HOME}/bin folder and export the JWT token

    Bash
    cd <SPARK_HOME>/bin
    export JWT_TOKEN="<JWT_TOKEN>"
    

  2. Start spark-session (choose one of spark-shell, pyspark, or spark-sql)

    • To pass the JWT token directly as a command-line argument, use the following configuration when connecting to the cluster:

      Bash
      ./<spark-shell | pyspark | spark-sql> \
      --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"
      

    • To use the file path containing the JWT token, use the following configuration:

      Bash
      ./<spark-shell | pyspark | spark-sql> \
      --conf "spark.hadoop.privacera.jwt.token=<path-to-jwt-token-file>" 
      

    • If you want to override the warehouse path, add the following configuration:

      Bash
      1
      2
      3
      ./<spark-shell | pyspark | spark-sql> \
      --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" \
      --conf "spark.sql.catalog.s3tables.warehouse=arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>"
      

  3. Use S3 Table tables

    Python
    # List databases
    spark.sql("SHOW NAMESPACES IN s3tables").show()
    
    # Query existing table
    df = spark.read.table("s3tables.s3table_db.s3table_table")
    df.show()
    
    # Create table
    spark.sql("""
    CREATE TABLE s3tables.s3table_db.s3table_table (
        id INT,
        product STRING,
        amount DOUBLE,
        sale_date DATE
    ) """)
    
    # Read table
    spark.read.table("s3tables.s3table_db.s3table_table").show()
    
    Scala
    // List databases
    spark.sql("SHOW NAMESPACES IN s3tables").show()
    
    // Query existing table
    spark.table("s3tables.s3table_db.s3table_table").show()
    
    // Create table
    spark.sql("""
    CREATE TABLE s3tables.s3table_db.s3table_table (
        id INT,
        product STRING,
        amount DOUBLE,
        sale_date DATE
    ) """)
    
    // Read table
    spark.table("s3tables.s3table_db.s3table_table").show()
    
    SQL
    -- List databases
    SHOW NAMESPACES IN s3tables;
    
    -- List tables
    SHOW TABLES IN s3tables.s3table_db;
    
    -- Query existing table
    SELECT * FROM s3tables.s3table_db.s3table_table;
    
    -- Create table
    CREATE TABLE s3tables.s3table_db.s3table_table (
        id INT,
        product STRING,
        amount DOUBLE,
        sale_date DATE
    );
    
    -- Query table
    SELECT * FROM s3tables.s3table_db.s3table_table;