Run Spark Session with MinIO

  1. To enable custom S3 endpoints for accessing specific MinIO buckets along with S3 buckets, you must update these properties for each bucket individually include following properties when starting spark session.

    Bash
    1
    2
    3
    4
    5
    6
    7
    8
    ./<spark-shell | pyspark | spark-sql> \
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" \
    --conf "spark.hadoop.fs.s3a.bucket.<bucket-1>.path.style.access=true" \
    --conf "spark.hadoop.fs.s3a.bucket.<bucket-1>.connection.ssl.enabled=true" \
    --conf "spark.hadoop.fs.s3a.bucket.<bucket-1>.endpoint=https://<MINIO_HOST>:<MINIO_PORT>" \
    --conf "spark.hadoop.fs.s3a.bucket.<bucket-2>.path.style.access=true" \
    --conf "spark.hadoop.fs.s3a.bucket.<bucket-2>.connection.ssl.enabled=true" \
    --conf "spark.hadoop.fs.s3a.bucket.<bucket-2>.endpoint=https://<MINIO_HOST>:<MINIO_PORT>"
    

  2. To enable global endpoint to access only minio buckets, include following properties when starting spark session.

    Bash
    1
    2
    3
    4
    5
    ./<spark-shell | pyspark | spark-sql> \
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" \
    --conf "spark.hadoop.fs.s3a.path.style.access=true" \
    --conf "spark.hadoop.fs.s3a.connection.ssl.enabled=true" \
    --conf "spark.hadoop.fs.s3a.endpoint=https://<MINIO_HOST>:<MINIO_PORT>"
    

Comments