Skip to content

About Privacera DataServer

Privacera DataServer is tailored for integrations with cloud-based storage solutions like AWS S3, Azure ADLS, and GCP GCS. It facilitates secure data access through the generation of signed URLs or through reverse proxy.

Key Features:

  • Signed URL Generation: The DataServer generates signed URLs that provide temporary, secure access to the data, ensuring that only authorized users can retrieve or update data.
  • Optimized for Cloud Storage: This mechanism is optimized for cloud storage services, making it ideal for environments where data is stored in cloud-based systems.
  • Compute Integration: The signed URLs are provided to the compute environments, such as Apache Spark or other Java-based applications, enabling secure and efficient data processing.
  • Enhanced Security: Signed URLs provide a mechanism that ensures strict control and monitoring of data access, significantly reducing the risk of unauthorized access.
  • Reverse Proxy Support: In addition to signed URLs, the DataServer can act as a reverse proxy, providing added flexibility in data access. This is particularly useful in scenarios where signed URLs are not feasible, such as accessing AWS S3 using the AWS CLI or Python Boto library. This mode is also referred to as DataProxy.

Common Use Cases:

  • AWS S3: Managing access to object storage in Amazon Web Services.
  • Azure ADLS: Securing data in Azure Data Lake Storage.
  • GCP GCS: Enforcing policies for data stored in Google Cloud Storage.

Flow for Signed URL:

  • Policy Configuration: Access control policies are defined within the Privacera platform for specific cloud storage services.
  • Request Handling: When a compute environment (e.g., Apache Spark) needs to access data, it sends a request to the Privacera DataServer.
  • Signed URL Generation: The DataServer generates a signed URL that grants temporary, secure access to the requested data.
  • Data Access: The compute environment uses the signed URL to retrieve or update data from the cloud storage service.
  • Security and Expiration: The signed URL includes security parameters and an expiration time, ensuring controlled and temporary access.
  • Audit Logging: All access requests and actions are logged for auditing purposes, providing a comprehensive trail of data access activities.
sequenceDiagram
  participant User
  participant ComputeEnvironment
  participant PrivaceraDataServer
  participant CloudStorageService
  participant PrivaceraPlatform
  participant AuditLogs

  User->>ComputeEnvironment: Request to access data
  ComputeEnvironment->>PrivaceraDataServer: Request signed URL
  PrivaceraDataServer->>PrivaceraPlatform: Verify policies
  PrivaceraPlatform-->>PrivaceraDataServer: Policies verified
  Note right of PrivaceraDataServer: Within PrivaceraDataServer using Apache Ranger Plugin*
  PrivaceraDataServer->>CloudStorageService: Generate signed URL
  Note right of PrivaceraDataServer: Within PrivaceraDataServer - Generate signed URL*
  CloudStorageService-->>PrivaceraDataServer: Signed URL generated
  PrivaceraDataServer-->>ComputeEnvironment: Provide signed URL
  ComputeEnvironment->>CloudStorageService: Access data using signed URL
  CloudStorageService-->>ComputeEnvironment: Data retrieved/updated
  PrivaceraDataServer->>AuditLogs: Log access request and actions

This API call is made within the Privacera DataServer. No output bound calls are made.

Flow for DataProxy:

In DataProxy mode, access requests are routed through the Privacera DataServer, which functions as a reverse proxy to the cloud service. This approach is particularly useful when signed URLs are not viable, such as when accessing AWS S3 through the AWS CLI or Python.

Here is a high-level flow for DataProxy:


sequenceDiagram
    participant User
    participant AWS CLI/API
    participant Privacera DataServer
    participant AWS

    User->>AWS CLI/API: Configure with PTokens
    AWS CLI/API->>Privacera DataServer: Call AWS API<br> (Set endpoint to Privacera DataServer)
    Privacera DataServer->>Privacera DataServer: Validate PTokens
    Privacera DataServer->>Privacera DataServer: Authorize object and operation
    alt User has required permissions
        Privacera DataServer->>AWS: Forward request using Service User Key
    end
    AWS ->> Privacera DataServer: Response from AWS 
    Privacera DataServer->>User: Return AWS response

Performance Considerations

This approach is not recommended for high-performance use cases, as the DataProxy serves as an intermediary between the user and the cloud service, which can introduce additional latency. To mitigate potential performance impacts, it is advisable to deploy a sufficient number of Privacera DataServer instances to manage the anticipated load effectively.

Comments