Overview for priority based Offline Scan in Privacera Discovery¶

What is Priority based Offline Scan?

Priority based Offline Scan is a feature in Privacera Discovery that allows you to prioritize the scan of specific resources. This feature is useful when you have a large number of resources and you want to prioritize the scan of specific resources.
How does Priority based Offline Scan work?
- Resource Priority: Set the priority of the resources to be scanned.
- Resource Scan: Scan the resources based on the priority.
- Resource Management: Manage the scan of resources efficiently.

Workflow for Priority based Offline Scan¶

User Interaction:
- User logs into the portal and selects an application.
- Adds a resource and sets the scan priority (Normal or FastTrack).
- Initiates the scan.
Scan Processing:
- Normal Priority → Resources allocated to the normal priority scan pools (listing_pool and offline_scan_pool) will be used for tasks with normal priority, ensuring efficient resource utilization.
- FastTrack Priority → Resources allocated to FastTrack priority scan pools (fast_track_listing_pool and fast_track_offline_scan_pool) will be used for tasks with FastTrack priority, ensuring efficient resource utilization.
Cluster Resource Management:
- If large jobs are running in normal pools, they will continue processing.
- When a FastTrack job arrives, it takes priority over normal jobs.
- Once completed, resources are redistributed back to normal pools.
- If another FastTrack job arrives, it again takes priority, repeating the cycle.
- If multiple FastTrack job arrives, they are processed sequentially.

Understanding Pools in Spark Resource Allocation¶

When running scans, resources are allocated to different pools to ensure efficient job processing. Think of these pools as priority queues that determine how quickly a job receives resources in a shared environment. The higher a pool’s weight, the higher its scheduling priority. Valid weight range: 1 (lowest) to 1000 (highest). Increasing the number of available executors further improves the likelihood that higher-priority jobs start sooner and complete faster.

Types of pools & their meaning:

Pool Name	Purpose	Weight	MinShare (CPU cores)
listing_pool	Normal priority jobs for listing tasks	1	1
offline_scan_pool	Normal priority jobs for offline scan tasks	1	1
fast_track_listing_pool	High-priority listing jobs	1000	2
fast_track_offline_scan_pool	High-priority offline scan jobs	1000	2

Note

For customizing Spark pools, refer to the Customizing Spark Pools documentation.

For more context on Spark pools , refer to the Spark Pool Documentation.

---
title: Priority based Offline Scan Workflow
---
flowchart TD

  subgraph "User Interaction"
    U1["User logs into Portal & selects application"]
    U2["Adds resource & sets Scan Priority: Normal or FastTrack"]
    U3["Starts scan"]
  end

  subgraph "Scan Processing"
    N1["Normal Priority -> listing_pool & offline_scan_pool"]
    F1["FastTrack Priority -> fast_track_listing_pool & fast_track_offline_scan_pool"]
  end

  subgraph "Cluster Resource Management"
    A["listing_pool & offline_scan_pool"]
    B["Large active jobs running"]
    C["New FastTrack job arrives"]
    D["FastTrack pools get priority"]
    E["FastTrack jobs complete"]
    F["Resources redistributed back to Normal pools"]
    G["Another FastTrack job arrives"]
    H["FastTrack pools get priority again"]
  end

  U1 --> U2 --> U3
  U3 -->|Normal| N1
  U3 -->|FastTrack| F1
  N1 --> A
  F1 --> C
  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F --> G
  G --> H
  H --> D

For detailed information and configuration steps, refer to the Configuration Guide to Enable Priority-Based Offline Scan in Privacera Discovery documentation.

Prev topic: Credit Card Validator