Overview for priority based Offline Scan in Privacera Discovery¶
-
What is Priority based Offline Scan?
Priority based Offline Scan is a feature in Privacera Discovery that allows you to prioritize the scan of specific resources. This feature is useful when you have a large number of resources and you want to prioritize the scan of specific resources.
-
How does Priority based Offline Scan work?
- Resource Priority: Set the priority of the resources to be scanned.
- Resource Scan: Scan the resources based on the priority.
- Resource Management: Manage the scan of resources efficiently.
Workflow for Priority based Offline Scan¶
- User Interaction:
- User logs into the portal and selects an application.
- Adds a resource and sets the scan priority (Normal or FastTrack).
- Initiates the scan.
- Scan Processing:
- Normal Priority → Resources allocated to the normal priority scan pools (listing_pool and offline_scan_pool) will be used for tasks with normal priority, ensuring efficient resource utilization.
- FastTrack Priority → Resources allocated to FastTrack priority scan pools (fast_track_listing_pool and fast_track_offline_scan_pool) will be used for tasks with FastTrack priority, ensuring efficient resource utilization.
- Cluster Resource Management:
- If large jobs are running in normal pools, they will continue processing.
- When a FastTrack job arrives, it takes priority over normal jobs.
- Once completed, resources are redistributed back to normal pools.
- If another FastTrack job arrives, it again takes priority, repeating the cycle.
Understanding Pools in Spark Resource Allocation¶
When running scans, resources are allocated to different pools to ensure efficient job processing. Think of these pools as priority queues that determine how quickly a job receives resources in a shared environment.
Types of pools & their meaning:
Pool Name | Purpose | Weight | MinShare (CPU cores) |
---|---|---|---|
listing_pool | Normal priority jobs for listing tasks | 1 | 1 |
offline_scan_pool | Normal priority jobs for offline scan tasks | 1 | 1 |
fast_track_listing_pool | High-priority listing jobs | 1000 | 2 |
fast_track_offline_scan_pool | High-priority offline scan jobs | 1000 | 2 |
Note
For customizing Spark pools, refer to the Customizing Spark Pools documentation.
For more context on Spark pools , refer to the Spark Pool Documentation.
---
title: Priority based Offline Scan Workflow
---
flowchart TD
subgraph "User Interaction"
U1["User logs into Portal & selects application"]
U2["Adds resource & sets Scan Priority: Normal or FastTrack"]
U3["Starts scan"]
end
subgraph "Scan Processing"
N1["Normal Priority -> listing_pool & offline_scan_pool"]
F1["FastTrack Priority -> fast_track_listing_pool & fast_track_offline_scan_pool"]
end
subgraph "Cluster Resource Management"
A["listing_pool & offline_scan_pool"]
B["Large active jobs running"]
C["New FastTrack job arrives"]
D["FastTrack pools get priority"]
E["FastTrack jobs complete"]
F["Resources redistributed back to Normal pools"]
G["Another FastTrack job arrives"]
H["FastTrack pools get priority again"]
end
U1 --> U2 --> U3
U3 -->|Normal| N1
U3 -->|FastTrack| F1
N1 --> A
F1 --> C
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> D
For detailed information and configuration steps, refer to the Configuration Guide to Enable Priority-Based Offline Scan in Privacera Discovery documentation.