Skip to content

Enable Real-time Scanning on Azure ADLS

Prerequisites#

Ensure the following prerequisites are met. To configure them, see Account.

  • Select Enable Real-Time Scanning button.

  • Configure Event Hub for scanning.

  • Create Consumer Group for Pkafka.

  • Configure Checkpoint Storage for Pkafka.

Create a Storage Account and Event Subscription for Scanning#

  1. Log in to Azure Portal.

  2. Use an existing storage account or create a new one. Refer to Microsoft documentation on how to create a storage account.

    Use this storage account name in Storage Account Name when providing Application Properties details for the datasource.

  3. Get Storage Account Key:

    1. Navigate to the storage account.

    2. Under Security + networking, click Access keys.

    3. Click Show Keys for keys to be populated.

    4. Use appropriate key value in Storage Account Key when providing Application Properties details for the datasource.

  4. Use an existing container or create a new one. Refer to Microsoft documentation on how to create a container.

  5. Get URL Prefix:

    1. Navigate to the container and click Properties.

      Container property details are populated on the right.

    2. Use the URL prefix in the Application Properties details for the datasource.

  6. Create a event subscription. Refer to Microsoft documentation on how to Create an Event Grid subscription.

    1. Navigate to the storage account.

    2. On the left menu, select Events and click + Event Subscription.

      Create Event Subscription page is displayed.

    3. On the Create Event Subscription page within the Basic tab, provide the following values:

      1. Enter the Event Name and Event Schema.
      2. Topics Details are auto populated.
      3. Choose Event Type as Blob Created and Blob Deleted.
      4. Choose Endpoint type as Event Hubs.
      5. Select an Endpoint from Select Event Hub dialog.
        1. From the Event Hub Namespace dropdown, choose the Event Hub Namespace you created.
        2. From the Event Hub dropdown choose the Event Hub you created.
        3. Click Select Confirmation.
      6. Click Create.

    Note

    It is recommended to disable soft delete on blob storage account as ORC and Parquet file scanning is not supported when soft delete is enabled.

Connect ADLS Gen2 Application for Data Discovery#

  1. Go the Setting > Applications.

  2. In the Applications screen, select ADLS Gen2.

  3. Enter the application Name and Description, and then click Save.

  4. Click the toggle button to enable the Data Discovery for ADLS Gen2.

  5. In the BASIC tab, enter the values in the following fields:

    • JDBC URL
    • JDBC Username 
    • JDBC Password 
  6. In the ADVANCED tab, you can add custom properties.

  7. Using the IMPORT PROPERTIES button, you can browse and import application properties.

  8. Click the TEST CONNECTION button to check if the connection is successful, and then click Save.

  9. To add a resource to be scanned in real-time, navigate to Discovery > Data Source. See Discovery.

  10. To see the scan results, navigate to Data Inventory > Classifications.


Last update: February 22, 2022