SDK API Error Rate High Alert¶

Root Cause¶

This alert is triggered when the percentage of SDK API calls that return 4xx or 5xx error responses exceeds the allowed threshold (1%) within a 5-minute window.

Common reasons include:

1.Communication Failures: The connector cannot successfully communicate with the backend service, resulting in a high number of failed API responses.

2.Configuration and Authentication Issues: Incorrect configuration, invalid credentials, or authentication failures result in repeated 4xx/5xx errors.

3.Network and Performance Problems: Network instability or performance degradation in downstream systems causes API calls to fail or time out.

4.Service Outages: Temporary service outages occur on the cloud provider or backend platform (e.g., AWS Lake Formation, GCP BigQuery, Databricks, Snowflake).

5.Load and Capacity Issues: Increased load or request spikes cause backend API endpoints to fail or reject calls.

Troubleshooting Steps¶

The connector automatically continues making SDK API calls based on the configured sync intervals and retry logic.

Step 1: Monitor the Connector Dashboards

Check the Connector-Common or service-specific dashboards:

SDK API Error Rate Panel
- Confirm whether the error rate is increasing or stable
- Ensure the error percentage has not exceeded the 1% threshold for a sustained period
API Error Response Count (4xx/5xx) Panel
- Validate the volume of failing API calls over the last few minutes

Escalation Checklist¶

If the issue cannot be resolved through the troubleshooting steps, escalate to Privacera support with the following details. For additional assistance, refer How to Contact Support for detailed guidance on reaching out to the support team.

Timestamp of the error: Include the exact time the alert was triggered.
Grafana dashboard and alert screenshots:
1. Grafana → Dashboards → Application-Dashboards → Connector-Common → SDK API Error Rate
2. Grafana → Alerting → Alert rules → SDK API Error Rate High Alert

Connector Service Logs: Include any logs showing 4xx/5xx API errors from the connector pods.

Option 1: Download Log from Diagnostic Portal (Recommended)

Open the Diagnostic Portal and navigate to Dashboard → Pods.
Select the connector pod from the available pods list.
Click on the Logs tab and download logs by clicking on DOWNLOAD LOGS button.

Option 2: Manual Log Collection (If Diagnostic Service is Not Enabled)

Bash
# Create log archive
kubectl exec -it <CONNECTOR_POD> -n <NAMESPACE> -- bash -c "cd /workdir/policysync/logs/ && tar -czf connector-logs.tar.gz *.log"

# Copy the fixed-name archive
kubectl cp <CONNECTOR_POD>:/workdir/policysync/logs/connector-logs.tar.gz ./connector-logs.tar.gz -n <NAMESPACE>

# Extract logs
tar -xzf connector-logs.tar.gz

Configuration files: A copy of applicable configuration files (e.g., authentication settings, endpoint configuration).
Description: Clear description of the issue and the troubleshooting steps already performed.
Alert details: Alert name, alert message, alert trigger timestamp, and severity level from the alert.

Prev topic: Troubleshooting for Connector