Signer High Latency Alert¶

Root Cause¶

This alert is triggered when the Signer API takes more than 2 seconds to respond, and this slow response continues for 5 minutes. The Signer API is a critical component for request authentication and authorization, and high latency directly impacts client requests and overall system performance. Common causes include:

High Request Load: Excessive incoming request volume overwhelming the DataServer.
Resource Pressure: CPU or memory pressure on DataServer pods causing slow processing.
Backend Services: Latency in backend services such as AWS, STS, or JWT Authentication server.
Network Latency: Network connectivity issues or high network latency between components.
Thread Blocking: Thread contention or blocking operations affecting request processing.

Troubleshooting Steps¶

Step 1: Review Grafana Dashboards

Navigate to Grafana → Dashboards → Application-Dashboards → dataserver → DataServer.
Review the Signer Request Latency panel:
- Check if the response time is consistently above 2 seconds.
- Identify any patterns or spikes in latency.
Review the Request Rate panel:
- Check if there is a spike in incoming requests that correlates with the high latency.

Step 2: Use Diagnostics Tool

This option is available only for self-managed deployments.

The Diagnostics Tool provides automated testing of DataServer functionality and helps identify configuration or connectivity issues.

Open the Diagnostic Portal and navigate to Dashboard → Pods.
Select the DataServer pod from the available pods list.
Under the CURRENT TEST RESULTS tab, review the PyTest Report for the following checks:
- test_heathcheck_api: Tests the health check endpoint to ensure the DataServer is responding to requests.
- test_certificate_api: Validates certificate retrieval functionality.
- To check resource utilization, review:
- test_diag_client_disk_space: Verifies that sufficient disk space is available for DataServer operations.
- test_diag_client_pod_cpu_utilization: Checks CPU usage to identify if the pod is under resource pressure.
- test_jvm_process_cpu_utilization: Monitors JVM CPU usage, which can indicate performance bottlenecks.

Step 3: Check DataServer Resource Utilization

Resource exhaustion is a common cause of high latency. Check if the DataServer has sufficient resources:

Review Pod Metrics:
- Navigate to Grafana → Dashboards → Infra-Dashboards → Pod Monitoring.
- Check pod memory and CPU usage for the DataServer pods.
- Look for high CPU utilization (>80%) or memory pressure.
Check JVM Heap Usage:
- High JVM heap usage or frequent garbage collection can cause latency spikes.
- If memory issues are detected, adjust resources using Compute Sizing.

Step 4: Review DataServer Logs

Download the DataServer logs and search for latency-related patterns. For detailed log collection steps, see the DataServer Service Logs section below.
Look for patterns such as:
- Slow operations: slow, timeout, latency
- Thread issues: thread, blocked, waiting
- Backend delays: sts, aws, jwt combined with slow or timeout

Step 5: Check Backend Service Connectivity

AWS STS Service:
- Verify connectivity to AWS STS endpoints.
- Check if there are any AWS service issues in the region.
Network Latency:
- Check network latency between DataServer pods and backend services.
- Review any network policies or firewalls that might be causing delays.

Escalation Checklist¶

If the issue cannot be resolved through the specific troubleshooting guides, escalate it to Privacera support with the following details. For additional assistance, refer How to Contact Support for detailed guidance on reaching out to the support team.

Timestamp of the error: Include the exact time the alert was triggered.
Grafana dashboard and alert screenshots:
1. Grafana → Dashboards → Application-Dashboards → dataserver → DataServer → Signer Request Latency
2. Grafana → Alerting → Alert rules → Signer High Latency Alert

DataServer Service Logs: Include logs showing latency issues, slow operations, or timeout errors.

Option 1: Download Log from Diagnostic Portal (Recommended)

This option is available only for self-managed deployments.

Open the Diagnostic Portal and navigate to Dashboard → Pods.
Select the DataServer pod from the available pods list.
Click on the Logs tab and download logs by clicking on DOWNLOAD LOGS button.

Option 2: Manual Log Collection (If Diagnostic service is not enabled)

Bash
# Create log archive
kubectl exec -it <POD> -n <NAMESPACE> -- bash -c "cd /workdir/privacera-dataserver/logs/dataserver/ && tar -czf dataserver-logs.tar.gz *.log"

# Copy the archive
kubectl cp <POD>:/workdir/privacera-dataserver/logs/dataserver/dataserver-logs.tar.gz ./dataserver-logs.tar.gz -n <NAMESPACE>

# Extract logs
tar -xzf dataserver-logs.tar.gz

Prev topic: Troubleshooting for Privacera DataServer