Signer Error Rate Alert¶
Root Cause¶
A Signer Error Rate alert is triggered when more than 5% of Signer API calls fail within a 5-minute period. This typically indicates that the DataServer is encountering issues while processing signer requests. Common causes include:
- Server Errors: Internal DataServer errors, such as failures during token generation or STS token failures.
- Invalid Token Errors: JWT token validation failures, expired tokens, or malformed authentication tokens.
- Network Issues: Connectivity problems between the DataServer and AWS service.
- Resource Exhaustion: Insufficient memory and CPU saturation.
- Configuration Issues: Incorrect AWS IAM role, credentials, or insufficient IAM role permission to generate sts token.
Troubleshooting Steps¶
Step 1: Review Grafana Dashboards
- Navigate to Grafana → Dashboards → Application-Dashboards → dataserver → DataServer.
- Review the Overall Error Request % panel:
- This panel shows the percentage of Signer requests that failed with response statuses
SERVER_ERRORorINVALID_TOKEN.
- This panel shows the percentage of Signer requests that failed with response statuses
Step 2: Use Diagnostics Tool
This option is available only for self-managed deployments.
The Diagnostics Tool provides automated testing of DataServer functionality and helps identify configuration or connectivity issues.
- Open the Diagnostic Portal and navigate to Dashboard → Pods.
- Select the DataServer pod from the available pods list.
- Under the CURRENT TEST RESULTS tab, review the PyTest Report for the following checks:
- test_prop_validation: Verifies that DataServer configuration properties are correctly set and valid.
- test_heathcheck_api: Tests the health check endpoint to ensure the DataServer is responding to requests.
- test_certificate_api: Validates certificate retrieval functionality, which is critical for secure communication.
- To check resource utilization, review:
- test_diag_client_disk_space: Verifies that sufficient disk space is available for DataServer operations.
- test_diag_client_pod_cpu_utilization: Checks CPU usage to identify if the pod is under resource pressure.
- test_jvm_process_cpu_utilization: Monitors JVM CPU usage, which can indicate performance bottlenecks.
Step 3: Review DataServer Logs
- Download the DataServer logs and search for error patterns. For detailed log collection steps, see the DataServer Service Logs section below.
- Look for patterns such as:
- Server errors:
server.error,exception,signingRequest,signingResponse - STS token errors:
Error while generating STS token,sts,tokencombined witherrororfail
- Server errors:
Step 4: Check AWS STS Configuration
Validate AWS Credentials:
- Review the AWS profile configuration in the DataServer configuration files.
- Configuration File Location:
~/privacera/privacera-manager/config/custom-vars/vars.dataserver.aws.yml - Verify the following properties:
- Correct AWS profile name under
DATASERVER_AWS_PROFILE_NAMESandPROFILE_NAMEproperties - Correct
IAM_ARNwith the necessary permissions
- Correct AWS profile name under
- Ensure the IAM roles have sufficient permissions for STS token generation. Refer to IAM Role Creation for details.
Step 5: Check DataServer Resource Utilization
Resource exhaustion can cause signer requests to fail. Check if the DataServer has sufficient resources:
- Review Pod Metrics:
- Review the Pod Monitoring dashboard under Dashboards → Infra-Dashboards in Grafana to check pod memory and CPU usage.
- If memory issues are detected, adjust resources using Compute Sizing.
Escalation Checklist¶
If the issue cannot be resolved through the specific troubleshooting guides, escalate it to the Privacera support with the following details. For additional assistance, refer How to Contact Support for detailed guidance on reaching out to the support team.
- Timestamp of the error: Include the exact time the alert was triggered.
- Grafana dashboard and alert screenshots:
- Grafana → Dashboards → Application-Dashboards → dataserver → DataServer → Overall Signer Error Request %
- Grafana → Alerting → Alert rules → Signer Error Rate Alert
-
DataServer Service Logs: Include logs showing signer errors, exceptions, or connection issues.
Option 1: Download Log from Diagnostic Portal (Recommended)
This option is available only for self-managed deployments.
- Open the Diagnostic Portal and navigate to Dashboard → Pods.
- Select the DataServer pod from the available pods list.
- Click on the Logs tab and download logs by clicking on DOWNLOAD LOGS button.
Option 2: Manual Log Collection (If Diagnostic service is not enabled)
- Prev topic: Troubleshooting for Privacera DataServer