Skip to content

Signer Error Rate Alert

Root Cause

A Signer Error Rate alert is triggered when more than 5% of Signer API calls fail within a 5-minute period. This typically indicates that the DataServer is encountering issues while processing signer requests. Common causes include:

  • Server Errors: Internal DataServer errors, such as failures during token generation or STS token failures.
  • Invalid Token Errors: JWT token validation failures, expired tokens, or malformed authentication tokens.
  • Network Issues: Connectivity problems between the DataServer and AWS service.
  • Resource Exhaustion: Insufficient memory and CPU saturation.
  • Configuration Issues: Incorrect AWS IAM role, credentials, or insufficient IAM role permission to generate sts token.

Troubleshooting Steps

Step 1: Review Grafana Dashboards

  1. Navigate to Grafana → Dashboards → Application-Dashboards → dataserver → DataServer.
  2. Review the Overall Error Request % panel:
    • This panel shows the percentage of Signer requests that failed with response statuses SERVER_ERROR or INVALID_TOKEN.

Step 2: Use Diagnostics Tool

This option is available only for self-managed deployments.

The Diagnostics Tool provides automated testing of DataServer functionality and helps identify configuration or connectivity issues.

  1. Open the Diagnostic Portal and navigate to DashboardPods.
  2. Select the DataServer pod from the available pods list.
  3. Under the CURRENT TEST RESULTS tab, review the PyTest Report for the following checks:
    • test_prop_validation: Verifies that DataServer configuration properties are correctly set and valid.
    • test_heathcheck_api: Tests the health check endpoint to ensure the DataServer is responding to requests.
    • test_certificate_api: Validates certificate retrieval functionality, which is critical for secure communication.
  4. To check resource utilization, review:
    • test_diag_client_disk_space: Verifies that sufficient disk space is available for DataServer operations.
    • test_diag_client_pod_cpu_utilization: Checks CPU usage to identify if the pod is under resource pressure.
    • test_jvm_process_cpu_utilization: Monitors JVM CPU usage, which can indicate performance bottlenecks.

Step 3: Review DataServer Logs

  1. Download the DataServer logs and search for error patterns. For detailed log collection steps, see the DataServer Service Logs section below.
  2. Look for patterns such as:
    • Server errors: server.error, exception, signingRequest, signingResponse
    • STS token errors: Error while generating STS token, sts, token combined with error or fail

Step 4: Check AWS STS Configuration

Validate AWS Credentials:

  1. Review the AWS profile configuration in the DataServer configuration files.
  2. Configuration File Location: ~/privacera/privacera-manager/config/custom-vars/vars.dataserver.aws.yml
  3. Verify the following properties:
    • Correct AWS profile name under DATASERVER_AWS_PROFILE_NAMES and PROFILE_NAME properties
    • Correct IAM_ARN with the necessary permissions
  4. Ensure the IAM roles have sufficient permissions for STS token generation. Refer to IAM Role Creation for details.

Step 5: Check DataServer Resource Utilization

Resource exhaustion can cause signer requests to fail. Check if the DataServer has sufficient resources:

  1. Review Pod Metrics:
    • Review the Pod Monitoring dashboard under Dashboards → Infra-Dashboards in Grafana to check pod memory and CPU usage.
    • If memory issues are detected, adjust resources using Compute Sizing.

Escalation Checklist

If the issue cannot be resolved through the specific troubleshooting guides, escalate it to the Privacera support with the following details. For additional assistance, refer How to Contact Support for detailed guidance on reaching out to the support team.

  • Timestamp of the error: Include the exact time the alert was triggered.
  • Grafana dashboard and alert screenshots:
    1. Grafana → Dashboards → Application-Dashboards → dataserver → DataServer → Overall Signer Error Request %
    2. Grafana → Alerting → Alert rules → Signer Error Rate Alert
  • DataServer Service Logs: Include logs showing signer errors, exceptions, or connection issues.

    Option 1: Download Log from Diagnostic Portal (Recommended)

    This option is available only for self-managed deployments.

    1. Open the Diagnostic Portal and navigate to DashboardPods.
    2. Select the DataServer pod from the available pods list.
    3. Click on the Logs tab and download logs by clicking on DOWNLOAD LOGS button.

    Option 2: Manual Log Collection (If Diagnostic service is not enabled)

    Bash
    1
    2
    3
    4
    5
    6
    7
    8
    # Create log archive
    kubectl exec -it <POD> -n <NAMESPACE> -- bash -c "cd /workdir/privacera-dataserver/logs/dataserver/ && tar -czf dataserver-logs.tar.gz *.log"
    
    # Copy the archive
    kubectl cp <POD>:/workdir/privacera-dataserver/logs/dataserver/dataserver-logs.tar.gz ./dataserver-logs.tar.gz -n <NAMESPACE>
    
    # Extract logs
    tar -xzf dataserver-logs.tar.gz