Skip to content

Portal - Outgoing HTTP Error Rate Alerts

This guide helps the support team diagnose and resolve Outgoing HTTP server request failures detected via the Outgoing HTTP Error Rate Alert in Grafana for the portal service.

Root Cause

Portal Outgoing HTTP error rate alerts occur when the portal service experiences failures in making HTTP requests to services or APIs. Common causes include:

  • Service Unavailability: Target services (Ranger API, Scheme Server, Ops-Server, Dataserver) are down or unresponsive
  • Network/Proxy Issues: Connectivity problems or proxy configuration errors preventing outbound requests
  • Resource Constraints: Memory pressure or connection pool exhaustion affecting outbound calls
  • Service Dependency Failures: Services returning errors or not responding as expected
  • Timeout Issues: Services taking too long to respond, causing request timeouts

Solution

Step 1: Identify Failing Endpoint

Use the alert metadata in Grafana (like URI, Method, Status) to identify:

  • Which portal endpoint is failing (Example: Ranger API (/ranger/*), Scheme Server(/peg/*), Ops-Server (/ops-server/*))
  • What request caused the issue (Example: GET /ranger/service/xusers/users500 Internal Server Error)

Step 2: Grafana Dashboard Checks

  1. Outgoing HTTP Request Rate

    This panel shows how many requests per second Portal is making to services over the last 5 minutes.

    What It Shows:

    • Request volume to each service (Ranger, PEG, Ops-Server)
    • Breakdown by HTTP status code (200, 404, 500, etc.)
    • Breakdown by HTTP method (GET, POST, PUT, etc.)
    • Specific endpoints being called

    When to Check:

    • To see if Portal is making unusually high or low requests to services
    • To identify which specific endpoints are generating the most traffic
    • To correlate request spikes with error increases
  2. Outgoing HTTP Response Time

    This panel shows how long services take to respond to Portal's requests.

    What It Shows:

    • Average response time for each service
    • Response time trends over time
    • Breakdown by endpoint and status code
    • Identifies slow-performing services

    When to Check:

    • If Portal UI feels slow or unresponsive
    • To identify which service is causing performance issues
    • To see if response times correlate with error rates
  3. Outgoing Connection Status

    This panel displays the current proxy connection health between Portal and services.

    What It Shows:

    • Ranger Proxy: Portal → Ranger API communication status (Connected/Disconnected/NA)
    • PEG Proxy: Portal → PEG/Scheme Server communication status (Connected/Disconnected/NA)
    • Ops Server Proxy: Portal → Ops-Server communication status (Connected/Disconnected/NA)

    Status Indicators:

    • Connected (Green): Portal can successfully communicate with the service
    • Disconnected (Red): Portal cannot reach the service (network/auth/service down)
    • NA (Gray): Service not configured or monitoring not available

    When to Check:

    • If you see 404/503/504 errors in outgoing requests
    • When Portal features dependent on services aren't working
    • To verify which specific service is causing connectivity issues

Step 3: Apply Quick Fixes Based on Common Error Pattern

Error Code Likely Cause Quick Fix
400 Invalid request parameters/malformed JSON Validate input parameters and request format
401/419 Token expired Verify service account tokens and check service health
404 Endpoint not found/resource missing Verify URL configuration and resource existence
500 Internal service error Proceed to Escalation Checklist for further investigation
503/504 Service unavailable Check service health

Escalation Checklist

If the issue cannot be resolved through the specific troubleshooting guides, escalate it to the appropriate team with the following details:

  • Timestamp of the error : Include the exact time the alert was triggered
  • Grafana dashboard and alert screenshots :
    • Grafana → Dashboards → Portal folder → Portal Dashboard
    • Grafana → Alerting → Alert rules → Outgoing HTTP Error Rate Alerts.
  • Portal Service Logs: Include any logs from the Portal client-side actions, or test steps that reproduce the issue

    Option 1: Download Log from Diagnostic Portal (Recommended)

    1. Open Diagnostic Portal and go to Dashboard → Services Tab
    2. Type "portal" in the service column input search box
    3. Click on the portal service to open its details page
    4. Find and click on a pod that shows "active" status
    5. Click the "Logs" tab on the pod details page
    6. Click "Download Logs" button to save the logs
    7. If you see multiple portal pods with "active" status, repeat steps 4-6 for each one

    Option 2: Manual Log Collection (If Diagnostic service is not enabled)

    Bash
    1
    2
    3
    4
    5
    6
    7
    8
    # Create log archive
    kubectl exec -it <POD> -n <NAMESPACE> -- bash -c "cd /opt/privacera/portal/logs/ && tar -czf portal-logs.tar.gz *.log"
    
    # Copy the fixed-name archive
    kubectl cp <POD>:/opt/privacera/portal/logs/portal-logs.tar.gz ./portal-logs.tar.gz -n <NAMESPACE>
    
    # Extract logs
    tar -xzf portal-logs.tar.gz
    
  • Current portal configuration details : Configuration settings and deployment information

  • Relevant user actions : Actions leading up to the error

For additional assistance, see How to Contact Support for detailed guidance on reaching out to the support team.

Comments