How PingPuffin's Monitoring System Works

Version: 1.0
Last updated: November 21, 2024


Table of Contents

  1. Overview
  2. Check Intervals and Frequency
  3. Error Detection
  4. Recovery and Status Changes
  5. Manual Updates
  6. Protection Against Monitor Failures
  7. Notification System
  8. Automatic Dashboard Updates
  9. Data Collection and Storage
  10. Technical Specifications
  11. Privacy and Security
  12. System Reliability
  13. Common Scenarios and Examples

Overview

PingPuffin monitors HTTP and HTTPS endpoints 24/7 to ensure your websites are available. Our system is built with a focus on reliability, precision, and transparency.

Key Features

  • Automatic checks every 5 minutes via cron job
  • 2-check verification to avoid false alarms
  • 24/7/365 monitoring without breaks
  • Instant recovery when your site is back
  • Manual updates for quick verification
  • Protection against internal errors in the monitor system

Why Transparency Matters

We believe in openness about how our monitoring works. This document explains exactly how we detect downtime, how we avoid false alarms, and how we ensure you get notified as quickly as possible when there's a real problem.


Check Intervals and Frequency

Automatic Checks

All active monitors are checked automatically every 5 minutes.

Cron schedule:

*/5 * * * *

This means:

  • First check: 00:00, 00:05, 00:10, 00:15, ...
  • No breaks, no weekends, no holidays
  • All monitors checked in parallel for efficiency

Manual Updates

You can always trigger an instant check via the "Update now" button in your dashboard:

  • Result shown immediately
  • Bypasses 2-check threshold for quick feedback
  • Useful after deployments or configuration changes

Coverage

  • Availability: 24/7/365
  • Parallel checks: All monitors checked simultaneously
  • Timeout: Standard 30 seconds (configurable)
  • Maximum redirects: Up to 5 follow requests

Error Detection

2-Check Verification System

To avoid false alarms, we require 2 consecutive failures before marking a site as down.

How It Works

First Failure (00:00):

  • Failure counter set to 1
  • Status remains unchanged (e.g., UP)
  • No notification sent
  • System logs failure for internal monitoring

Second Failure (00:05):

  • Failure counter updated to 2
  • Status changes to DOWN
  • Incident created automatically
  • Notification sent to all configured channels

Total time: ~5-10 minutes from first failure to DOWN status.

Why 2 Checks?

Transient network problems (DNS blips, brief timeouts, temporary server errors) occur even on stable sites. By requiring 2 failures:

  • ✅ We eliminate false alarms from brief problems
  • ✅ We confirm there's a real problem
  • ✅ We improve user trust in notifications

What Counts as a Failure?

The following situations are marked as failures:

HTTP Error Codes

  • 4xx Client Errors: 400, 403, 404, 405, etc. (unless explicitly allowed)
  • 5xx Server Errors: 500, 502, 503, 504, etc.

Network Errors

  • Timeout: No response within timeout period (default: 30 seconds)
  • Connection Refused: Server actively rejects connections
  • DNS Failure: Cannot resolve domain name
  • Network Unreachable: Host not available on network

SSL/TLS Errors

  • Invalid Certificate: Certificate is invalid
  • Expired Certificate: Certificate has expired
  • Untrusted Certificate: Certificate not from trusted CA
  • Hostname Mismatch: Certificate hostname doesn't match URL

Handling Different Error Types

Important: ALL error types count toward the failure counter.

Example:

Check 1: HTTP 500 → Failure counter: 1, Status: UP (waiting for confirmation)
Check 2: Timeout  → Failure counter: 2, Status: DOWN (confirmed failure)

Rationale:

  • If a server switches between different error types, it indicates instability
  • It's not less serious if the error type changes
  • Any failure means the site is not functioning correctly

Recovery and Status Changes

Instant Recovery

When your site comes back, we react immediately – no 2-check threshold for recovery.

Recovery flow:

Status: DOWN
Check 1: Site responds with HTTP 200 → Failure counter reset, Status: UP
Result: Instant recovery, notification sent

Why instant recovery?

  • ✅ Users want quick feedback when their site is back
  • ✅ No reason to wait for confirmation that something works
  • ✅ Best practice in monitoring
  • ✅ Reduces worry and waiting time

Status States

🟢 UP (Online)

Meaning:

  • Site responds with expected status code (typically 200-399)
  • Response time within acceptable range
  • No errors detected

Notifications:

  • Sent on recovery from DOWN status

🔴 DOWN (Offline)

Meaning:

  • Site failed 2+ consecutive checks
  • Incident created and tracked
  • All configured notification channels alerted

Duration:

  • Recorded from first DOWN check to recovery
  • Shown in incident log with precise duration

🟡 WARNING (Warning)

Meaning:

  • Site responds but with warnings
  • Examples: Slow response time, Cloudflare challenge detected
  • Monitoring continues normally

Notifications:

  • Can be configured per user

🔵 REDIRECT (Redirect)

Meaning:

  • Permanent redirect (301) detected to another URL
  • Site is functional but URL has changed
  • You can choose to update URL or continue monitoring original

Notifications:

  • Can be configured per user

Cloudflare-Protected Sites

Automatic Handling:

PingPuffin automatically handles sites behind Cloudflare based on a fundamental principle in uptime monitoring:

The Principle: If the server responds with an HTTP status code (e.g., 403), it means the server is online. Cloudflare's protection blocks our monitoring requests, but this doesn't mean your site is down.

  • HTTP 403 Forbidden: If your site returns 403, but the server actually responds (no timeout), this is automatically detected as Cloudflare protection and marked as "Problematic" (warning status) instead of "Down". This is because 403 typically indicates Cloudflare bot protection, and the server is actually online (it's responding).

  • HTTP 503 Service Unavailable: 503 is only treated as "Problematic" if Cloudflare is actually detected (headers, body patterns, or short response < 1200 bytes). If no Cloudflare detection, 503 is treated normally (may be real downtime).

Why "Problematic" instead of "Down"?

If the server responds with an HTTP status code, it means the server is online. Uptime monitoring is about availability - if the server responds, it's available, even if there's protection active. Therefore, it's marked as "Problematic" to indicate that there's active protection, not real downtime.

Optional: Configure Cloudflare for Better Monitoring

If you want to avoid "Problematic" status for your Cloudflare-protected site, you can:

  1. WAF Rules: Create a rule that allows requests from PingPuffin's User-Agent:

    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
  2. Browser Integrity Check: Consider disabling this for monitoring IPs

  3. Rate Limiting: Adjust rate limiting so monitoring requests aren't blocked

PingPuffin's Monitoring IP: 188.245.198.146 (for whitelisting if needed)


Manual Updates

User-Initiated Checks

You can always trigger an instant check via the "Update now" button in the dashboard.

Behavior:

  • Bypass 2-check threshold: Result shown and applied immediately
  • Instant status update: If status changes, it updates immediately
  • Notification: Sent if status changes
  • Failure counter updated: Counter updated based on result

Use Cases

  • ✅ Quick verification after deployment of fixes
  • ✅ Testing new monitor configuration
  • ✅ Instant status check without waiting for cron
  • ✅ Debugging connection problems

Example:

00:00 - Automatic check fails → Failure counter: 1, Status: UP
00:02 - You click "Update now" → Check fails → Status: DOWN instantly
Result: Manual check skips 2-check threshold for quick feedback

Protection Against Monitor Failures

Internal vs. External Errors

It's critical to distinguish between errors in your site and errors in our monitor system.

Site Errors (Monitored)

These errors from your site count as failures:

  • ✅ HTTP 500 from target site → Counts as failure
  • ✅ Timeout connecting to target site → Counts as failure
  • ✅ DNS error for target domain → Counts as failure

Monitor System Errors (Protected)

These errors in PingPuffin's code do NOT mark your site as down:

  • ❌ PHP exception in PingPuffin code → Does NOT mark site as down
  • ❌ Database connection error → Does NOT mark site as down
  • ❌ Internal logic error → Does NOT mark site as down

Administrator Alerting

When the monitor system fails:

Logging:

  • Critical errors logged with full stack trace
  • Timestamp and monitor ID recorded
  • All details saved for debugging

Email Alarm:

  • Email sent to system administrator
  • Contains error message, stack trace, and context
  • Rate-limited: Maximum 1 email per hour per unique error
  • Prevents email flooding during system problems

Database:

  • Status remains unchanged (no false downs)
  • No incident created
  • Users not affected

Example:

Monitor checker runs
→ Internal error detected in monitor system
→ Error logged with full context
→ Email sent to system administrator
→ Database NOT updated
→ Your site status remains unchanged

Notification System

When Notifications Are Sent

Automatic Checks

  • Status DOWN: After 2 consecutive failures confirmed (~5-10 min)
  • Status UP: Instantly when site comes back from DOWN
  • Other changes: Configurable per user (redirect, warning, etc.)

Manual Checks

  • Instant notification: If status changes on manual update
  • No delay: Bypasses 2-check threshold

Notification Channels

Email

  • Direct email notifications
  • Contains: Site name, status, error message, timestamp
  • Contains link to dashboard for more details

Slack

  • Message to configured channel or DM
  • Formatted with colors based on status (red=DOWN, green=UP)
  • Includes direct link to monitor

Webhook

  • POST request to custom endpoint
  • JSON payload with all details
  • Status code, response time, error message included
  • Useful for integration with other systems

Notification Snoozing

You can temporarily disable notifications (snooze) for 24 hours:

During Snooze:

  • ✅ Monitoring continues normally
  • ✅ Status updates in dashboard
  • ❌ No notifications sent to any channel
  • ⏰ Automatically un-snoozed after 24 hours

Use Cases:

  • Planned maintenance
  • Known issue during rollout
  • Temporary shutdown

Automatic Dashboard Updates

Auto-Refresh Mechanism

Your dashboard updates automatically every 30 seconds without reloading the page.

Technical:

  • Dashboard updates automatically every 30 seconds
  • Fetches latest data from database via secure API calls
  • Does NOT trigger new checks (read-only)
  • Lightweight calls for quick updates

What Gets Updated?

Dashboard always shows latest data from last cron check:

  • 🎨 Status indicator: Colored badge (green/red/yellow/blue)
  • Last checked: Precise timestamp of last check
  • Response time: Response time in milliseconds
  • 🔢 Failure counter: Number of consecutive failures
  • 📊 Incident info: Active incidents and duration

Important: Auto-refresh respects automatic 2-check logic because it only shows data from cron checks, not new checks.


Data Collection and Storage

Check Records

Every single check is saved in the database with the following information:

  • Unique ID for check
  • Reference to monitor
  • Timestamp (when check was performed)
  • HTTP status code or error type
  • Response time in milliseconds
  • Success/failure status
  • Error message (if relevant)
  • Redirect information (if relevant)
  • SSL error details (if relevant)

Usage:

  • Uptime percentage calculations
  • Historical graphs and reports
  • Error analysis and debugging
  • Performance tracking over time

Incident Tracking

When status changes to DOWN, an incident is created with the following data:

  • Unique ID for incident
  • Reference to monitor
  • Start timestamp
  • End timestamp (when resolved)
  • Total duration

Features:

  • Automatic creation on DOWN status
  • Continuous duration calculation
  • Automatic resolution on recovery
  • Complete history maintained
  • Exportable to CSV

Activity Log

All significant events are logged:

  • ✅ Status changes (UP → DOWN, DOWN → UP, etc.)
  • ✅ Manual checks performed by users
  • ✅ Configuration changes
  • ✅ URL updates
  • ✅ Metadata updates

Functionality:

  • Exportable to CSV format
  • Searchable and filterable
  • Shows details for each event
  • Timestamps on all entries

Technical Specifications

HTTP Request Parameters

When PingPuffin checks your site, the following request is sent:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Accept: */*
Connection: close
[Optional: Authorization header for Basic Auth]

Note: We use a standard Chrome user agent to avoid being blocked by sites that filter custom user agents.

Configuration:

  • Timeout: Can be configured per monitor (default: 30 seconds)
  • Follow Redirects: Yes, up to 5 redirects maximum
  • SSL Verification: Enabled (validates certificates)
  • Connection Reuse: Disabled (fresh connection for each check)

Standard Expected Status Codes

Monitors expect these codes by default (can be customized):

2xx Success:

  • 200 OK
  • 201 Created
  • 204 No Content

3xx Redirects:

  • 301 Moved Permanently
  • 302 Found
  • 307 Temporary Redirect
  • 308 Permanent Redirect

Custom:

  • You can configure which status codes are acceptable for your specific monitor
  • Example: Accept 401 for password-protected pages

Response Time Measurement

What's Measured:

  • DNS lookup time
  • Connection time (TCP handshake)
  • SSL handshake time (if HTTPS)
  • Time to first byte (TTFB)

What's NOT Measured:

  • Body download time (we only read headers)
  • JavaScript execution time
  • Asset loading time

Storage:

  • Measured in milliseconds
  • Saved at each check
  • Used for performance tracking
  • Shown in dashboard

Advanced Monitoring Settings

For advanced users, we offer:

HTTP Method

  • GET: Standard method
  • POST: For endpoints that require POST

Request Body

  • Send JSON or form data with POST requests
  • Useful for API endpoints that require specific data

Basic Authentication

  • Username and password for protected endpoints
  • Passwords encrypted with AES-256-CBC
  • Never stored in plain text

Future Features

  • Custom headers
  • Request params
  • Advanced authentication (OAuth, Bearer tokens)

Server Information

Public IP Address

PingPuffin's monitoring server uses the following public IP address to perform checks:

IP Address: 188.245.198.146

If you need to whitelist PingPuffin's IP address in your firewall or server configuration, you can use this IP address.

Note: The IP address may change during server migrations or infrastructure updates. We recommend using the User-Agent header for identification instead of IP-based rules, if possible.

User-Agent Identification:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36

Privacy and Security

Data Encryption

Sensitive Credentials:

  • All passwords (Basic Auth) encrypted on storage
  • Algorithm: AES-256-CBC (industry standard)
  • Key: Stored securely in environment variables
  • IV: Unique initialization vector per encryption
  • Decryption only happens in memory during checks
  • Never exposed in logs or API responses

Data Access

Access Control:

  • Check results only visible to monitor owner
  • No data shared with third parties
  • Activity log only exportable by owner
  • Secure API endpoints with authentication

Database Security:

  • Prepared statements (prevents SQL injection)
  • Session-based authentication
  • CSRF protection on all forms

HTTPS Enforcement

SSL/TLS:

  • All monitor checks support HTTPS
  • SSL certificate validation enabled
  • Warns on certificate problems
  • Detects expired certificates

Dashboard:

  • Always accessed via HTTPS
  • Secure cookies (HttpOnly, Secure flags)
  • HSTS headers recommended

System Reliability

Monitor System Health

Our monitor system monitors itself:

Error Detection:

  • Automatic detection of internal errors
  • Full logging of all exceptions
  • Stack traces for debugging

Administrator Alerts:

  • Critical errors emailed to system administrator
  • Rate-limited to avoid spam
  • Details included for quick resolution

Automatic Recovery:

  • Cron continues on errors in individual monitors
  • No cascade failures across monitors
  • Database transactions ensure data integrity

Uptime Goals

We aim for the following reliability:

Monitor System Uptime:

  • Goal: 99.9% (less than 9 hours downtime per year)
  • Monitored: Via cron log and system metrics

Check Execution Rate:

  • Goal: 99.5% success rate for check execution
  • Monitored: Error rate in logs

Cron Reliability:

  • Monitored: Each cron execution logged
  • Alerting: On missing execution

Common Scenarios and Examples

Scenario 1: Transient Network Blip

Situation: Brief network problem, site is actually up.

00:00 - Check fails (timeout) → Failure counter: 1, Status: UP
00:05 - Check succeeds → Failure counter: 0, Status: UP

Result:
✅ No notification sent
✅ No status change
✅ No false alarm

Why it works: 2-check threshold catches brief problems.


Scenario 2: Real Downtime

Situation: Server is really down (e.g., hosting problem).

00:00 - Check fails (HTTP 500) → Failure counter: 1, Status: UP
      → System logs first failure for internal monitoring

00:05 - Check fails (timeout) → Failure counter: 2, Status: DOWN
      → Incident created automatically
      → Notification sent via email/Slack/webhook

00:10 - Check fails (timeout) → Failure counter: 3, Status: DOWN
      → Incident duration updated continuously

Result:
✅ DOWN status confirmed at 00:05
✅ Notification sent ~5 minutes after first failure
✅ Different error types (500 + timeout) both count

Why it works: Two consecutive failures confirm real problem.


Scenario 3: Quick Recovery

Situation: Site down, comes back quickly.

00:00 - Status is DOWN (from previous failure)
00:05 - Check succeeds (HTTP 200) → Failure counter: 0, Status: UP
      → Incident marked as resolved automatically
      → Recovery notification sent

Result:
✅ Instant recovery on first successful check
✅ Incident duration calculated (00:00 to 00:05 = 5 min)
✅ You're informed quickly about recovery

Why it works: No 2-check threshold for recovery.


Scenario 4: Manual Refresh During First Failure

Situation: Automatic check failed once, user wants to verify.

00:00 - Automatic check fails → Failure counter: 1, Status: UP
      → Status remains UP (waiting for confirmation)

00:02 - You click "Update now" → Check fails → Status: DOWN immediately
      → Manual check bypasses 2-check threshold for quick feedback
      → Notification sent

Result:
✅ Manual check gives instant feedback
✅ Status updates without waiting for next automatic check
✅ Useful for debugging and verification

Why it works: Manual checks are designed for instant feedback.


Scenario 5: Monitor System Error

Situation: Internal error in PingPuffin's own code.

00:00 - Monitor system runs automatic check
      → Internal error detected
      → Error caught automatically

      Logging:
      → Error logged with full context for internal monitoring
      → Full technical information saved for debugging

      Administrator Alarm:
      → Email sent to system administrator (max 1 per hour)
      → Contains error details and context

      Database:
      → No update to your site status
      → Your site status remains unchanged
      → Failure counter not affected

Result:
✅ Monitor error does NOT affect your site status
✅ Administrator alerted to fix problem
✅ No false DOWN status

Why it works: Distinction between monitor errors and site errors.


Scenario 6: Different Error Types Consecutively

Situation: Server unstable, different errors each time.

00:00 - Check fails (HTTP 500) → Failure counter: 1, Status: UP
00:05 - Check fails (Connection timeout) → Failure counter: 2, Status: DOWN
00:10 - Check fails (HTTP 503) → Failure counter: 3, Status: DOWN

Result:
✅ Different error types ALL count
✅ Status DOWN after 2 failures (regardless of type)
✅ Indicates unstable server (maybe worse than one consistent error)

Why it works: Any error means site not functioning correctly.


Frequently Asked Questions

How quickly do I get notified of downtime?

Automatic check: ~5-10 minutes after first failure (requires 2 failures).
Manual check: Instantly if you update manually.

Can I get false alarms?

Very rarely. The 2-check system eliminates most brief problems. If you get an alarm, there's almost always a real problem.

What if my server is temporarily slow?

If response time exceeds timeout (default 30 sec), it counts as failure. You can increase timeout value for your monitor.

How is planned maintenance handled?

Use the "Snooze" function to disable notifications for 24 hours. Monitoring continues, but you get no alarms.

Can I see history for all checks?

Yes, the activity log shows all checks and status changes. You can also export to CSV.

What happens if PingPuffin itself goes down?

Our monitors run on reliable infrastructure. On critical system errors, administrator is alerted, but your site is NOT marked as down.


Contact & Support

Have questions about how monitoring works?

📧 Email: support@pingpuffin.com
🐛 Bug reports: Via email
📚 Documentation: See documentation section for more information


Changelog

v1.0 (November 21, 2024)

  • First version of documentation
  • 2-check verification system implemented
  • Monitor failure protection added
  • Rate-limited administrator alerts

This document is updated continuously. Check "Last updated" at the top to see if there are new versions.

📅 Last updated: January 14, 2026 ⏱️ 1 days ago