Heartbeat Monitoring
Heartbeat monitoring helps you ensure that your cron jobs, scheduled tasks, and background processes are running as expected. Instead of checking if your services are up, heartbeats wait for your jobs to “ping” them when they run, start, or complete.What are Heartbeats?
Heartbeats work by providing you with unique URLs that your scripts and scheduled tasks call to signal they’re running correctly. If a heartbeat doesn’t receive a ping within its expected timeframe (based on the cron schedule and grace period), Upcron.io creates an incident and alerts you.Heartbeats are perfect for monitoring cron jobs, data backups, batch processing, scheduled reports, and any recurring background tasks.
How Heartbeats Work
Unlike uptime monitors that actively check your services, heartbeats are passive monitors that wait for your services to check in:- You create a heartbeat with a cron schedule (e.g., “daily at 2 AM”)
- Upcron.io generates unique URLs for your heartbeat
- Your script calls the appropriate URL when it starts, succeeds, or fails
- If no ping is received within the grace period, an incident is created
Creating Your First Heartbeat
1
Navigate to Heartbeats
In your Upcron.io dashboard, go to the Heartbeats section within your project.
2
Add New Heartbeat
Click “Create Heartbeat” and fill in the required information:
- Heartbeat Name: Descriptive name (e.g., “Daily Database Backup”)
- Project: Select the project this heartbeat belongs to
- Cron Schedule: When your job is expected to run (e.g.,
0 2 * * *for daily at 2 AM) - Grace Period: How long to wait after the expected time before alerting (60-86400 seconds)
3
Configure Advanced Settings
Optional configurations:
- Next Ping At: Manually set when the next ping is expected
- Description: Additional details about what this heartbeat monitors
4
Save and Get URLs
Click “Create Heartbeat” to save and receive your unique ping URLs.
Understanding Cron Schedules
Heartbeats use cron expression format to define when your jobs are expected to run:Cron Format Explanation
The cron format uses five fields:minute hour day month weekday
- Fields
- Special Characters
- Examples
- Minute: 0-59
- Hour: 0-23 (24-hour format)
- Day: 1-31
- Month: 1-12
- Weekday: 0-7 (0 and 7 are Sunday)
Understanding Grace Periods
The grace period determines how long Upcron.io waits after the expected ping time before creating an incident:Short Grace Period
60-300 seconds (1-5 minutes)For critical jobs that must run on time
Medium Grace Period
300-3600 seconds (5-60 minutes)For jobs that may have variable runtime
Long Grace Period
3600-86400 seconds (1-24 hours)For jobs with unpredictable schedules or long runtimes
Examples
- Database backup: 30 minutes
- Email reports: 1 hour
- Data sync: 15 minutes
- Log rotation: 2 hours
Heartbeat URLs
Each heartbeat provides three unique URLs for different scenarios:1. Success URL (Primary)
Call this when your job completes successfully:2. Start URL
Call this when your job begins (optional but recommended for long-running tasks):3. Fail URL
Call this when your job encounters an error:Implementing Heartbeats in Your Scripts
Basic Implementation
- Bash Script
- Python
- Node.js
- PHP
Advanced Implementation with Run IDs
For jobs that may run multiple times or overlapping instances:Best Practices
1. Always Handle Failures
Ensure your scripts can ping the fail URL when something goes wrong:2. Set Appropriate Grace Periods
- Quick Jobs
- Medium Jobs
- Long Jobs
Grace Period: 1-5 minutesFor jobs that should complete quickly:
- Log rotation
- Simple file operations
- Quick health checks
3. Use Descriptive Names
Choose clear names that describe what the heartbeat monitors:4. Test Your Implementation
Before deploying, test your heartbeat implementation:1
Test Success Path
Run your script manually and verify it pings the success URL correctly.
2
Test Failure Path
Simulate a failure condition and verify it pings the fail URL.
3
Test Network Issues
Ensure your script handles network timeouts gracefully with retries.
4
Verify Scheduling
Confirm your cron schedule matches when your job actually runs.
Common Use Cases
Database Backups
Log Processing
Data Synchronization
Troubleshooting
Heartbeat shows as failed but job is running
Heartbeat shows as failed but job is running
Possible solutions:
- Check if your script is actually calling the ping URLs
- Verify network connectivity from your server to ping.upcron.io
- Increase the grace period to accommodate longer job runtimes
- Check if your cron schedule matches the actual job execution times
Getting alerts for jobs that haven't failed
Getting alerts for jobs that haven't failed
Common causes:
- Grace period is too short for the job’s normal runtime
- Network issues preventing ping requests from reaching Upcron.io
- Script is calling the wrong heartbeat URL
- Cron schedule doesn’t match actual job timing
Ping requests timing out
Ping requests timing out
Solutions:
- Add timeout and retry logic to your ping requests
- Use curl with
--max-time 10 --retry 3options - Check firewall settings on your server
- Verify DNS resolution for ping.upcron.io
Multiple instances of job creating duplicate pings
Multiple instances of job creating duplicate pings
Solutions:
- Use run IDs to distinguish between job instances
- Implement proper locking to prevent overlapping executions
- Adjust grace period to account for potential overlaps
- Consider using process management tools like systemd