Posted: Wed Jun 18, 2014 17:53 Post subject: cron falling asleep workaround?
What's a workaround for the infamous cron falling asleep bug at http://www.dd-wrt.com/wiki/index.php/CRON#CRON_Service_falling_asleep_in_V24 ?
I was thinking of making cron restart itself, but I don't know if that's possible. If I add a cron job that, say every hour, does stopservice cron && startservice cron, would the compound command run fully, or will it interrupt when the stop part executes? (I assume it would run through if it's launched in a separate process, but would not if it runs within the cron process.) Any alternatives?
Occasionally my cron jobs stop running on my up to date stable kong build. Here's a simple cron watchdog shell script that detects the problem and restarts cron service only when it's asleep. This solution is similar to this one but runs on dd-wrt itself and doesn't require a server with php.
I run the script every hour on the hour from Administration->Management->cron with this line:
Code:
0 * * * * root /opt/bin/cronwd.sh
I have a startup script that runs from Administration->Commands->Startup which does all my startup stuff. In that script, the line below launches the cron watchdog script in background (you could directly launch cronwd.sh from startup instead):
# cron watchdog script
#
# Usage: cron [-d]
#
# without -d it writes a timestamp file and quits (run this way from cron)
# with -d it keeps running and checks that the timestamp never exceeds
# limit and restarts the cron service if it has fallen asleep, see:
# https://wiki.dd-wrt.com/wiki/index.php/CRON#CRON_Service_falling_asleep_in_V24
#
# set MAXALLOWEDMINUTES to the number of minutes old the cron timestamp must be
# before the script decides cron is asleep and restarts the cron service.
# You might want to allow an extra minute to account for cron timing variations.
MAXALLOWEDMINUTES=61 # change this value to match your cron schedule
if [ "$1" != "" ] && [ "$1" == "-d" ]; then
# daemon mode: calculate age in seconds of dead/hung cron
# and we check twice as often in case of unlucky timing.
TIMEDIFFLIMIT=$(expr $MAXALLOWEDMINUTES \* 60)
CHECKPERIOD=$(expr $TIMEDIFFLIMIT / 2)
else
# cron mode: write the timestamp file and exit
CURRENTTIMESTAMP=$(date +%s)
echo $CURRENTTIMESTAMP > $TIMESTAMPFILE
exit 0
fi
while [ true ]
do
CURRENTTIMESTAMP=$(date +%s)
CRONTIMESTAMP=$CURRENTTIMESTAMP
[ -f $TIMESTAMPFILE ] && CRONTIMESTAMP=$(cat $TIMESTAMPFILE)
TIMEDIFF=$(expr $CURRENTTIMESTAMP - $CRONTIMESTAMP)
if [ $TIMEDIFF -gt $TIMEDIFFLIMIT ]; then
echo cron is dead/asleep! Delay of ${TIMEDIFF}s exceeds ${TIMEDIFFLIMIT}s Restarting cron... >> $LOGFILE
echo Current time: $(date) >> $LOGFILE
echo old cron pid: $(pidof cron) >> $LOGFILE
stopservice cron && startservice cron >> $LOGFILE
rm -f $TIMESTAMPFILE
echo new cron pid: $(pidof cron) >> $LOGFILE
echo ---------------- >> $LOGFILE
fi
sleep $CHECKPERIOD
done
exit 0
It gets triggered about once a week so far. I'm probably running more optware stuff on my router than most so is that a contributing factor? The cron falling asleep bug seems to occur mostly within 24h of my weekly dd-wrt reboots.
More info on this cron falling asleep issue. This cron watchdog script has shown that cron falls asleep between 5 to 8 hours following a scheduled reboot (enabled in Administration->Keep Alive->Schedule Reboot). A random power outage a couple of days ago shows that cron does not fall asleep after a hard reset as it does hours after a scheduled reboot.
My current test is to disable scheduled reboots and put the same command that would be in /etc/cron.d/check_schedules into Administration->Management->Cron->Additional Cron Jobs instead. If that works, great. If it doesn't work, I'll try to issue the reboot differently.
It appears that the cron falling asleep bug is actually more of a scheduled reboot bug.
I've managed to reproduce the cron falling asleep bug on demand.
1) Power off the ISP's modem (unplug ethernet cable may work too)
2) Reboot dd-wrt
At this point dd-wrt runs ntpclient periodically to try to set the clock. My setup actually sets the clock at bootup with gps so it's pretty close. During this time dd-wrt and all its processes including cron run perfectly for as long as you want.
3) Power on the ISP's modem
After a few minutes the modem will get an internet connection and share it by DHCP with dd-wrt and ntpclient will succeed in setting the clock.
Result)
All cron jobs, which may have been running for hours, will never run again. The cron process is "asleep" and needs to be restarted for any cron jobs to run again.
Theory)
There's a race condition between cron and ntpclient at startup and if ntpclient comes in second, it can mess up a working cron.
Edit)
During one test, the ISP took an unusually long time to connect the modem. During this delay, dd-wrt's cron still fell asleep after receiving only a local IP address through DHCP from the modem and ntpclient was still trying/failing on dd-wrt. It would seem ntpclient is not the culprit for messing up cron but rather something dd-wrt does when receiving an IP address on the WAN port.
I thought I'd try the simplest approach first, so I added Cabbage's startup command. So far, the router is rebooting itself as scheduled, at 3:00 a.m.
Thanks for all the useful information in this thread. _________________ Router: Linksys WRT1900ACSv2
Modem: Verizon Fios DD-WRT v3.0-r44048 std (08/02/20)
ISP: Verizon Fios
NAS: ReadyNas314
I prefer the watchdog script workaround since it won't restart cron unless it's found to be "asleep". Don't know what might happen if a cron restart occurred in the middle of cron already running something.
I've added the replication steps from this thread to this bug. Hopefully, now that there are replication steps, BrainSlayer will solve the underlying problem and cron asleep workarounds will become moot.
That's an excellent point, and I'll likely follow your suggestion. I was mainly interested in a "Proof of Concept", that sleepy CRON was indeed the problem.
Thanks again for your help! _________________ Router: Linksys WRT1900ACSv2
Modem: Verizon Fios DD-WRT v3.0-r44048 std (08/02/20)
ISP: Verizon Fios
NAS: ReadyNas314
Posted: Wed Sep 11, 2019 9:27 Post subject: Watchdog Script
Not sure my PMs are getting sent...still have messages in my "Outbox".
The script is running as a background process. However, it seems that the cron instance doesn't run. If I run it once from the command line, the cronwd.txt files is created, and then is updated every hour on the hour. Otherwise, the cronwd.txt file never appears.
Auto-reboot is working, but no explanation for why the cron job doesn't fire off on it's own.
Nothing in the cronwd.log. _________________ Router: Linksys WRT1900ACSv2
Modem: Verizon Fios DD-WRT v3.0-r44048 std (08/02/20)
ISP: Verizon Fios
NAS: ReadyNas314
Joined: 16 Nov 2015 Posts: 6446 Location: UK, London, just across the river..
Posted: Wed Sep 11, 2019 9:58 Post subject:
well, in witch build cron is not working i can see
mine's is fine and there are services going on and off on regular, basses... _________________ Atheros
TP-Link WR740Nv1 ---DD-WRT 55630 WAP
TP-Link WR1043NDv2 -DD-WRT 55723 Gateway/DoT,Forced DNS,Ad-Block,Firewall,x4VLAN,VPN
TP-Link WR1043NDv2 -Gargoyle OS 1.15.x AP,DNS,QoS,Quotas
Qualcomm-Atheros
Netgear XR500 --DD-WRT 55779 Gateway/DoH,Forced DNS,AP Isolation,4VLAN,Ad-Block,Firewall,Vanilla
Netgear R7800 --DD-WRT 55819 Gateway/DoT,AD-Block,Forced DNS,AP&Net Isolation,x3VLAN,Firewall,Vanilla
Netgear R9000 --DD-WRT 55779 Gateway/DoT,AD-Block,AP Isolation,Firewall,Forced DNS,x2VLAN,Vanilla
Broadcom
Netgear R7000 --DD-WRT 55460 Gateway/SmartDNS/DoH,AD-Block,Firewall,Forced DNS,x3VLAN,VPN
NOT USING 5Ghz ANYWHERE
------------------------------------------------------
Stubby DNS over TLS I DNSCrypt v2 by mac913
My version is listed below.
There has been a problem on and off through many versions, where cron either dies, or falls asleep. In my situation, the auto-reboot wasn't working.
Apparently, sleepy/dying cron has been an issue for some time, at least with some routers/versions. _________________ Router: Linksys WRT1900ACSv2
Modem: Verizon Fios DD-WRT v3.0-r44048 std (08/02/20)
ISP: Verizon Fios
NAS: ReadyNas314
Last edited by giles02134 on Wed Sep 11, 2019 17:41; edited 1 time in total
well, in witch build cron is not working i can see
mine's is fine and there are services going on and off on regular, basses...
It's not that cron never works, the problem is that it sometimes falls asleep and stops running jobs (often after a scheduled reboot).
Try the replication steps in the currently last comment of this bug.
There's likely a race condition at startup that works on 99% of setups but the replication steps given in the bug introduce a delay changing the outcome of the race into failing 100% of the time on my setup (Asus RT-AC56U: DD-WRT v3.0-r39960M kongac).
Since dd-wrt relies on cron for stability with check_ps, the underlying problem needs to be resolved instead of documenting cron and scheduled reboot failures.
The work-around shell script that yoyoma2 has shared in this thread,(about which I'll post more later this week) works really well at poking cron with a stick when it dies or falls asleep.
As I noted earlier, the symptom for me was that the scheduled reboot wasn't working. Until yoyoma2 pointed out that cron was the likely culprit, I assumed the reboot issue was a bug in its own right.
Indeed, the cron issue needs to be fixed, as DD-WRT, like any Linux system, depends on it. I'd be happy to see old bugs fixed ahead of new functionality being added, but that's just my opinion.
As I am in no way a systems developer/programmer, I'll have to wait and see. _________________ Router: Linksys WRT1900ACSv2
Modem: Verizon Fios DD-WRT v3.0-r44048 std (08/02/20)
ISP: Verizon Fios
NAS: ReadyNas314
Posted a fix for this cron bug in the Contributions Upload forum here. Until the developers submit this/another fix, the watchdog script in this thread is a usable workaround.