Author |
Message |
mirrera
Joined: 28 Feb 2008 Posts: 5 Country: United States |
|
Restart Job Usage |
|
Hello, we are relatively new 24x7 users, and I have a question about the job restart feature.
We are running v 3.4.27 on a W2k3 server. Many of the jobs run PL/SQL procedures, and some of these we would like to have automatically restart on failure.
My assumption has been that simply checking the "Restart this job if it fails" on the 9th screen of the job wizard, coupled with providing values for the retry interval, and number of retries, would be enough to enable this.
I have a test job that calls a procedure which will error out. In the log, I get "Attempt 1 of 3 to run this job failed, will retry in 60 seconds", yet the job never re-runs. The job was scheduled as a one-time run.
This seems pretty basic to me, and I've searched the site and haven't found any other posts with the same problem -- what am I doing wrong?
Thanks.
|
|
Thu Feb 28, 2008 2:34 pm |
|
 |
barefootguru
Joined: 10 Aug 2007 Posts: 195
|
|
|
|
This is on my list of things to investigate as well. I have an FTP job which sometimes fails to make a connection, so I've set the job definition for 1 retry in 60 seconds.
1. I'll always receive the 'job error' e-mail, though would rather only get this after the second fail. I don't want to be notified on the first attempt because I've told 24x7 this is OK.
2. Sometimes the second run never happens.
24x7 multi-platform 4.1 427
|
|
Thu Feb 28, 2008 2:48 pm |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
We are looking into this.
|
|
Thu Feb 28, 2008 3:03 pm |
|
 |
mirrera
Joined: 28 Feb 2008 Posts: 5 Country: United States |
|
|
|
Has there been any progress on this?
|
|
Tue Mar 11, 2008 10:00 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
Here is what I found out. There is surely some design efficiency.
If the job is NOT set to run detached, than everything is ok. The scheduler restarts it properly as many times as specified in retry property and sends an error email only after the last failed job run.
If the job is set to run detached, in other words as a separate process, the scheduler spawns the process and let it go. For script type jobs including SQL scripts, it doesn't know if the spawned process failed or not. The spawned process always return 0 as the process exit code. The scheduler shares the log file with all detached jobs and that's why you can see "failed" reported in the log.
On the other hand, the detached job is physically separated from the scheduler and it doesn't know f the scheduler is going to restart it or not in case of failure. That's why it always sends an error email when it fails.
As you can see the solution to this issue is to simply uncheck the detached property. But there is catch -- jobs running non-detached, share memory with the scheduler. If they leak any system resources (for example, virtually all db drivers do that) the lost resources accumulate over time and may potentially affect the system ability to run new jobs.
Sorry if the explanation above is too wordy.
|
|
Tue Mar 11, 2008 4:39 pm |
|
 |
barefootguru
Joined: 10 Aug 2007 Posts: 195
|
|
|
|
Thanks for the explanation, it's not too wordy. The answer's a bit disappointing though, I hope this is on the longer term list to address.
(The particular job which fails here is convoluted and runs hourly, so I want it to stay detached. I'll wrap some shell scripting around it instead.)
|
|
Tue Mar 11, 2008 11:22 pm |
|
 |
mirrera
Joined: 28 Feb 2008 Posts: 5 Country: United States |
|
|
|
Yes, thanks for the reply. In my case, I can change them to not run detached, but I'm running 10 or so jobs a day, so it's not a preferred workaround. I, too, hope it's on the fix list.
|
|
Wed Mar 12, 2008 11:51 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
We are working on these issues. Hope to see relevant changes implemented in the next maintenance version for each product.
|
|
Wed Mar 12, 2008 2:24 pm |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
Wed Apr 02, 2008 11:33 am |
|
 |
|