SoftTree Technologies SoftTree Technologies
Technical Support Forums
RegisterSearchFAQMemberlistUsergroupsLog in
Scheduler Locks Up--out of space in queue
Goto page 1, 2  Next
 
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite View previous topic
View next topic
Scheduler Locks Up--out of space in queue
Author Message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Scheduler Locks Up--out of space in queue Reply with quote
Fairly regularly, one of my 24x7 instances freezes up and has to be restarted. The last message in the log is

7/6/2009 7:36:11.495000 1 0 0 24x7 Scheduler Unable to add new job to the "DataFilesJobs" job queue. Maximum queue size reached. 24x7 is waiting for free space to become available in the queue.

Is changing the queue size supposed to help this problem? Or is there something else I should be doing?
Mon Jul 06, 2009 9:16 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6500

Post Reply with quote
How large the queue size value is now?

When this message appears again, please check how many files you have in C:\Program Files\24x7 Automation 3\Queue\DataFilesJobs folder. If there are many, you are getting some kind of runaway queue issue. Also please check amount of free disk space on the same disk volume.
Mon Jul 06, 2009 10:02 am View user's profile Send private message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Reply with quote
Thanks. Great Information. Since the last restart at 8:10am CDT today, there are 132 files in the ..Queue\DataFilesJobs queue. I do have jobs in that queue that execute every minute, so should the old ones that have run and finished go away from the folder?

This brings up the question of whether jobs should stay running in a "loop and sleep" mode or terminate and be run again by schedule? In the past, I have used the "loop and sleep" philosophy, but quit doing that when it seemed that the scheduler was crashing more often due to these jobs--memory leaks perhaps.

(FYI--there's 67GB free on the C:\ drive.)
Mon Jul 06, 2009 10:09 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6500

Post Reply with quote
Yes, once completed, they should disappear from the queue. Is there a good way to find out why they are still sitting in the queue? If logging is enabled for these jobs, does the schedule.log show matching pairs of started and finished messages for al these jobs? Do you use some sort of custom logging to track job step-by-step progress and to find out at which step they die? If you find non-matching start/finish records, can you figure out somehow which step/activities have been performed in jobs that got stuck? I'm guessing that these every minute jobs run different steps depending on some conditions they check; sometimes they just spin and do nothing; sometimes they do something and in that case they get stuck.
Mon Jul 06, 2009 11:49 am View user's profile Send private message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Reply with quote
I may have found the problem! Many of the jobs in the particular queue ended with the Exit verb in the JAL code. Would the Exit verb cause the jobs to end without being removed from the queue? (I have taken out the Exits--there was no need for them--they were put in by a COBOL programmer, I think...) Thanks for your help!
Mon Jul 06, 2009 4:11 pm View user's profile Send private message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Reply with quote
Well, taking out the "Exit" at the end of many of the jobs didn't help. The scheduler ran okay overnight, but there were 2000+ files in the queue folder. The log shows successful start/end for almost every job.

It seems that as long as I manually cleanup up the .q and .tmp files, the scheduler stays stable.

Any ideas on what to try next? (FYI--I am running 3.5.2 on this particular box. I have another instance of 24x7 running 3.4.26 as a service, and I never see any files accumulating in the queue folder.)
Tue Jul 07, 2009 2:01 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6500

Post Reply with quote
"Exit" doesn't affect anything resource-wise; it is used for script flow control only, in case a script needs to be aborted in a middle.

So far it looks like the scheduler is unable to clear jobs from the queue, maybe because not all job allocated resources are closed. Basically it waits for job threads to terminate and that doesn't seem to happen.

Please verify that the jobs are set detached and synchronous, in other words they run as separate system processes. This way, the scheduler can kill them if it needs to.
Tue Jul 07, 2009 7:02 pm View user's profile Send private message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Reply with quote
The jobs in the queues that seem to be having trouble are Detached and Asyncronous.

Also, now that I'm clearing out the queue folders by hand to keep the scheduler running, I eventually have problems with File Watch jobs. Even though the jobs are definetely not running, they fail to run and a message is displayed in the log:

7/8/2009 8:04:06.744000 0 94 0 FactorAckReport Semaphore file(s) \\db1\lw\DATA\PARJCT\FAC101\*.txt found. Event is ignored because the job is already queued.
7/8/2009 8:04:06.916000 0 92 0 FactorSalesExport Semaphore file(s) \\db1\lw\DATA\PARJCT\FAC001\*.txt found. Event is ignored because the job is already queued.

When I restart the scheduler, these file-watch jobs go ahead and run.

Could this problem be related? (By the way, the jobs in the log above are Detached, Syncronous.)

Thanks for continuing to investigate!
Wed Jul 08, 2009 9:21 am View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
This seems very similar to the issues I am having with jobs getting stuck in the queue.
I am at a loss to what is causing this. I did find that when I run the scheduler in GUI mode (I have had it up in GUI mode since this past Saturday) that the issue does not happen. If you figure anything out, please let me know.

Thanks.
Wed Jul 08, 2009 10:02 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6500

Post Reply with quote
I believe you are running different versions of 24x7 on different platforms. This is a kind of a common problem with, it is just that in queue you can see well visible side effects of the problem, while the root cause is somewhere else. As an analogy, perhaps you cannot access a website and you get an error displayed in your browser. But is it the browser that causes the error, or the web site or some of the networks connecting the browser to that web site? Same here you see files getting stuck in queues, but why are they getting stuck?
Wed Jul 08, 2009 10:26 am View user's profile Send private message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Reply with quote
I believe I've found a workaround for the time being. I wrote a "QueueCleanup" job that runs every 20 minutes and deletes all files older than 20 minutes from the \program files\24x7 automation 3\queue\ folders. This seems to be working well. The scheduler has stayed running flawlessly for 24 hours now. I sure would like to know why the queue files are building up, however.

Would it help if I sent you the JAL-code for one of the jobs that is leaving its .q files in the queue folder? If so, to what email address should I send it?
Tue Jul 21, 2009 9:36 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6500

Post Reply with quote
Please email it to the supportATsofttreetech.com along with a reference to this message thread.
Tue Jul 21, 2009 11:35 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Hi Bill, Is it possible to get a copy of the queue clean up job you wrote?

Thanks!
Wed Jul 22, 2009 10:40 am View user's profile Send private message
BillR69



Joined: 11 May 2007
Posts: 29
Country: United States

Post Reply with quote
seanc217--would love to send you the code, but I'm limited by our company policy on in-house developed code. Concept is really simple, however.

1. Use FileFindFirst, LoopWhile/FileFindNext to get the names of all the sub-directories in the "Queue" folder. I stored the names of the folders in a string var and delimited them with "~". (Make sure to skip the "files" named "." and ".." that come back with FileFindFirst/Next.
2. Setup a LoopWhile to go through the list of sub-directories you created in step 1. Use an inner LoopWhile/FileFindFirst/FileFindNext to process each sub-directory (queue folder). Use FileDate i_PathAndFile fdate, FileTime i_PathAndFile ftime, DateTime fdate,ftime,fdatetime, DateTimeDiff fdatetime dttimenow diff to determine how long each file has been sitting out there. Use FileDelete to delete the ones that are too old.
Thu Jul 23, 2009 11:09 am View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Excellent. Thanks for the tips!!
Fri Jul 24, 2009 1:59 pm View user's profile Send private message
Display posts from previous:    
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite All times are GMT - 4 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


 

 

Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.