SoftTree Technologies SoftTree Technologies
Technical Support Forums
RegisterSearchFAQMemberlistUsergroupsLog in
Left files in Queue directory
Goto page Previous  1, 2, 3, 4  Next
 
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite View previous topic
View next topic
Left files in Queue directory
Author Message
Redemann



Joined: 11 Jul 2007
Posts: 90
Country: Germany

Post Reply with quote
I have some news regarding this issue:

Please have a look at these debug-logs taken from the remote agent:

008-05-22 14:05:02,910 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): start
2008-05-22 14:05:02,911 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: ftp_connect: 127.0.0.1
2008-05-22 14:05:02,933 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: root login ok
2008-05-22 14:05:02,933 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - execProcess(): command line [date > /tmp/check_schedule.txt] in work directory [/]
2008-05-22 14:05:02,933 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runAs() username=root command=date,>,/tmp/check_schedule.txt workDir=/
2008-05-22 14:05:02,933 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - exec : ./runas.pl,root,date > /tmp/check_schedule.txt,/
2008-05-22 14:05:02,987 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): start
2008-05-22 14:05:03,079 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): end
2008-05-22 14:05:03,080 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - isFailed(...) : exit code 0
2008-05-22 14:05:03,080 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-05-22 14:05:03,080 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): end


2008-05-22 14:10:04,961 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): start
2008-05-22 14:10:04,961 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: ftp_connect: 127.0.0.1
2008-05-22 14:10:04,985 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: root login ok
2008-05-22 14:10:04,985 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - execProcess(): command line [date > /tmp/check_schedule.txt] in work directory [/]
2008-05-22 14:10:04,985 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runAs() username=root command=date,>,/tmp/check_schedule.txt workDir=/
2008-05-22 14:10:04,985 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - exec : ./runas.pl,root,date > /tmp/check_schedule.txt,/
2008-05-22 14:10:05,065 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): start
2008-05-22 14:10:05,159 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): end
2008-05-22 14:10:05,160 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - isFailed(...) : exit code 0
2008-05-22 14:10:05,160 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start



2008-05-22 13:03:05,384 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): start
2008-05-22 13:03:05,385 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: ftp_connect: 127.0.0.1
2008-05-22 13:03:05,407 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: informix login ok
2008-05-22 13:03:05,408 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - execProcess(): command line [/home/dcs/bin/check_views 2>/dev/null] in work directory [/home/informix]
2008-05-22 13:03:05,408 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runAs() username=informix command=/home/dcs/bin/check_views,2>/dev/null workDir=/home/informix
2008-05-22 13:03:05,408 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - exec : ./runas.pl,informix,/home/dcs/bin/check_views 2>/dev/null,/home/informix
2008-05-22 13:03:05,461 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): start
2008-05-22 13:03:05,559 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): end
2008-05-22 13:03:05,559 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - isFailed(...) : exit code 0
2008-05-22 13:03:05,559 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-05-22 13:03:05,559 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): end





2008-05-22 14:10:04,962 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): start
2008-05-22 14:10:04,962 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: ftp_connect: 127.0.0.1
2008-05-22 14:10:04,999 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: informix login ok
2008-05-22 14:10:05,002 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - execProcess(): command line [/home/dcs/bin/check_views 2>/dev/null] in work directory [/home/informix]
2008-05-22 14:10:05,002 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runAs() username=informix command=/home/dcs/bin/check_views,2>/dev/null workDir=/home/informix
2008-05-22 14:10:05,002 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - exec : ./runas.pl,informix,/home/dcs/bin/check_views 2>/dev/null,/home/informix
2008-05-22 14:10:05,065 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): start
2008-05-22 14:10:05,160 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): end
2008-05-22 14:10:05,160 [Job #69 - bamacc:check_views] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - isFailed(...) : exit code 0





First look at the debug for job #67. The job at 14:05 finishes with "killProcess start" and then "runJob(): end". The last log entry for job #67 is the "killProcess start" and "unJob(): end" is missing! For exactly this job the queue file isn't deleted the 24x7 assumes that the job is still running!

At nearly the exact time another job (#69) was started. This one ended normally.

#67 and #69 do *NOT use the same Queue.

So it seems that the "killProcess start" is somehow unfinished. Why? It seems that this only happens if there is another job running at the same time (IMHO).
In addition you can see that job #69 at 13:03 finished with "killProcess start" and then "runJob(): end" and the one at 14:10 ended with "isFailed(...) : exit code 0".

I checked the log over and over and I'm sure I did not miss any lines. (I saved the log and if it would be useful for you I could send it via mail if you want).

Seems where are getting closer...

Thanks in advance for your help.
Thu May 22, 2008 9:07 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
Sorry for delay with the response. The forum was closed for weekend because of the backend database server upgrades.

Yes, your observation brings us closer to the finding of the root cause. We knew the job didn't complete and the agent never reported job completion as a reason for the job getting stuck in the queue. Now we know this occurs when the job clean up process hangs. 24x7 always executes some kind of a "kill process" command to terminate external job process in case it is still running or not completely shutdown. The implementation of the "kill" is different on different systems. From the trace I can see the job is running on some Unix system, therefore the "kill" command is just a regular Unix "kill [pid]" command.

Why could system kill command hang on your system? Could it display an interactive prompt in case it doesn't like the operation and wait for interactive user input?

Can you put a script file with "kill" name and no extension into 24x7 directory and use it as a wrapper for the system kill command? If you can, please make the script to invoke the system command from /usr/bin directory (or wherever it is on your system) passing the command line parameter and also logging every call from 24x7 for this script into some log file, including the command line parameter value, and also logging the return code and response from the system kill command. This should help us to get to the bottom of the issue and find out what happens when the 'kill" operation hangs.
Mon May 26, 2008 12:56 pm View user's profile Send private message
Redemann



Joined: 11 Jul 2007
Posts: 90
Country: Germany

Post Reply with quote
>> Sorry for delay with the response. The forum was closed for weekend because of the backend database server upgrades.
No problem.

>> Can you put a script file with "kill" name and no extension into 24x7 directory and use it as a wrapper...

Sorry. I'm not sure if I understand that. How should this skript look like?

For instance (?):

/usr/local/24x7_Scheduler/kill (kill is an executable shell script)

kill-Script:
#!/bin/bash
/usr/bin/kill -TERM <what>
echo "log something into logfile" >> logfile


Can you provide me an example please?

PS : The remote agent is running AIX 5.3 (latest service level and latest JAVA 1.4.2 package from IBM)
Tue May 27, 2008 5:26 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
I think your example script is just fine. The script is going to receive 1 command line parameter which is the system process id of the process it needs to terminate in case that process is still running. I would add one more echo line in the beginning to have the script print some diagnostic message before calling /usr/bin/kill, for example,

Code:
#!/bin/bash
echo "-------------"
echo "running kill for process $1" >> kill.log
/usr/bin/kill -TERM $1 >> kill.log
echo "kill completed with exit code $?" >> kill.log



Maybe also add current date-time priting to this script, to easy the troubleshooting
Tue May 27, 2008 9:10 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
Hi. Could you please update us on the status of this issue? Did the implementation of a "custom" kill command resolve the issue?
Wed Jun 11, 2008 12:31 am View user's profile Send private message
Redemann



Joined: 11 Jul 2007
Posts: 90
Country: Germany

Post Reply with quote
Sorry, I was out of office for 2 weeks and had no time to get into this.
Last 2 weeks no more problems regarding this issue occured.

I'll keep you up to date if I find time to check this...

Thank you.
Wed Jun 11, 2008 3:29 am View user's profile Send private message
Redemann



Joined: 11 Jul 2007
Posts: 90
Country: Germany

Post Reply with quote
Took a long time but I just was able to take care about this case and continue...

I just added a kill script into the 24x7 directory on the remote agent to check out your suggestion but the Scheduler seems to ignore it's existence. I even restarted the process.

root@bam00(bam_tcp):"/acc/24x7_Scheduler"$ cat kill
#!/usr/bin/bash
echo "-------------"
echo "`date` : running kill for process $1" >> kill.log
/usr/bin/kill -TERM $1 >> kill.log
echo "`date` : kill completed for $1 with exit code $?" >> kill.log

No kill.log is created.

Any idea?
Thu Jul 10, 2008 5:43 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
Please check debug.log in /acc/24x7_Scheduler for text fragments like "/bin/sh kill" and what happens after that. This should provide a clue for why your kill script is not being picked.
Thu Jul 10, 2008 6:40 pm View user's profile Send private message
Redemann



Joined: 11 Jul 2007
Posts: 90
Country: Germany

Post Reply with quote
I cannot find any hints:

root@bam00(bam_tcp):"/acc/24x7_Scheduler"$ grep kill debug.log
2008-07-18 15:30:00,298 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:30:19,920 [Job #85 - bamacc:pruef_bam] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:35:00,419 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:39:03,829 [Job #85 - bamacc:pruef_bam] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:40:00,385 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:45:00,279 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:49:04,841 [Job #85 - bamacc:pruef_bam] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:50:00,340 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-07-18 15:55:00,340 [Job #67 - bamacc:check_schedule] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start

Only the "normal" kill-messages. The kill-script seems to be ignored.

The scripts exists (of course):

root@bam00(bam_tcp):"/acc/24x7_Scheduler"$ ls -l kill
-rwxr-xr-- 1 root system 196 Jul 10 11:08 kill
Fri Jul 18, 2008 10:01 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
Sorry, I must be looking at some other version of the scheduler. I will try to find out why the output is different.
Fri Jul 18, 2008 11:36 am View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
I am having similar issues where my queue get locked up and then nothing gets executed. Is there any solution to this issue?

Thanks.
Mon Jul 06, 2009 12:11 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
Technically this is a side effect of some other issue. The root cause - the job is not clearing some resources and the system is unable to terminate the job thread or some external processes associated with that thread. The thread gets stuck in the queue.

Does this occur randomly or you see some patterns? Morning jobs?, jobs A, B, and C running concurrently, large files and longer job runs? Something else?
Mon Jul 06, 2009 5:35 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
I see it in the morning most of the time when I am running jobs to check for trigger files. The job checks for a trigger file and if it's there will kick off a job via notification event. If the trigger file is not there, then the job exits with a status of 1 however I do not disable the job. It's very possible that some concurrency is going on here, where multiple jobs are getting kicked off at the same time because I check for files every 5 mintues for my high priority jobs.

Let me know if I can provide more information.

Thanks.
Mon Jul 06, 2009 7:02 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7966

Post Reply with quote
Can you find out whether the jobs getting stuck in queue are doing some regular stuff or doing something different when they get stuck? Is there a detailed step-by-step log of their activities available?


BTW, the next maintenance release should support email alerts for queues loaded with lots of jobs. The alert is not going to clear such jobs, but at least it is going to simplify the management, as it would automatically notify you when an attention is required, for example when a queue is filled with x-number of jobs.
Tue Jul 07, 2009 12:42 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
The scripts that sometime get stuck are simple file checker scripts that are just looking for files on the remote agent. Nothing real tricky going on here. I was hoping that someone may have figured out what's going on with this, but it appears there is no definitive answer. One thing I do notice is that I have been running in GUI mode on a windows server since Saturday and I have not seen this issue happen. It seems to be when the scheduler is in service mode the issue happens. Thanks for any input you can provide.
Tue Jul 07, 2009 2:05 pm View user's profile Send private message
Display posts from previous:    
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite All times are GMT - 4 Hours
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


 

 

Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.