SoftTree Technologies SoftTree Technologies
Technical Support Forums
RegisterSearchFAQMemberlistUsergroupsLog in
Job Queues still lock up
Goto page Previous  1, 2, 3, 4, 5  Next
 
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite View previous topic
View next topic
Job Queues still lock up
Author Message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Hi There, I was wondering if you have any update on this issue yet?
This continues to happen from time to time, but I have cron jobs in place that alert me.

Let me know.

Thanks
Mon Aug 30, 2010 3:46 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7847

Post Reply with quote
Hi,

We are expcting new maintenance version this week. I hope to see this issue resolved in the new version.
Mon Aug 30, 2010 4:08 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Excellent, Can I get a copy of the new version?
Mon Aug 30, 2010 4:18 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7847

Post Reply with quote
Please wait for a couple of days. It should be available soon.
Mon Aug 30, 2010 5:54 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Hi there,

I am not familiar with the issue number for this.
Please let me know if the issue in this thread was resolved in the lastest release.

Thanks!
Mon Oct 11, 2010 4:06 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7847

Post Reply with quote
Hi,

I believe the issue number is 24x7-10781. This is the same issue that covers thread synchronization problems when code is run on multi-processor systems.
Mon Oct 11, 2010 11:44 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Excellent.

Thanks!
Tue Oct 12, 2010 11:26 am View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Hi,

We upgraded to the latest version of the scheduler multi-platform edition, however I am still having issues with the queues locking up. I have enabled tracing to get you log files once one of them locks up again. A question, once the queue locks up the only way I have found to clear the queues is to shut down the master scheduler delete the .q files and then restart. Is there any other way I could clear the queue without re-starting? Sometimes this happens at very inconvient times when critical job processes are running and I would prefer to keep it up and running.

Also I need this to be a priority fix because it's just annoying and causes alot of problems on my production instance.

Thanks.
Mon Dec 20, 2010 5:22 pm View user's profile Send private message
SysOpJ



Joined: 20 Aug 2010
Posts: 95

Post Reply with quote
Could you send us a recent debug.log file after a queue lockup?
Also, can you tell me the exact type and version of the OS, and number of processors on the system?

The queue is locked because the queue manager is waiting for the current job to exit and that job or resource it is using is stuck. Cleaning that
resource or job would release the queue.

PS. The resource can be local or remote, depending on the job setup
Tue Dec 21, 2010 11:16 am View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
All of our jobs are remote jobs.
Right now things are not locking up, so I will get you logs when it does.

Thanks.
Wed Dec 22, 2010 2:17 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
One of my queues finally locked up... Job 1234 got locked up on a queue

Here's pieces of the log from the agent:

2011-01-06 15:40:03,146 [Job #1548 - 010_check_ach_report_02_trigger] DEBUG com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl - Starting job=010_check_fidfm2201, runtime id=6546870
2011-01-06 15:40:03,154 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): start
2011-01-06 15:40:03,155 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: /opt/24x7_Scheduler/auth.pl
2011-01-06 15:40:03,185 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - execProcess(): command line [/home/srv_etl/file_scripts/file_checker.ksh /loads/dropoff/fidelity/FIDFM2201.TXT fidelity Y] in work directory [/home/srv_etl/file_scripts]
2011-01-06 15:40:03,185 [Thread-1538847] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner$TimeoutVerifier - run(): start
2011-01-06 15:40:03,186 [Thread-1538847] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner$TimeoutVerifier - run(): timeout check not required
2011-01-06 15:40:03,186 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - runAs() username=srv_etl command=/home/srv_etl/file_scripts/file_checker.ksh,/loads/dropoff/fidelity/FIDFM2201.TXT,fidelity,Y workDir=/home/srv_etl/file_scripts
2011-01-06 15:40:03,186 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - exec : ./runas.pl,srv_etl,/home/srv_etl/file_scripts/file_checker.ksh /loads/dropoff/fidelity/FIDFM2201.TXT fidelity Y,/home/srv_etl/file_scripts
2011-01-06 15:40:03,204 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): start
2011-01-06 15:40:03,204 [Thread-1538848] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner$TimeoutVerifier - run(): start
2011-01-06 15:40:03,216 [Thread-1538851] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner$TimeoutVerifier - run(): start
2011-01-06 15:40:03,367 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): end
2011-01-06 15:40:03,368 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - isFailed(...) : exit code 1
2011-01-06 15:40:03,368 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner - isFailed(...) : Enumeration found [0]
2011-01-06 15:40:03,368 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2011-01-06 15:40:03,495 [Job #1234 - 010_check_fidfm2201] ERROR com.softtreetech.jscheduler.business.runner.JobExecutorImpl - Job errors: Job completed with exit code 1. This exit code does not satisfy job exit code condition. Job failed.
2011-01-06 15:40:03,496 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl - Error occurred while running job=010_check_fidfm2201, runtime id=6546870
2011-01-06 15:40:03,600 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl - Starting job=010_check_ach_report_03c_02c_trigger, runtime id=6546872



Here's the log from the master


2011-01-06 15:40:03,485 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.runner.RemoteJobRunner - runJob
com.softtreetech.jscheduler.common.SchedException: Job completed with exit code 1. This exit code does not satisfy job exit code condition. Job failed.
at com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl.executeJob(Unknown Source)
at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:294)
at sun.rmi.transport.Transport$1.run(Transport.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:149)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:466)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:707)
at java.lang.Thread.run(Thread.java:595)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:247)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:223)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:126)
at com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl_Stub.executeJob(Unknown Source)
at com.softtreetech.jscheduler.business.runner.RemoteJobRunner.runJob(Unknown Source)
at com.softtreetech.jscheduler.business.runner.AbstractJobRunner.runJobIgnoringErrorsIfNeeded(Unknown Source)
at com.softtreetech.jscheduler.business.runner.AbstractJobRunner.startExecution(Unknown Source)
at com.softtreetech.jscheduler.business.runner.AbstractJobRunner.execute(Unknown Source)
at com.softtreetech.jscheduler.business.runner.JobExecutorImpl.execute(Unknown Source)
at com.softtreetech.jscheduler.business.runner.JobExecutorImpl$1.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)



From what I can see it appears that maybe the job might have kicked off twice, from the logs above can you tell that?

Other information that might be useful:
If I go into the queue monitor for the queue that locked up here's the info I see:


Queue: file_ops_2
Job#: 8207997
Time queued: 6-Jan-2011 15:40:00
Priority: Normal
Status: RUNNING
Job ID: 1234
Job Name: 010_check_fidfm2201
Time Job started: 6-Jan-2011 15:40:02
Size: 1
Size in Queue: 1
System Process ID: 8207997 file_ops_2
Thu Jan 06, 2011 5:17 pm View user's profile Send private message
SysOpJ



Joined: 20 Aug 2010
Posts: 95

Post Reply with quote
Thanks for the update. We're looking through the trace now to see if anything stands out.
Fri Jan 07, 2011 9:57 am View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Thanks, If this can be interim fix that would be great, because it's getting annoying when they lock up.
Let me know.
Fri Jan 07, 2011 3:37 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7847

Post Reply with quote
Hi,

There are couple of suspicious lines in the posted trace from the agent. For example,

2011-01-06 15:40:03,146 [Job #1548 - 010_check_ach_report_02_trigger] DEBUG com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl - Starting job=010_check_fidfm2201, runtime id=6546870

Job 010_check_fidfm2201 has Job #1234

Similarly, 2011-01-06 15:40:03,600 [Job #1234 - 010_check_fidfm2201] DEBUG com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl - Starting job=010_check_ach_report_03c_02c_trigger, runtime id=6546872


The above would make sense if it were recorded in the master scheduler trace, but not in the agent trace. All job control and chaining is supposed be handled by the scheduler. Can you describe how these jobs relate to each other? How do you start them?

Are there any records in the agent trace indicating other activities of jobs 010_check_ach_report_02_trigger and 010_check_ach_report_03c_02c_trigger?


The current theory is the scheduler queue is stuck because the queue is waiting for the remote job to terminate. Your agent trace indicates that instead of terminating, the job is triggering some chain reaction. The fragment of the log too short to see what is happens after that.
Sat Jan 08, 2011 1:02 pm View user's profile Send private message
seanc217



Joined: 23 May 2007
Posts: 272

Post Reply with quote
Hi there,

Basically, I have trigger files that get created based on some event, like say when I receive a file.
These jobs check for the existence of the trigger file every 5 minutes.

When one is found some jobs are kicked off that's basically it.

I have the full debug file, but it's too big to post here. If you like I can e-mail both the master and agent logs to you.
Just let me know where to send them.

Thanks!
Sat Jan 08, 2011 9:50 pm View user's profile Send private message
Display posts from previous:    
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite All times are GMT - 4 Hours
Goto page Previous  1, 2, 3, 4, 5  Next
Page 2 of 5

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


 

 

Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.