Author |
Message |
seanc217
Joined: 23 May 2007 Posts: 272
|
|
killing job shuts down agent |
|
When I kill a job out of the job queue it appears that the agent it's running on shuts down.
Is there a reason why this is happening?
Thanks
|
|
Wed Mar 19, 2008 10:42 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
I don't think this should be happening. . What kind of agent is that? What the job is doing when you are aborting it?
Please let me also know which versions of the scheduler and agent you are running.
|
|
Wed Mar 19, 2008 11:12 am |
|
 |
seanc217
Joined: 23 May 2007 Posts: 272
|
|
|
|
The scheduler and agent are version version 4.1 build 247. Both the master and agents run a red hat linux box.
|
|
Wed Mar 19, 2008 11:20 am |
|
 |
seanc217
Joined: 23 May 2007 Posts: 272
|
|
|
|
Here are the log entries that confirm this is happening...
19-Mar-2008 11:12:44 AM 3 null 177 02_run_tsys_tx_daily_multi_loadProcess has been terminated!
19-Mar-2008 11:12:44 AM 3 null 177 02_run_tsys_tx_daily_multi_loadProcess has been terminated!
19-Mar-2008 11:12:48 AM 2 null null 24x7 Scheduler 24x7 Remote Agent terminating...
When I kill the job I am killing it from the master in the GUI. I kill the job using either the queue manager or the job monitor.
|
|
Wed Mar 19, 2008 11:24 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
From which log is that, agent or scheduler?
|
|
Wed Mar 19, 2008 11:28 am |
|
 |
seanc217
Joined: 23 May 2007 Posts: 272
|
|
|
|
Opps sorry. That log is from the agent where I killed the job.
Thanks
|
|
Wed Mar 19, 2008 12:01 pm |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
I have tried several things but so far I'm unable to reproduce this issue. Please enable tracing on the agent side (Tools/Options menu; Log tab; Trace Enabled) and post section of debug.log related to remote job termination process.
|
|
Wed Mar 19, 2008 6:16 pm |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
Just another thought.... Can the job in question watch for the exit signal and attempt to terminate the entire process tree?
|
|
Wed Mar 19, 2008 7:05 pm |
|
 |
seanc217
Joined: 23 May 2007 Posts: 272
|
|
|
|
Hi there here are the entries from the debug.log file...
2008-03-20 09:25:53,996 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob(): start
2008-03-20 09:25:54,000 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.security.SecurityService - authNativeUser: /opt/24x7_Scheduler/auth.pl
2008-03-20 09:25:54,024 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - execProcess(): command line [/home/srv_etl/scripts/multi_job_run.ksh dw_s1 tsys_base_daily Y Y /home/srv_etl/par_files/dw_s1/tsys_s1/tsys_s1.par /loads/work/etl/inbound/tsys/base_daily Y /home/srv_etl/file_lister_output/tsys_base_daily_file.txt] in work directory [/home/srv_etl/scripts]
2008-03-20 09:25:54,024 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runAs() username=srv_etl command=/home/srv_etl/scripts/multi_job_run.ksh,dw_s1,tsys_base_daily,Y,Y,/home/srv_etl/par_files/dw_s1/tsys_s1/tsys_s1.par,/loads/work/etl/inbound/tsys/base_daily,Y,/home/srv_etl/file_lister_output/tsys_base_daily_file.txt workDir=/home/srv_etl/scripts
2008-03-20 09:25:54,024 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - exec : ./runas.pl,srv_etl,/home/srv_etl/scripts/multi_job_run.ksh dw_s1 tsys_base_daily Y Y /home/srv_etl/par_files/dw_s1/tsys_s1/tsys_s1.par /loads/work/etl/inbound/tsys/base_daily Y /home/srv_etl/file_lister_output/tsys_base_daily_file.txt,/home/srv_etl/scripts
2008-03-20 09:25:54,032 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - waitForProcess(): start
2008-03-20 09:25:54,032 [Thread-4] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner$TimeoutVerifier - run(): start
2008-03-20 09:25:54,032 [Thread-4] DEBUG com.softtreetech.jscheduler.business.runner.AbstractJobRunner$TimeoutVerifier - run(): end due to zero timeout
2008-03-20 09:26:01,356 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - runJob():
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1001)
at java.lang.Thread.join(Thread.java:1054)
at com.softtreetech.jscheduler.business.runner.ProgramJobRunner.OĆ0000(Unknown Source)
at com.softtreetech.jscheduler.business.runner.ProgramJobRunner.runJob(Unknown Source)
at com.softtreetech.jscheduler.business.runner.AbstractJobRunner.do(Unknown Source)
at com.softtreetech.jscheduler.business.runner.AbstractJobRunner.Ć00000(Unknown Source)
at com.softtreetech.jscheduler.business.runner.AbstractJobRunner.execute(Unknown Source)
at com.softtreetech.jscheduler.business.runner.JobExecutorImpl.execute(Unknown Source)
at com.softtreetech.jscheduler.business.agent.remote.RemoteAgentImpl.executeJob(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:261)
at sun.rmi.transport.Transport$1.run(Transport.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:144)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:460)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:701)
at java.lang.Thread.run(Thread.java:534)
2008-03-20 09:26:01,358 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-03-20 09:26:01,486 [Job #159 - 01_run_tsys_base_daily_multi_load] ERROR com.softtreetech.jscheduler.business.runner.JobExecutorImpl - Job errors: Process has been terminated!
|
|
Thu Mar 20, 2008 9:30 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
2008-03-20 09:26:01,358 [Job #159 - 01_run_tsys_base_daily_multi_load] DEBUG com.softtreetech.jscheduler.business.runner.ProgramJobRunner - killProcess start
2008-03-20 09:26:01,486 [Job #159 - 01_run_tsys_base_daily_multi_load] ERROR com.softtreetech.jscheduler.business.runner.JobExecutorImpl - Job errors: Process has been terminated!
This looks ok. What did you get after the "Job errors: Process has been terminated!" line?
|
|
Thu Mar 20, 2008 10:04 am |
|
 |
seanc217
Joined: 23 May 2007 Posts: 272
|
|
|
|
That was all the debug.log shows, but the agent still shut down.
Thanks.
|
|
Thu Mar 20, 2008 10:43 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7949
|
|
|
|
We failed to reproduce this issue in the lab.
Can you investigate possibilities that when that particular process receives a terminate signal it attempts to terminate the entire process tree? To test that theory, try writing a simple script starting the process and pausing after process completion. Kill the process using its id reported by ps command and see if the script will terminate too.
By the way, what does the following command return on the agent side?
java version
|
|
Wed Apr 02, 2008 11:41 am |
|
 |
|