Author |
Message |
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
debug.log will not appear when trace is enabled |
|
I'm running scheduler version 3.4.26 on our production system (Windows based) and we're having issues nightly with jobs getting hung. I went to options>>>log tab>>> and clicked trace enabled and no debug.log. I have 3.6.10 running our test system and do the same thing and the debug.log appears like it should. Am I doing something wrong? Any help would be appreciated because this issue has been going for weeks.
Nick
|
|
Wed Nov 03, 2010 1:21 pm |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
|
Be sure to restart the scheduler after enabling the trace.
|
|
Thu Nov 04, 2010 7:17 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
I did that and the debug.log file still isn't there. We're on our 2nd night of 24x7 "hanging". Without a good log we're unable to troubleshoot.
|
|
Thu Nov 04, 2010 8:56 am |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
|
Ok, the trace issue is because of the age of your installation ... the trace output to debug.log was not introduced until 3.5. In 3.4 you'll find trace files by job id (e.g., <job>.log)
Can you describe which jobs are hanging, their types, operations, etc. If they're script types, take a look in the corresponding <jobid>.log file and find the last executed operation.
|
|
Thu Nov 04, 2010 9:14 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
If I browse to \\condor\c$\Program Files (x86)\24x7 Automation 3\Performance Data all I see is .jpd extensions.
There are roughly around 75 of the jobs that are hanging. Basically what they do is call a vbs script that it turns runs some JAVA code that we use for our ETL data warehouse. The first jobs that kick would be the parent tables and all the child tables are semaphore based on the parent completion. In the past 2 nights I have basically just stopped the service and restarted it, and the jobs kicked off. We have the license for 3.6.10. Do you think it could be a bug in the 3.4 version that we're running?
Thanks.
|
|
Thu Nov 04, 2010 9:25 am |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
|
Is condor the machine where everything runs? Is there any indication that it's a network connection issue if not?
|
|
Thu Nov 04, 2010 10:21 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
Yes, Condor is the machine running 24x7. In reviewing the event logs, I'm not seeing any errors or issues with network connections.
|
|
Thu Nov 04, 2010 10:33 am |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
|
Presumably things were running before, so it's unlikely to be a bug.
Did you make any changes recently to either scripts or the system?
Any anti-virus updates blocking VBScript processes?
Any related events in windows app logs?
Did you check locks in the ETL database?
Do you have all jobs assigned to a single job queue?
|
|
Thu Nov 04, 2010 10:56 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
|
|
Presumably things were running before, so it's unlikely to be a bug.
Did you make any changes recently to either scripts or the system?
Any anti-virus updates blocking VBScript processes?
Any related events in windows app logs?
Did you check locks in the ETL database?
Do you have all jobs assigned to a single job queue? |
1) We recently removed all database connections and made the jobs run as VBscripts. That was over a week ago and this problem started 2 days ago.
2) No.
3) Nothing.
4) Ran script to check for blocking and no blocking.
5) No. We have 11 queues, not including the default.
At this point we are strongly looking to upgrade to 3.6.10. It may not magically fix it but at least we would have better logging. I also should have mentioned that we're running Windows 2003 server x64
|
|
Thu Nov 04, 2010 11:05 am |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
|
It may help the logging, yes, although the question of why you aren't seeing logs now remains. At the very least, if you have the trace turned on, you should be seeing logs for the jobs that are not hanging.
Is it possible there's another install directory elsewhere (perhaps under Program Files rather than Program Files (x86)? Can you do a search on the filesystem for any <integer>.log file names?
Or if you'd rather go ahead with the update, let me know what the log results look like afterwards.
|
|
Fri Nov 05, 2010 7:12 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
|
|
It may help the logging, yes, although the question of why you aren't seeing logs now remains. At the very least, if you have the trace turned on, you should be seeing logs for the jobs that are not hanging.
Is it possible there's another install directory elsewhere (perhaps under Program Files rather than Program Files (x86)? Can you do a search on the filesystem for any <integer>.log file names?
Or if you'd rather go ahead with the update, let me know what the log results look like afterwards. |
We ended up upgrading on Friday 11/5 and it seemed to help out a bunch. Every job ran during the weekend without hanging. One thing I'm working on since the upgrade is a pop up that displays everytime someone logs into the server. It basically says that 24x7 scheduler is running as a windows service. We had this in the previous version but it only popped up when you launched scheduler.
|
|
Mon Nov 08, 2010 1:03 pm |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
|
Make sure there is not still a shortcut in the Startup folder now that you're running it as a service.
|
|
Wed Nov 10, 2010 9:12 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
|
|
Make sure there is not still a shortcut in the Startup folder now that you're running it as a service. |
That worked. I think I'm getting closer to the underlying issue. When the errors occur at night there are messages in the event viewer "Application popup: cmd.exe - Application Error : The application failed to initialize properly (0xc0000142). Click on OK to terminate the application. " This error happens exactly the same time the error occur. There were also some errors about desktop heap.
|
|
Fri Nov 12, 2010 10:19 am |
|
|
SysOpJ
Joined: 20 Aug 2010 Posts: 95
|
|
|
Mon Nov 15, 2010 6:31 am |
|
|
nickb
Joined: 19 Oct 2010 Posts: 35 Country: United States |
|
|
|
Now I have an application error that is causing the jobs to error out.
Faulting application 24X7.EXE, version 3.6.1.0, faulting module PBVM70.DLL, version 7.0.3.10095, fault address 0x00255516.
|
|
Wed Nov 17, 2010 11:04 am |
|
|
|