SoftTree Technologies SoftTree Technologies
Technical Support Forums
RegisterSearchFAQMemberlistUsergroupsLog in
job error creates many repeating jobs in queue

 
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite View previous topic
View next topic
job error creates many repeating jobs in queue
Author Message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post job error creates many repeating jobs in queue Reply with quote

Greetings:

I've just installed Windows2000Professional from scratch on a PC, patched it thoroughly from Microsoft, installed 24x7 v3.2.1, and copied over the jobs database from our old Win95 system (also running v3.2.1).

This new system seems much more stable except for one persistent problem. Our jobs are 99+% telnet connections to IBM RS/6000 AIX servers running v4.3.3. They open a telnet session, execute a script on the RS/6000, and close the telnet session. Some of the jobs also perform ftp transfers to/from the RS/6000's.

The problem:

Occasionally, one of the jobs will experience an error on the telnetopen statement saying 'login failed or host not found' (I specify the exact IP address in the telnetopen statement). Funny, it was there 10 minutes ago when you last ran the job, and it is there when you retry the job. My jobs are setup to retry on failure, waiting 60 seconds, and retrying twice. (Up to this point I was having the exact same problem, much more often, on the Win95 box.) On this new system, this will trigger dozens of repeat jobs to appear in succession in the job queue the original ran in. This, of course, ties up the system and nothing gets done. Why is this system creating these repeat jobs in the job queue instead of following the retry procedures set up in the job properties?????

Also, does anyone have any experience using 24x7 with AIX, specifically telnet functions, that could explain why I keep getting the 'login failed or host not found' error?

Thanks!

Neil.....

Tue Mar 19, 2002 1:47 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7948

Post Re: job error creates many repeating jobs in queue Reply with quote

You are having two separate problems:

1. Host not found: Try increasing login timeout value using TelnetConfig statement. I think that should help. Default timeout is 10 seconds, try for instance 30 seconds. Please keep in mind that call to TelnetConfig affects only the job where it appears, so you will need to add it to every job.
Example: TelnetConfig "TIMEOUT", 30

2. Jobs restart/repeat: Do you have "ignore errors" option for these jobs turned on or off? Do you run these jobs asynchronously or synchronously? If you run them asynchronously using "asynchronous job" option try running them synchronously or detached so that they cannot interfere with each other.

: Greetings: I've just installed Windows2000Professional from scratch on a PC,
: patched it thoroughly from Microsoft, installed 24x7 v3.2.1, and copied
: over the jobs database from our old Win95 system (also running v3.2.1).

: This new system seems much more stable except for one persistent problem. Our
: jobs are 99+% telnet connections to IBM RS/6000 AIX servers running
: v4.3.3. They open a telnet session, execute a script on the RS/6000, and
: close the telnet session. Some of the jobs also perform ftp transfers
: to/from the RS/6000's.

: The problem: Occasionally, one of the jobs will experience an error on the
: telnetopen statement saying 'login failed or host not found' (I specify
: the exact IP address in the telnetopen statement). Funny, it was there 10
: minutes ago when you last ran the job, and it is there when you retry the
: job. My jobs are setup to retry on failure, waiting 60 seconds, and
: retrying twice. (Up to this point I was having the exact same problem,
: much more often, on the Win95 box.) On this new system, this will trigger
: dozens of repeat jobs to appear in succession in the job queue the
: original ran in. This, of course, ties up the system and nothing gets
: done. Why is this system creating these repeat jobs in the job queue
: instead of following the retry procedures set up in the job
: properties?????

: Also, does anyone have any experience using 24x7 with AIX, specifically
: telnet functions, that could explain why I keep getting the 'login failed
: or host not found' error?

: Thanks!

: Neil.....

Tue Mar 19, 2002 2:08 pm View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

Okay, this is implemented. I'll let you know how this works (or doesn't).
: You are having two separate problems: 1. Host not found: Try increasing login
: timeout value using TelnetConfig statement. I think that should help.
: Default timeout is 10 seconds, try for instance 30 seconds. Please keep in
: mind that call to TelnetConfig affects only the job where it appears, so
: you will need to add it to every job.
: Example: TelnetConfig "TIMEOUT", 30

Ignore errors is not checked. Asynchronous process is not checked. I'll schedule a couple of the more repetitive offenders for detached process and let you know.
: 2. Jobs restart/repeat: Do you have "ignore errors" option for
: these jobs turned on or off? Do you run these jobs asynchronously or
: synchronously? If you run them asynchronously using "asynchronous
: job" option try running them synchronously or detached so that they
: cannot interfere with each other.

Thanks!

Tue Mar 19, 2002 3:43 pm View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

: You are having two separate problems: 1. Host not found: Try increasing login
: timeout value using TelnetConfig statement. I think that should help.
: Default timeout is 10 seconds, try for instance 30 seconds. Please keep in
: mind that call to TelnetConfig affects only the job where it appears, so
: you will need to add it to every job.
: Example: TelnetConfig "TIMEOUT", 30

This seems to have resolved this issue, THANKS!

: 2. Jobs restart/repeat: Do you have "ignore errors" option for
: these jobs turned on or off? Do you run these jobs asynchronously or
: synchronously? If you run them asynchronously using "asynchronous
: job" option try running them synchronously or detached so that they
: cannot interfere with each other.

No change noted. Any further suggestions?

Thanks!

Neil.....

Thu Mar 21, 2002 12:28 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7948

Post Re: job error creates many repeating jobs in queue Reply with quote

How frequently these jobs run?
Do they fail while actully executing the job or while executing notification actions?

If you don't run 24x7 as a service, please check what you have on the job details page (Detail View in the Job Explorer). After a job failure, does that job appea on the details page in the Fails column as 1/3 or 2/3? Does the Fails column show the correct number of tries?

: This seems to have resolved this issue, THANKS!

: No change noted. Any further suggestions?

: Thanks!

: Neil.....

Thu Mar 21, 2002 2:12 pm View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

: How frequently these jobs run?
: Do they fail while actully executing the job or while executing notification
: actions?
I have two jobs that are quite chronic about this. Even with the telnet timeout set to 60, they will still occasionally error with 'Login failed or host not found'. It then sends an error notification eMail. One runs every ten minutes, the other every 15.

: If you don't run 24x7 as a service, please check what you have on the job
: details page (Detail View in the Job Explorer). After a job failure, does
: that job appea on the details page in the Fails column as 1/3 or 2/3? Does
: the Fails column show the correct number of tries?
24x7 is not running as a service. The Fails column is empty for all jobs. Is this a temporary entry? If so, I'll try to monitor it more closely. Sometimes the job is added maybe 10 times immediately to the job queue, sometimes closer to 100!

Thu Mar 21, 2002 5:25 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7948

Post Re: job error creates many repeating jobs in queue Reply with quote

: I have two jobs that are quite chronic about this. Even with the telnet
: timeout set to 60, they will still occasionally error with 'Login failed
: or host not found'.

How long does it take in worst case schenario to connect to your host from a standard Terminal program?

:It then sends an error notification eMail. One runs
: every ten minutes, the other every 15.
: 24x7 is not running as a service. The Fails column is empty for all jobs. Is
: this a temporary entry?

Yes, sort of.. it is updated after every job start, for failed jobs it should show something like x/y where "x" is the number of failed tries, and "y" is the total number of tries as specified in the job property.

:If so, I'll try to monitor it more closely.
: Sometimes the job is added maybe 10 times immediately to the job queue,
: sometimes closer to 100!

That's weird. I will investigate what can cause such effect.

Thu Mar 21, 2002 6:23 pm View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

: How long does it take in worst case schenario to connect to your host from a
: standard Terminal program?
We're running 100-base-T, it's usually no more than 5 seconds, and that's extreme.

: Yes, sort of.. it is updated after every job start, for failed jobs it should
: show something like x/y where "x" is the number of failed tries,
: and "y" is the total number of tries as specified in the job
: property.
There are entries under Fails that say 1/3 now.

: That's weird. I will investigate what can cause such effect.
Thanks! Please advise directions and I'll send you a copy of the log file.

Fri Mar 22, 2002 12:10 pm View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

: How long does it take in worst case schenario to connect to your host from a
: standard Terminal program?

: Yes, sort of.. it is updated after every job start, for failed jobs it should
: show something like x/y where "x" is the number of failed tries,
: and "y" is the total number of tries as specified in the job
: property.

: That's weird. I will investigate what can cause such effect.
New observation!
It appears that while the job is running on its' first 'retry' the system pumps continuous sequential repeats of the job into the assigned job queue. This explains why on some jobs it creates only a few duplicates while on some there are well over 100. I just got to observe one of the longer jobs do this. As long as the job was running, jobs were being added as fast as the system could generate them. As soon as the job finished, no more jobs were being added to the job queue. All this time, the Fails column showed 1/3.

???

Fri Mar 22, 2002 1:11 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7948

Post Re: job error creates many repeating jobs in queue Reply with quote

This might be the key!!!

I think you should disable job retry option for now. I'm still looking into this and Telnet timeout issue to see if there are related. Currently it looks like a bug. If it is the case we will provide you with an update that should fix the problem.

: New observation!
: It appears that while the job is running on its' first 'retry' the system
: pumps continuous sequential repeats of the job into the assigned job
: queue. This explains why on some jobs it creates only a few duplicates
: while on some there are well over 100. I just got to observe one of the
: longer jobs do this. As long as the job was running, jobs were being added
: as fast as the system could generate them. As soon as the job finished, no
: more jobs were being added to the job queue. All this time, the Fails
: column showed 1/3.

: ???

Fri Mar 22, 2002 1:43 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7948

Post Re: job error creates many repeating jobs in queue Reply with quote

I was unable to reproduce your problem. Please make sure these jobs really have 60 seconds for the retry interval. You can check it on the jobs Properties Page or print job properties report using File/Print/Job Properties menu.

I still think that you have something wrong with the job setup, may be some properties internally have invalid values that you cannot see on the screen. If these jobs were converted from 24x7 Scheduler version 2, try recreating them: disable current jobs that spawn many instances to the queue while waiting for a new retry; create new jobs and copy scripts and settings from the disabled jobs; please don't use "Copy properties of an existing job" blue link.

Please let me know if that helps.

: Okay, this is implemented. I'll let you know how this works (or doesn't).

: Ignore errors is not checked. Asynchronous process is not checked. I'll
: schedule a couple of the more repetitive offenders for detached process
: and let you know.

: Thanks!

Mon Mar 25, 2002 10:31 am View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

: I was unable to reproduce your problem. Please make sure these jobs really
: have 60 seconds for the retry interval. You can check it on the jobs
: Properties Page or print job properties report using File/Print/Job
: Properties menu.
60 seconds is confirmed!

: I still think that you have something wrong with the job setup, may be some
: properties internally have invalid values that you cannot see on the
: screen. If these jobs were converted from 24x7 Scheduler version 2, try
: recreating them: disable current jobs that spawn many instances to the
: queue while waiting for a new retry; create new jobs and copy scripts and
: settings from the disabled jobs; please don't use "Copy properties of
: an existing job" blue link.
Same exact problem continues to occur.

: Please let me know if that helps.

Tue Mar 26, 2002 4:56 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7948

Post Re: job error creates many repeating jobs in queue Reply with quote

But what about number of retries? How it appears in the Job Properties Wizard as 3 or a blank?

As for Telnet connection time, which protocol and port you use in your interactive Telnet program?

How long does it take using W2K command line Telnet client (run "telnet" command from the DOS prompt)?

: 60 seconds is confirmed!
: Same exact problem continues to occur.

Wed Mar 27, 2002 9:08 am View user's profile Send private message
Neil Carter



Joined: 19 Mar 2002
Posts: 26

Post Re: job error creates many repeating jobs in queue Reply with quote

: But what about number of retries? How it appears in the Job Properties Wizard
: as 3 or a blank?
The Job Properties Wizard shows the number of retries as '2'

: As for Telnet connection time, which protocol and port you use in your
: interactive Telnet program?
TCPIP is the protocol, port 25 I believe, whatever the default is.

: How long does it take using W2K command line Telnet client (run
: "telnet" command from the DOS prompt)?
Roughly 5 seconds

Mon Apr 01, 2002 3:53 pm View user's profile Send private message
Display posts from previous:    
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite All times are GMT - 4 Hours
Page 1 of 1

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


 

 

Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.