Scheduling when a job is retried

We're trying to use lock files (handled outside 24x7) to ensure that
jobs do not conflict: jobs are executed via a "lockfile" command that
attempts to create a lockfile (duh). If the lockfile is successfully created,
the actual job is then started. The actual command used looks something like
this:

lockfile c:\tmp\test.lck cmd /c pause

The lockfile command returns either 1 (meaning that it was unable to create
the lockfile) or 0 (meaning that the lockfile was created, the command
executed and the lockfile finally removed).

We set the parameter RETRY_INTERVAL to 300 (seconds).

What I expect to see is that 24x7 tries to execute other jobs between retries,
but this does not appear to be the case.

We have considered using 24x7 to create the lockfiles (semaphore files), but
this is significantly more cumbersome to set up.

Yes, your assumption is correct, if the job is set to retry every 300 seconds it will not hold other jobs between retries. You have something else causing that effect. Take a look at how other jobs setup. Are they looking for the same file?

By the way, if the purpose of the locking is to have one job instance to run at a time. Simply assign all jobs to the same queue and set them to run synchronous.
If the purpose is to have no more then 1 instance of each job you can use the the simple RunOnce utility which is available in the same archive (see link in the previous message).

: We're trying to use lock files (handled outside 24x7) to ensure that
: jobs do not conflict: jobs are executed via a "lockfile" command
: that
: attempts to create a lockfile (duh). If the lockfile is successfully created,
: the actual job is then started. The actual command used looks something like
: this: lockfile c:\tmp\test.lck cmd /c pause

: The lockfile command returns either 1 (meaning that it was unable to create
: the lockfile) or 0 (meaning that the lockfile was created, the command
: executed and the lockfile finally removed).

: We set the parameter RETRY_INTERVAL to 300 (seconds).

: What I expect to see is that 24x7 tries to execute other jobs between
: retries,
: but this does not appear to be the case.

: We have considered using 24x7 to create the lockfiles (semaphore files), but
: this is significantly more cumbersome to set up.

: Yes, your assumption is correct, if the job is set to retry every 300 seconds
: it will not hold other jobs between retries. You have something else
: causing that effect. Take a look at how other jobs setup. Are they looking
: for the same file?

: By the way, if the purpose of the locking is to have one job instance to run
: at a time. Simply assign all jobs to the same queue and set them to run
: synchronous.
: If the purpose is to have no more then 1 instance of each job you can use the
: the simple RunOnce utility which is available in the same archive (see
: link in the previous message).

The purpose of the locking is to ensure that two or more jobs will not try to
update a shared resource at the same time... We have about 170 jobs at the
moment, with around 60 lock files protecting the shared resources.

Note that this locking also prevents each job from conflicts with itself
(which could otherwise be a problem with long-running processes, although I
suspect that 24x7 handles this by itself).

We are using 4 queues, with synchronous exceution, to ensure that no more than
4 jobs are run in parallel (exactly 4 jobs, hopefully :-)

My problem may actually be that I have set DELAY to 1200, on the understanding
that this is 1200 minutes, like the manual says. I have a suspicion that the
manual is actually wrong on this point, and that the unit for DELAY is seconds,
not minutes...

It is not the DELAY parameter. Please use RETRY_INTERVAL for this purpose.

: The purpose of the locking is to ensure that two or more jobs will not try to
: update a shared resource at the same time... We have about 170 jobs at the
: moment, with around 60 lock files protecting the shared resources.

: Note that this locking also prevents each job from conflicts with itself
: (which could otherwise be a problem with long-running processes, although I
: suspect that 24x7 handles this by itself).

: We are using 4 queues, with synchronous exceution, to ensure that no more
: than
: 4 jobs are run in parallel (exactly 4 jobs, hopefully :-)

: My problem may actually be that I have set DELAY to 1200, on the
: understanding
: that this is 1200 minutes, like the manual says. I have a suspicion that the
: manual is actually wrong on this point, and that the unit for DELAY is
: seconds,
: not minutes...

: It is not the DELAY parameter. Please use RETRY_INTERVAL for this purpose.
I'm setting both DELAY and RETRY_INTERVAL... What I meant was that, if I had
the units for DELAY wrong, the job might "expire" earlier than I thought.

Just to clear... DELAY for a late job is specified minutes and RETRY_INTERVAL for failed job is be specified in seconds

: I'm setting both DELAY and RETRY_INTERVAL... What I meant was that, if I had
: the units for DELAY wrong, the job might "expire" earlier than I
: thought.