SoftTree Technologies SoftTree Technologies
Technical Support Forums
RegisterSearchFAQMemberlistUsergroupsLog in
fail-over mode with many servers

 
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite View previous topic
View next topic
fail-over mode with many servers
Author Message
Keith Robichaud



Joined: 18 Nov 2005
Posts: 4

Post fail-over mode with many servers Reply with quote

We're still evaluating 24x7 and would like some more information about the
fail-over mode. Could you explain how it would work in the following example?

Say we set up 24x7 at three sites and have three schedulers:

Site A - master scheduler
Site B - standby scheduler
Site C - standby scheduler

If the network connection between site A and site B goes down, but B to C and
C
to A are still up, what happens?

B can't comunicate with the master A, so does B become the master? If B can't
communicate with A then presumably A would remain a master too (or would the
mastership change get routed from B to A via C)? So would jobs then get
submitted to C twice? I'd guess that a computer will only bind to one master
scheduler at a time so that jobs can't be started twice.

I read in a previous thread about a standby scheduler going into stand-alone
or
auto-pilot mode when the master scheduler goes down. What does that mean? Does
it continue starting scheduled task locally only, or does it still try to
start tasks on remote computers too?

Regarding configuration, it looks like on the master scheduler you can only
configure one standby scheduler. To specify more standby schedulers do you
simply configure all the standby schedulers to point to the master?

Keith Robichaud

Tue Feb 07, 2006 10:01 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7955

Post Re: fail-over mode with many servers Reply with quote

Here is how it works. Every 3 minutes (or as configured in the settings) the standby scheduler makes an attempt to connect to the master. If the connection succeeds, it downloads all updated job definitions. If the connection fails it waits a minute and makes another attempt. After several failed consecutive attempts (by default it does 3 attempts after first failure) the standby scheduler assumes that master site is down and it switches into standalone scheduler or master mode as specified in the settings.

So if connection between A and B goes down B will start running jobs on its own. In a mean time C will see A running and will remain in the standby mode. As you can see in this scenario it does not make sense to have 2 standby scheduler.

Regarding your second question. Where jobs run, locally or remotely, is defined in the job settings. When the standby takes over it continue running jobs according to they settings. So if a job has "local" as a host name in the settings t will run on the standby computer, if it has "agent E" and there is a remote agent profile defined for agent E then the standby will run this job on agent E.

: We're still evaluating 24x7 and would like some more information about the
: fail-over mode. Could you explain how it would work in the following example?

: Say we set up 24x7 at three sites and have three schedulers
: Site A - master scheduler
: Site B - standby scheduler
: Site C - standby scheduler

: If the network connection between site A and site B goes down, but B to C and
: C
: to A are still up, what happens?

: B can't comunicate with the master A, so does B become the master? If B can't
: communicate with A then presumably A would remain a master too (or would the
: mastership change get routed from B to A via C)? So would jobs then get
: submitted to C twice? I'd guess that a computer will only bind to one master
: scheduler at a time so that jobs can't be started twice.

: I read in a previous thread about a standby scheduler going into stand-alone
: or
: auto-pilot mode when the master scheduler goes down. What does that mean?
: Does
: it continue starting scheduled task locally only, or does it still try to
: start tasks on remote computers too?

: Regarding configuration, it looks like on the master scheduler you can only
: configure one standby scheduler. To specify more standby schedulers do you
: simply configure all the standby schedulers to point to the master?

: Keith Robichaud

Tue Feb 07, 2006 10:14 am View user's profile Send private message
Keith Robichaud



Joined: 18 Nov 2005
Posts: 4

Post Re: fail-over mode with many servers Reply with quote

: Here is how it works. Every 3 minutes (or as configured in the settings) the
: standby scheduler makes an attempt to connect to the master. If the
: connection succeeds, it downloads all updated job definitions. If the
: connection fails it waits a minute and makes another attempt. After
: several failed consecutive attempts (by default it does 3 attempts after
: first failure) the standby scheduler assumes that master site is down and
: it switches into standalone scheduler or master mode as specified in the
: settings.

: So if connection between A and B goes down B will start running jobs on its
: own. In a mean time C will see A running and will remain in the standby
: mode. As you can see in this scenario it does not make sense to have 2
: standby scheduler.

: Regarding your second question. Where jobs run, locally or remotely, is
: defined in the job settings. When the standby takes over it continue
: running jobs according to they settings. So if a job has "local"
: as a host name in the settings t will run on the standby computer, if it
: has "agent E" and there is a remote agent profile defined for
: agent E then the standby will run this job on agent E.

Thank you for the explanation. However, I'm still not sure what will happen
for a job scheduled to run on C. We now have A and B running as masters, and
both can communicate with C, so will both A and B try to run the job on C?
Or will C only accept jobs from the scheduler that it regards as the master
and reject other request?

If a job is configured as local on A will it become "agent A" on B, when B
becomes a master, or will B try to run it locally?

The reason I'm asking these questions is that before we purchase 24x7 (and I
hope we do!) my colleagues want assurance that if there are network probems
that the jobs scheduled for each computer will still get run, i.e. they're
concerned that a single central scheduler will not be resilient enough against
network problems (we have sites in the USA and in Europe and will want to use
the scheduler to schedule jobs across all sites, and oocasionally there is
network outtage between various sites). They therefore asked if we could make
every computer that needs to run some tasks a standby scheduler. Would that be
the best solution?

Keith Robichaud

Tue Feb 07, 2006 11:11 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7955

Post Re: fail-over mode with many servers Reply with quote

I hope you are not confusing remote agents and standby schedulers, are you? You first indicated that C is a standby, now you are talking about running job on C. This cannot be done at the same time Either C is used as a standby or not. If C is an agent and you have a job setup to run on C and the cnetwork between A and break goes down both A and B will attempt to run the C job.

Anyway, the purpose of using fail-over mode is not to protect from network failures but to allow continues job running when primary site (master scheduler system) goes down. If you are concerned about network issues and want to be protected from network failures you should consider settings fail-overs for your network servers, not applications and utility systems.

: Thank you for the explanation. However, I'm still not sure what will happen
: for a job scheduled to run on C. We now have A and B running as masters, and
: both can communicate with C, so will both A and B try to run the job on C?
: Or will C only accept jobs from the scheduler that it regards as the master
: and reject other request?

: If a job is configured as local on A will it become "agent A" on B,
: when B
: becomes a master, or will B try to run it locally?

: The reason I'm asking these questions is that before we purchase 24x7 (and I
: hope we do!) my colleagues want assurance that if there are network probems
: that the jobs scheduled for each computer will still get run, i.e. they're
: concerned that a single central scheduler will not be resilient enough
: against
: network problems (we have sites in the USA and in Europe and will want to use
: the scheduler to schedule jobs across all sites, and oocasionally there is
: network outtage between various sites). They therefore asked if we could make
: every computer that needs to run some tasks a standby scheduler. Would that
: be
: the best solution?

: Keith Robichaud

Tue Feb 07, 2006 11:24 am View user's profile Send private message
Keith Robichaud



Joined: 18 Nov 2005
Posts: 4

Post Re: fail-over mode with many servers Reply with quote

: I hope you are not confusing remote agents and standby schedulers, are you?
: You first indicated that C is a standby, now you are talking about running
: job on C. This cannot be done at the same time Either C is used as a
: standby or not. If C is an agent and you have a job setup to run on C and
: the cnetwork between A and break goes down both A and B will attempt to
: run the C job.

I didn't think I was getting confused, but I might have been! I was
thinking of the situation where the schedule ran a task on C and C also happened
to be a standby scheduler (I thought that was possible). Are you saying that's
not a good idea and that a scheduler should only schedule tasks on non-scheduler
computers (master or standby) if you have standby schedulers?

: Anyway, the purpose of using fail-over mode is not to protect from network
: failures but to allow continues job running when primary site (master
: scheduler system) goes down. If you are concerned about network issues and
: want to be protected from network failures you should consider settings
: fail-overs for your network servers, not applications and utility systems.

The documentation says:

The 24x7 Scheduler can run on two or more machines simultaneously, eliminating
a single point-of-failure. However, only one scheduler at a time can serve as
the Master Scheduler. All other schedulers run in Standby mode. When "Fail-over
Mode" is activated, all information written to the Master Scheduler's job
database is mirrored in the Standby Scheduler's database using sophisticated
synchronization techniques. If the Master component becomes unavailable, the
24x7 Scheduler will perform an unattended rollover to the first Standby
Scheduler to respond. This Standby Scheduler then becomes the new Master
Scheduler. This architecture ensures that jobs will run on time in the event
of a machine failure and that jobs will continue processing without
interruption.

What it says about "the first Standby Scheduler to respond" made me think
that a scheduler changing from standby to master would be communicated to the
other standby schedulers, so that the other standby schedulers would all now
try to connect to the new master for downloading updated job definitions. From
what you say that is not the case.

The documentation also says:

Optionally you can setup both Master and Standby Schedulers to run on the same
machine. This will ensure continued processing should the Master Scheduler fail.

but I don't think I'll go there!

Keith Robichaud

Tue Feb 07, 2006 12:46 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7955

Post Re: fail-over mode with many servers Reply with quote

Standby scheduler runs in a special idle mode in which it can only monitors the master scheduler and synhronize job definitions and states. It cannot be used for anything else. In theory you can have standby schedulers configured for the same master but I cannot imagine why ony one would want to do that.

: I didn't think I was getting confused, but I might have been! I was
: thinking of the situation where the schedule ran a task on C and C also
: happened
: to be a standby scheduler (I thought that was possible). Are you saying
: that's
: not a good idea and that a scheduler should only schedule tasks on
: non-scheduler
: computers (master or standby) if you have standby schedulers?

: The documentation says: The 24x7 Scheduler can run on two or more machines
: simultaneously, eliminating
: a single point-of-failure. However, only one scheduler at a time can serve as
: the Master Scheduler. All other schedulers run in Standby mode. When
: "Fail-over
: Mode" is activated, all information written to the Master Scheduler's
: job
: database is mirrored in the Standby Scheduler's database using sophisticated
: synchronization techniques. If the Master component becomes unavailable, the
: 24x7 Scheduler will perform an unattended rollover to the first Standby
: Scheduler to respond. This Standby Scheduler then becomes the new Master
: Scheduler. This architecture ensures that jobs will run on time in the event
: of a machine failure and that jobs will continue processing without
: interruption.

: What it says about "the first Standby Scheduler to respond" made me
: think
: that a scheduler changing from standby to master would be communicated to the
: other standby schedulers, so that the other standby schedulers would all now
: try to connect to the new master for downloading updated job definitions.
: From
: what you say that is not the case.

: The documentation also says: Optionally you can setup both Master and Standby
: Schedulers to run on the same
: machine. This will ensure continued processing should the Master Scheduler
: fail.

: but I don't think I'll go there!

: Keith Robichaud

Tue Feb 07, 2006 4:50 pm View user's profile Send private message
Chuck Bennard



Joined: 14 Feb 2006
Posts: 1

Post Re: fail-over mode with many servers Reply with quote

: Standby scheduler runs in a special idle mode in which it can only monitors
: the master scheduler and synhronize job definitions and states. It cannot
: be used for anything else. In theory you can have standby schedulers
: configured for the same master but I cannot imagine why ony one would want
: to do that.

The concern Keith was asked to look into is behavior over a global WAN,
where network connectivity cannot be assured. Reading some other sources
on network schedulers indicates most of them use a master-slave setup (as you do)
rather than a distributed mechanism - which I think means replicating the job
data to distributed schedulers.

Tue Feb 14, 2006 4:03 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 7955

Post Re: fail-over mode with many servers Reply with quote

You can do it too. In that case don't use the scheduler's fail-over mode. Replace it with a simpe process that periodically copies all files from the scheduler's home directory and subdirectories to your standby server. In addition you can also sync. the registry settings. regedt32.exe can be run on the master with command line options to get the registry settings exported to a file and then another run on the stadby can load that stuff back into the registry. As you can image that replication job can be also scheduled with 24x7.

: The concern Keith was asked to look into is behavior over a global WAN,
: where network connectivity cannot be assured. Reading some other sources
: on network schedulers indicates most of them use a master-slave setup (as you
: do)
: rather than a distributed mechanism - which I think means replicating the job
: data to distributed schedulers.

Tue Feb 14, 2006 4:47 pm View user's profile Send private message
Display posts from previous:    
Reply to topic    SoftTree Technologies Forum Index » 24x7 Scheduler, Event Server, Automation Suite All times are GMT - 4 Hours
Page 1 of 1

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


 

 

Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.