SE401:Group33:ElectionsAndGroupCommunication

From Marks Wiki
Revision as of 05:21, 3 November 2008 by Mark (Sọ̀rọ̀ | contribs) (14 revision(s))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Home

14/06/08

We will be using an algorithm developed by A. Amintabar, A. Kostin, and L. Ilushechkinaand, and found here "A Leader Election Protocol for Timed Asynchronous Distributed Systems", Computer and Information Sciences - ISCIS 2006, pages 877-886. 2006. Found on the web via SpringerLink... Also: Simulation of a novel leader election protocol with the use of Petri nets, in IEEEX

The algorithm is based on reliable IP multicast, which is not supported on Amazon EC2. Therefore we will use JGroups ([1]) to implement this reliable multicast.

JGroups runs over the top of existing network protocols such as IP and TCP. We will use TCP, and modify the JGroups protocol stack so that multicast is implemented as multiple TCP unicasts.

The way JGroups implements multicast is that a unicast message is sent to every member of a 'group'. Group membership is normally determined by doing an IP multicast, but this is not available in our case. Therefore we will use the well-known hosts store the current group membership, and new nodes added to the group can query these for current group membership. This will be implemented using elastic IP addresses, 1 bound to the controlling instance and one to another 'random' instance. These instances will determine current group membership by web service calls to AWS to retrieve the internalDNS names of all the instances being run by us, that are currently in the running state. The elastic IP(s) of well-known hosts will be given to every new instance on start-up, so it knows where to query. We could have an arbitrary number of well known hosts, or even all hosts as well known hosts. This method is called TCP with TCPPING in the JGroups manual.

15/06/08

Looks like there is a problem with the above, in that it is based on a timed asynchronous distributed model which I didn't think much of in the first reading through, but I now think that the EC2 environment does not fit this model and so the algorithm may not be suitable... Doing further research to try and define the timed asynchronous distributed model: "The Timed Asynchronous Distributed System Model",Flaviu Cristian, Fellow, IEEE, and Christof Fetzer. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 6, JUNE 1999. Defined here.

In "A Leader Election Protocol for Timed Asynchronous Distributed Systems", some other work is referenced that is based on reliable multicast rather than unicast. Such as:

Donahoo, M.J., Ainapure, S.R.: Scalable multicast representative member selection. Proc. IEEE Infocom, March (2001) 259-268

Not sure why they referenced this, it has nothing to do with leader election at all

Cidon, I., Mokryn, O.: Propagation and Leader Election in a Multihop Broadcast Environment. DISC98, Distributed Computing, 12th Inter. Symp., Greece (1998) 104-118

This model is valid for a broadcast environment. The assumptions of this model are that all receivers receive the message, in some bounded time, and ordering is preserved. In addition there are no link or node failures or additions, making this not applicable to us (or anything IMO...)

Cristian, F., Fetzer C.: The timed asynchronous distributed system model, IEEE Transactions on Parallel and Distributed Systems, vol. 10, Iss 6, June (1999) 642 – 657

as referenced above. Here it is stated that deterministic leader election is achievable in this model. An example of this is (apparently) given in C. Fetzer and F. Cristian, “A Highly Available Local Leader Service,” IEEE Trans. Software Eng., to appear in 1999. -> In this paper it is stated that the model is applicable to a network of workstations conected via a LAN or WAN

References [4], [10], [14], [6], [13] from this paper cover leader election and other algorithms in this model. Most by the same author(s).


The Timed Asynchronous Distributed System Model

The timed asynchronous distributed system model (or, for short, the timed model), which we define formally in this paper, assumes that

  • 1) all services are timed: Their specification prescribes not only the outputs and state transitions that should occur in response to inputs, but also the time intervals within which a client can expect these outputs and transitions to occur
  • 2) interprocess communication is via an unreliable datagram service with omission/performance failure semantics: The only failures that messages can suffer are omission (message is dropped) and performance failures (message is delivered late),
  • 3) processes have crash/performance failure semantics: The only failures a process can suffer are crash and performance

failures,

  • 4) processes have access to hardware clocks that proceed within a linear envelope of real-time,
  • 5) no bound exists on the frequency of communication and process failures that can occur in a system. We feel this model adequately describes existing distributed systems built from networked workstations. In contrast with the time-free model, the timed model allows practically needed distributed services such as clock synchronization, membership, consensus, election, and atomic broadcast to be implemented [4], [10], [14], [6], [13]

16/06/08

Papers to look at to decide on election algorithm:

  • A leader election protocol for timed asynchronous distributed systems
  • The timed asynchronous distributed system model
  • a highly available local leader election service

58-start 1:57 – Req start 3:08 – Start runnig