PRNG

January 28, 2019

Introduction

Gossip protocol is a communication protocol in which you multicast messages (i.e. send a message to ALL nodes in the system). Inspired by Human Gossip and Epidemic theory, gossip protocol has been known to be efficient (O(logN) rounds to reach all nodes) and does not require a central server.

Trying to squash a rumor is like trying to unring a bell. -Shana Alexander

Applications of Gossip are Database replication, spread server state, failure detection etc. Cassandra uses gossip to exchange node meta information. DynamoDB uses gossip for distributed failure detection and membership protocol.

Strengths of Gossip

Scalability - O(logN) rounds are required to reach all nodes. Also each node sends a fixed number of fixed size messages irrespective of the size of the network. A node does not wait for acknowledement and does not take corrective action if some other node fails (almost stateless, so can scale easily).

Fault Tolerant - Can operate in systems with unstable connectivity (as a node shares the same information several times to different nodes, so a message takes different routes across a different set of nodes, high redundancy).

Robust - As no node is more important than any other node, failure of one node will not lead to the system failing to send / receive messages. Membership is fluid (anyone can join and leave the system at any point).

Quick Eventual Consistency - As only O(log(N)) rounds are required nodes can converge quickly to a globally consistent state.

Decentralized - no central server required at all.

Weak Points

Network Congestion - High redundancy leads to higher bandwidth requirement which can cause network congestion.

Byzantine fault tolerance - This protocol does not assume malfunctioning or malicious nodes, and robustness cannot be assumed in this scenario. For example Amazon s3 had an outage in July 2008 caused due to a single bit flip which still resulted in an intelligible message, and the gossip protocol for message sharing shared the corrupt message across all nodes, causing incorrect system state info to propogate. Amazon had to restart everything after fixing the states manually on disk.

References

Introduction to Gossip