Jump Consistant Hash

Consistent Hash

Monotonicity

The monotonicity means when we add a bucket, we need to remap all the keys. For s specify key is can only located in the origin bucket or the new bucket. It cannot reassign to another bucket.

Incorrect

Bucket1

Bucket2

Bucket3

Key1

Key2

Key3

Key4

Key5

Key6

Key7

Key8

Key9

Key10

Key11

Key12

Bucket1

Bucket2

Bucket3

Bucket4

Key2

Key3

Key1

Key5

Key6

Key9

Key10

Key4

Key7

Key8

Key11

The above example does not meet the consistency. Because when we added a new bucket Bucket4 the Key1 is moved to Bucket2, this is incorrect.

Correct

Example1

Bucket1

Bucket2

Bucket3

Key1

Key2

Key3

Key4

Key5

Key6

Key7

Key8

Key9

Key10

Key11

Key12

Bucket1

Bucket2

Bucket3

Bucket4

Key1

Key2

Key3

Key5

Key6

Key7

Key9

Key10

Key4

Key8

Key11

Key12

The above example meets the monotonicity. Because when we added a new bucket Bucket4, all the keys are either in the old bucket or the new bucket. But they did not meet the Balance.

Balance

The balance means the data can be distribute among all node with same probability

Incorrect

Bucket1

Bucket2

Bucket3

Key1

Key2

Key3

Key4

Key5

Key6

Key7

Key8

Key9

Key10

Key11

Key12

Bucket1

Bucket2

Bucket3

Bucket4

Key1

Key2

Key5

Key6

Key9

Key10

Key3

Key4

Key7

Key8

Key11

Key12

The above example meets the monotonicity but does not meet the balance.

Correct

Bucket1

Bucket2

Bucket3

Key1

Key2

Key3

Key4

Key5

Key6

Key7

Key8

Key9

Key10

Key11

Key12

Bucket1

Bucket2

Bucket3

Bucket4

Key1

Key2

Key3

Key7

Key5

Key6

Key11

Key10

Key9

Key4

Key8

Key12

This is example meets the balance and also the monotonicity

Hash Ring

Initialize Solution

Pros

1 Easy to understand

2 We can remove any bucket if want at any position

Cons

1 The distribution of data is not uniform in most case

2 If we remove a bucket all the data of current bucket will move to next bucket, In the worst it may led system avalanche.

3 The time complexity of find a bucket is O(n)

4 We need to store all the bucket info in our memory, so space complexity is also o(n)

Optimization

Pros

1 The distribution of data is more uniform compared to first one.

2 We can delete any bucket at any position.

3 We can give different weight to different bucket.

Cons

1 The time complexity of find a bucket is O(k*n)

2 We need to store all the bucket info in our memory, so space complexity is also o(k*n)

Notice : k is the virtual bucket count n the bucket size

Jump Consistent Hash

Initialize Solution

Meets Monotonicity

This algorithm meets the monotonicity

int ch(int key, int num_buckets)

{

random.seed(key);

int b = 0;

for (int i = 1; i < num_buckets; i++)

{

if (random.nextInt() %2 == 1) b = i;

}

return b;

}

Code analysis

The key is that we use the key as random seed , if you give the same key the random value list should always be the same.

Example:

Random list	3 buckets	4 buckets
3 4 1 6	Final index : 2	Final index : 2
3 4 1 5	Final index : 2	Final index : 3

From above analysis we can notice that when we added a bucket. For a specific key the index is either located in the old bucket or new bucket it can not move to emailother buckets that already exists in the buckets list. So this algorithm meets the monotonicity

Probability Analysis

Suppose we have 3 buckets in total the probability of save to each node is.

				Probability
Assign to 0	1 (odd or even)	½ (even)	½ (even)	1 * ½ * ½ = ¼
Assign to 1	1 (odd or even)	½ (odd)	½ (even)	1 * ½ * ½ = ¼
Assign to 2	1 (odd or even)	1 (odd or even)	½ (odd)	1 * 1 * ½ = ½

Form the chart we can see that the probability of save to each node is not same for a specificity key.

Complexity

Time: O(n) Space: O(1)

Meets both Monotonicity and Balance

In the above solution the probability of moving data from one bucket to next bucket is ½, so it does not meet the balance, because the probability of data distribution is not 1/n for n buckets. Here is an solution to optimized last algorithm

int ch(int key, int n)

{

random.seed(key);

int b = 0;

for (int i = 1; i < n ; i++)

{

if (random.nextDouble() < 1.0 / (i + 1)) b = i;

}

return b;

}

Probability Analysis

Suppose we have 4 buckets in total, The probability of save to each bucket are:

					Probability
Assign to 0	1(save to 0)	½ (not to 1)	(not to 2)	¾ (not to 3)	1 * ½ * * ¾ = ¼
Assign to 1	1(save or not)	½ (save to 1)	(not to 2)	¾ (not to 3)	1 * ½ * * ¾ = ¼
Assign to 2	1(save or not)	1 (save or not 1)	(save to 2)	¾ (not to 3)	1 * 1 * * ¾ = ¼
Assign to 3	1(save or not)	1(save or not)	1(save or not)	¼ (save to 3)	1 * 1 * 1 * ¼ = ¼

The above method has already met the monotonicity and balance

Complexity

Time: O(n), Space O(1)

Pros and Cons

Pros

1 The space complexity is O(1)

2 The time complexity is O(n)

Cons

1 The bucket list must have order

2 We can only delete bucket from the end of bucket list

3 All the bucket should be equal they cannot have different weight

Optimization

Meets Monotonicity

The last algorithm can both meet the monotonicity and balance but the time complexity of find an index is o(n), here we are going to optimized last algorithm from o(n) to o(logn). Because in most case ch(int key, int n) is keep unchanged( ), We can random generate a sequence to jump such as we generate a list:3,5,2 we will reassign the data in these indexes, so we can jump to these indexes directly.

Anatomy

We put all the data to bucket 0, and generate a random num. When we add bucket if bucket number is greater than current num keep the key to origin bucket otherwise assign the key to new bucket.

int ch(int key, int n)

{

random.seed(key);

int id = 0, fence = 0;

while (fence < n)

{

id = fence;

fence = fence + rand();

}

return id;

}

Code analysis

Since we use key as random seed the random sequence will always be same for a same key. Suppose the random sequence list is : 3, 5, 2

Number of buckets	Final Index	Analysis
1	0	id=0, n=1, fence=0 id=0, n=1, fence=3 Return 0
2	0	ditto
3	0	ditto
4	3	id=0, n=4, fence=0 id=0, n=4, fence=3 Id=3, n=4, fence=8 Return 3
5	3	ditto
6	3	ditto
7	3	ditto
8	3	ditto
9	8	id=0, n=9, fence=0 id=0, n=9, fence=3 id=3, n=9, fence=8 id=8, n=9, fence=10 Return 8

Complexity

Time: O(logn), Space O(1)

Probability Analysis

The fence is randomly chosen so the probability of save a specific key to one bucket is not 1/n. Thus, this method meets the Monotonicity but does not meet the balance.

Pros and Cons

Pros

1 The space complexity is O(1)

2 The time complexity is O(logn)

Cons

1 The bucket list must have order

2 We can only delete bucket from the end of bucket list

3 All the bucket should be equal they cannot have different weight

Meets both Monotonicity and Balance

Suppose current jump index is: b, then we can get

ch(key, b) ch(key, b+1)

ch(key, b+1)=b

Next jump index is: j , the we can get

ch(key, j) ch(key,j+1)

ch(key, j+1) j

ch(key, j+1) ch(key, b+1)

it mens:

ch(key, j) = ch(key, b+1)

Since the probability of move from n bucket to n+1 bucket is ,

For example, P(ch(key, 10)= ch(key,11))= , P(ch(key, 11)= ch(key,12))= P(ch(key, 10)= ch(key,12))=

More general ch(key, n) = ch(key, m) is

For i≥b P(ch(key,i)=ch(key,b+1)) =

It means: r≤ ==> i≤

Int ch(int key, int n)

{

random.seed(key);

int id = -1;

int fence = 0;

while (fence < n)

{

id = fence;

fence = floor( (id+1) / randDouble())

}

return id;

}

Code analysis

Since we use key as random seed the random sequence will always be same for a same key. Suppose the random sequence list is : 0.5, 0.3, 0.8, 0.1

Number of buckets	Final Index	Analysis
1	0	id=-1, n=1, fence=0, id=0, n=1, fence=floor(1/0.5)=5 Return 0
2	0	ditto
3	0	ditto
4	3	ditto
5	3	ditto
6	3	id=-1, n=6, fence=0, id=0, n=6, fence=floor(1/0.5)=5 id=5, n=6, fence=floor(6/0.3)=20 Return 5
7	3	ditto
8	3	ditto
9	8	ditto

Probability Analysis

Thus this algorithm both meet the Monotonicity and Balance.

Pros and Cons

Pros

1 The space complexity is O(1)

2 The time complexity is O(logn)

Cons

1 The bucket list must have order

2 We can only delete bucket from the end of bucket list

3 All the bucket should be equal they cannot have different weight

Category：SystemDesign

Labels： Other

Jump Consistant Hash

Consistent Hash

Monotonicity

Incorrect

Correct

Example1

Balance

Incorrect

Correct

Hash Ring

Initialize Solution

Optimization

Jump Consistent Hash

Initialize Solution

Meets Monotonicity

Code analysis

Probability Analysis

Complexity

Meets both Monotonicity and Balance

Probability Analysis

Complexity

Pros and Cons

Optimization

Meets Monotonicity

Anatomy

Code analysis

Complexity

Probability Analysis

Pros and Cons

Meets both Monotonicity and Balance

Code analysis

Probability Analysis

Pros and Cons

本文目录

Labels

SiteInfo