Jump Consistant Hash
Consistent Hash
Monotonicity
The monotonicity means when we add a bucket, we need to
remap all the keys. For s specify key is can only located in the origin bucket
or the new bucket. It cannot reassign to another bucket.
Incorrect
Bucket1 |
Bucket2 |
Bucket3 |
Key1 Key2 Key3 Key4 |
Key5 Key6 Key7 Key8 |
Key9 Key10 Key11 Key12 |
Bucket1 |
Bucket2 |
Bucket3 |
Bucket4 |
Key2 Key3 |
Key1 Key5 Key6 |
Key9 Key10 |
Key4 Key7 Key8 Key11 |
The above example does not meet the consistency. Because
when we added a new bucket Bucket4 the Key1 is moved to Bucket2, this is
incorrect.
Correct
Example1
Bucket1 |
Bucket2 |
Bucket3 |
Key1 Key2 Key3 Key4 |
Key5 Key6 Key7 Key8 |
Key9 Key10 Key11 Key12 |
Bucket1 |
Bucket2 |
Bucket3 |
Bucket4 |
Key1 Key2 Key3 |
Key5 Key6 Key7 |
Key9 Key10 |
Key4 Key8 Key11 Key12 |
The above example meets the monotonicity. Because when we
added a new bucket Bucket4, all the keys are either in the old bucket or the
new bucket. But they did not meet the Balance.
Balance
The balance means the data can be distribute among all node
with same probability
Incorrect
Bucket1 |
Bucket2 |
Bucket3 |
Key1 Key2 Key3 Key4 |
Key5 Key6 Key7 Key8 |
Key9 Key10 Key11 Key12 |
Bucket1 |
Bucket2 |
Bucket3 |
Bucket4 |
Key1 Key2 |
Key5 Key6 |
Key9 Key10 |
Key3 Key4 Key7 Key8 Key11 Key12 |
The above example meets the monotonicity but does not meet
the balance.
Correct
Bucket1 |
Bucket2 |
Bucket3 |
Key1 Key2 Key3 Key4 |
Key5 Key6 Key7 Key8 |
Key9 Key10 Key11 Key12 |
Bucket1 |
Bucket2 |
Bucket3 |
Bucket4 |
Key1 Key2 Key3 |
Key7 Key5 Key6 |
Key11 Key10 Key9 |
Key4 Key8 Key12 |
This is example meets the balance and also the monotonicity
Hash Ring
Initialize Solution
Pros |
1 Easy to understand 2 We can
remove any bucket if want at any position |
Cons |
1 The
distribution of data is not uniform in most case 2 If we
remove a bucket all the data of current bucket will move to next bucket, In
the worst it may led system avalanche. 3 The time complexity
of find a bucket is O(n) 4 We need to
store all the bucket info in our memory, so space complexity is also o(n) |
Optimization
Pros |
1 The
distribution of data is more uniform compared to first one. 2 We can delete
any bucket at any position. 3 We can give
different weight to different bucket. |
Cons |
1 The time
complexity of find a bucket is O(k*n) 2 We need to
store all the bucket info in our memory, so space complexity is also o(k*n) Notice : k is
the virtual bucket count n the bucket size |
Jump Consistent Hash
Initialize Solution
Meets Monotonicity
This algorithm meets the monotonicity
int ch(int
key, int num_buckets) { random.seed(key); int b = 0; for (int i = 1; i < num_buckets; i++) { if (random.nextInt() %2 == 1) b = i;
} return b; } |
Code analysis
The key is that we use the key as random seed , if you give
the same key the random value list should always be the same.
Example:
Random list |
3 buckets |
4 buckets |
3 4 1 6 |
Final index :
2 |
Final index :
2 |
3 4 1 5 |
Final index :
2 |
Final index :
3 |
From above analysis we can notice that when we added a
bucket. For a specific key the index is either located in the old bucket or new
bucket it can not move to emailother buckets that already exists in the buckets
list. So this algorithm meets the monotonicity
Probability Analysis
Suppose we have 3 buckets in total the probability of save
to each node is.
|
|
|
|
Probability |
Assign to 0 |
1 (odd or even) |
½ (even) |
½ (even) |
1 * ½ * ½ = ¼ |
Assign to 1 |
1 (odd or even) |
½ (odd) |
½ (even) |
1 * ½ * ½ = ¼ |
Assign to 2 |
1 (odd or even) |
1 (odd or even) |
½ (odd) |
1 * 1 * ½ = ½
|
Form the chart we can see that the probability of save to each
node is not same for a specificity key.
Complexity
Time: O(n) Space:
O(1)
Meets both Monotonicity and Balance
In the above solution the probability of moving data from
one bucket to next bucket is ½, so it does not meet the balance, because the
probability of data distribution is not 1/n for n buckets. Here is an solution
to optimized last algorithm
int ch(int
key, int n) { random.seed(key); int b = 0; for (int i = 1; i < n ; i++) { if (random.nextDouble() < 1.0 /
(i + 1)) b = i; } return b; } |
Probability Analysis
Suppose we have 4 buckets in total, The probability of save to each bucket are:
|
|
|
|
|
Probability |
Assign to 0 |
1(save to 0) |
½ (not to 1) |
(not to 2) |
¾ (not to 3) |
1 * ½ * * ¾ = ¼ |
Assign to 1 |
1(save or not) |
½ (save to 1) |
(not to 2) |
¾ (not to 3) |
1 * ½ * * ¾ = ¼ |
Assign to 2 |
1(save or not) |
1 (save or not 1) |
(save to 2) |
¾ (not to 3) |
1 * 1 * * ¾ = ¼ |
Assign to 3 |
1(save or not) |
1(save or not) |
1(save or not) |
¼ (save to 3) |
1 * 1 * 1 * ¼
= ¼ |
The above
method has already met the monotonicity and balance
Complexity
Time: O(n), Space O(1)
Pros and Cons
Pros |
1 The space
complexity is O(1) 2 The time
complexity is O(n) |
Cons |
1 The bucket
list must have order 2 We can only
delete bucket from the end of bucket list 3 All the bucket
should be equal they cannot have different weight |
Optimization
Meets Monotonicity
The last algorithm can both meet the monotonicity and
balance but the time complexity of find an index is o(n), here we are going to
optimized last algorithm from o(n) to o(logn). Because in most case ch(int key,
int n) is keep unchanged( ), We can random generate a
sequence to jump such as we generate a list:3,5,2 we will reassign the data in
these indexes, so we can jump to these indexes directly.
Anatomy
We put all the data to bucket 0, and generate a random num.
When we add bucket if bucket number is greater than current num keep the key to
origin bucket otherwise assign the key to new bucket.
int ch(int
key, int n) { random.seed(key); int id = 0, fence = 0; while (fence < n) { id = fence; fence = fence + rand(); } return id; } |
Code analysis
Since we use key as random seed the random sequence will
always be same for a same key. Suppose the random sequence list is : 3, 5, 2
Number of
buckets |
Final Index |
Analysis |
1 |
0 |
id=0, n=1,
fence=0 id=0, n=1,
fence=3 Return 0 |
2 |
0 |
ditto |
3 |
0 |
ditto |
4 |
3 |
id=0, n=4,
fence=0 id=0, n=4,
fence=3 Id=3, n=4, fence=8 Return 3 |
5 |
3 |
ditto |
6 |
3 |
ditto |
7 |
3 |
ditto |
8 |
3 |
ditto |
9 |
8 |
id=0, n=9, fence=0 id=0, n=9,
fence=3 id=3, n=9,
fence=8 id=8, n=9,
fence=10 Return 8 |
Complexity
Time: O(logn), Space O(1)
Probability Analysis
The fence is randomly chosen so the probability of save a
specific key to one bucket is not 1/n. Thus, this method meets the Monotonicity
but does not meet the balance.
Pros and Cons
Pros |
1 The space
complexity is O(1) 2 The time
complexity is O(logn) |
Cons |
1 The bucket
list must have order 2 We can only
delete bucket from the end of bucket list 3 All the
bucket should be equal they cannot have different weight |
Meets both Monotonicity and Balance
Suppose current jump index is: b, then we can get
ch(key, b) ch(key, b+1)
ch(key,
b+1)=b
Next jump index is: j , the we can get
ch(key, j) ch(key,j+1)
ch(key,
j+1) j
ch(key, j+1)
ch(key, b+1)
it mens:
ch(key, j)
= ch(key, b+1)
Since the probability of move from n bucket to n+1 bucket is
,
For example, P(ch(key,
10)= ch(key,11))= , P(ch(key,
11)= ch(key,12))= P(ch(key, 10)= ch(key,12))=
More general ch(key,
n) = ch(key, m) is
For i≥b P(ch(key,i)=ch(key,b+1)) =
It means: r≤
==> i≤
Int ch(int
key, int n) { random.seed(key); int id = -1; int fence = 0; while (fence < n) { id = fence; fence = floor( (id+1) / randDouble()) } return id; } |
Code analysis
Since we use key as random seed the random sequence will
always be same for a same key. Suppose the random sequence list is : 0.5, 0.3, 0.8, 0.1
Number of
buckets |
Final Index |
Analysis |
1 |
0 |
id=-1, n=1, fence=0, id=0, n=1,
fence=floor(1/0.5)=5 Return 0 |
2 |
0 |
ditto |
3 |
0 |
ditto |
4 |
3 |
ditto |
5 |
3 |
ditto |
6 |
3 |
id=-1, n=6,
fence=0, id=0, n=6,
fence=floor(1/0.5)=5 id=5, n=6,
fence=floor(6/0.3)=20 Return 5 |
7 |
3 |
ditto |
8 |
3 |
ditto |
9 |
8 |
ditto |
Probability Analysis
Thus this algorithm both meet the Monotonicity and Balance.
Pros and Cons
Pros |
1 The space
complexity is O(1) 2 The time
complexity is O(logn) |
Cons |
1 The bucket
list must have order 2 We can only
delete bucket from the end of bucket list 3 All the
bucket should be equal they cannot have different weight |
- Labels: Other