『DDIA』CH09 - Consistency and Consensus

11 Sep 2021 | DDIA

Consistency Guarantees

Most replicated databases provide at least eventual consistency - if you stop writing to the database and wait for some unspecified length of time, then eventually all read requests will return the same value. Also convergence as we expect all replicas to eventually converge to the same value. Weak, because it says nothing about when.

Strong consistency deosn’t come for free, systems with stronger guarantees may have worse performance or be less fault-tolerent than the systems with weaker guarantees.

Transaction isolation is primarily about avoiding race conditions due to concurrently executing transactions, whereas distributed consistency is mostly about coordinating the state of replicas in the face of delays and faults.

This chapter starts from one of the strongest consistency models in common use, linearizability, then examine the issue of ordering events in a distributed system particularly around causality and total ordering. Finally, we’ll explore how to atomically commit a distributed transaction.

『DDIA』CH08 - Trouble w/ Distributed Systems

24 Aug 2021 | DDIA

Faults and Partial Failures

In distributed systems, there can be partial failure - some parts of the system that are broken in some unpredictable way, even though other parts of the system are working fine. This nondeterminism and possibility of partial failures is what makes distributed sys‐ tems hard to work with.

A supercomputer handles faults by simply stop the entire cluster workload. A job typically checkpoints its state to duarable storage from time to time. After the faulty node is repaired, the job resumes from the last checkpoint.

Cloud computing is diffrent. Many internet related applications are online, so it is unrealistic to stop the system and repair. Super computers are typically made from specialized hardware, where each node is quite reliable, and nodes communicate through shared memory and remote direct memory access. In geographically distributed deployment, communication most likely goes over the internet which is slow and unreliable compared to local networks. We need to build a reliable system from unreliable components.

It is important to consider a wide rage of possible faults and to artificially create such situations in your test environment to see what happens. In distributed systems, suspicion, pessimism, and paranoia pay off.

Some Array Practices

01 Jul 2021 | arrays

Generate Array Elements by Rule

There is an array generated by a rule. The first item is 1. If k is in the array, then k3 +1 and k2+1 are in the array.

The array is sorted. There are no duplicate values. Please write a function that accepts an input N. It should return the index N of the array.

For example [1, 3, 4, 7, 9, 10, 13, 15, 19, 21, 22, 27, …] n=10, return 22

public static int generate(int n) {
        int[] count = new int[n + 1];
        int[] result = new int[n + 1];
        for (int i = 0; i <= n; i++) {
            count[i] = 2;
        }
        result[0] = 1;
        int min = 0;
        for (int i = 1; i <= n; i++) {
            min = result[i - 1] * 3 + 2;
            int min_i_1 = -1; //the index of generated 3*j + 1
            int min_i_2 = -1; //the index of generated 2*j + 1
            int tmp = 0;
            for (int j = 0; j < i; j++) {
               if (count[j] == 2) tmp = 2 * result[j] + 1;
               else if (count[j] == 1) tmp = 3 * result[j] + 1;
               else continue;
               if (tmp < min) {
                   min = tmp;
                   if (count[j] == 2) {
                       min_i_2 = j;
                       min_i_1 = -1;
                   } else if (count[j] == 1) {
                       min_i_1 = j;
                       min_i_2 = -1;
                   }
               }
            }
            result[i] = min;
            if (min_i_1 >= 0) {
                count[min_i_1] = 0;
            }
            if (min_i_2 >= 0) {
                count[min_i_2] = 1;
            }
        }
        return result[n];
    }

Fast & Slow Pointers Note

24 Apr 2021 | arrays

The typical algorithm using fast and slow pointers is LeetCode 141.

Given head, the head of a linked list, determine if the linked list has a cycle in it.

Define two pointers start from head with different speed, the fast pointer move 2 steps each time while the slow pointer move 1 step each time. If there’s no cycle in the LinkedList, the fast pointer will reach a null, then return. If there is a cycle, the faster and slow pointer will finally meet.

Why we define fast speed to 2 steps and slow to 1 step

Flink Sharing Session I

06 May 2020 | flink

What’s Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. => To see Flink as a streaming framework.

A Brief Introduction to Streaming

Some vivid paragraphs are referrenced from this great article.

Processing data in a streaming fashion becomes more and more popular over the more “traditional” way of batch-processing big data sets available as a whole. The focus shifted in the industry: it’s no longer that important how big is your data, it’s much more important how fast you can analyse it and gain insights. That’s why some people are now talking about fast data instead of the now old-school big data.

Chaining CompletionStage of CompletableFuture

04 Mar 2020 | java

Since Java8, CompletableFuture is introduced to provide flexible chains of non-blocking operators. CompletableFuture is an implementation of both Future and CompletionStage. Let's see how CompletionStages are chained together and get things done.

Can you tell the print result of the following code sample?

Syntax in Functions

04 Dec 2019 | Haskell

##pattern matching The sequence in pattern matching is important, patterns will be iterated from top to bottom.

sayMe :: (Integral a) => a -> String
sayMe 1 = "One"
sayMe 2 = "Two"
sayMe 3 = "Three"
sayMe n  = "Not between 1 - 3"

*Main> sayMe 3
"Three"
*Main> sayMe 5
"Not between 1 - 3"

If you reverse the sequence in the pattern matching, the result will always be “Not between 1 - 3” because the first pattern catches all inputs.

If there’s no pattern matched, the function crashes, thus we should always add a catch-all pattern in the end. This looks like Java’s switch-case syntax with a default switch, the difference is , Java won’t crash when case matches nothing.

Can I override private code snippets of a parent class?

22 Nov 2019 | java

Dig into Kafka Assignment Algorithm

01 Nov 2019 | Kafka

As I accidentally ran into the issues related to Kafka assignment algorithm when adding new consumers to a certain consumer group. I checked Kafka’s documentation and here logging my understandings here, hope to help other guys understand the algorithm and choose the right one for your application (or write your own).

Basics of Haskell Learning II (Type System)

29 Oct 2019 | Haskell

Interesting static type system in Haskell: has type reference and can infer the type on its own. Not only for explicit types (capitalized type like Char), but also complex types like list, tuple..

Prelude> :t 'a'
'a' :: Char
Prelude> :t "Hello"
"Hello" :: [Char]
:t (1, "abc")
(1, "abc") :: Num a => (a, [Char])

Meijia Lyu's Blog Always keep mind sharp

Consistency Guarantees

Faults and Partial Failures

Generate Array Elements by Rule

Why we define fast speed to 2 steps and slow to 1 step

What’s Apache Flink

A Brief Introduction to Streaming