Streaming System
two impport dimensions that define the shape of a given dataset
cardinality and constituion
the cardinality of a dataset dicates its size
the most salient aspect of cardinality being where a given dataset is finite or infinite
coarse cardinality in a dataset
Bounded data A type of dataset that is finite in size
Unbounded data
Cardinality imposes additional burdens on consumer
Constitution dictates its physical manifestation
Table
Stream
it’s the constitution pipeline developers directly interact with in most data processing systems today(both batch and steaming)
constitution |how something is made up of different parts
the basic idea is that you run a streaming system alongside a batch system,both performing essentially the same calculation.
Unfortunnately,maintaining a Lambda system is a hassle:you neede to build,provision,and maintain two independent versions of your pipline and then alse somehow merge the results from the two piplines at the end.
hassle
a situation that is annoying because it involves doing sth difficult or complicated that needs a lot of effort 困难;麻烦
As someone who spent years working on a strongly consistent streaming engine, I also found the entire principle of the Lambda Architecture a bit unsavory
unsavory > adj unpleasant, or morally offensive
corollary > noun something that results from something else
antiquity > the distant past (= a long time ago), especially before the sixth century
cogently > in a way that is clearly expressed and is likely to persuade people
To speak cogently about unbounded data processing requires a clear understanding of the domains of time involed
This is the time at which events acutally occurred
This is the time at which evnets are observed in the system.
skew > verb to cause something to be not straight or exact; to twist or distort ||
adj not straight
In an ideal world,event time and processing time would always be equal,with events being processed immediately as they occur.Reality is not so kind,however,and the skew between event time and processing time is not only nonzero,but often a highly variable function of the characteristics of the underlying input sources,execution engine,and hardware.
congestion | a situation in which a place is too blocked or crowded,causing difficulties
plot verb to mark or draw something on a piece of paper or a map
noun the story of a book, film, play, etc.
contention | the disagreement that results from opposing arguments
underlying > real but not immediately obvious