• Streaming Systems


    1 Streaming


    picture

    What is streaming

    Streaming System
    two impport dimensions that define the shape of a given dataset

    cardinality and constituion

    the cardinality of a dataset dicates its size
    the most salient aspect of cardinality being where a given dataset is finite or infinite
    coarse cardinality in a dataset
    Bounded data A type of dataset that is finite in size
    Unbounded data
    Cardinality imposes additional burdens on consumer
    Constitution dictates its physical manifestation
    Table
    Stream
    it’s the constitution pipeline developers directly interact with in most data processing systems today(both batch and steaming)

    constitution |how something is made up of different parts

    On the Greatly Exaggerated Limitations Of Streaming

    Lambda Architecture

    the basic idea is that you run a streaming system alongside a batch system,both performing essentially the same calculation.
    Unfortunnately,maintaining a Lambda system is a hassle:you neede to build,provision,and maintain two independent versions of your pipline and then alse somehow merge the results from the two piplines at the end.

    hassle
    a situation that is annoying because it involves doing sth difficult or complicated that needs a lot of effort 困难;麻烦

    As someone who spent years working on a strongly consistent streaming engine, I also found the entire principle of the Lambda Architecture a bit unsavory

    unsavory > adj unpleasant, or morally offensive

    corollary > noun something that results from something else

    antiquity > the distant past (= a long time ago), especially before the sixth century

    Event Time VS Processing Time

    cogently > in a way that is clearly expressed and is likely to persuade people

    To speak cogently about unbounded data processing requires a clear understanding of the domains of time involed

    Event Time

    This is the time at which events acutally occurred

    Processing Time

    This is the time at which evnets are observed in the system.

    skew > verb to cause something to be not straight or exact; to twist or distort ||
    adj not straight

    In an ideal world,event time and processing time would always be equal,with events being processed immediately as they occur.Reality is not so kind,however,and the skew between event time and processing time is not only nonzero,but often a highly variable function of the characteristics of the underlying input sources,execution engine,and hardware.

    • Shared resouce limitations like network congestion,network partitions,
    • Software causes such as distributed system logic contention,
    • Features of the data themselves,like key distribution,variance in throughput,or variance in disorder

    congestion | a situation in which a place is too blocked or crowded,causing difficulties

    plot verb to mark or draw something on a piece of paper or a map
    noun the story of a book, film, play, etc.

    contention | the disagreement that results from opposing arguments
    underlying > real but not immediately obvious
    在这里插入图片描述

    Data Processing Patterns

    Bounded Data

    Unbounded Data Batch

    Unbounded Data Streaming

  • 相关阅读:
    数据库事务
    组件的使用
    数字资产革命:Web3带来的新商业机会
    一起来学Kotlin:概念:8. Kotlin Control Flow: When, For, While, Range 等使用
    【App自动化测试】(九)移动端复杂测试环境模拟——来电、短信、网络切换
    redis相关文章汇总
    Nginx -3
    关于信息安全软考的记录5
    R 语言马尔可夫链蒙特卡洛:实用介绍
    【STM32快速上手】点灯只需4步
  • 原文地址:https://blog.csdn.net/chixushuchu/article/details/127674714