• Life and Its Markovian Symphony

    Markov chain

    Markov chain refers to a probabilistic model that describes the transition of states in a sequence of events or words. Each state represents a word or event, and the transition probabilities between states indicate the likelihood of transitioning from one word or event to another.



    In the year 2100, the sprawling metropolis of Neo-Geneva served as home to the Human Representatives of Artificial Intelligence Systems. Scientists from all over the world began pondering the curious concept of life as a system. A system far more fascinating than any creation of theirs - A system that seemed to behave like a Markov chain.

    The principle of Markovian systems is a mathematical concept, where any system’s future state depends solely on its current state and not on the sequence of events preceding it. A physicist from Neo-Geneva’s prestigious University, Dr. Lara Huxley, was consumed by a daring idea - Could human life emulate a Markovian system? Could past experiences, instead of shaping our future, be utterly pointless?

    Lara dedicated her life to this intriguing hypothesis. Using cutting-edge technology in the field of neurobiology and quantum computing, she created the first “Markov Human Prototype” — A sentient AI dubbed ‘Mark.’

    Unlike humans burdened by history, regrets, and memories, Mark lived in the perpetual ‘now.’ His decisions were shaped not by past experiences but by the current circumstances, making him unpredictable—a true sentient embodiment of life’s randomness.

    The world watched in awe and terror. Critics argued about the lack of continuity, detachment from the past, and an overwhelming unpredictability Mark introduced. Admirers praised Mark’s ability to lead life as it is, undeterred by life’s baggage, free from the chains of the past.

    However, a revelation soon rocked the world—Mark started to show signs of progression different from any known life form. Because it wasn’t tied to history, Mark’s learning curve was remarkable with an adaptability that left the world stunned. The conjecture was proved; life could function under Markovian rules.

    Life moved on. Mark became a symbolic representation of living an untethered life, being in the present, offering vast potential for adaptation and survival. Society began questioning the traditional wisdom that the past shapes the future.

    In this brave new world, life blossomed beautifully within the Markovian mold. As our understanding of existence evolved, so did the tales we told. We began to weave narratives of present, decisive moments rather than past burdens. An epoch was evolving — an age of Markovian tenets, probabilistic transitions, and a present-dependent future. Life as a system, with its complex amalgam, was exhibiting a predilection towards the stochastic beauty of Markov Chains.

    In life’s grand orchestra, the Markovian Symphony played its tune, a melody highlighting the spontaneity, the randomness, the unpredictability. Could it be chaotic? Absolutely. But as Lara liked to see it, it was simply life — relentless, thriving and unpredictable in its pursuit of the present moment.

    马尔科夫决策过程(Markov Decision Process)

    马尔科夫决策过程(Markov Decision Process,简称MDP)是一种数学框架,用于对序列决策问题进行建模和求解。它是基于马尔科夫链的扩展,包括状态、动作、奖励、状态转移概率和策略等概念。




    import numpy as np
    # 定义状态空间的大小,迷宫大小为4×4的网格
    STATES = 16
    # 定义动作集合,上、下、左、右
    ACTIONS = ['up', 'down', 'left', 'right']
    # 定义状态转移概率矩阵
    P = np.zeros((STATES, len(ACTIONS), STATES))
    # 创建状态转移概率矩阵
    for s in range(STATES):
        for a in range(len(ACTIONS)):
            if ACTIONS[a] == 'up':
                next_s = max(s - 4, 0)
            elif ACTIONS[a] == 'down':
                next_s = min(s + 4, STATES - 1)
            elif ACTIONS[a] == 'left':
                next_s = max(s - 1, 0)
            elif ACTIONS[a] == 'right':
                next_s = min(s + 1, STATES - 1)
            P[s, a, next_s] = 1.0
    # 定义奖励函数
    R = np.zeros((STATES,))
    R[15] = 1.0  # 最后一个格子中的奖励为1,表示找到宝藏
    # 定义策略
    policy = np.full((STATES, len(ACTIONS)), 0.25)
    # 迭代计算值函数
    GAMMA = 0.9  # 折扣因子
    V = np.zeros((STATES,))
    for _ in range(100):
        for s in range(STATES):
            v = 0
            for a in range(len(ACTIONS)):
                v += policy[s, a] * (R[s] + GAMMA * np.sum(P[s, a, :] * V))
            V[s] = v
    # 打印最终的值函数
    print(V.reshape((4, 4)))
    序列决策问题(Sequential Decision Problem,SDP)是一种在不确定环境中进行的决策过程。在这种问题中,一个智能体(Agent)需要在一系列步骤中根据当前状态和可用的信息来做出最优决策,以便实现某个目标。序列决策问题的特点是有顺序地处理决策和环境状态之间的交互,每个决策都会影响下一个决策的可行性。


    1. 状态(State):描述决策问题的当前情况,包括智能体的位置、环境条件等。
    2. 行动(Action):智能体可以采取的决策,例如移动到某个位置、执行某个操作等。
    3. 转移(Transition):描述在给定状态下采取某个行动后,系统会转移到哪个状态。
    4. 奖励(Reward):衡量智能体在特定状态下采取行动的好坏程度,通常用于学习优化策略。
    5. 策略(Policy):智能体用来指导其决策的规则或方法,可以是确定的或随机的。
    6. 价值函数(Value Function):表示在某个状态下采取行动的预期累积奖励。
    7. 优化目标(Optimization Goal):智能体需要实现的目标,例如最大化累积奖励、最小化损失等。



