• 计算物理专题----蒙特卡洛积分实战


    Part one 蒙特卡洛积分计算案例

    1. import numpy as np
    2. import matplotlib.pyplot as plt
    3. import pandas as pd
    4. from scipy.stats import norm, kstest
    5. np.random.seed(0)
    6. def integrate(a,b,n=100):
    7. x = np.random.uniform(a,b,n)
    8. total = sum(np.exp(x))
    9. return (b - a) * total / n
    10. Num = 10000
    11. F = np.zeros(Num)
    12. for i in range(Num):
    13. result = integrate(0,1)
    14. F[i] = result
    15. df = pd.DataFrame({'score':F})
    16. bins = np.arange(1.61,1.83,0.001) # 箱的边界值
    17. df['score_bin'] = pd.cut(df['score'], bins)
    18. count = df['score_bin'].value_counts()
    19. x = [(i.left + i.right) / 2 for i in count.index]
    20. y = count.values/sum(count.values)
    21. # 绘制散点图
    22. plt.scatter(x,y,s=1,c="green",label="10000 I(100) calcs")
    23. F = np.zeros(Num)
    24. for i in range(Num):
    25. result = integrate(0,1,n=500)
    26. F[i] = result
    27. df = pd.DataFrame({'score':F})
    28. bins = np.arange(1.61,1.83,0.001) # 箱的边界值
    29. df['score_bin'] = pd.cut(df['score'], bins)
    30. count = df['score_bin'].value_counts()
    31. x = [(i.left + i.right) / 2 for i in count.index]
    32. y = count.values/sum(count.values)
    33. # 绘制散点图
    34. plt.scatter(x,y,s=1,c="red",label="10000 I(500) calcs")
    35. F = np.zeros(Num)
    36. for i in range(Num):
    37. result = integrate(0,1,n=1000)
    38. F[i] = result
    39. df = pd.DataFrame({'score':F})
    40. bins = np.arange(1.61,1.83,0.001) # 箱的边界值
    41. df['score_bin'] = pd.cut(df['score'], bins)
    42. count = df['score_bin'].value_counts()
    43. x = [(i.left + i.right) / 2 for i in count.index]
    44. y = count.values/sum(count.values)
    45. # 绘制散点图
    46. plt.scatter(x,y,s=1,c="blue",label="10000 I(1000) calcs")
    47. ########
    48. ##Num = 100000
    49. ##F = np.zeros(Num)
    50. ##for i in range(Num):
    51. ## result = integrate(0,1)
    52. ## F[i] = result
    53. ##df = pd.DataFrame({'score':F})
    54. ##bins = np.arange(1.61,1.83,0.001) # 箱的边界值
    55. ##df['score_bin'] = pd.cut(df['score'], bins)
    56. ##count = df['score_bin'].value_counts()
    57. ##x = [(i.left + i.right) / 2 for i in count.index]
    58. ##y = count.values/sum(count.values)
    59. ##plt.scatter(x,y,s=1,c="red",label="10000 I calcs")
    60. ####
    61. plt.legend()
    62. plt.savefig("hist-plot-4.jpg")
    63. plt.show()
    64. ##print(count)
    65. ##import csv
    66. ##header = ["fit"]
    67. ##rows = [[i] for i in F]
    68. ##
    69. ##with open('1.csv','w',newline="") as file:
    70. ## writer = csv.writer(file)
    71. ## writer.writerow(header)
    72. ## writer.writerows(rows)

    Final Result:1.718

    Real Result: 1.71828

    ERROR EASTIMATE:10^-4

    • 蒙特卡洛方法的误差通常可以通过增加模拟次数来减小。具体来说,当我们重复运行模拟并计算出相应的输出后,我们可以计算这些输出的均值和方差,并根据这些数据来估计误差值。如果我们希望减小误差,可以增加模拟次数,这样均值和方差都会更加准确,从而使误差更小。

    Part 2 重要抽样 分层抽样

            重要抽样积分和分层抽样积分是数值积分中常用的两种方法,以获得更精确的结果。

            重要性抽样积分涉及从不同于原始分布的概率分布中抽样,但对最终结果贡献较大的区域赋予更大的权重。换句话说,该算法选择更有可能对积分结果做出贡献的样本,并赋予它们比其他样本更高的权重。当原始分布难以采样或对最终结果贡献最大的区域很小时,这尤其有用。

            例如,考虑f(x)从a到b的积分。我们可以从与f(x)|成比例的分布中抽样,而不是从区间[a,b]中均匀抽样。这意味着具有较高的|f(x)|值的区域将被更频繁地采样,并在最终结果中被赋予更高的权重。

            分层采样整合则是将整合区域划分为若干子区域,并从每个子区域分别采集样本。这确保了所有区域的采样都是平等的,而不管该区域的被积数的值是多少。

            例如,考虑f(x,y)在区域[a,b] x [c,d]上的积分。我们可以把这个区域分成4个更小的区域,每个区域分别采样。这确保了我们从所有区域得到相等的样本,当被积量在积分区域上表现出高方差时特别有用。

            重要抽样积分和分层抽样积分都是数值积分的重要技术,它们的适用性取决于需要解决的问题。在被积量高度可变或难以采样的情况下,重要性抽样往往更有效,而分层抽样在被积量在整个积分区域表现出高度可变性的情况下是有用的。

            我们将a,b两问的结果用下表展示处理

    Num = 5e4,k=5e3,n=10

    Simple MC

    Stratified sampling

    Round 1

    1.3741162994281273

    1.3741162994281273

    Stats(round for 200)

    figure

    mean

    1.3873987598712592

    1.387267505078644

    std

    0.022234190779851584

    0.02282746853161396

    find the optimal n

    Optimal

    RESULT

    ERROR

    N=18

    1.3868879367871603

    1e-4

    1. import numpy as np
    2. np.random.seed(0)
    3. #exact value 1.38629
    4. def MC(a=0,b=1,n=50000):
    5. f = lambda x:1/(x+x**0.5)
    6. x = np.random.uniform(a,b,n)
    7. return sum(f(x))*(b-a)/n
    8. #stratified sampling
    9. def SS(a=0,b=1,n=10,k=5000):
    10. x1 = np.linspace(a,b,k+1)[:k]
    11. x2 = np.linspace(a,b,k+1)[1:]
    12. SUM = 0
    13. for i in range(k):
    14. SUM += MC(a=x1[i],b=x2[i],n=n)
    15. return SUM
    16. import matplotlib.pyplot as plt
    17. ##S = np.array([SS() for i in range(200)])
    18. ##M = np.array([MC() for i in range(200)])
    19. ##
    20. ##print(np.mean(S))
    21. ##print(np.mean(M))
    22. ##
    23. ##print(np.std(S))
    24. ##print(np.std(M))
    25. ##
    26. ##plt.hist(S,label="S")
    27. ##
    28. ##plt.legend()
    29. ##plt.savefig("P3-1.jpg")
    30. ##plt.pause(0.01)
    31. ##plt.cla()
    32. ##plt.hist(M,label="M")
    33. ##plt.legend()
    34. ##plt.savefig("P3-2.jpg")
    35. ##plt.pause(0.01)
    36. S = []
    37. for n in range(5,20):
    38. S.append(SS(n=n,k=int(50000/n)))
    39. S = np.array(S)

    1. import numpy as np
    2. import matplotlib.pyplot as plt
    3. np.random.seed(0)
    4. def f(x):
    5. return 1 / (np.sqrt(x) + x)
    6. # 积分区域划分
    7. num_strata = 5000
    8. strata_size = 1 / num_strata
    9. # 每个子区域采样点数
    10. num_samples_per_stratum = 10
    11. # 计算每个子区域的采样点
    12. samples = []
    13. for i in range(num_strata):
    14. stratum_start = i * strata_size
    15. stratum_end = (i + 1) * strata_size
    16. stratum_samples = np.random.uniform(stratum_start, stratum_end, num_samples_per_stratum)
    17. samples.extend(stratum_samples)
    18. # 计算每个采样点的函数值
    19. values = f(samples)
    20. # 计算积分近似值
    21. integral_approx = np.mean(values)
    22. print("Approximate integral value:", integral_approx)


    Part 3 积分实战

    CHOOSE

    ERROR

    N^(-1/2)

    Best when alpha = 0.5

    0.00167

    0.3536

    Num of Nodes is 8

    Best when alpha = 0.6

    0.04752

    0.3015

    Num of Nodes is 8

    Best when alpha = 0.7

    0.00167

    0.3536

    Num of Nodes is 8

    Best when alpha = 0.8

    0.00167

    0.3536

    Num of Nodes is 8


    Part 6 高维积分

            Now,we will implement a MC method to find the volume of the N-dimensional sphere for the arbitrary dimension N and the sphere radius R.

            让我们估算一下一个25D的球体需要多少次实验才能得到精确的结果

    1/(((1/25)**0.5)**25)=2.980232238769527e+17

            所以我们不可能从0-1开始取值,这样的运算量是我们不能接受的。

            利用C语言,我们创建了多个线程尽可能得将数值计算的极限推进到了13阶左右(计算用时在一分钟左右)

            同时我们希望使用Python语言进行编程,方便我们进行并行运算:并行运算的程序我们已在附件中给出了,遗憾的是,该方法仅能支持13阶左右的运行,远远达不到25阶的需求。(计算用时在一分钟左右)

            我们需要解决维度灾难问题。应该要从数学角度解决他。

                    首先我们改用L1距离运算。稍微加快计算的速度

                    其次,我们使用一个低差异序列(Sobol)序列生成更加均匀的随机数

    运算结果

    Dim

    17

    18

    19

    20

    21

    22

    Theory

    0.1409

    0.0821

    0.0466

    0.0258

    0.0139

    0.0073

    Calculate

    0.1409

    0.0821

    0.0466

    0.0258

    0.0139

    0.0073

    23

    24

    25

    26

    27

    0.0038

    0.0019

    0.0009

    0.000466

    0.000223

    0.0038

    0.0019

    0.0009

    0.000456

    0.000198

    1. import numpy as np
    2. import scipy.special as spi
    3. import matplotlib.pyplot as plt
    4. np.random.seed(0)
    5. import time
    6. def prod(n):
    7. p = 1
    8. for i in range(1,n+1):
    9. p *= i
    10. return p
    11. def f(n):
    12. if n%2==0:
    13. k = int(n/2)
    14. return np.pi**(k)/prod(k)
    15. elif n%2==1:
    16. k = int((n-1)/2)
    17. return np.pi**k*prod(k)*2**n/prod(n)
    18. def main(Num=1e8,dim=4):
    19. assert(dim>=2),"dim should greater than 2!!!"
    20. Sum = 0
    21. #start = time.time()
    22. for i in range(int(Num)):
    23. x = np.random.random(dim)
    24. p = sum(x**2)
    25. if p<=1:
    26. Sum += 1
    27. #end = time.time()
    28. #print("Used:",end-start)
    29. #print("Volume:",Sum/Num*(2**dim))
    30. #print("Theory:",f(dim))
    31. print(Sum)
    32. return Sum/Num*(2**dim),f(dim)
    33. Dim = [20,21,22,23,24,25,26]
    34. V = [0 for i in range(len(Dim))]
    35. F = [0 for i in range(len(Dim))]
    36. for i in range(len(Dim)):
    37. s = time.time()
    38. V[i],F[i] = main(dim=Dim[i])
    39. e = time.time()
    40. print(e-s)
    41. plt.scatter(range(len(Dim)),V,c="red")
    42. plt.scatter(range(len(Dim)),F,c="blue")
    43. plt.plot(range(len(Dim)),V,c="red",label="volume")
    44. plt.plot(range(len(Dim)),F,c="blue",label="Theory")
    45. plt.legend()
    46. plt.pause(0.01)
    47. def main(Num=1e8,dim=4):
    48. Sum = 0
    49. for i in range(int(Num)):
    50. x = np.random.random(dim)
    51. p = sum(x**2)
    52. if p<=1:
    53. Sum += 1
    54. print(Sum)
    55. return Sum/Num*(2**dim),f(dim)


    main.py 

    1. import datetime
    2. import numpy as np
    3. import multiprocessing as mp
    4. def final_fun(name,param):
    5. dim = param
    6. Num = int(5*1e4)
    7. Sum = np.random.random((dim,Num))**2
    8. Sum = sum(Sum[:,:])
    9. Sum = len(np.where(Sum<=1)[0])
    10. return {name: Sum}
    11. if __name__ == '__main__':
    12. dim = 20
    13. start_time = datetime.datetime.now()
    14. num_cores = int(mp.cpu_count())
    15. pool = mp.Pool(num_cores)
    16. param_dict = {'task1': dim,
    17. 'task2': dim,
    18. 'task3': dim,
    19. 'task4': dim,
    20. 'task5': dim,
    21. 'task6': dim,
    22. 'task7': dim,
    23. 'task8': dim,
    24. 'task9': dim,
    25. 'task10': dim,
    26. 'task11': dim,
    27. 'task12': dim,
    28. 'task13': dim,
    29. 'task14': dim,
    30. 'task15': dim,
    31. 'task16': dim}
    32. results = [pool.apply_async(final_fun, args=(name, param)) for name, param in param_dict.items()]
    33. results = [p.get() for p in results]
    34. end_time = datetime.datetime.now()
    35. use_time = (end_time - start_time).total_seconds()
    36. print("多进程计算 共消耗: " + "{:.2f}".format(use_time) + " 秒")
    37. print(results)

    利用拉丁超立方体采样生成在高维更加均匀的随机数

    拉丁超立方体采样的步骤如下:

            1.对于每个维度,将采样区间均匀地划分为相等的子区间。例如,如果要生成100个样本点,那么将采样区间[0, 1]划分为100个子区间,每个子区间的长度为1/100。

            2.在每个维度的每个子区间内,随机选择一个点作为采样点。可以使用均匀分布的随机数生成器来生成这些随机点。

            3.重复步骤2,直到在每个维度上都选择了一个采样点。

    1. import numpy as np
    2. np.random.seed(0)
    3. def latin_hypercube_sampling(dimensions,num_samples):
    4. samples = np.zeros((num_samples,dimensions))
    5. intervals = np.linspace(0,1,num_samples+1)
    6. for i in range(dimensions):
    7. offsets = np.array([np.random.uniform(intervals[np.random.randint(0,num_samples+1)],intervals[np.random.randint(0,num_samples+1)]) for i in range(num_samples)])
    8. samples[:,i] = offsets
    9. return samples
    10. if __name__ == '__main__':
    11. dimensions = 25
    12. num_samples = 1000
    13. # Perform Latin Hypercube Sampling
    14. samples = latin_hypercube_sampling(dimensions, num_samples)
    15. # Calculate mean and variance for each dimension
    16. means = np.mean(samples, axis=0)
    17. variances = np.var(samples, axis=0)
    18. # Print results
    19. print("Means:", means)
    20. print("Variances:", variances)

  • 相关阅读:
    Django-ORM-1:orm的字段和参数
    全面解析内存泄漏检测与修复技术
    【产品经理修炼之道】- 政务G端产品建设指南
    135:vue+openlayers添加海量点,使用WebGLPoints方法(示例代码)
    redis
    【Qt6】QWindow类可以做什么
    多场景适用,多卡聚合路由器才是真刚需
    SpringBoot整合RabbitMQ
    命令行运行没问题,单独pycharm就报错
    工具及方法 - 编辑二进制文件(使用VSCode和Notepad++的插件Hex Editor)
  • 原文地址:https://blog.csdn.net/Chandler_river/article/details/133111124