• 通过 python 生成随机数据,并批量插入到 Amazon DocumentDB (或mongodb) 中


    通过 python 生成随机数据,并批量插入到 Amazon DocumentDB (或mongodb) 中。

    Python 生成随机数据。 使用 random。 例如:
    随机整数 (0 - 999999)

    id = random.randint(0,999999)
    
    • 1

    随机选择一个 item

    enum_city = ['Beijing','Shanghai','Guangzhou','Shenzhen','Hangzhou','Wuhan']
    city = random.choice(enum_city)
    
    • 1
    • 2

    随机字符串

    import random
    import string
    str = random.sample(string.ascii_letters + string.digits, 16)
    print(''.join(str))
    
    • 1
    • 2
    • 3
    • 4

    生成想要的数据格式(json)

        enum_bool = ['true', 'false']
        enum_sexy = ['male', 'female']
        enum_city = ['Beijing','Shanghai','Guangzhou','Shenzhen','Hangzhou','Wuhan']
        enum_device = ['IOS','Android']
        random_id = random.randint(0,99999999)
        mobile = '138%s' % random_id
        smsConsent = random.choice(enum_bool)
        emailConsent = random.choice(enum_bool)
        sexual = random.choice(enum_sexy)
        city = random.choice(enum_city)
        device = random.choice(enum_device)
        insertdata = '''
    {
        "journeyId" : 1,
        "mobile": "%s",
        "email": "%s",
        "smsConsent": "%s",
        "emailConsent": "%s",
        "nextStepId": 1,
        "traits": [
          {"tag": "sexual", "value": "%s"},
          {"tag": "city", "value": "%s" },
          {"tag": "device", "value": "%s"}
        ]
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    链接 DocumentDB,插入批量数据

    import pymongo
    myclient = pymongo.MongoClient('mongodb://dbadmin:XXX@docdb.XXXXX.docdb.cn-north-1.amazonaws.com.cn:27017/?tls=true&tlsCAFile=rds-combined-ca-cn-bundle.pem&replicaSet=rs0&readPreference=s
    econdaryPreferred&retryWrites=false')
    data = [{"item1":"1"},{"item2":"2"},...]
    db = myclient["dbname"]
    col = db.col_test01
    col.insert_many(data)
    并行执行
    from multiprocessing import Pool
    p = Pool()
        for i in range(5):
            p.apply(func=insert_data, args=())
        p.close()
        p.join()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    把以上连起来的最终代码

    import pymongo
    import sys
    from multiprocessing import Pool
    import random
    import json
    
    
    def insert_data():
        myclient = pymongo.MongoClient('mongodb://dbadmin:XXX@docdb.XXXXX.docdb.cn-north-1.amazonaws.com.cn:27017/?tls=true&tlsCAFile=rds-combined-ca-cn-bundle.pem&replicaSet=rs0&readPreference=s
    econdaryPreferred&retryWrites=false')
        for i in range(1000):
            data = []
            db = myclient["dbname"]
            col = db.col_test01
            for j in range(1000):
                enum_bool = ['true', 'false']
                enum_sexy = ['male', 'female']
                enum_city = ['Beijing','Shanghai','Guangzhou','Shenzhen','Hangzhou','Wuhan']
                enum_device = ['IOS','Android']
                random_id = random.randint(0,99999999)
                mobile = '138%s' % random_id
                email = '%s@csdn.com' % random_id
                smsConsent = random.choice(enum_bool)
                emailConsent = random.choice(enum_bool)
                sexual = random.choice(enum_sexy)
                city = random.choice(enum_city)
                device = random.choice(enum_device)
                insertdata = '''{
                "Id" : 1,
                "mobile": "%s",
                "email": "%s",
                "smsConsent": "%s",
                "emailConsent": "%s",
                "nextId": 1,
                "traits": [
                  {"tag": "sexual", "value": "%s"},
                  {"tag": "city", "value": "%s" },
                  {"tag": "device", "value": "%s"}
                ]
            }
             ''' % (mobile,email,smsConsent,emailConsent,sexual,city,device)
                json_insertdata = json.loads(insertdata)
                data.append(json_insertdata)
            col.insert_many(data)
    
    
    if __name__ == '__main__':
        p = Pool()
        for i in range(5):
            p.apply(func=insert_data, args=())
        p.close()
        p.join()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
  • 相关阅读:
    进程与计划任务
    想跟大家说点心里话~(希望大家都看一下谢谢各位 !!)
    麦芽糖-聚乙二醇-阿霉素maltose-Doxorubicin
    Linux C语言编译报错:undefined reference to `sem_init‘(编译时加 -lpthread)
    类与对象(十七)----继承extend
    谷歌浏览器HttpOnly跨域请求
    2023最新SSM计算机毕业设计选题大全(附源码+LW)之java高校教室管理系统9y8cv
    java毕业设计滁州市的围棋协会网站Mybatis+系统+数据库+调试部署
    C# netcore 创建WebService(SoapCore)
    Java 线程池之ThreadPoolExecutor学习总结
  • 原文地址:https://blog.csdn.net/chuckchen1222/article/details/128167476