• ClickHouse中“大列”造成的JOIN的内存超限问题


    ClickHouse中“大列”造成的JOIN的内存超限问题

    “大列”是指单行数据量非常大的列,通常是100KiB以上。这样的列会导致JOIN(通常LEFT JOIN 和 INNER JOIN)出现内存超限的异常。

    常用的JOIN算法

    这里讨论的是常用的JOIN算法:partial merge join 与 hash join。Direct join算法不在本文讨论范围。

    内存超限的错误一般长这样:

    2024.01.16 17:04:30.084819 [ 3654586 ] {52653888-f266-443a-beba-348c66a88e60}  TCPHandler: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 33.61 GiB (attempt to allocate chunk of 34359738368 bytes), maximum: 27.94 GiB.: While executing JoiningTransform. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
    

    测试数据

    为了分析“大列”的JOIN运算出现内存超限问题,这里准备了特定的测试数据。

    测试数据为两张表:

    table1 - 相当于N表,10W行,两列:id,fk(外键)

    table2 - 相当于1表,1W行,两列:id,value(value为超长字符串列,每个cell大小500K左右)

    创建测试数据的查询如下:

    drop database if exists `big_columns_join`;
    create database `big_columns_join`;
    use big_columns_join;
    
    create table `big_columns_join`.`big_columns_table` (
        `id` UInt64,
        `fk` UInt64
    ) ENGINE = MergeTree() order by id
    as (select 
            `number`,
            rand(4) % 10000
        from numbers(100000)
        settings max_block_size = 100
    );
    
    create table `big_columns_join`.`dim_table1` (
        `id` UInt64,
        `value` String
    ) ENGINE = MergeTree() order by id
    as (select
            `number`,
            repeat(toString(rand(1)), 100000)
        from numbers(10000));
    

    测试查询

    测试查询是简单的LEFT JOIN。

    select
        *
    from
        `big_columns_join`.`big_columns_table`
        left join `big_columns_join`.`dim_table1` on `big_columns_join`.`big_columns_table`.`fk` = `big_columns_join`.`dim_table1`.`id`
    format Null;
    

    Hash join算法

    运行测试查询之前,设定max_joined_block_size_rows阈值。

    set join_algorithm = 'hash';
    set max_joined_block_size_rows=10000;  -- 阈值设小,需要内存会变少。
    -- 运行测试查询
    

    测试结果是:

    JOIN算法为hash join时,无论如何调整max_joined_block_size_rows的阈值大小,均会内存超限。

    报错信息如下:

    0. Poco::Exception::Exception(String const&, int) @ 0x000000001504f759 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000b243077 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    2. DB::Exception::Exception>>(int, FormatStringHelperImpl::type, std::type_identity::type, std::type_identity::type, std::type_identity::type, std::type_identity::type, std::type_identity::type, std::type_identity>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view>&&) @ 0x000000000b255d97 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    3. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000b2551b9 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    4. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000b254d6a in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    5. CurrentMemoryTracker::alloc(long) @ 0x000000000b21b30f in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    6. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x000000000b218368 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    7. void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x0000000006343dc6 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    8. DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x0000000011c0ae2c in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    9. void DB::(anonymous namespace)::AddedColumns::appendFromBlock(DB::Block const&, unsigned long) @ 0x00000000113d24ca in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    10. DB::PODArrayfalse, false>, 63ul, 64ul> DB::(anonymous namespace)::joinRightColumns<(DB::JoinKind)1, (DB::JoinStrictness)3, DB::ColumnsHashing::HashMethodOneNumber, DB::RowRefList const, unsigned long, false, true, false>, HashMapTable, HashTableNoState, PairNoInit>, DefaultHash, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>(std::vector, DB::RowRefList const, unsigned long, false, true, false>, std::allocator, DB::RowRefList const, unsigned long, false, true, false>>>&&, std::vector, HashTableNoState, PairNoInit>, DefaultHash, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>> const*, std::allocator, HashTableNoState, PairNoInit>, DefaultHash, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>> const*>> const&, DB::(anonymous namespace)::AddedColumns&, DB::JoinStuff::JoinUsedFlags&) @ 0x00000000113d602a in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    11. void DB::HashJoin::joinBlockImpl<(DB::JoinKind)1, (DB::JoinStrictness)3, DB::HashJoin::MapsTemplate>(DB::Block&, DB::Block const&, std::vector const*, std::allocator const*>> const&, bool) const @ 0x0000000011493867 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    12. DB::HashJoin::joinBlock(DB::Block&, std::shared_ptr&) @ 0x00000000113a40e9 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    13. DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x0000000012a32b90 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    14. DB::JoiningTransform::transform(DB::Chunk&) @ 0x0000000012a31905 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    15. DB::JoiningTransform::work() @ 0x0000000012a313aa in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    16. DB::ExecutionThreadContext::executeTask() @ 0x0000000012810d13 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    17. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic*) @ 0x0000000012808ad0 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    18. void std::__function::__policy_invoker ()>::__call_impl::__function::__default_alloc_func::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x000000001280965b in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    19. ThreadPoolImplfalse>>::worker(std::__list_iteratorfalse>, void*>) @ 0x000000000b317483 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    20. ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl ThreadPoolImplfalse>>::scheduleImpl(std::function, Priority, std::optional, bool)::'lambda0'()>(void&&)::'lambda'()::operator()() @ 0x000000000b31a5bf in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    21. ThreadPoolImpl::worker(std::__list_iterator) @ 0x000000000b314b07 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    22. void* std::__thread_proxy[abi:v15000]>, void ThreadPoolImpl::scheduleImpl(std::function, Priority, std::optional, bool)::'lambda0'()>>(void*) @ 0x000000000b318a2e in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
    23. ? @ 0x00007f6db212b609 in ?
    24. ? @ 0x00007f6db2050133 in ?
    

    模板方法joinRightColumns根据join key查找对应的右边表的行,找到行对应的row_number,然后通过调用appendFromBlock(block, row_numer)完成一行的JOIN结果。

    appendFromBlock每次调用IColumn::insertFrom插入一行。这里有个问题:ColumnString的内容增长是一行一行地增长,在这个过程中涉及到多个reallocation内存重分配,并造成大量的内存之间的数据复制。

    parital merge join算法

    运行测试查询之前,设定max_joined_block_size_rows阈值。

    set join_algorithm = 'parital_merge';
    set max_joined_block_size_rows=10000;  -- 阈值设小,需要内存会变少。
    -- 第一次运行测试查询
    
    set max_joined_block_size_rows=1000000;  -- 阈值设大,需要内存可能会更多。
    -- 第二次运行测试查询
    

    测试结果是:

    JOIN算法为parital merge join时,调小max_joined_block_size_rows的阈值,内存就不会超限。

    当max_joined_block_size_rows比较大时(上例中第二次运行测试查询),会发生内存超限问题。

    出错信息:

    2024.01.16 17:04:30.084819 [ 3654586 ] {52653888-f266-443a-beba-348c66a88e60}  TCPHandler: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 33.61 GiB (attempt to allocate chunk of 34359738368 bytes), maximum: 27.94 GiB.: While executing JoiningTransform. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
    
    0. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/exception:134: std::exception::capture() @ 0x000000000b609a42 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    1. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/exception:112: std::exception::exception[abi:v15000]() @ 0x000000000b609a0d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    2. /home/ubuntu/depot/ch-pro/base/poco/Foundation/src/Exception.cpp:27: Poco::Exception::Exception(String const&, int) @ 0x000000002562ed40 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    3. /home/ubuntu/depot/ch-pro/src/Common/Exception.cpp:96: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000144166ee in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    4. /home/ubuntu/depot/ch-pro/src/Common/Exception.h:73: DB::Exception::Exception(String&&, int, bool) @ 0x000000000b5fba8a in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    5. /home/ubuntu/depot/ch-pro/src/Common/Exception.h:101: DB::Exception::Exception>>(int, FormatStringHelperImpl::type, std::type_identity::type, std::type_identity::type, std::type_identity::type, std::type_identity::type, std::type_identity::type, std::type_identity>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view>&&) @ 0x0000000014435b3d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    6. /home/ubuntu/depot/ch-pro/src/Common/MemoryTracker.cpp:339: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x00000000144340b0 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    7. /home/ubuntu/depot/ch-pro/src/Common/MemoryTracker.cpp:397: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000014434402 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    8. /home/ubuntu/depot/ch-pro/src/Common/CurrentMemoryTracker.cpp:58: CurrentMemoryTracker::allocImpl(long, bool) @ 0x00000000143bb7db in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    9. /home/ubuntu/depot/ch-pro/src/Common/CurrentMemoryTracker.cpp:90: CurrentMemoryTracker::alloc(long) @ 0x00000000143bb9e1 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    10. /home/ubuntu/depot/ch-pro/src/Common/Allocator.h:161: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x00000000143b5eba in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    11. /home/ubuntu/depot/ch-pro/src/Common/PODArray.h:165: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::realloc<>(unsigned long) @ 0x000000000b667142 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    12. /home/ubuntu/depot/ch-pro/src/Common/PODArray.h:0: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x000000000b666ee6 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    13. /home/ubuntu/depot/ch-pro/src/Columns/ColumnString.h:147: DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x000000001f0eaaff in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    14. /home/ubuntu/depot/ch-pro/src/Columns/IColumn.h:179: DB::IColumn::insertManyFrom(DB::IColumn const&, unsigned long, unsigned long) @ 0x000000000b60192b in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    15. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:404: DB::(anonymous namespace)::copyRightRange(DB::Block const&, DB::Block const&, std::vector::mutable_ptr::IColumn>, std::allocator::mutable_ptr::IColumn>>>&, unsigned long, unsigned long) @ 0x000000001e999757 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    16. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:432: bool DB::(anonymous namespace)::joinEquals(DB::Block const&, DB::Block const&, DB::Block const&, std::vector::mutable_ptr::IColumn>, std::allocator::mutable_ptr::IColumn>>>&, std::vector::mutable_ptr::IColumn>, std::allocator::mutable_ptr::IColumn>>>&, DB::MergeJoinEqualRange&, unsigned long) @ 0x000000001e997d86 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    17. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:892: bool DB::MergeJoin::leftJoin<true>(DB::MergeJoinCursor&, DB::Block const&, DB::MergeJoin::RightBlockInfo&, std::vector::mutable_ptr::IColumn>, std::allocator::mutable_ptr::IColumn>>>&, std::vector::mutable_ptr::IColumn>, std::allocator::mutable_ptr::IColumn>>>&, unsigned long&) @ 0x000000001e9b2a7d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    18. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:793: void DB::MergeJoin::joinSortedBlock<false, true>(DB::Block&, std::shared_ptr&) @ 0x000000001e99b738 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    19. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:733: DB::MergeJoin::joinBlock(DB::Block&, std::shared_ptr&) @ 0x000000001e997512 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    20. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:206: DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x00000000208ca1e3 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    21. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:191: DB::JoiningTransform::transform(DB::Chunk&) @ 0x00000000208c9f4e in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    22. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:125: DB::JoiningTransform::work() @ 0x00000000208c985d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    23. /home/ubuntu/depot/ch-pro/src/Processors/Executors/ExecutionThreadContext.cpp:47: DB::executeJob(DB::ExecutingGraph::Node*, DB::ReadProgressCallback*) @ 0x00000000203e5263 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    24. /home/ubuntu/depot/ch-pro/src/Processors/Executors/ExecutionThreadContext.cpp:95: DB::ExecutionThreadContext::executeTask() @ 0x00000000203e4fa0 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    25. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:251: DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic*) @ 0x00000000203ca041 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    26. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:217: DB::PipelineExecutor::executeSingleThread(unsigned long) @ 0x00000000203ca357 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    27. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:341: DB::PipelineExecutor::spawnThreads()::$_0::operator()() const @ 0x00000000203cb098 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    28. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/invoke.h:394: decltype(std::declval::$_0&>()()) std::__invoke[abi:v15000]::$_0&>(DB::PipelineExecutor::spawnThreads()::$_0&) @ 0x00000000203caff5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    29. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/invoke.h:480: void std::__invoke_void_return_wrappertrue>::__call::PipelineExecutor::spawnThreads()::$_0&>(DB::PipelineExecutor::spawnThreads()::$_0&) @ 0x00000000203cafd5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    30. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/function.h:235: std::__function::__default_alloc_func::PipelineExecutor::spawnThreads()::$_0, void ()>::operator()[abi:v15000]() @ 0x00000000203cafb5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    31. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/function.h:716: void std::__function::__policy_invoker ()>::__call_impl::__function::__default_alloc_func::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x00000000203caf80 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
    
    

    MergeJoin::leftJoin负责partial merge join的LEFT JOIN运算的实现,最终会调用::copyRightRange方法,复制右边表的内容。在此期间IColumn::insertFrom会被调用,这会涉及到调用realloc重分配内存。在这个点上可能会发生内存超限的异常。

    LEFT ANY JOIN

    LEFT ANY JOIN走不同代码分支,该代码分支忽略了setting配置max_joined_block_size_rows,因此即使调小了max_joined_block_size_rows,LEFT ANY JOIN依然会产生内存超限异常。

    “auto”的内存问题

    根据文档,设置join_algorithm='auto' 以启用自动切换JOIN算法的模式,ClickHouse先尝试用hash join算法,然后在内存超过阈值的情况下切换到partial merge join算法。阈值有两个:max_bytes_in_join和max_rows_in_join,指定hash join算法创建的hash table的最大的占用内存字节数和行数。两个阈值只要有一个超过就算是超过阈值。

    但是在实践中发现,原先的尝试的hash join所消耗的内存仍然没有释放,导致在切换到partial merge join后在原本不应该超限的情况下仍然产生内存超限的异常。参考以下示例:

    SELECT
        t1.id AS id,
        t2.value AS value
    FROM big_columns_join.big_columns_table AS t1
    LEFT JOIN big_columns_join.dim_table1 AS t2 ON t1.fk = t2.id
    SETTINGS join_algorithm = 'auto', max_bytes_in_join = '15G'
    FORMAT `Null`;
    

    运行日志如下:

    2024.01.19 11:31:40.580304 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  JoiningTransform: Before join block: 'id UInt64 UInt64(size = 0), fk UInt64 UInt64(size = 0)'
    2024.01.19 11:31:40.580692 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  JoiningTransform: After join block: 'id UInt64 UInt64(size = 0), fk UInt64 UInt64(size = 0), value String String(size = 0)'
    2024.01.19 11:31:40.680382 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 1.01 GiB.
    2024.01.19 11:31:40.753297 [ 3654852 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 2.01 GiB.
    2024.01.19 11:31:40.833103 [ 3654862 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 3.01 GiB.
    2024.01.19 11:31:40.905190 [ 3654854 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 4.01 GiB.
    2024.01.19 11:31:41.047932 [ 3654900 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 6.01 GiB.
    2024.01.19 11:31:41.117219 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 7.00 GiB.
    2024.01.19 11:31:41.198586 [ 3654851 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 8.01 GiB.
    2024.01.19 11:31:41.272147 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 9.00 GiB.
    2024.01.19 11:31:41.342679 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 10.00 GiB.
    2024.01.19 11:31:41.416512 [ 3654851 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 11.01 GiB.
    2024.01.19 11:31:41.486753 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 12.01 GiB.
    2024.01.19 11:31:41.560137 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 13.00 GiB.
    2024.01.19 11:31:41.630755 [ 3654862 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 14.01 GiB.
    2024.01.19 11:31:41.633781 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  HashJoin: (0x7f8299383918) Join data is being released, 15001390080 bytes and 8940 rows in hash table
    2024.01.19 11:31:41.666781 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MergeJoin: Joining keys: left [fk], right [t2.id]
    2024.01.19 11:31:41.666943 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  HashJoin: (0x7f8299383918) Join data has been already released
    2024.01.19 11:31:41.764950 [ 3654852 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 15.00 GiB.
    2024.01.19 11:31:42.024590 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 16.14 GiB.
    2024.01.19 11:31:42.221853 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 17.64 GiB.
    2024.01.19 11:31:42.481914 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 19.64 GiB.
    2024.01.19 11:31:43.001865 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  MemoryTracker: Current memory usage (for query): 23.64 GiB.
    2024.01.19 11:31:44.610495 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0}  executeQuery: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 31.64 GiB (attempt to allocate chunk of 17179869184 bytes), maximum: 27.94 GiB.: While executing MergeSortingTransform: While executing FillingRightJoinSide. (MEMORY_LIMIT_EXCEEDED) (version 23.8.3.1.alex-build-debug) (from [::ffff:127.0.0.1]:33076) (in query: select t1.id as id, t2.value as `value` from `big_columns_join`.`big_columns_table` as t1 left join `big_columns_join`.`dim_table1` as t2 on t1.fk = t2.id settings join_algorithm='auto',max_bytes_in_join=15000000000 format Null;), Stack trace (when copying this message, always include the lines below):
    

    从日志可以看到,首先采用了hash join,然后内存使用量一路飙升到15GiB的时候切换到了partial merge join(日志第17行),原先hash join所使用的15GiB内存将被释放(日志第16行)。最后hash join的内存被释放(日志第18行),但是实际内存使用量并没有减少,仍然从15GiB开始继续增长。留给partial merge join的可用内存不多,因此很快产生内存超限MEMORY_LIMIT_EXCEEDED异常。

    目前找到的解决办法只能是调小阈值,但是这样导致原本可以使用hash join的情况下却被迫切换到相对慢速的partial merge join。影响整体性能。

  • 相关阅读:
    python 爬取天气预报天气
    数据化运营15 活跃(上):如何通过运营手法提升⽤户活跃度?
    如何使用 ABAP 代码发送邮件到指定邮箱
    2023高教社杯 国赛数学建模C题思路 - 蔬菜类商品的自动定价与补货决策
    机组运行约束对机组节点边际电价的影响研究(Matlab代码实现)
    码农批量下岗后的全新开发模式预览
    184页10万字智慧城市公共服务中台:业务和数据中台建设方案
    Springboot 项目 org.apache.poi 版本冲突
    集成Emscripten+wasm至React项目踩坑记录
    基于ABP的AppUser对象扩展
  • 原文地址:https://www.cnblogs.com/chengxin1985/p/17978176