Presto RBO之 Sort算子优化

一. 前言

本文主要介绍在Presto中部分与Sort(order by)相关的RBO优化。

二. MergeLimitWithSort规则

MergeLimitWithSort的作用主要将Limit + OrderBy 算子转化为TopN算子，如下图的执行计划所示：

MergeLimitWithSort的实现代码也非常简单，主要是将Limit+OrderBy的Node替换成TopN的Node即可。

三. MergeLimitOverProjectWithSort规则

MergeLimitOverProjectWithSort 的作用是将Limit-Project-Sort的算子转化成Project-TopN的算子的形式，如下所示：

MergeLimitOverProjectWithSort的代码实现过程主要通过将Project的Source替换成一个TopN Node，其中N为Limit的数量，排序列为Sort的排序列。

四. RemoveRedundantSort规则

RemoveRedundantSort 主要作用是去除只有1条结果或者0 条结果的Sort。如下执行计划中，order by 1是order by就只会有一条结果，因此Sort算子可以裁剪掉。

其代码实现过程如下所示：


public Result apply(SortNode node, Captures captures, Context context)
{
    // 如果sort的source无结果的话，则将整个执行计划都裁掉
    if (isAtMost(node.getSource(), context.getLookup(), 0)) {
        return Result.ofPlanNode(new ValuesNode(node.getId(), node.getOutputSymbols(), ImmutableList.of()));
    }
    // 如果只有一条结果的话，直接裁掉Sort算子
    if (isScalar(node.getSource(), context.getLookup())) {
        return Result.ofPlanNode(node.getSource());
    }
    return Result.empty();
}

五. PruneOrderByInAggregation规则

PruneOrderByInAggregation 主要作用是裁剪掉聚合函数内部的sort算子。如下执行计划所示，sum(id order by id)中的order by无意义，因此可以裁剪掉。

其代码实现的过程如下所示：


public Result apply(AggregationNode node, Captures captures, Context context)
{
    ......
    for (Map.Entry<Symbol, Aggregation> entry : node.getAggregations().entrySet()) {
        Aggregation aggregation = entry.getValue();
        // 聚合操作中无sort操作，不做处理
        if (!aggregation.getOrderingScheme().isPresent()) {
            aggregations.put(entry);
        }
        // getAggregateFunctionImplementation can be expensive, so check it last.
        else if (metadata.getFunctionAndTypeManager().getAggregateFunctionImplementation(aggregation.getFunctionHandle()).isOrderSensitive()) {
            aggregations.put(entry);
        }
        // 聚合操作中无sort操作，裁剪掉sort算子
        else {
            anyRewritten = true;
            aggregations.put(entry.getKey(), new Aggregation(
                    aggregation.getFunctionCall(),
                    aggregation.getArguments(),
                    aggregation.isDistinct(),
                    aggregation.getFilter(),
                    Optional.empty(),   // 裁剪sort算子的地方
                    aggregation.getMask()));
        }
    }
    ......
}

相关阅读:
腾讯云服务器多少钱一年？2023年腾讯云优惠云服务器推荐
2023工博会强势回归！智微工业携八大系列重磅亮相
多表操作-外键级联操作
Tomcat 源码分析 (连接器) (六)
Nginx ngx_http_auth_request_module模块鉴权【下】携带账号密码登录
计算机毕业设计之java+ssm基于web的医药进出口交易系统
Python------列表集合字典推导式（本文以集合为主）
HTTP中的重定向和请求转发的区别
数据可视化：数据可视化的意义
Java精进-手写持久层框架

原文地址：https://blog.csdn.net/wangfeihuo/article/details/127698731