• 看看你离世界一流大厂有多远?3道Google最新SQL面试题 ⛵


    💡 作者:韩信子@ShowMeAI
    📘 数据分析◉技能提升系列https://www.showmeai.tech/tutorials/33
    📘 AI 面试题库系列https://www.showmeai.tech/tutorials/48
    📘 本文地址https://www.showmeai.tech/article-detail/297
    📢 声明:版权所有,转载请联系平台与作者并注明出处
    📢 收藏ShowMeAI查看更多精彩内容

    下面是最新的 3 道 Google SQL 面试题和参考答案。这些题目面向的 Google 职位包括:数据科学 数据分析师商业智能 工程师数据工程师商业分析师

    ShowMeAI 制作了快捷即查即用的 SQL 速查表手册,大家可以在下述位置获得:

    💡 面试题 1:墨西哥和美国第三高峰

    问题: 请完成1个 SQL 来找出每个国家第三高的山名,并按 ASC 顺序对国家/地区排序。

    Table: mountains
    +---------------------+------+-------------+
    |name                 |height|country      |
    +---------------------+------+-------------+
    |Denalli              |20310 |United States|
    |Saint Elias          |18008 |United States|
    |Foraker              |17402 |United States|
    |Pico de Orizab       |18491 |Mexico       |
    |Popocatépetl         |17820 |Mexico       |
    |Iztaccihuatl         |17160 |Mexico       |
    +---------------------+------+-------------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    参考答案:

    SELECT "country",
           "name"
    FROM   (SELECT "country",
                   "name",
                   Rank()
                     OVER (
                       partition BY "country"
                       ORDER BY "height" DESC) AS "rank"
            FROM   mountains) AS m
    WHERE  "rank" = 3
    ORDER  BY country ASC 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    💡 面试题 2:用 latest_event 查找当前打开的页数

    问题: 给定下表,表中包含有关页面状态更改时间的信息。完成 SQL 查找当前使用 latest_event 的页面数。 注意,表中 page_flag 列将用于识别页面是『OFF』还是『ON』。

    Table: pages_info
    +-------+--------------------------------------+----------+
    |page_id|event_time                            |page_flag |
    +-------+--------------------------------------+----------+
    |1      |current_timestamp - interval '6 hours'|ON        |
    |1      |current_timestamp - interval '3 hours'|OFF       |
    |1      |current_timestamp - interval '1 hours'|ON        |
    |2      |current_timestamp - interval '3 hours'|ON        |
    |2      |current_timestamp - interval '1 hours'|OFF       |
    |3      |current_timestamp                     |ON        |
    +-------+--------------------------------------+----------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    参考答案:

    -- 首先,对于每个页面ID,让我们选择最新的记录(基于事件时间列)。
    SELECT page_id,
           Max(event_time) AS latest_event
    FROM   pages_info
    GROUP  BY page_id 
    
    -- 接着,我们将前面的查询与原表连接起来,并检查其中有多少人的标记页等于ON。
    WITH latest_event
         AS (SELECT page_id,
                    Max(event_time) AS latest_event
             FROM   pages_info
             GROUP  BY page_id)
    SELECT Sum(CASE
                 WHEN page_flag = 'ON' THEN 1
                 ELSE 0
               END) AS result
    FROM   pages_info pi
           JOIN latest_event le
             ON pi.page_id = le.page_id
                AND pi.event_time = le.latest_event; 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    💡 面试题 3:回访用户

    问题: 在如下的数据库表中,包含有关用户访问网页的信息。 完成 SQL 返回连续访问该页面最长的 3 个用户,按长短的倒序排列 3 个用户。

    Table: visits
    +--------+----------------------------+
    |user_id |date                        | 
    +--------+----------------------------+
    |1       |current_timestamp::DATE - 0 |
    |1       |current_timestamp::DATE - 1 |
    |1       |current_timestamp::DATE - 2 |
    |1       |current_timestamp::DATE - 3 |
    |1       |current_timestamp::DATE - 4 |
    |2       |current_timestamp::DATE - 1 |
    |4       |current_timestamp::DATE - 0 |
    |4       |current_timestamp::DATE - 1 |
    |4       |current_timestamp::DATE - 3 |
    |4       |current_timestamp::DATE - 4 |
    |4       |current_timestamp::DATE - 62|   
    +--------+----------------------------+
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    参考答案:

    --首先,让我们添加一个新的列,其值是每个用户的下一次访问(与当前日期不同)。我们将使用lead函数来完成:
    SELECT DISTINCT user_id,
                    date,
                    Lead(date)
                      OVER (
                        partition BY user_id
                        ORDER BY date) AS next_date
    FROM   (SELECT DISTINCT *
            FROM   visits) AS t; 
    --接着,让我们创建另一个列,其目的是让我们知道访问的停止。这包括检查下一个日期是否与当前日期+1是否不同。
    WITH next_dates
         AS (SELECT DISTINCT user_id,
                             date,
                             Lead(date)
                               OVER (
                                 partition BY user_id
                                 ORDER BY date) AS next_date
             FROM   (SELECT DISTINCT *
                     FROM   visits) AS t) --去重
    SELECT user_id,
           date,
           next_date,
           CASE
             WHEN next_date IS NULL
                   OR next_date = date + 1 THEN 1
             ELSE NULL
           END AS streak
    FROM   next_dates; 
    --接着,我们将为每个用户创建一个分区,每个分区代表一个连续的访问。从概念上讲,我们要做的是,对于每个用户,取最近的记录(基于日期)并赋值为0,然后寻找下面的记录,如果访问没有停止就赋值为0,如果访问停止就赋值为1(如果连胜列为空),然后继续这样做,直到每个连续访问被一个不同的分区所代表。执行这一逻辑的代码如下。
    WITH next_dates
         AS (SELECT DISTINCT user_id,
                             date,
                             Lead(date)
                               OVER (
                                 partition BY user_id
                                 ORDER BY date) AS next_date
             FROM   (SELECT DISTINCT *
                     FROM   visits)),
         streaks
         AS (SELECT user_id,
                    date,
                    next_date,
                    CASE
                      WHEN next_date IS NULL
                            OR next_date = date + 1 THEN 1
                      ELSE NULL
                    END AS streak
             FROM   next_dates)
    SELECT *,
           Sum(CASE
                 WHEN streak IS NULL THEN 1
                 ELSE 0
               END)
             OVER (
               partition BY user_id
               ORDER BY date) AS partition
    FROM   streaks; 
    --一旦我们有了这个分区,问题就容易了,现在我们只需要计算每个用户和分区的记录数,并找到计数最多的用户。完整的查询如下
    WITH next_dates AS
    (
                    SELECT DISTINCT user_id,
                                    date,
                                    Lead(date) OVER (partition BY user_id ORDER BY date) AS next_date
                    FROM            visits ), streaks AS
    (
           SELECT user_id,
                  date,
                  next_date,
                  CASE
                         WHEN next_date IS NULL
                         OR     next_date = date + 1 THEN 1
                         ELSE NULL
                  END AS streak
           FROM   next_dates ), partitions AS
    (
             SELECT   *,
                      Sum(
                      CASE
                               WHEN streak IS NULL THEN 1
                               ELSE 0
                      END ) OVER (partition BY user_id ORDER BY date) AS partition
             FROM     streaks ), count_partitions AS
    (
             SELECT   user_id,
                      partition,
                      Count(1) AS streak_days
             FROM     partitions
             GROUP BY user_id,
                      partition )
    SELECT   user_id,
             Max(streak_days) AS longest_streak
    FROM     count_partitions
    GROUP BY user_id
    ORDER BY 2 DESC limit 3;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94

    参考资料

  • 相关阅读:
    nmake简介
    懵了,阿里一面就被虐了,幸获内推华为技术四面,成功拿到offer
    SpringCloud搭建微服务之OAuth2实现SSO单点登录
    SpringBoot数据层解决方案
    沈阳航空航天大学计算机考研资料汇总
    Java HashMap 的扩容因子为什么是 0.75
    MySQL 表查询关键字
    1.1) Hadoop单机环境安装与部署测试服务器免密登录
    【重拾Java系列】—— 集合之Collection
    【可扩展性】谷歌可扩展和弹性应用的模式
  • 原文地址:https://blog.csdn.net/ShowMeAI/article/details/125986540