leecode 数据库：1148. 文章浏览 I

导入数据：

SQL Schema:


Create table If Not Exists Views (article_id int, author_id int, viewer_id int, view_date date);
Truncate table Views;
insert into Views (article_id, author_id, viewer_id, view_date) values ('1', '3', '5', '2019-08-01');
insert into Views (article_id, author_id, viewer_id, view_date) values ('1', '3', '6', '2019-08-02');
insert into Views (article_id, author_id, viewer_id, view_date) values ('2', '7', '7', '2019-08-01');
insert into Views (article_id, author_id, viewer_id, view_date) values ('2', '7', '6', '2019-08-02');
insert into Views (article_id, author_id, viewer_id, view_date) values ('4', '7', '1', '2019-07-22');
insert into Views (article_id, author_id, viewer_id, view_date) values ('3', '4', '4', '2019-07-21');
insert into Views (article_id, author_id, viewer_id, view_date) values ('3', '4', '4', '2019-07-21');

Pandas Schema:


data = [[1, 3, 5, '2019-08-01'], [1, 3, 6, '2019-08-02'], [2, 7, 7, '2019-08-01'], [2, 7, 6, '2019-08-02'], [4, 7, 1, '2019-07-22'], [3, 4, 4, '2019-07-21'], [3, 4, 4, '2019-07-21']]
Views = pd.DataFrame(data, columns=['article_id', 'author_id', 'viewer_id', 'view_date']).astype({'article_id':'Int64', 'author_id':'Int64', 'viewer_id':'Int64', 'view_date':'datetime64[ns]'})

Views 表：

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
此表可能会存在重复行。（换句话说，在 SQL 中这个表没有主键）
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意，同一人的 author_id 和 viewer_id 是相同的。

请查询出所有浏览过自己文章的作者

结果按照 id 升序排列。

查询结果的格式如下所示：

示例 1：

输入：


Views 表：
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

输出：


+------+
| id   |
+------+
| 4    |
| 7    |
+------+

思路
筛选
先筛选出符合题目要求的行数据 views.loc[views['author_id']==views['viewer_id'],['author_id']]

去重
drop_duplicates() 函数

列重命名
rename(columns={'author_id':'id'})

排序
默认升序 sort_values(['id'])

pandas 代码


import pandas as pd
 
def article_views(views: pd.DataFrame) -> pd.DataFrame:
    return (views.loc[views['author_id']==views['viewer_id'],['author_id']]
                 .drop_duplicates()
                 .rename(columns={'author_id':'id'})
                 .sort_values(['id']))

相关阅读:
mysql中GROUP_CONCAT函数详解
Jenkins的介绍与相关配置
基于jeecg-boot的flowable流程历史记录显示修改
可视化数据科学平台在信贷领域应用系列四：决策树策略挖掘
记住这份软件测试八股文还怕不能拿offer？你值得拥有
FPGA到底是什么？
数据库之API操作
新零售SaaS架构：促销系统架构设计
OceanBase 里的 schema 是什么？
玩机教程:阿里云无影云电脑怎么使用？

原文地址：https://blog.csdn.net/Clittle225/article/details/132602925