导入数据:
SQL Schema:
- Create table If Not Exists Views (article_id int, author_id int, viewer_id int, view_date date);
- Truncate table Views;
- insert into Views (article_id, author_id, viewer_id, view_date) values ('1', '3', '5', '2019-08-01');
- insert into Views (article_id, author_id, viewer_id, view_date) values ('1', '3', '6', '2019-08-02');
- insert into Views (article_id, author_id, viewer_id, view_date) values ('2', '7', '7', '2019-08-01');
- insert into Views (article_id, author_id, viewer_id, view_date) values ('2', '7', '6', '2019-08-02');
- insert into Views (article_id, author_id, viewer_id, view_date) values ('4', '7', '1', '2019-07-22');
- insert into Views (article_id, author_id, viewer_id, view_date) values ('3', '4', '4', '2019-07-21');
- insert into Views (article_id, author_id, viewer_id, view_date) values ('3', '4', '4', '2019-07-21');
Pandas Schema:
- data = [[1, 3, 5, '2019-08-01'], [1, 3, 6, '2019-08-02'], [2, 7, 7, '2019-08-01'], [2, 7, 6, '2019-08-02'], [4, 7, 1, '2019-07-22'], [3, 4, 4, '2019-07-21'], [3, 4, 4, '2019-07-21']]
- Views = pd.DataFrame(data, columns=['article_id', 'author_id', 'viewer_id', 'view_date']).astype({'article_id':'Int64', 'author_id':'Int64', 'viewer_id':'Int64', 'view_date':'datetime64[ns]'})
Views 表:
+---------------+---------+ | Column Name | Type | +---------------+---------+ | article_id | int | | author_id | int | | viewer_id | int | | view_date | date | +---------------+---------+此表可能会存在重复行。(换句话说,在 SQL 中这个表没有主键)
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意,同一人的 author_id 和 viewer_id 是相同的。
请查询出所有浏览过自己文章的作者
结果按照 id 升序排列。
查询结果的格式如下所示:
示例 1:
输入:
- Views 表:
- +------------+-----------+-----------+------------+
- | article_id | author_id | viewer_id | view_date |
- +------------+-----------+-----------+------------+
- | 1 | 3 | 5 | 2019-08-01 |
- | 1 | 3 | 6 | 2019-08-02 |
- | 2 | 7 | 7 | 2019-08-01 |
- | 2 | 7 | 6 | 2019-08-02 |
- | 4 | 7 | 1 | 2019-07-22 |
- | 3 | 4 | 4 | 2019-07-21 |
- | 3 | 4 | 4 | 2019-07-21 |
- +------------+-----------+-----------+------------+
输出:
- +------+
- | id |
- +------+
- | 4 |
- | 7 |
- +------+
思路
筛选
先筛选出符合题目要求的行数据 views.loc[views['author_id']==views['viewer_id'],['author_id']]
去重
drop_duplicates() 函数
列重命名
rename(columns={'author_id':'id'})
排序
默认升序 sort_values(['id'])
pandas 代码
- import pandas as pd
-
- def article_views(views: pd.DataFrame) -> pd.DataFrame:
- return (views.loc[views['author_id']==views['viewer_id'],['author_id']]
- .drop_duplicates()
- .rename(columns={'author_id':'id'})
- .sort_values(['id']))
-