关键词“地平线图”
首先,从TSV文件中读取数据,并进行数据清洗和处理。
- rm(list=ls())
- pacman::p_load(tidyverse,ggalt,ggHoriPlot,hrbrthemes)
-
- sports <- read_tsv("activity.tsv")
- sports <- sports %>%
- group_by(activity) %>%
- filter(max(p) > 3e-04,
- !grepl('n\\.e\\.c', activity)) %>%
- arrange(time) %>%
- mutate(p_peak = p / max(p),
- p_smooth = (lag(p_peak) + p_peak + lead(p_peak)) / 3,
- p_smooth = coalesce(p_smooth, p_peak)) %>%
- ungroup() %>%
- do({
- rbind(.,
- filter(., time == 0) %>%
- mutate(time = 24*60))
- }) %>%
- mutate(time = ifelse(time < 3 * 60, time + 24 * 60, time)) %>%
- mutate(activity = reorder(activity, p_peak, FUN=which.max)) %>%
- arrange(activity) %>%
- mutate(activity.f = reorder(as.character(activity), desc(activity)))
-
- sports <- mutate(sports, time2 = time/60)
根据处理后的数据生成初步图表,展示不同体育活动在一天中的分布情况。
- ggplot(sports, aes(time2, p_smooth)) +
- geom_horizon(bandwidth=0.1) +
- facet_grid(activity.f~.) +
- scale_x_continuous(expand=c(0,0), breaks=seq(from = 3, to = 27, by = 3), labels = function(x) {sprintf("%02d:00", as.integer(x %% 24))}) +
- viridis::scale_fill_viridis(name = "Activity relative to peak", discrete=TRUE,
- labels=scales::percent(seq(0, 1, 0.1)+0.1))
进一步美化图表,使其更具吸引力和可读性。
- ggplot(sports, aes(time2, p_smooth)) +
- geom_horizon(bandwidth=0.1) +
- facet_grid(activity.f~.) +
- scale_x_continuous(expand=c(0,0), breaks=seq(from = 3, to = 27, by = 3), labels = function(x) {sprintf("%02d:00", as.integer(x %% 24))}) +
- viridis::scale_fill_viridis(name = "Activity relative to peak", discrete=TRUE,
- labels=scales::percent(seq(0, 1, 0.1)+0.1)) +
- theme_ipsum_rc(grid="") +
- theme(panel.spacing.y=unit(-0.05, "lines"),
- strip.text.y = element_text(hjust=0, angle=360),
- axis.text.y=element_blank(),
- axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
-
- ggsave('pic.png', bg = 'white', width = 8, height = 6)
这张图表展示了不同体育活动在一天中的高峰时段。颜色深浅代表了活动强度的相对峰值。通过这张图表,我们可以清晰地看到各项活动在一天中不同时间段的分布情况。