• [R] Graphing the relation between two variables


    The data:Student's BMI survey

    What we want to visualize

    1. Relation between BMI and GPA, BMI and homework, BMI and nbr of step per day(two continuous variables)

    2. Relation between Gender and BMI, Department and BMI (one categorical/one continuos)

    3. Relation between video game and place of birth, and gender (two categorical variables)

    BMI and GPA

    On our data set Students BMI we want to analyze the correlation between two variables GPA and BMI(necessarily as strong correlation since weigth is use to calculate BMI)

    The most common way to visualize such correlation is to use a scatter plotIn R language it means that we will use geom_point() in the geom part

    ggplot(Student_BMI_2, aes(x=BMI,y=GPA))+geom_point(size=3)
    ggplot(Student_BMI_2, aes(x=BMI,y=GPA,color=Gender,shape=Dpt))+geom_point(size=2)

    ( color is decides based on gender, and the shape is decided based on Department)

    We suspect that physical activity might have some influence on students BMI

    We suspect that student commitment to their studies influence their BMIExercise:

    1. Visualize the correlation between BMI and homework_day

    2. Visualize the correlation between BMI and walk_meter_day

    3. Redo the graphs while adding information on students department (Dpt)

    1. #Practice
    2. #Visualize relation between BMI and homework_day
    3. ggplot(Student_BMI_2,aes(y=BMI,x=homework_day))+geom_point(size=2,color="blue")
    4. #visualize the relation between GPA and walk_meter_day
    5. ggplot(Student_BMI_2,aes(y=BMI,x=walk_meter_day))+geom_point(size=2,color="blue")
    6. #use color to show the difference by Department
    7. ggplot(Student_BMI_2,aes(y=BMI,x=walk_meter_day,color=Dpt))+geom_point(size=2)
    8. ggplot(Student_BMI_2,aes(y=BMI,x=homework_day,color=Dpt))+geom_point(size=2)

    #adding a curve showing the tendency (loess curve or linear model)

    ggplot(Student_BMI_2, aes(x=BMI,y=weight))+geom_point()+geom_smooth(method="lm")

    ggplot(Student_BMI_2, aes(x=BMI,y=weight))+geom_point()+geom_smooth(method="loess")

    1. #Add a loess curve
    2. ggplot(Student_BMI_2,aes(x=BMI,y=homework_day))+geom_point(size=2)+geom_smooth(method="loess")
    3. ggplot(Student_BMI_2,aes(x=BMI,y=walk_meter_day))+geom_point(size=2,color="blue")+geom_smooth(method="loess")

    Continuos and a categorical one

    In the dataset BMI_students, BMI is continuous while Gender, Department are discrete (-categorical =factor)

    1. #doing a bar boxplot: visualize the relation between BMI, department and gender
    2. ggplot(Student_BMI_2,aes(x=Gender,y=BMI,color=Gender))+geom_boxplot()
    3. #Practice
    4. ggplot(Student_BMI_2,aes(x=Dpt,y=BMI,color=Dpt))+geom_boxplot()+labs(title="BMI of CUHK SZ Students in different department",x="Department")
    5. ggplot(Student_BMI_2,aes(x=Dpt,y=BMI,color=Gender))+geom_boxplot()+labs(title="Boxplot of Students' BMI by Department and Gender", x="Department")

    Two categorical ones

    1. #Graphing two categorical variables
    2. # bar chart of students origin per department
    3. Student_BMI_2$`place of birth`<-as.factor(Student_BMI_2$`place of birth`)
    4. ggplot(Student_BMI_2,aes(fill=`place of birth`,x=Dpt))+geom_bar(position="dodge")
    5. #Do even better
    6. Student_BMI_2$`place of birth`<-factor(Student_BMI_2$`place of birth`,levels=c("Guangdong","Other_province","International"))
    7. levels(Student_BMI_2$`place of birth`)
    8. ggplot(Student_BMI_2,aes(fill=`place of birth`,x=Dpt))+geom_bar(position="dodge")

    In the context of ggplot2 in R, the position = "dodge" argument is used to adjust the position of elements, such as bars in a bar plot, so that they are placed side by side rather than being stacked on top of each other.

    1. library(ggplot2)
    2. # Sample data
    3. data <- data.frame(
    4. Category = c("A", "A", "B", "B"),
    5. Subcategory = c("X", "Y", "X", "Y"),
    6. Count = c(10, 20, 15, 25)
    7. )
    8. # Bar plot with position = "dodge"
    9. ggplot(data, aes(x = Category, y = Count, fill = Subcategory)) +
    10. geom_bar(stat = "identity", position = "dodge") +
    11. ggtitle("Dodged Bar Plot")

    If = fill;

    1. # Sample data
    2. data <- data.frame(
    3. Category = c("A", "A", "B", "B"),
    4. Subcategory = c("X", "Y", "X", "Y"),
    5. Count = c(10, 20, 15, 25)
    6. )
    7. # Bar plot with position = "dodge"
    8. ggplot(data, aes(x = Category, y = Count, fill = Subcategory)) +
    9. geom_bar(stat = "identity", position = "fill") +
    10. ggtitle("Dodged Bar Plot")

     

    if = "stack" 

    1. library(ggplot2)
    2. # Sample data
    3. data <- data.frame(
    4. Category = c("A", "A", "B", "B"),
    5. Subcategory = c("X", "Y", "X", "Y"),
    6. Count = c(10, 20, 15, 25)
    7. )
    8. # Bar plot with position = "dodge"
    9. ggplot(data, aes(x = Category, y = Count, fill = Subcategory)) +
    10. geom_bar(stat = "identity", position = "stack") +
    11. ggtitle("Dodged Bar Plot")

    if = jitter; to avoid overplotting

     

  • 相关阅读:
    JWT(2):JWT入门使用
    揭秘弹幕游戏制作
    Spring(十四)- Spring注解原理解析
    LLM之Colossal-LLaMA-2:Colossal-LLaMA-2的简介、安装、使用方法之详细攻略
    基于simulink的风力机房温度控制系统仿真
    9月份红帽认证考试又 PASS 19位同学
    后台管理----user内容区获取用户数据
    docker中搭建python
    【面试总结大纲】
    Vue组件详解
  • 原文地址:https://blog.csdn.net/m0_74331272/article/details/136529858