Pandas

Pandas
Series

Intro to data structures — pandas 1.4.4 documentationhttps://pandas.pydata.org/docs/user_guide/dsintro.html#series

a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

Creating a series
```
s = pd.Series(data, index=index)
```
- Data:
  - a Python dict
  - an ndarray (numpy)
  - a scalar value (like 5)
- Index
  - a list of axis labels
Access elements
```
s = pd.Series(data, index=index)
s[0] # access numbered series
s["charlie"] # access dict-like series
```
Extracting indices and values
- series.index
  - return a list of indices
- series.value
  - return a list of values only
Operation on every value
```
s = pd.Series(data, index=index)
s / 100 # divide by 100 for each and every value in s
```
Dataframe

Intro to data structures — pandas 1.4.4 documentationhttps://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe2-dimensional labeled data structure with columns of potentially different types.

Just like a SQL table

Creating a dataframe
```
df = pd.DataFrame(data, index=index, columns=columns)
```
- Data:
  - Dict of 1D ndarrays, lists, dicts, or Series
  - 2-D numpy.ndarray
  - Structured or record ndarray
  - A Series
  - Another DataFrame
- Index
  - a list of row labels
  - Note: index is encoded as part of a row (like a column)
- Columns
  - a list of column labels
Projection (access column)
- Return as a series
  - only one column at a time
  - Good for list-like operation such as operation
```
df = pd.DataFrame(data, index=index, columns=columns)
df['col_name'] # OR
df.'col_name'
```
- Return as an editable dataframe
  - multiple columns as a subset
  - Good for further dataframe operation
```
df = pd.DataFrame(data, index=index, columns=columns)
df[['col_name_1', 'col_name_2', ...]]
```
Logical operations on column
```
df = pd.DataFrame(data, index=index, columns=columns)
df.'col_name' == value # do this operation on selected column for all rows 
```
Extract indices and values
- df.index
  - return a list of indices
- df.value
  - return a list of values only
Extract certain rows
```
df = pd.DataFrame(data, index=index, columns=columns)
df[df.capital == 'London'] # return a new version of df, in which rows satisfies this condition
```
*df.head() // get top 5 rows

Add column
```
df = pd.DataFrame(data, index=index, columns=columns)
df['new_col'] = list # create new column in the dataframe and apply list values to each row
df['new_col'] = df.'col_1' + df.'col_2' # create new column in the dataframe and populate fields by combining values from individual columns
```
Apply a function to a subset of a dataframe
- use df.apply(), which is faster than just using for loop
  - Computes values in parallel whereas loops compute in sequence
  - series.map works only on Series but has the same functionality as apply.
  - df.applymap works only on dfs and applies to every element excluding the target column.
```
df = pd.DataFrame(data, index=index, columns=columns)
df.capital.apply(lambda x: x.upper()) # capitalize each capital col value in dataframe
df['new_col'] = df.apply(lambda x: f(x['col_1']), axis = 1) # axis = 1 stands for along the column
```
Merge()
```
population = pd.DataFrame(data, index=index, columns=columns)
countries = pd.DataFrame(data, index=index, columns=columns)
 
pd.merge(left=population, right=countries, left_on="col_1", right_on="col_2") # default, inner merge
pd.merge(left=population, right=countries, left_on="col_1", right_on="col_2", how="left") # left merge
```
Groupby()
- To put the dataframe into groups based on column value
- Then apply aggregation function on the grouped dataframe
```
population = pd.DataFrame(data, index=index, columns=columns)
 
population.groupby('continent')[['area']].mean() # goupby continients and calculate means of each column, return dataframe with only area column
 
population.groupby('continent').mean()[['area']] # alternative to the above function
 
population.groupby('continent').mean()[['area']].reset_index(drop=False) # reformatting the index column of the grouped dataframe
```
** Cannot be converted to dataframe before aggregating

pop_countries.groupby('continent').to_frame() # won't work

Replaces NA/NaN values
```
missing_df.fillna(0) # replace all NaN with 0
```
Drop the rows or columns with NA/NaN values

dopna(axis=axis, how=how)
- axis argument determines if rows or columns which contain missing values are removed.
- axis = 0: Drop rows which contain missing values.
- axis = 1: Drop columns which contain missing value.
- how argument determines if row or column is removed from DataFrame, when we have at least one NA or all NA.
- how = any: If any NA values are present, drop that row or column. (default)
- how = all : If all values are NA, drop that row or column.
```
missing_df.dropna(axis=0) # drop all the rows that have missing values
missing_df.dropna(axis=1) # drop all the cols that have missing values
```
相关阅读:
蓝桥杯1043
openwrt 断网重启检测脚本
 Linux环境变量
 Linux安装nodejs问题
 Vue 2使用element ui 表格不显示
 创建型模式-建造者模式
 3.4bochs的调试方法
 机器学习-特征选择：如何使用交叉验证精准选择最优特征？
招投标系统软件源码,招投标全流程在线化管理
 1162 Postfix Expression
原文地址：https://blog.csdn.net/DOITJT/article/details/126801841

Series

Dataframe