KNN分类算法,又叫K近邻算法,它概念极其简单,但效果又很优秀。
如觉得有帮助请点赞关注收藏啦~~~
KNN算法的核心是,如果一个样本在特征空间中的K个最相似,即特征空间中最邻近的样本中的大多数属于某一个类别,则该样本也属于这个类别
1:K值
K值也就是选择几个相邻的作为测量
2:距离的度量
距离决定了哪些是邻居哪些不是,度量距离有很多种方法,常用的是欧式距离
1:查看数据 使用鸢尾花数据集 由sklearn模块导入
- from sklearn.datasets import load_iris
- from sklearn.model_selection import train_test_split
- from sklearn.neighbors import KNeighborsClassifier
- from sklearn import datasets
- import matplotlib.pyplot as plt
- import matplotlib; matplotlib.use('TkAgg')
- import pandas as pd
- print("数据集的keys",iris_dataset.keys())
- print("特征名",iris_dataset['feature_names'])
- print("数据类型",type(iris_dataset['data']))
- print("数据维度",iris_dataset['data'].shape)
- print("标记名",iris_dataset['target_names'])
2:使用散点矩阵查看数据特征关系
代码如下 绘图并且划分数据集与训练集
-
- from sklearn.datasets import load_iris
- from sklearn.model_selection import train_test_split
- from sklearn.neighbors import KNeighborsClassifier
- from sklearn import datasets
- import matplotlib.pyplot as plt
- import matplotlib; matplotlib.use('TkAgg')
- import pandas as pd
- iris_dataset=load_iris()
- train_x,test_x,train_y,test_y=train_test_split(iris_dataset['data'],iris_dataset['target'],random_state=2)
- print('trainx\n',train_x)
- print('trainy\n',train_y)
- print('testx\n',test_x)
- print('testy\n',test_y)
- print(test_x.shape)
- print(test_x.shape)
- irisdataframe=pd.DataFrame(train_x,columns=iris_dataset.feature_names)
- pd.plotting.scatter_matrix(irisdataframe,c=train_y,figsize=(15,15),marker='o',hist_kwds={'bins':20},s=60,alpha=0.8)
- plt.show()
3:建立KNN模型进行预测
python中实现KNN方法使用的是KNeighborsClassifier类
核心操作分三步
3.1:创建KNeighborsClassifier对象 并进行初始化
3.2:调用fit()方法 对数据集进行训练
fit(x,y)以x为训练集 y为测试及对模型进行训练
3.3:调用predict函数进行预测
源代码如下
- from sklearn.datasets import load_iris
- from sklearn.model_selection import train_test_split
- from sklearn.neighbors import KNeighborsClassifier
- from sklearn import datasets
- import matplotlib.pyplot as plt
- import matplotlib; matplotlib.use('TkAgg')
- import pandas as pd
- iris_dataset=load_iris()
- iris=datasets.load_iris()
- print("数据集结构",iris.data.shape)
- iris_x=iris.data
- iris_y=iris.target
- iris_train_x,iris_test_x,iris_train_y,iris_test_y=train_test_split(iris_x,iris_y,test_size=0.2,random_state=0)
- knn=KNeighborsClassifier()
- knn.fit(iris_train_x,iris_train_y)
- predictresult=knn.predict(iris_test_x)
- print("测试集大小",iris_test_x.shape)
- print("真实结果",iris_test_y)
- print("预测结果",predictresult)
- print("预测精确率",knn.score(iris_test_x,iris_test_y))