1.基础知识了解
Sklearn库请参考:
非常详细的sklearn介绍_机器学习算法那些事的博客-CSDN博客_sklearn
2.KNN实战应用
KNN算法求病人癌症检测的正确率
- import csv
- import random
-
- # 读取数据
- with open(".\Prostate_Cancer.csv","r") as f:
- render = csv.DictReader(f)
- datas = [row for row in render]
-
- # 分组,打乱数据
- random.shuffle(datas)
- n = len(datas)//3
-
- test_data = datas[0:n]
- train_data = datas[n:]
- # print (train_data[0])
- # print (train_data[0]["id"])
-
-
- # 计算对应的距离
- def distance(x, y):
- res = 0
- for k in ("radius","texture","perimeter","area","smoothness","compactness","symmetry","fractal_dimension"):
- res += (float(x[k]) - float(y[k]))**2
- return res ** 0.5
-
- # K=6
- def knn(data,K):
- # 1. 计算距离
- res = [
- {"result":train["diagnosis_result"],"distance":distance(data,train)}
- for train in train_data
- ]
- # 2. 排序
- sorted(res,key=lambda x:x["distance"])
- # print(res)
- # 3. 取前K个
- res2 = res[0:K]
- # 4. 加权平均
- result = {"B":0,"M":0}
- # 4.1 总距离
- sum = 0
- for r in res2:
- sum += r["distance"]
- # 4.2 计算权重
- for r in res2 :
- result[r['result']] += 1-r["distance"]/sum
-
- # 4.3 得出结果
- if result['B'] > result['M']:
- return "B"
- else:
- return "M"
-
-
- # print(distance(train_data[0],train_data[1]))
- # 预测结果和真实结果对比,计算准确率
- for k in range(1,11):
- correct = 0
- for test in test_data:
- result = test["diagnosis_result"]
- result2 = knn(test,k)
- if result == result2:
- correct += 1
- print("k="+str(k)+"时,准确率{:.2f}%".format(100*correct/len(test_data)))
运行结果:
由此可见,当K=6时准确率最高
以上图片资料来着梅科尔工作室,仅供学习,请勿随意转载