实验不同k
for i in range(3,7):
rele_table,pred_table,pred_rand = {},{},{}
rele_table['kol'] = 0.1
rele_table['media'] = 0.2
rele_table['other'] = 0.3
rele_table['guandian'] = 0.4
rele_table['taolun'] = 0.5
rele_table['f012'] = 0.6
pred_table['kol'] = 0.34828833
pred_table['media'] = 0.31925637
pred_table['other'] = 0.30245525
pred_table['guandian'] = 0.13245525
pred_table['taolun'] = 0.83245525
pred_table['f012'] = 0.37245525
mids = ['kol',"media","other","guandian","taolun","f012"]
pred_rand['kol'] = random.random()
pred_rand['media'] = random.random()
pred_rand['other'] = random.random()
pred_rand['guandian'] = random.random()
pred_rand['taolun'] = random.random()
pred_rand['f012'] = random.random()
value = cal_list_ndcg(mids, rele_table, pred_table, i)
ndcg_rand = cal_list_ndcg(mids, rele_table, pred_rand, i)
print(i)
print(value,ndcg_rand)
print(value / i,ndcg_rand / i)
def cal_list_ndcg(mids, rele_table, pred_table, n):
mids = sorted(mids, reverse=True, key=lambda x: rele_table[x])
idcg = 0
for i, m in enumerate(mids):
if i >= n:
break
idcg += ((2**rele_table[m] - 1) / (math.log2(i+2)))
mids = sorted(mids, reverse=True, key=lambda x: pred_table[x])
# print(" ".join([str(rele_table[mid]) for mid in mids]))
dcg = 0
for i, m in enumerate(mids):
if i >= n:
break
dcg += ((2**rele_table[m] - 1) / (math.log2(i+2)))
return dcg / idcg
i越大 ,dcg越大
dcg += ((2**rele_table[m] - 1) / (math.log2(i+2)))
3
1.0 0.4281434559617804
0.3333333333333333 0.14271448532059347
4
1.0 0.768846962242674
0.25 0.1922117405606685
5
1.0 0.8405557275318762
0.2 0.16811114550637524
6
1.0 0.8676354240144575
0.16666666666666666 0.14460590400240958
在排序一致的前提下
可见k越大,ndcg越小
原因应该是
随着每次考虑的item 越多,分母cout增加,但是分子ndcg不变,导致整体下降。
那么为什么分子ndcg不变,也就是 dcg / idcg 不变呢
原因是他们排序完全一致 dcg==idcg
理想的idcg 应该随着k 的增大,增量不断下降,因为重要的都被排在前面
实际如果dcg越大,整体比例会越大,在k一定的情况下
3
0.3001289811601986 0.5473110827374296
0.1000429937200662 0.18243702757914318
4
0.40407676211684973 0.7345030054951796
0.10101919052921243 0.1836257513737949
5
0.5293144802949171 0.5967650853888128
0.10586289605898343 0.11935301707776255
6
0.68132617129277 0.888722526516354
0.11355436188212832 0.148120421086059
可以看到,k越大,ndcg越大,分析可能是k大的时候容错比较好
所以如果在排序好的情况下,k越大,ndcg越小(因为分子永恒为1,分母变大count++)。
但是如果在排序差的情况下,k越大,ndcg越大(idcg 和不会变,但如果dcg加的多了,整体值会增加)。