• Item-Based Recommendations with Hadoop


    Mahout在MapReduce上实现了Item-Based Collaborative Filtering,这里我尝试运行一下。

    1. 安装Hadoop

    2. 从下载Mahout并解压

    3. 准备数据
      下载1 Million MovieLens Dataset,解压得到ratings.dat,用

      sed ‘s/:😦[0-9]{1,}):😦[0-9]{1})::[0-9]{1,}$/,\1,\2/’ ratings.dat
      处理成需要的格式。

    4. 运行
      mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file -o /path/to/desired/output -n 25
      参数:

    MAHOUT-JOB: /home/laxe/apple/mahout/mahout-examples-0.11.0-job.jar
    Job-Specific Options:
    --input (-i) input Path to job input directory.
    --output (-o) output The directory pathname for output.
    --numRecommendations (-n) numRecommendations Number of recommendations per user.
    --usersFile usersFile File of users to recommend for.
    --itemsFile itemsFile File of items to recommend for.
    --filterFile (-f) filterFile File containing comma-separated userID,itemID pairs. Used to exclude the item from the recommendations for that user(optional).
    --userItemFile (-uif) userItemFile File containing comma-separated userID,itemID pairs(optional). Used to include only these items into recommendations. Cannot be used together with usersFile or itemsFile.
    --booleanData (-b) booleanData Treat input as without prefvalues.
    --maxPrefsPerUser (-mxp) maxPrefsPerUser Maximum number of preferences considered per user in final recommendation phase.
    --minPrefsPerUser (-mp) minPrefsPerUser Ignore users with less preferences than this in the similarity computation (default: 1).
    --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem Maximum number of similarities considered per item.
    --maxPrefsInItemSimilarity (-mpiis) maxPrefsInItemSimilarity Max number of preferences to consider per user or item in the item similarity computation phase, users or items with more preferences will be sampled down(default: 500).
    --similarityClassname (-s) similarityClassname Name of distributed similarity measures class to instantiate,
    alternatively use one of the predefined similarities([SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD, SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE, SIMILARITY_PEARSON_CORRELATION, SIMILARITY_EUCLIDEAN_DISTANCE])
    --threshold (-tr) threshold Discard item pairs with a similarity value below this.
    --outputPathForSimilarityMatrix (-opfsm) outputPathForSimilarityMatrix Write the items imilarity matrix to this path(optional).
    --randomSeed randomSeed Use this seed for sampling.
    --sequencefileOutput Write the output into a Sequence File instead of a text file.
    --help (-h) Print out help.
    --tempDir tempDir Intermediate output directory.
    --startPhase startPhase First phase to run.
    --endPhase endPhase Last phase to run specify HDFS directories while running on hadoop; else specify local file system directories.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    参考
    Introduction to Item-Based Recommendations with Hadoop
    mahout分布式:Item-based推荐

  • 相关阅读:
    工作后那些事儿
    嵌入式学习记录5.20(TCP并发服务器)
    Mybatis中Resources和ClassLoaderWrapper
    设计模式-Bridge模式(桥模式)
    物联网ARM开发-4协议-单总线应用温湿度传感器
    【OpenGL】纹理(Texture)的使用
    CPU检测工具:CPU-Z快捷键大全(绿色)
    07-类文件结构
    如何在 C# 程序中注入恶意 DLL?
    U-boot(二):主Makefile
  • 原文地址:https://blog.csdn.net/liuyuan185442111/article/details/49800369