• Item-Based Recommendations with Hadoop


    Mahout在MapReduce上实现了Item-Based Collaborative Filtering,这里我尝试运行一下。

    1. 安装Hadoop

    2. 从下载Mahout并解压

    3. 准备数据
      下载1 Million MovieLens Dataset,解压得到ratings.dat,用

      sed ‘s/:😦[0-9]{1,}):😦[0-9]{1})::[0-9]{1,}$/,\1,\2/’ ratings.dat
      处理成需要的格式。

    4. 运行
      mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file -o /path/to/desired/output -n 25
      参数:

    MAHOUT-JOB: /home/laxe/apple/mahout/mahout-examples-0.11.0-job.jar
    Job-Specific Options:
    --input (-i) input Path to job input directory.
    --output (-o) output The directory pathname for output.
    --numRecommendations (-n) numRecommendations Number of recommendations per user.
    --usersFile usersFile File of users to recommend for.
    --itemsFile itemsFile File of items to recommend for.
    --filterFile (-f) filterFile File containing comma-separated userID,itemID pairs. Used to exclude the item from the recommendations for that user(optional).
    --userItemFile (-uif) userItemFile File containing comma-separated userID,itemID pairs(optional). Used to include only these items into recommendations. Cannot be used together with usersFile or itemsFile.
    --booleanData (-b) booleanData Treat input as without prefvalues.
    --maxPrefsPerUser (-mxp) maxPrefsPerUser Maximum number of preferences considered per user in final recommendation phase.
    --minPrefsPerUser (-mp) minPrefsPerUser Ignore users with less preferences than this in the similarity computation (default: 1).
    --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem Maximum number of similarities considered per item.
    --maxPrefsInItemSimilarity (-mpiis) maxPrefsInItemSimilarity Max number of preferences to consider per user or item in the item similarity computation phase, users or items with more preferences will be sampled down(default: 500).
    --similarityClassname (-s) similarityClassname Name of distributed similarity measures class to instantiate,
    alternatively use one of the predefined similarities([SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD, SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE, SIMILARITY_PEARSON_CORRELATION, SIMILARITY_EUCLIDEAN_DISTANCE])
    --threshold (-tr) threshold Discard item pairs with a similarity value below this.
    --outputPathForSimilarityMatrix (-opfsm) outputPathForSimilarityMatrix Write the items imilarity matrix to this path(optional).
    --randomSeed randomSeed Use this seed for sampling.
    --sequencefileOutput Write the output into a Sequence File instead of a text file.
    --help (-h) Print out help.
    --tempDir tempDir Intermediate output directory.
    --startPhase startPhase First phase to run.
    --endPhase endPhase Last phase to run specify HDFS directories while running on hadoop; else specify local file system directories.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    参考
    Introduction to Item-Based Recommendations with Hadoop
    mahout分布式:Item-based推荐

  • 相关阅读:
    PHP日志库 - Monolog 知识整理
    PHP的curl会话
    十大经典排序算法
    洛谷 P4752 Divided Prime
    面试总结:H5 移动 web 开发(H5 新特性、CSS3 新特性、如何实现双飞翼(圣杯)布局)
    有哪些相见恨晚的stm32学习的方法?
    使用HTML5画布(Canvas)模拟图层(Layers)效果
    容器基础--基本概念入门
    [从零学习汇编语言] - 内中断
    车载通信架构 —— 新车载总线类型下(以太网)的通信架构
  • 原文地址:https://blog.csdn.net/liuyuan185442111/article/details/49800369