• 基于boost库的站内搜索引擎


    项目整体展示

    在这里插入图片描述

    一、项目的背景

    • 现在的大型互联网公司:百度、搜狗、360等等所做的是全网搜索,所做的大型搜索引擎,功能很全面,门槛比较高,我们自己想要实现几乎是不可能的

    在这里插入图片描述

    • 我们所实现的:站内搜索引擎,例如cplusplus.com这样的网站,在里面搜索string。站内搜索的特点就是,搜索的内容更加垂直,内容的范围也比较有限,数据量比较小。
      在这里插入图片描述
    • boost官网是没有站内搜索功能的,所以需要我们自己做一个!

    二、搜索引擎的宏观原理

    介绍一下数据在搜索引擎上是怎样流动的
    在这里插入图片描述

    三、此项目所需要的技术栈和环境

    • 技术栈:C/C++、STL、boost准标准库、jsoncpp、cppjieba、cpp-httplib
    • 环境:Centos 7、g++、makefile、vim、VS2019、vscode
    • 其他:html5、css、js、jQuery、Ajax

    四、正排索引和倒排索引原理

    目标文档:

    文档1:雷军买了四斤小米
    文档2:雷军发布了小米手机

    目标文档进行分词
    目的:方便建立倒排索引和查找

    雷军买了四斤小米:雷军/买/四斤/小米

    停止词:了、的、吗、啊,这样的词,我们在分词的时候一般不考虑进去。

    1、正排索引
    从文档ID找到文档的内容(关键字)

    文档ID文档内容
    1雷军买了四斤小米
    2雷军发布了小米手机

    2、倒排索引
    根据文档内容,分词、整理不重复的各个关键字,找到对应联系的文档ID的方法

    关键字(具有唯一性)文档ID,weight(权重排序)
    雷军文档1、文档2
    文档1
    四斤文档1
    小米文档1、文档2
    发布文档2
    小米手机文档2

    模拟一次查找的过程:

    用户输入# 小米
    小米→倒排索引中查找→提取出文档ID(1,2)→根据正排索引→找到文档的内容→title+conent (desc) +url文档结果进行摘要→构建响应结果

    五、编写数据去标签与数据清理模块 Parser

    数据源:boost官网,目前项目只需要boost_1_78_0/doc/html下的网页文件,用它来建立索引

    https://www.boost.org/
    
    • 1

    将此文件下载下来过后,拖入到Linux云服务器中(中间可能会因为服务器配置太低而自动挂掉,多次重复即可),然后解压该文件,将该文件中的html文件拷贝到我们的data目录下,作为数据源

    [sjj@VM-20-15-centos boost_searcher]$ cp -rf boost_1_78_0/doc/html/* data/input/
    
    • 1
    [sjj@VM-20-15-centos input]$ ls -Rl | grep -E '*.html' | wc -l
    8141
    
    • 1
    • 2

    5.1 去标签和数据清洗

    将inpute下的文件进行去标签和数据清洗

    [sjj@VM-20-15-centos boost_searcher]$ touch parser.cc
    [sjj@VM-20-15-centos boost_searcher]$ ll
    total 4
    drwxrwxr-x 3 sjj sjj 4096 Jul 25 22:36 data
    -rw-rw-r-- 1 sjj sjj    0 Jul 25 22:46 parser.cc
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    在这里插入图片描述
    这个标签是对我们的搜索是没有任何意义的,我们需要将其过滤出去,而且一般标签都是成对出现的

    [sjj@VM-20-15-centos data]$ mkdir raw_html
    [sjj@VM-20-15-centos data]$ ll
    total 20
    drwxrwxr-x 60 sjj sjj 16384 Jul 25 22:38 input
    drwxrwxr-x  2 sjj sjj  4096 Jul 25 22:53 raw_html
    [sjj@VM-20-15-centos input]$ ls -Rl | grep -E '*.html' | wc -l
    8141
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    input:存放原始数据
    raw_html:存放去标签后的干净数据

    最后目标:把每个文档都去标签,然后写入到同一个文件中,每个文档内容不需要换行,文档和文档之间用\3(经验规律)来划分,\3是属于控制字符,不能显式出来的,所以不会污染我们的文档内容。
    eg:XXXXXXX\3YYYYYY\3ZZZZZZ\3

    5.2 parser代码编写

    第一步:将所有的文件名全部提取出来放到vector files_list中
    递归式的把每个html文件名带路径,保存到files_list中,方便后期进行一个一个的文件进行读取

    第二步:按照files_list读取每个文件内容,并进行解析
    每个文件路径解析成为如下所示统一的格式:

    typedef struct DocInfo
    {
      std::string title;	//文章的标题
      std::string content;	//文章的摘要内容
      std::string url;		//此网页的url
    }DocInfo_t;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    第三步:把解析完毕的各个文件内容,写入到output中,按照\3作为每个文档的分隔符

    小小的规则:
    const &:输入型参数
    *(指针):输出型参数
    &(引用):输入输出型参数

    parser.cc主体代码

    #include
    #include
    #include
    
    const std::string src_path="data/raw_html";
    const std::string output="data/raw_html/raw.txt";
    
    typedef struct DocInfo{
      std::string title;
      std::string content;
      std::string url;
    }DocInfo_t;
    
    bool EnumList(const std::string &src_path,std::vector<std::string> *files_list);
    bool ParseHtml(const std::string &files_list,std::string *std::vector<DocInfo_t> results);
    bool SaveHtml(const std::vector<DocInfo_t> &results,const std::string *output);
    int main()
    {
      std::vector<std::string> files_list;
      if(!EnumList(src_path,&files_list))
      {
        std::cerr<<"enum file name error!"<<std::endl;
        return 1;
      }
    
      std::vector<DocInfo_t> results;
      if(!ParseHtml(files_list,&results))
      {
        std::cerr<<"parse html error!"<<std::endl;
        return 2;
      }
    
      if(!SaveHtml(results,&output))
      {
        std::cerr<<"save html error!"<<std::endl;
        return 3;
      }
    
      return 0;
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    三个函数的实现

    1、EnumList

    枚举所有网页文件写入到std::vector files_list
    由于C++和STL对于文件读写操作支持的不是特别的好,所以我们这里采用的是boost库中的filesystem这样的文件操作
    安装boost库:sudo yum install -y boost-devel
    这是一个boost开发库,里面包含了此项目所需的头文件

    bool EnumFile(const std::string &src_path,std::vector<std::string> *files_list)
    {
      namespace fs=boost::filesystem;
      fs::path root_path(src_path);
    
      // 判断路径是否存在,如果不存在或者是错误的,就没有必要再继续了
      if(!fs::exists(root_path))
      {
        std::cerr<<src_path<<"not exists!"<<std::endl;
        return false;
      }
      // 定义一个空的迭代器,用来判断递归结束
      fs::recursive_directory_iterator end;
      for(fs::recursive_directory_iterator iter(root_path);iter!=end;iter++)
      {
        // 判断是否是常规的文件
        if(!fs::is_regular_file(*iter)) continue;
    
        // 判断是否是以 html 结尾的文件
        if(iter->path().extension()!=".html") continue;
        //现在的路径就是一个合法的,以html结尾的我们所需要的路径
        //我们字符串的格式插入到files_list中
        //std::cout<<"debug :"<path().string()<
        files_list->push_back(iter->path().string());
      }
      return true;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    结果展示:
    在这里插入图片描述

    2、ParseHtml

    每个文件路径解析成为DocInfo_t类型的结构体

    bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results)
    {
      // file代表每一个文件的路径
      for (const std::string &file : files_list)
      {
        // 1 ReadFile
        std::string result;
        if (!ns_util::FileUtil::ReadFile(file, &result))
        {
          continue;
        }
    
        // 即将要填充的doc对象
        DocInfo_t doc;
        // 2 提取title
        if (!ParseTitle(result, &doc.title))
        {
          continue;
        }
        // 3 提取content
        if (!ParseContent(result, &doc.content))
        {
          continue;
        }
        // 4 提取url
        if (!ParseUrl(file, &doc.url))
        {
          continue;
        }
    
        // 到目前这行,一定是完成了解析的任务,将当前文档的相关内容保存到了doc结构体中
         
        // results->push_back(doc); // push_back本质会发生拷贝
        results->push_back(std::move(doc));
        // ShowDoc(doc);
        // break;
      }
      return true;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39

    这里又可以分为四个小步骤

    1️⃣读取文件,ReadFile()函数读取文件的内容

    再建立一个工具集,存放工具

    getline的返回值是一个对象的引用,while(bool),本质是因为对象中重载了强制类型转换

    #pragma once
    #include 
    #include 
    #include 
    namespace ns_util
    {
        class FileUtil
        {
        public:
            static bool ReadFile(const std::string &file_path, std::string *out)
            {
                std::ifstream in(file_path,std::ios::in);
                if(!in.is_open())
                {
                    std::cerr<<file_path<<"open error"<<std::endl;
                    return false;
                }
                std::string line;
                while(std::getline(in,line))
                {
                    *out+=line;
                }
                in.close();
                return true;
            }
        };
    
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    2️⃣解析指定的文件,提取title

    就是在文档中搜索>和,然后再提取这中间的,便是标题了
    在这里插入图片描述

    static bool ParseTitle(const std::string &file, std::string *title)
    {
      std::size_t begin=file.find(""</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">if</span><span class="token punctuation">(</span>begin<span class="token operator">==</span>std<span class="token double-colon punctuation">::</span>string<span class="token double-colon punctuation">::</span>npos<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      std<span class="token double-colon punctuation">::</span>size_t end<span class="token operator">=</span>file<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"");
      if(end==std::string::npos)
      {
        return false;
      }
      // 现在找到了有效的位置
      begin+=std::string(""</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">if</span><span class="token punctuation">(</span>begin<span class="token operator">></span>end<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token operator">*</span>title<span class="token operator">=</span>file<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>begin<span class="token punctuation">,</span>end<span class="token operator">-</span>begin<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li></ul></pre> 
    <h4><a id="threecontent_315"></a>3️⃣解析指定的文件,提取content</h4> 
    <p>本质就是去标签,只保留网页的内容部分<br> 在遍历的时候,只要遇到了右尖括号<code>></code>,就意味着当前标签处理完毕了<br> 只要遇到了新的左尖括号<code><</code>,意味着新的标签开始了</p> 
    <pre data-index="11" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token keyword">static</span> <span class="token keyword">bool</span> <span class="token function">ParseContent</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>file<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">*</span>content<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token comment">// 去标签,基于一个简单的状态机</span>
      <span class="token keyword">enum</span> <span class="token class-name">status</span><span class="token punctuation">{<!-- --></span>
        LABLE<span class="token punctuation">,</span>
        CONTENT
      <span class="token punctuation">}</span><span class="token punctuation">;</span>
      <span class="token keyword">enum</span> <span class="token class-name">status</span> s <span class="token operator">=</span> LABLE<span class="token punctuation">;</span>
      <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">char</span> c <span class="token operator">:</span> file<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
        <span class="token keyword">switch</span> <span class="token punctuation">(</span>c<span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">case</span> LABLE<span class="token operator">:</span>
          <span class="token keyword">if</span> <span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">'>'</span><span class="token punctuation">)</span>
            s <span class="token operator">=</span> CONTENT<span class="token punctuation">;</span>
          <span class="token keyword">break</span><span class="token punctuation">;</span>
    
        <span class="token keyword">case</span> CONTENT<span class="token operator">:</span>
          <span class="token keyword">if</span> <span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">'<'</span><span class="token punctuation">)</span>
            s <span class="token operator">=</span> LABLE<span class="token punctuation">;</span>
          <span class="token keyword">else</span><span class="token punctuation">{<!-- --></span>
            <span class="token comment">// 我们不想保留原始文档中的\n</span>
            <span class="token keyword">if</span><span class="token punctuation">(</span>c<span class="token operator">==</span><span class="token char">'\n'</span><span class="token punctuation">)</span> c<span class="token operator">=</span><span class="token char">' '</span><span class="token punctuation">;</span>
            content<span class="token operator">-></span><span class="token function">push_back</span><span class="token punctuation">(</span>c<span class="token punctuation">)</span><span class="token punctuation">;</span>
          <span class="token punctuation">}</span>
          <span class="token keyword">break</span><span class="token punctuation">;</span>
        <span class="token keyword">default</span><span class="token operator">:</span>
          <span class="token keyword">break</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
      <span class="token punctuation">}</span>
      <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li></ul></pre> 
    <h4><a id="foururl_354"></a>4️⃣解析指定的文件路径,构建url</h4> 
    <p>boost库的官方文档和我们做项目下载下来的文档,是有路径的对应关系的</p> 
    <blockquote> 
     <p><font size="3">官网URL样例: <code>https://www.boost.org/doc/libs/1_78_0/doc/html</code>/accumulators.html<br> 我们下载下来的URL样例:<br> boost_1_78_0/doc/html/accumulators.html<br> 我们拷贝到我们项目中的样例:<br> data/input/<code>accumulators.html </code><br> 我们把下载下来的boost库中doc/html/* 拷贝到 data/input/<br> url_head = “https://www.boost.org/doc/libs/1_78_0/doc/html”;<br> url_tail = data/input(删除) /accumulators.html -> url_tail = /accumulators.html<br> 最后拼接相当于形成了一个官网链接:<br> url = url_head + url_tail</font></p> 
    </blockquote> 
    <pre data-index="12" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token keyword">static</span> <span class="token keyword">bool</span> <span class="token function">ParseUrl</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>file_path<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">*</span>url<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token comment">// 官网中的URL</span>
      std<span class="token double-colon punctuation">::</span>string url_head <span class="token operator">=</span> <span class="token string">"https://www.boost.org/doc/libs/1_78_0/doc/html"</span><span class="token punctuation">;</span>
      <span class="token comment">// 忽略src_path,一直截取到文件尾</span>
      std<span class="token double-colon punctuation">::</span>string url_tail <span class="token operator">=</span> file_path<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>src_path<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token operator">*</span>url <span class="token operator">=</span> url_head <span class="token operator">+</span> url_tail<span class="token punctuation">;</span>
      <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li></ul></pre> 
    <h3><a name="t12"></a><a id="3SaveHtml_381"></a>3、<strong>SaveHtml</strong></h3> 
    <p>把解析完毕的各个文件内容,写入到output(output=“data/raw_html/raw.txt”)中,按照<code>\3</code>作为每个文档的分隔符</p> 
    <blockquote> 
     <p><font size="3"> vesion1: eg:XXXXXXX\3YYYYYY\3ZZZZZZ\3<br> 现在简化一下: 采用下面的方案:<br> version2: 写入文件中,一定要考虑下一次在读取的时候,也要方便操作!<br> 类似:title \3content \3 url <code>\n</code> title \3 content \3 url <code>\n</code> title \3 content \3 url <code>\n</code> …<br> 方便我们<code>getline(ifsream, line)</code>,直接获取文档的全部内容:title\3content\3url</font></p> 
    </blockquote> 
    <pre data-index="13" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token keyword">bool</span> <span class="token function">SaveHtml</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo_t<span class="token operator">></span> <span class="token operator">&</span>results<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>output<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">SEP</span> <span class="token char">'\3'</span></span>
      <span class="token comment">// 按照二进制方式写入</span>
      std<span class="token double-colon punctuation">::</span>ofstream <span class="token function">out</span><span class="token punctuation">(</span>output<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>out <span class="token operator">|</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>binary<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>out<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        std<span class="token double-colon punctuation">::</span>cerr <span class="token operator"><<</span> <span class="token string">"open "</span> <span class="token operator"><<</span> output <span class="token operator"><<</span> <span class="token string">"error!"</span> <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>
        <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token comment">// 按照我们的约定规则写入到output文件中</span>
      <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span> <span class="token operator">&</span>item <span class="token operator">:</span> results<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        std<span class="token double-colon punctuation">::</span>string out_string<span class="token punctuation">;</span>
        out_string <span class="token operator">=</span> item<span class="token punctuation">.</span>title<span class="token punctuation">;</span>
        out_string <span class="token operator">+=</span> SEP<span class="token punctuation">;</span>
        out_string <span class="token operator">+=</span> item<span class="token punctuation">.</span>content<span class="token punctuation">;</span>
        out_string <span class="token operator">+=</span> SEP<span class="token punctuation">;</span>
        out_string <span class="token operator">+=</span> item<span class="token punctuation">.</span>url<span class="token punctuation">;</span>
        out_string <span class="token operator">+=</span> <span class="token char">'\n'</span><span class="token punctuation">;</span>
        
        <span class="token comment">//将内容写入到文件中</span>
        out<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>out_string<span class="token punctuation">.</span><span class="token function">c_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> out_string<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
    
      out<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li></ul></pre> 
    <h1><a name="t13"></a><a id="_421"></a>六、建立索引</h1> 
    <p>所需的数据结构</p> 
    <pre data-index="14" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 正排单个数据</span>
    <span class="token keyword">struct</span> <span class="token class-name">DocInfo</span>
    <span class="token punctuation">{<!-- --></span>
      std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span>   <span class="token comment">// 文档的标题</span>
      std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">// 文档对应的去标签之后的内容</span>
      std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span>     <span class="token comment">// 文档的url</span>
      <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span>     <span class="token comment">// 文档的ID</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    
    <span class="token comment">// 倒排单个数据</span>
    <span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span>  <span class="token comment">// 文档的ID</span>
      std<span class="token double-colon punctuation">::</span>string word<span class="token punctuation">;</span> <span class="token comment">// 搜索的关键字</span>
      <span class="token keyword">int</span> weight<span class="token punctuation">;</span>       <span class="token comment">// 权重</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    
    <span class="token comment">// 倒排拉链</span>
    <span class="token keyword">typedef</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElem<span class="token operator">></span> InvertedList<span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li></ul></pre> 
    <h2><a name="t14"></a><a id="_445"></a>正排索引</h2> 
    <p>正排索引的数据结构用数组,数组下标就是天然的文档ID</p> 
    <pre data-index="15" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 根据文档ID获得索引值</span>
    DocInfo <span class="token operator">*</span><span class="token function">GetForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">if</span> <span class="token punctuation">(</span>doc_id <span class="token operator">>=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        std<span class="token double-colon punctuation">::</span>cerr <span class="token operator"><<</span> <span class="token string">"doc_id has out of range!"</span> <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>
        <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">[</span>doc_id<span class="token punctuation">]</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li></ul></pre> 
    <h3><a name="t15"></a><a id="_459"></a>构建正排索引</h3> 
    <p>首先就是切分字符串<br> 使用C++的字符串切分比较麻烦,所以采用现成的boost库中的split接口即可完成。</p> 
    <pre data-index="16" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token keyword">class</span> <span class="token class-name">StringUtil</span>
    <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">public</span><span class="token operator">:</span>
        <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">CutString</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> target<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span><span class="token operator">*</span> out<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string sep<span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
            boost<span class="token double-colon punctuation">::</span><span class="token function">split</span><span class="token punctuation">(</span><span class="token operator">*</span>out<span class="token punctuation">,</span> target<span class="token punctuation">,</span> boost<span class="token double-colon punctuation">::</span><span class="token function">is_any_of</span><span class="token punctuation">(</span>sep<span class="token punctuation">)</span><span class="token punctuation">,</span> boost<span class="token double-colon punctuation">::</span>token_compress_on<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li></ul></pre> 
    <pre data-index="17" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;">DocInfo<span class="token operator">*</span> <span class="token function">BulidForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> line<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
        <span class="token comment">// 1、切分字符串</span>
        <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string sep <span class="token operator">=</span> <span class="token string">"\3"</span><span class="token punctuation">;</span>
        std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> results<span class="token punctuation">;</span> <span class="token comment">// 将切分的line ,放入vector中</span>
        ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">StringUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>line<span class="token punctuation">,</span> <span class="token operator">&</span>results<span class="token punctuation">,</span> sep<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span>results<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">3</span><span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
            <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token comment">// 2、字符串进行填充到DocInfo里</span>
        DocInfo doc<span class="token punctuation">;</span>
        doc<span class="token punctuation">.</span>title <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
        doc<span class="token punctuation">.</span>content <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
        doc<span class="token punctuation">.</span>url <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
        doc<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 对应的ID就是当前doc在当前vector中的下标</span>
    
        <span class="token comment">// 3、插入到正排索引的vector中</span>
        forward_index<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>doc<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">.</span><span class="token function">back</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li></ul></pre> 
    <h2><a name="t16"></a><a id="_497"></a>倒排索引</h2> 
    <p>倒排索引一定是一个唯一的关键字和一个(组)<code>InvertedElem</code>对应,即关键字和倒排拉链的映射关系,所以选用<code>unordered_map</code>作为存储的数据结构。</p> 
    <pre data-index="18" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 根据关键字,获得倒排拉链</span>
    InvertedList <span class="token operator">*</span><span class="token function">GetInvertedList</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>word<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">auto</span> iter <span class="token operator">=</span> inverted_index<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">if</span> <span class="token punctuation">(</span>iter <span class="token operator">==</span> inverted_index<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        std<span class="token double-colon punctuation">::</span>cerr <span class="token operator"><<</span> word <span class="token operator"><<</span> <span class="token string">"has no InvertedList!"</span> <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>
        <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token keyword">return</span> <span class="token operator">&</span><span class="token punctuation">(</span>iter<span class="token operator">-></span>second<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li></ul></pre> 
    <h3><a name="t17"></a><a id="_514"></a>构建倒排索引</h3> 
    <p>原理:<br> 我们拿到的文档内容格式:</p> 
    <pre data-index="19" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token keyword">struct</span> <span class="token class-name">DocInfo</span>
    <span class="token punctuation">{<!-- --></span>
      std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span>   <span class="token comment">// 文档的标题</span>
      std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">// 文档对应的去标签之后的内容</span>
      std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span>     <span class="token comment">// 文档的url</span>
      <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span>     <span class="token comment">// 文档的ID</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li></ul></pre> 
    <blockquote> 
     <p><font size="3"> 文档:<br> title : 吃葡萄<br> content: 吃葡萄不吐葡萄皮<br> url: http://XXXX<br> doc_id: 123<br> 根据文档内容,形成一个或者多个InvertedElem(倒排拉链) 因为当前我们是一个一个文档进行处理的,一个文档会包含多个”词“,都应当对应到当前的doc_id</font></p> 
    </blockquote> 
    <h3><a name="t18"></a><a id="1title_contentjieba_535"></a>1、需要对title 和content都要先分词——使用jieba分词</h3> 
    <blockquote> 
     <p>title:吃/葡萄/吃葡萄(title_word)<br> content:吃/葡萄/不吐/葡萄皮(content_word)<br> 词和文档的相关性(词频:在标题中出现的词,可以认为相关性更高一些,在内容中出现相关性低一些)</p> 
    </blockquote> 
    <h4><a id="cppjieba_540"></a>cppjieba的安装和测试使用</h4> 
    <p>安装:<code>git clone https://gitcode.net/mirrors/yanyiwu/cppjieba.git</code></p> 
    <p>使用:<br> 1、建立词库的软连接</p> 
    <pre data-index="20" class="prettyprint"><code class="prism language-bash has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ <span class="token function">ln</span> -s cppjieba/dict dict
    <span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ ll
    total <span class="token number">108</span>
    -rwxrwxr-x <span class="token number">1</span> sjj sjj <span class="token number">96608</span> Jul <span class="token number">29</span> <span class="token number">15</span>:42 a.out
    drwxrwxr-x <span class="token number">8</span> sjj sjj  <span class="token number">4096</span> Mar  <span class="token number">3</span> <span class="token number">15</span>:07 cppjieba
    -rw-rw-r-- <span class="token number">1</span> sjj sjj  <span class="token number">2797</span> Jul <span class="token number">29</span> <span class="token number">22</span>:27 demo.cpp
    lrwxrwxrwx <span class="token number">1</span> sjj sjj    <span class="token number">13</span> Jul <span class="token number">29</span> <span class="token number">22</span>:29 dict -<span class="token operator">></span> cppjieba/dict
    -rw-rw-r-- <span class="token number">1</span> sjj sjj   <span class="token number">389</span> Jul <span class="token number">29</span> <span class="token number">15</span>:43 test.cc
    <span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ <span class="token function">ls</span> cppjieba/dict/
    hmm_model.utf8  idf.utf8  jieba.dict.utf8  pos_dict  README.md  stop_words.utf8  user.dict.utf8
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li></ul></pre> 
    <p>2、建立头文件(Jieba.hpp)的软连接</p> 
    <pre data-index="21" class="prettyprint"><code class="prism language-bash has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ <span class="token function">ln</span> -s cppjieba/include/ inc
    <span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ ll
    total <span class="token number">108</span>
    -rwxrwxr-x <span class="token number">1</span> sjj sjj <span class="token number">96608</span> Jul <span class="token number">29</span> <span class="token number">15</span>:42 a.out
    drwxrwxr-x <span class="token number">8</span> sjj sjj  <span class="token number">4096</span> Mar  <span class="token number">3</span> <span class="token number">15</span>:07 cppjieba
    -rw-rw-r-- <span class="token number">1</span> sjj sjj  <span class="token number">2797</span> Jul <span class="token number">29</span> <span class="token number">22</span>:27 demo.cpp
    lrwxrwxrwx <span class="token number">1</span> sjj sjj    <span class="token number">13</span> Jul <span class="token number">29</span> <span class="token number">22</span>:29 dict -<span class="token operator">></span> cppjieba/dict
    lrwxrwxrwx <span class="token number">1</span> sjj sjj    <span class="token number">17</span> Jul <span class="token number">29</span> <span class="token number">22</span>:32 inc -<span class="token operator">></span> cppjieba/include/
    -rw-rw-r-- <span class="token number">1</span> sjj sjj   <span class="token number">389</span> Jul <span class="token number">29</span> <span class="token number">15</span>:43 test.cc
    <span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ <span class="token function">ls</span> cppjieba/include/
    cppjieba
    <span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ <span class="token function">ls</span> cppjieba/include/cppjieba/
    DictTrie.hpp     HMMSegment.hpp        limonp          PosTagger.hpp     SegmentBase.hpp        Trie.hpp
    FullSegment.hpp  Jieba.hpp             MixSegment.hpp  PreFilter.hpp     SegmentTagged.hpp      Unicode.hpp
    HMMModel.hpp     KeywordExtractor.hpp  MPSegment.hpp   QuerySegment.hpp  TextRankExtractor.hpp
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li></ul></pre> 
    <p><mark>注意细节</mark>:需要手动拷贝文件deps/limonp,否则可能编译不通过</p> 
    <pre data-index="22" class="prettyprint"><code class="prism language-bash has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM-20-15-centos cppjieba<span class="token punctuation">]</span>$ <span class="token variable"><span class="token variable">`</span><span class="token function">cp</span> deps/limonp include/cppjieba/ -rf<span class="token variable">`</span></span>
    <span class="token punctuation">[</span>sjj@VM-20-15-centos cppjieba<span class="token punctuation">]</span>$ <span class="token function">ls</span> include/cppjieba/
    DictTrie.hpp     HMMSegment.hpp        <span class="token variable"><span class="token variable">`</span>limonp<span class="token variable">`</span></span>          PosTagger.hpp     SegmentBase.hpp        Trie.hpp
    FullSegment.hpp  Jieba.hpp             MixSegment.hpp  PreFilter.hpp     SegmentTagged.hpp      Unicode.hpp
    HMMModel.hpp     KeywordExtractor.hpp  MPSegment.hpp   QuerySegment.hpp  TextRankExtractor.hpp
    <span class="token punctuation">[</span>sjj@VM-20-15-centos cppjieba<span class="token punctuation">]</span>$ <span class="token function">ls</span> include/cppjieba/limonp/
    ArgvContext.hpp           Closure.hpp    FileLock.hpp     Md5.hpp           StringUtil.hpp
    BlockingQueue.hpp         Colors.hpp     ForcePublic.hpp  MutexLock.hpp     Thread.hpp
    BoundedBlockingQueue.hpp  Condition.hpp  LocalVector.hpp  NonCopyable.hpp   ThreadPool.hpp
    BoundedQueue.hpp          Config.hpp     <span class="token variable"><span class="token variable">`</span>Logging.hpp<span class="token variable">`</span></span>      StdExtension.hpp
    
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li></ul></pre> 
    <p>测试样例:demo.cc</p> 
    <pre data-index="23" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"inc/cppjieba/Jieba.hpp"</span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"><string></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"><iostream></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"><vector></span></span>
    
    <span class="token keyword">using</span> <span class="token keyword">namespace</span> std<span class="token punctuation">;</span>
    
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> DICT_PATH <span class="token operator">=</span> <span class="token string">"./dict/jieba.dict.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> HMM_PATH <span class="token operator">=</span> <span class="token string">"./dict/hmm_model.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> USER_DICT_PATH <span class="token operator">=</span> <span class="token string">"./dict/user.dict.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> IDF_PATH <span class="token operator">=</span> <span class="token string">"./dict/idf.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> STOP_WORD_PATH <span class="token operator">=</span> <span class="token string">"./dict/stop_words.utf8"</span><span class="token punctuation">;</span>
    
    <span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token keyword">int</span> argc<span class="token punctuation">,</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token operator">*</span>argv<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      cppjieba<span class="token double-colon punctuation">::</span>Jieba <span class="token function">jieba</span><span class="token punctuation">(</span>DICT_PATH<span class="token punctuation">,</span>
                            HMM_PATH<span class="token punctuation">,</span>
                            USER_DICT_PATH<span class="token punctuation">,</span>
                            IDF_PATH<span class="token punctuation">,</span>
                            STOP_WORD_PATH<span class="token punctuation">)</span><span class="token punctuation">;</span>
      vector<span class="token operator"><</span>string<span class="token operator">></span> words<span class="token punctuation">;</span>
      string s<span class="token punctuation">;</span>
    
      s <span class="token operator">=</span> <span class="token string">"小明硕士毕业于中国科学院计算所,后在日本京都大学深造"</span><span class="token punctuation">;</span>
      cout <span class="token operator"><<</span> s <span class="token operator"><<</span> endl<span class="token punctuation">;</span>
      cout <span class="token operator"><<</span> <span class="token string">"[demo] CutForSearch"</span> <span class="token operator"><<</span> endl<span class="token punctuation">;</span>
      jieba<span class="token punctuation">.</span><span class="token function">CutForSearch</span><span class="token punctuation">(</span>s<span class="token punctuation">,</span> words<span class="token punctuation">)</span><span class="token punctuation">;</span>
      cout <span class="token operator"><<</span> limonp<span class="token double-colon punctuation">::</span><span class="token function">Join</span><span class="token punctuation">(</span>words<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> words<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">"/"</span><span class="token punctuation">)</span> <span class="token operator"><<</span> endl<span class="token punctuation">;</span>
    
      <span class="token keyword">return</span> EXIT_SUCCESS<span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li></ul></pre> 
    <p>结果展示:</p> 
    <pre data-index="24" class="prettyprint"><code class="prism language-bash has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ g++ demo.cpp -std<span class="token operator">=</span>c++11
    <span class="token punctuation">[</span>sjj@VM-20-15-centos test<span class="token punctuation">]</span>$ ./a.out 
    小明硕士毕业于中国科学院计算所,后在日本京都大学深造
    <span class="token punctuation">[</span>demo<span class="token punctuation">]</span> CutForSearch
    小明/硕士/毕业/于/中国/科学/学院/科学院/中国科学院/计算/计算所/,/后/在/日本/京都/大学/日本京都大学/深造
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li></ul></pre> 
    <h4><a id="jieba_633"></a>将jieba库引入到项目中</h4> 
    <p>建立两个软连接,方便使用</p> 
    <pre data-index="25" class="prettyprint"><code class="prism language-bash has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM-20-15-centos boost_searcher<span class="token punctuation">]</span>$ <span class="token function">ln</span> -s ~/thirdpart/cppjieba/include/cppjieba/ cppjieba
    <span class="token punctuation">[</span>sjj@VM-20-15-centos boost_searcher<span class="token punctuation">]</span>$ <span class="token function">ln</span> -s ~/thirdpart/cppjieba/dict/ dict
    
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li></ul></pre> 
    <pre data-index="26" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"cppjieba/Jieba.hpp"</span>         <span class="token comment">//结巴分词</span></span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> DICT_PATH <span class="token operator">=</span> <span class="token string">"./dict/jieba.dict.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> HMM_PATH <span class="token operator">=</span> <span class="token string">"./dict/hmm_model.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> USER_DICT_PATH <span class="token operator">=</span> <span class="token string">"./dict/user.dict.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> IDF_PATH <span class="token operator">=</span> <span class="token string">"./dict/idf.utf8"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> STOP_WORD_PATH <span class="token operator">=</span> <span class="token string">"./dict/stop_words.utf8"</span><span class="token punctuation">;</span>
    
    <span class="token keyword">class</span> <span class="token class-name">JiebaUtil</span>
    <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">private</span><span class="token operator">:</span>
        <span class="token keyword">static</span> cppjieba<span class="token double-colon punctuation">::</span>Jieba jieba<span class="token punctuation">;</span>
    
    <span class="token keyword">public</span><span class="token operator">:</span>
        <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">CutString</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>src<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> <span class="token operator">*</span>out<span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
            jieba<span class="token punctuation">.</span><span class="token function">CutForSearch</span><span class="token punctuation">(</span>src<span class="token punctuation">,</span> <span class="token operator">*</span>out<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    <span class="token comment">// 静态成员需要在类外初始化</span>
    cppjieba<span class="token double-colon punctuation">::</span>Jieba <span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">jieba</span><span class="token punctuation">(</span>DICT_PATH<span class="token punctuation">,</span> HMM_PATH<span class="token punctuation">,</span> USER_DICT_PATH<span class="token punctuation">,</span> IDF_PATH<span class="token punctuation">,</span> STOP_WORD_PATH<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li></ul></pre> 
    <h3><a name="t19"></a><a id="2_664"></a>2、词频统计——词和文档的相关性</h3> 
    <p>我们约定:在标题中出现的词相关性更高一些,在内容中出现的词,相关性就低一些</p> 
    <pre data-index="27" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 统计词频的结构体</span>
    <span class="token keyword">struct</span> <span class="token class-name">word_cnt</span> <span class="token punctuation">{<!-- --></span>
        title_cnt<span class="token punctuation">;</span>
        content_cnt<span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token comment">// 词和词频的映射关系</span>
    unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> word_cnt<span class="token operator">></span> word_cnt<span class="token punctuation">;</span>
    <span class="token comment">// 遍历统计</span>
    <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token operator">&</span>word <span class="token operator">:</span> title_word<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
        word_cnt<span class="token punctuation">[</span>word<span class="token punctuation">]</span><span class="token punctuation">.</span>title_cnt<span class="token operator">++</span><span class="token punctuation">;</span> <span class="token comment">//吃(1)/葡萄(1)/吃葡萄(1) </span>
    <span class="token punctuation">}</span>
    <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token operator">&</span>word <span class="token operator">:</span> content_word<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
        word_cnt<span class="token punctuation">[</span>word<span class="token punctuation">]</span><span class="token punctuation">.</span>content_cnt<span class="token operator">++</span><span class="token punctuation">;</span> <span class="token comment">//吃(1)/葡萄(1)/不吐(1)/葡萄皮(1) </span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li></ul></pre> 
    <p>知道了在文档中,标题和内容每个词出现的次数</p> 
    <h3><a name="t20"></a><a id="3_684"></a>3.、自定义相关性——按照权值排序</h3> 
    <pre data-index="28" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token operator">&</span>word <span class="token operator">:</span> word_cnt<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    	<span class="token comment">//具体一个词和123文档的对应关系,当有多个不同的词,指向同一个文档的时候,此时该优先显示谁?</span>
    	<span class="token comment">//由相关性决定! </span>
        <span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span> elem<span class="token punctuation">;</span><span class="token comment">// 构建结点</span>
        elem<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> <span class="token number">123</span><span class="token punctuation">;</span>
        elem<span class="token punctuation">.</span>word <span class="token operator">=</span> word<span class="token punctuation">.</span>first<span class="token punctuation">;</span>
        elem<span class="token punctuation">.</span>weight <span class="token operator">=</span> <span class="token number">10</span> <span class="token operator">*</span> word<span class="token punctuation">.</span>second<span class="token punctuation">.</span>title_cnt <span class="token operator">+</span> word<span class="token punctuation">.</span>second<span class="token punctuation">.</span>content_cnt<span class="token punctuation">;</span>  
        
        <span class="token comment">// 结点插入到map中,以key方括号的形式进行索引</span>
        inverted_index<span class="token punctuation">[</span>word<span class="token punctuation">.</span>first<span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>elem<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li></ul></pre> 
    <p><strong>完整的建立索引的代码:</strong></p> 
    <pre data-index="29" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;">DocInfo <span class="token operator">*</span><span class="token function">BulidForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>line<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// 1、切分字符串</span>
    <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string sep<span class="token operator">=</span><span class="token string">"\3"</span><span class="token punctuation">;</span>
    std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> results<span class="token punctuation">;</span> <span class="token comment">// 将切分的line ,放入vector中</span>
    ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">StringUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>line<span class="token punctuation">,</span><span class="token operator">&</span>results<span class="token punctuation">,</span>sep<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">if</span><span class="token punctuation">(</span>results<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token operator">!=</span><span class="token number">3</span><span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token comment">// 2、字符串进行填充到DocInfo里</span>
    DocInfo doc<span class="token punctuation">;</span>
    doc<span class="token punctuation">.</span>title<span class="token operator">=</span>results<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
    doc<span class="token punctuation">.</span>content<span class="token operator">=</span>results<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
    doc<span class="token punctuation">.</span>url<span class="token operator">=</span>results<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
    doc<span class="token punctuation">.</span>doc_id<span class="token operator">=</span>forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 对应的ID就是当前doc在当前vector中的下标</span>
    
    <span class="token comment">// 3、插入到正排索引的vector中</span>
    forward_index<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>doc<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">.</span><span class="token function">back</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">bool</span> <span class="token function">BulidInvertedIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> DocInfo <span class="token operator">&</span>doc<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// DocInfo doc{title,content,url,doc_id}</span>
    <span class="token comment">// 利用正排索引得到的文档,建立倒排</span>
    <span class="token keyword">struct</span> <span class="token class-name">word_cnt</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">int</span> title_cnt<span class="token punctuation">;</span>
      <span class="token keyword">int</span> content_cnt<span class="token punctuation">;</span>
      <span class="token function">word_cnt</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token operator">:</span><span class="token function">title_cnt</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token function">content_cnt</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span>
    
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span>word_cnt<span class="token operator">></span> word_map<span class="token punctuation">;</span> <span class="token comment">// 用来暂存词频的映射表</span>
    
    <span class="token comment">// 对标题进行分词并且统计词频</span>
    std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> title_words<span class="token punctuation">;</span>
    ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>title<span class="token punctuation">,</span><span class="token operator">&</span>title_words<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">for</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string s<span class="token operator">:</span> title_words<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
       <span class="token comment">// 统一为小写</span>
      boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span>
      word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>title_cnt<span class="token operator">++</span><span class="token punctuation">;</span><span class="token comment">// 如果key值存在就统计,如果不存在就新建</span>
    <span class="token punctuation">}</span>
    
    <span class="token comment">// 对内容进行分词并且统计词频</span>
    std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> content_words<span class="token punctuation">;</span>
    ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>content<span class="token punctuation">,</span><span class="token operator">&</span>content_words<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span>  s<span class="token operator">:</span> content_words<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token comment">// 统一为小写</span>
      boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span>
      word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>content_cnt<span class="token operator">++</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">X</span> <span class="token expression"><span class="token number">10</span></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">Y</span> <span class="token expression"><span class="token number">1</span></span></span>
    <span class="token comment">// 构建倒排拉链</span>
    <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span> <span class="token operator">&</span>word_pair<span class="token operator">:</span>word_map<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
        InvertedElem item<span class="token punctuation">;</span>
        item<span class="token punctuation">.</span>doc_id<span class="token operator">=</span>doc<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>
        item<span class="token punctuation">.</span>word<span class="token operator">=</span>word_pair<span class="token punctuation">.</span>first<span class="token punctuation">;</span>
    
        <span class="token comment">// 相关性</span>
        item<span class="token punctuation">.</span>weight<span class="token operator">=</span>X<span class="token operator">*</span>word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>title_cnt<span class="token operator">+</span>Y<span class="token operator">*</span>word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>content_cnt<span class="token punctuation">;</span>
        InvertedList <span class="token operator">&</span> inverted_list<span class="token operator">=</span>inverted_index<span class="token punctuation">[</span>word_pair<span class="token punctuation">.</span>first<span class="token punctuation">]</span><span class="token punctuation">;</span>
        inverted_list<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>item<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li><li style="color: rgb(153, 153, 153);">32</li><li style="color: rgb(153, 153, 153);">33</li><li style="color: rgb(153, 153, 153);">34</li><li style="color: rgb(153, 153, 153);">35</li><li style="color: rgb(153, 153, 153);">36</li><li style="color: rgb(153, 153, 153);">37</li><li style="color: rgb(153, 153, 153);">38</li><li style="color: rgb(153, 153, 153);">39</li><li style="color: rgb(153, 153, 153);">40</li><li style="color: rgb(153, 153, 153);">41</li><li style="color: rgb(153, 153, 153);">42</li><li style="color: rgb(153, 153, 153);">43</li><li style="color: rgb(153, 153, 153);">44</li><li style="color: rgb(153, 153, 153);">45</li><li style="color: rgb(153, 153, 153);">46</li><li style="color: rgb(153, 153, 153);">47</li><li style="color: rgb(153, 153, 153);">48</li><li style="color: rgb(153, 153, 153);">49</li><li style="color: rgb(153, 153, 153);">50</li><li style="color: rgb(153, 153, 153);">51</li><li style="color: rgb(153, 153, 153);">52</li><li style="color: rgb(153, 153, 153);">53</li><li style="color: rgb(153, 153, 153);">54</li><li style="color: rgb(153, 153, 153);">55</li><li style="color: rgb(153, 153, 153);">56</li><li style="color: rgb(153, 153, 153);">57</li><li style="color: rgb(153, 153, 153);">58</li><li style="color: rgb(153, 153, 153);">59</li><li style="color: rgb(153, 153, 153);">60</li><li style="color: rgb(153, 153, 153);">61</li><li style="color: rgb(153, 153, 153);">62</li><li style="color: rgb(153, 153, 153);">63</li><li style="color: rgb(153, 153, 153);">64</li><li style="color: rgb(153, 153, 153);">65</li><li style="color: rgb(153, 153, 153);">66</li><li style="color: rgb(153, 153, 153);">67</li><li style="color: rgb(153, 153, 153);">68</li><li style="color: rgb(153, 153, 153);">69</li><li style="color: rgb(153, 153, 153);">70</li></ul></pre> 
    <h1><a name="t21"></a><a id="searcher_771"></a>七、编写searcher</h1> 
    <p>我们的索引已经建立好了,我们接下来的任务就是根据索引去搜索内容<br> <img src="https://1000bd.com/contentImg/2022/08/15/054746899.png" alt="在这里插入图片描述"></p> 
    <h2><a name="t22"></a><a id="1_774"></a>1、分词</h2> 
    <p>首先进行分词操作,才能够进行搜索!对于我们输入的关键字query,按照searcher的要求进行分词</p> 
    <h2><a name="t23"></a><a id="2_776"></a>2、触发</h2> 
    <p>就是根据分词的各个“词”,进行index查找<br> 不完美的地方:<br> 可能会存在搜索到重复文档的情况,这种情况并不是错误的,而是因为分词的原因,多个词可能对应的都是同一篇文档,这就给用户的体验带来了不便</p> 
    <h2><a name="t24"></a><a id="3_780"></a>3、合并排序</h2> 
    <p>汇总查找结果,按照相关性weight进行降序排序</p> 
    <h2><a name="t25"></a><a id="4_782"></a>4、构建</h2> 
    <p>根据查找出来的结果,构建json字符串——第三方库<code>jsoncpp</code><br> 通过jsoncpp完成序列化和反序列化的过程</p> 
    <p>安装jsoncpp:</p> 
    <pre data-index="30" class="prettyprint"><code class="prism language-bash has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM-20-15-centos boost_searcher<span class="token punctuation">]</span>$ <span class="token function">sudo</span> yum <span class="token function">install</span> -y jsoncpp-devel
    
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li></ul></pre> 
    <p>使用jsoncpp:</p> 
    <pre data-index="31" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token punctuation">[</span>sjj@VM<span class="token operator">-</span><span class="token number">20</span><span class="token operator">-</span><span class="token number">15</span><span class="token operator">-</span>centos test<span class="token punctuation">]</span>$ cat test<span class="token punctuation">.</span>cc 
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"><iostream></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"><string></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"><jsoncpp/json/json.h></span></span>
    
    <span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      Json<span class="token double-colon punctuation">::</span>Value root<span class="token punctuation">;</span>
      Json<span class="token double-colon punctuation">::</span>Value item1<span class="token punctuation">;</span>
      item1<span class="token punctuation">[</span><span class="token string">"key1"</span><span class="token punctuation">]</span><span class="token operator">=</span><span class="token string">"value111"</span><span class="token punctuation">;</span>
      item1<span class="token punctuation">[</span><span class="token string">"key2"</span><span class="token punctuation">]</span><span class="token operator">=</span><span class="token string">"value222"</span><span class="token punctuation">;</span>
    
      Json<span class="token double-colon punctuation">::</span>Value item2<span class="token punctuation">;</span>
      item2<span class="token punctuation">[</span><span class="token string">"key1"</span><span class="token punctuation">]</span><span class="token operator">=</span><span class="token string">"value1"</span><span class="token punctuation">;</span>
      item2<span class="token punctuation">[</span><span class="token string">"key2"</span><span class="token punctuation">]</span><span class="token operator">=</span><span class="token string">"value1"</span><span class="token punctuation">;</span>
    
      <span class="token comment">// 相当于把item插入到root数组中</span>
      root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>item1<span class="token punctuation">)</span><span class="token punctuation">;</span>
      root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>item2<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token comment">//Json::StyledWriter writer;</span>
      Json<span class="token double-colon punctuation">::</span>FastWriter writer<span class="token punctuation">;</span>
      std<span class="token double-colon punctuation">::</span>string s<span class="token operator">=</span>writer<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>root<span class="token punctuation">)</span><span class="token punctuation">;</span>
      std<span class="token double-colon punctuation">::</span>cout<span class="token operator"><<</span>s<span class="token operator"><<</span>std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>
      <span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token punctuation">[</span>sjj@VM<span class="token operator">-</span><span class="token number">20</span><span class="token operator">-</span><span class="token number">15</span><span class="token operator">-</span>centos test<span class="token punctuation">]</span>$ g<span class="token operator">++</span> test<span class="token punctuation">.</span>cc <span class="token operator">-</span>std<span class="token operator">=</span>c<span class="token operator">++</span><span class="token number">11</span> <span class="token operator">-</span>ljsoncpp
    <span class="token punctuation">[</span>sjj@VM<span class="token operator">-</span><span class="token number">20</span><span class="token operator">-</span><span class="token number">15</span><span class="token operator">-</span>centos test<span class="token punctuation">]</span>$ <span class="token punctuation">.</span><span class="token operator">/</span>a<span class="token punctuation">.</span>out 
    <span class="token punctuation">[</span>
       <span class="token punctuation">{<!-- --></span>
          <span class="token string">"key1"</span> <span class="token operator">:</span> <span class="token string">"value111"</span><span class="token punctuation">,</span>
          <span class="token string">"key2"</span> <span class="token operator">:</span> <span class="token string">"value222"</span>
       <span class="token punctuation">}</span><span class="token punctuation">,</span>
       <span class="token punctuation">{<!-- --></span>
          <span class="token string">"key1"</span> <span class="token operator">:</span> <span class="token string">"value1"</span><span class="token punctuation">,</span>
          <span class="token string">"key2"</span> <span class="token operator">:</span> <span class="token string">"value1"</span>
       <span class="token punctuation">}</span>
    <span class="token punctuation">]</span>
    
    <span class="token punctuation">[</span>sjj@VM<span class="token operator">-</span><span class="token number">20</span><span class="token operator">-</span><span class="token number">15</span><span class="token operator">-</span>centos test<span class="token punctuation">]</span>$ g<span class="token operator">++</span> test<span class="token punctuation">.</span>cc <span class="token operator">-</span>std<span class="token operator">=</span>c<span class="token operator">++</span><span class="token number">11</span> <span class="token operator">-</span>ljsoncpp
    <span class="token punctuation">[</span>sjj@VM<span class="token operator">-</span><span class="token number">20</span><span class="token operator">-</span><span class="token number">15</span><span class="token operator">-</span>centos test<span class="token punctuation">]</span>$ <span class="token punctuation">.</span><span class="token operator">/</span>a<span class="token punctuation">.</span>out 
    <span class="token punctuation">[</span><span class="token punctuation">{<!-- --></span><span class="token string">"key1"</span><span class="token operator">:</span><span class="token string">"value111"</span><span class="token punctuation">,</span><span class="token string">"key2"</span><span class="token operator">:</span><span class="token string">"value222"</span><span class="token punctuation">}</span><span class="token punctuation">,</span><span class="token punctuation">{<!-- --></span><span class="token string">"key1"</span><span class="token operator">:</span><span class="token string">"value1"</span><span class="token punctuation">,</span><span class="token string">"key2"</span><span class="token operator">:</span><span class="token string">"value1"</span><span class="token punctuation">}</span><span class="token punctuation">]</span>
    
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li><li style="color: rgb(153, 153, 153);">32</li><li style="color: rgb(153, 153, 153);">33</li><li style="color: rgb(153, 153, 153);">34</li><li style="color: rgb(153, 153, 153);">35</li><li style="color: rgb(153, 153, 153);">36</li><li style="color: rgb(153, 153, 153);">37</li><li style="color: rgb(153, 153, 153);">38</li><li style="color: rgb(153, 153, 153);">39</li><li style="color: rgb(153, 153, 153);">40</li><li style="color: rgb(153, 153, 153);">41</li><li style="color: rgb(153, 153, 153);">42</li></ul></pre> 
    <h3><a name="t26"></a><a id="_838"></a>截取摘要</h3> 
    <p>找到<code>word</code>在<code>html_content</code>中的首次出现,然后往前找50字节(如果不足50字节,那就从开头开始begin找),往后找100字节(如果不足100字节,那就一直到end就可以了),最后截取这部分内容就是我们平常在网页上面看到的摘要了。<br> <img src="https://1000bd.com/contentImg/2022/08/15/054747033.png" alt="在这里插入图片描述"></p> 
    <pre data-index="32" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 获取摘要</span>
    std<span class="token double-colon punctuation">::</span>string <span class="token function">GetDesc</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>html_content<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>word<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>size_t prev_step <span class="token operator">=</span> <span class="token number">50</span><span class="token punctuation">;</span>
      <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>size_t next_step <span class="token operator">=</span> <span class="token number">100</span><span class="token punctuation">;</span>
      <span class="token comment">// 1、找到首次出现的词</span>
      std<span class="token double-colon punctuation">::</span>size_t pos <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator">==</span> std<span class="token double-colon punctuation">::</span>string<span class="token double-colon punctuation">::</span>npos<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token string">"None"</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token comment">// 2、获取start、end</span>
      std<span class="token double-colon punctuation">::</span>size_t start <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span>
      std<span class="token double-colon punctuation">::</span>size_t end <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span>
      <span class="token comment">// 如果前面有50+个字符,需要更新start位置</span>
      <span class="token keyword">if</span><span class="token punctuation">(</span>pos<span class="token operator">-</span>prev_step<span class="token operator">></span>start<span class="token punctuation">)</span> start<span class="token operator">=</span>pos <span class="token operator">-</span> prev_step<span class="token punctuation">;</span>
      <span class="token keyword">if</span><span class="token punctuation">(</span>pos<span class="token operator">+</span>next_step<span class="token operator"><</span>end<span class="token punctuation">)</span> end<span class="token operator">=</span>pos <span class="token operator">+</span>next_step<span class="token punctuation">;</span>
      <span class="token comment">// 3、截取子串</span>
      <span class="token keyword">if</span><span class="token punctuation">(</span>start<span class="token operator">>=</span>end<span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token string">"None"</span><span class="token punctuation">;</span>
      <span class="token keyword">return</span> html_content<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>start<span class="token punctuation">,</span>end<span class="token operator">-</span>start<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li></ul></pre> 
    <h1><a name="t27"></a><a id="_866"></a>八、综合调试</h1> 
    <p><img src="https://1000bd.com/contentImg/2022/08/15/054747216.png" alt="在这里插入图片描述"><br> 在search.hpp中<br> <mark>bug1</mark></p> 
    <pre data-index="33" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 2、获取start、end</span>
    std<span class="token double-colon punctuation">::</span>size_t start <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span>
    std<span class="token double-colon punctuation">::</span>size_t end <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">;</span>
    <span class="token comment">// 如果前面有50+个字符,需要更新start位置</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator">-</span> prev_step <span class="token operator">></span> start<span class="token punctuation">)</span>
      start <span class="token operator">=</span> pos <span class="token operator">-</span> prev_step<span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator">+</span> next_step <span class="token operator"><</span> end<span class="token punctuation">)</span>
      end <span class="token operator">=</span> pos <span class="token operator">+</span> next_step<span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li></ul></pre> 
    <p><code>size_t</code>是无符号整数,<code>pos-prev_step</code>可能是一个负数,但是由于是无符号整数,所以会被转换称为很大的正数,if条件始终满足,所以这里有个bug<br> 修改如下:</p> 
    <pre data-index="34" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 2、获取start、end</span>
    <span class="token comment">// size_t是无符号整数</span>
    std<span class="token double-colon punctuation">::</span>size_t start <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span>
    std<span class="token double-colon punctuation">::</span>size_t end <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">;</span>
    <span class="token comment">// 如果前面有50+个字符,需要更新start位置</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator">></span> start <span class="token operator">+</span> prev_step<span class="token punctuation">)</span>
      start <span class="token operator">=</span> pos <span class="token operator">-</span> prev_step<span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token keyword">int</span><span class="token punctuation">)</span>pos <span class="token operator"><</span> <span class="token punctuation">(</span><span class="token keyword">int</span><span class="token punctuation">)</span><span class="token punctuation">(</span>end <span class="token operator">-</span> next_step<span class="token punctuation">)</span><span class="token punctuation">)</span>
      end <span class="token operator">=</span> pos <span class="token operator">+</span> next_step<span class="token punctuation">;</span>
    <span class="token comment">// 3、截取子串</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>start <span class="token operator">>=</span> end<span class="token punctuation">)</span>
      <span class="token keyword">return</span> <span class="token string">"None2"</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> html_content<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>start<span class="token punctuation">,</span> end <span class="token operator">-</span> start<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li></ul></pre> 
    <p><mark>bug2</mark></p> 
    <pre data-index="35" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 1、找到首次出现的词</span>
    std<span class="token double-colon punctuation">::</span>size_t pos <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator">==</span> std<span class="token double-colon punctuation">::</span>string<span class="token double-colon punctuation">::</span>npos<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">return</span> <span class="token string">"None1"</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li></ul></pre> 
    <p>我们在搜索文档是,将其转化为小写,但是我们搜索的数据源本就是大小写都有的,所以我们要利用搜索时,忽略大小写的方法</p> 
    <pre data-index="36" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 1、找到首次出现的词</span>
    <span class="token keyword">auto</span> iter <span class="token operator">=</span> std<span class="token double-colon punctuation">::</span><span class="token function">search</span><span class="token punctuation">(</span>html_content<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> html_content<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> word<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> word<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">int</span> x<span class="token punctuation">,</span> <span class="token keyword">int</span> y<span class="token punctuation">)</span>
                            <span class="token punctuation">{<!-- --></span> <span class="token keyword">return</span> <span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">tolower</span><span class="token punctuation">(</span>x<span class="token punctuation">)</span> <span class="token operator">==</span> std<span class="token double-colon punctuation">::</span><span class="token function">tolower</span><span class="token punctuation">(</span>y<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    
    <span class="token keyword">if</span><span class="token punctuation">(</span>iter<span class="token operator">==</span>html_content<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token keyword">return</span> <span class="token string">"None1"</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">int</span> pos <span class="token operator">=</span> std<span class="token double-colon punctuation">::</span><span class="token function">distance</span><span class="token punctuation">(</span>html_content<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>iter<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li></ul></pre> 
    <p><mark>bug3</mark><br> 我最想知道,我们的文档是否是按照权值来排倒序的</p> 
    <pre data-index="37" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// for debug</span>
    elem<span class="token punctuation">[</span><span class="token string">"id"</span><span class="token punctuation">]</span><span class="token operator">=</span><span class="token punctuation">(</span><span class="token keyword">int</span><span class="token punctuation">)</span>item<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>
    elem<span class="token punctuation">[</span><span class="token string">"weight"</span><span class="token punctuation">]</span><span class="token operator">=</span>item<span class="token punctuation">.</span>weight<span class="token punctuation">;</span>
    root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>elem<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li></ul></pre> 
    <p><img src="https://1000bd.com/contentImg/2022/08/15/054747572.png" alt="在这里插入图片描述"><br> <mark>bug4</mark><br> 可能会存在搜索到重复文档的情况,这种情况并不是错误的,而是因为分词的原因,多个词可能对应的都是同一篇文档,这就给用户的体验带来了不便<br> <code>eg:</code>搜索关键字为->你是一个好人<br> 分词过后:你/是/一个/好人<br> 这个分词结果在倒排当中就可能对应了4个key-Value关系,我们实际上只想要一个kv关系,所以接下来要进行去重操作<br> <img src="https://1000bd.com/contentImg/2022/08/15/054747870.png" alt="在这里插入图片描述"><br> <mark>思路</mark>:文档id相同的全部合并起来,合并的这些文档的权值全部累加起来</p> 
    <pre data-index="38" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token comment">// 新增加一个结点:用于去重的打印倒排拉链的结点</span>
    <span class="token keyword">struct</span> <span class="token class-name">InvertedElemPrint</span><span class="token punctuation">{<!-- --></span>
        <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span>
        <span class="token keyword">int</span> weight<span class="token punctuation">;</span>
        std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> words<span class="token punctuation">;</span>
        <span class="token function">InvertedElemPrint</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token operator">:</span><span class="token function">doc_id</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token function">weight</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
     <span class="token keyword">void</span> <span class="token function">Search</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>query<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">*</span>json_string<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      <span class="token comment">// 1、分词</span>
      std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> words<span class="token punctuation">;</span>
      ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>query<span class="token punctuation">,</span> <span class="token operator">&</span>words<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token comment">// 2、触发</span>
      <span class="token comment">// ns_index::InvertedList inverted_list_all; // typedef std::vector<InvertedElem> InvertedList;</span>
      std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElemPrint<span class="token operator">></span> inverted_list_all<span class="token punctuation">;</span>
      std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span><span class="token keyword">uint64_t</span><span class="token punctuation">,</span> InvertedElemPrint<span class="token operator">></span> tokens_map<span class="token punctuation">;</span>
      <span class="token keyword">for</span> <span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string word <span class="token operator">:</span> words<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 首先需要获取倒排拉链</span>
        ns_index<span class="token double-colon punctuation">::</span>InvertedList <span class="token operator">*</span>inverted_list <span class="token operator">=</span> index<span class="token operator">-></span><span class="token function">GetInvertedList</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">nullptr</span> <span class="token operator">==</span> inverted_list<span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
          <span class="token keyword">continue</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token comment">//inverted_list_all.insert(inverted_list_all.end(), inverted_list->begin(), inverted_list->end());</span>
        <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">const</span> <span class="token keyword">auto</span> <span class="token operator">&</span>elem <span class="token operator">:</span> <span class="token operator">*</span>inverted_list<span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
          <span class="token keyword">auto</span> <span class="token operator">&</span>item <span class="token operator">=</span> tokens_map<span class="token punctuation">[</span>elem<span class="token punctuation">.</span>doc_id<span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token comment">//[]:如果存在直接获取,如果不存在新建</span>
          <span class="token comment">// item一定是doc_id相同的print节点</span>
          item<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> elem<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>
          item<span class="token punctuation">.</span>weight <span class="token operator">+=</span> elem<span class="token punctuation">.</span>weight<span class="token punctuation">;</span>
          item<span class="token punctuation">.</span>words<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>elem<span class="token punctuation">.</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
      <span class="token punctuation">}</span>
    
      <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">const</span> <span class="token keyword">auto</span> <span class="token operator">&</span>item <span class="token operator">:</span> tokens_map<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        inverted_list_all<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>item<span class="token punctuation">.</span>second<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token comment">// 3、合并排序  根据weight降序排序</span>
      <span class="token comment">// std::sort(inverted_list_all.begin(), inverted_list_all.end(),</span>
      <span class="token comment">//           [](const ns_index::InvertedElem &e1, const ns_index::InvertedElem &e2)</span>
      <span class="token comment">//           {<!-- --></span>
      <span class="token comment">//             return e1.weight > e2.weight;</span>
      <span class="token comment">//           });</span>
    
      std<span class="token double-colon punctuation">::</span><span class="token function">sort</span><span class="token punctuation">(</span>inverted_list_all<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> inverted_list_all<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
                <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">const</span> InvertedElemPrint <span class="token operator">&</span>e1<span class="token punctuation">,</span> <span class="token keyword">const</span> InvertedElemPrint <span class="token operator">&</span>e2<span class="token punctuation">)</span>
                <span class="token punctuation">{<!-- --></span>
                  <span class="token keyword">return</span> e1<span class="token punctuation">.</span>weight <span class="token operator">></span> e2<span class="token punctuation">.</span>weight<span class="token punctuation">;</span>
                <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    
      <span class="token comment">// 4、构建json串</span>
      Json<span class="token double-colon punctuation">::</span>Value root<span class="token punctuation">;</span>
      <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span> <span class="token operator">&</span>item <span class="token operator">:</span> inverted_list_all<span class="token punctuation">)</span>
      <span class="token punctuation">{<!-- --></span>
        <span class="token comment">// 通过查正排获取文档信息</span>
        ns_index<span class="token double-colon punctuation">::</span>DocInfo <span class="token operator">*</span>doc <span class="token operator">=</span> index<span class="token operator">-></span><span class="token function">GetForwardIndex</span><span class="token punctuation">(</span>item<span class="token punctuation">.</span>doc_id<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">nullptr</span> <span class="token operator">==</span> doc<span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
          <span class="token keyword">continue</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        Json<span class="token double-colon punctuation">::</span>Value elem<span class="token punctuation">;</span>
        elem<span class="token punctuation">[</span><span class="token string">"title"</span><span class="token punctuation">]</span> <span class="token operator">=</span> doc<span class="token operator">-></span>title<span class="token punctuation">;</span>
        elem<span class="token punctuation">[</span><span class="token string">"desc"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">GetDesc</span><span class="token punctuation">(</span>doc<span class="token operator">-></span>content<span class="token punctuation">,</span> item<span class="token punctuation">.</span>words<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//我们需要的是内容摘要的一部分,0号就是摘要</span>
        elem<span class="token punctuation">[</span><span class="token string">"url"</span><span class="token punctuation">]</span> <span class="token operator">=</span> doc<span class="token operator">-></span>url<span class="token punctuation">;</span>
    
        <span class="token comment">// for debug</span>
        elem<span class="token punctuation">[</span><span class="token string">"id"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token keyword">int</span><span class="token punctuation">)</span>item<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>
        elem<span class="token punctuation">[</span><span class="token string">"weight"</span><span class="token punctuation">]</span> <span class="token operator">=</span> item<span class="token punctuation">.</span>weight<span class="token punctuation">;</span>
        root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>elem<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token comment">// Json::StyledWriter writer;</span>
      Json<span class="token double-colon punctuation">::</span>FastWriter writer<span class="token punctuation">;</span>
      <span class="token operator">*</span>json_string <span class="token operator">=</span> writer<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>root<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li><li style="color: rgb(153, 153, 153);">32</li><li style="color: rgb(153, 153, 153);">33</li><li style="color: rgb(153, 153, 153);">34</li><li style="color: rgb(153, 153, 153);">35</li><li style="color: rgb(153, 153, 153);">36</li><li style="color: rgb(153, 153, 153);">37</li><li style="color: rgb(153, 153, 153);">38</li><li style="color: rgb(153, 153, 153);">39</li><li style="color: rgb(153, 153, 153);">40</li><li style="color: rgb(153, 153, 153);">41</li><li style="color: rgb(153, 153, 153);">42</li><li style="color: rgb(153, 153, 153);">43</li><li style="color: rgb(153, 153, 153);">44</li><li style="color: rgb(153, 153, 153);">45</li><li style="color: rgb(153, 153, 153);">46</li><li style="color: rgb(153, 153, 153);">47</li><li style="color: rgb(153, 153, 153);">48</li><li style="color: rgb(153, 153, 153);">49</li><li style="color: rgb(153, 153, 153);">50</li><li style="color: rgb(153, 153, 153);">51</li><li style="color: rgb(153, 153, 153);">52</li><li style="color: rgb(153, 153, 153);">53</li><li style="color: rgb(153, 153, 153);">54</li><li style="color: rgb(153, 153, 153);">55</li><li style="color: rgb(153, 153, 153);">56</li><li style="color: rgb(153, 153, 153);">57</li><li style="color: rgb(153, 153, 153);">58</li><li style="color: rgb(153, 153, 153);">59</li><li style="color: rgb(153, 153, 153);">60</li><li style="color: rgb(153, 153, 153);">61</li><li style="color: rgb(153, 153, 153);">62</li><li style="color: rgb(153, 153, 153);">63</li><li style="color: rgb(153, 153, 153);">64</li><li style="color: rgb(153, 153, 153);">65</li><li style="color: rgb(153, 153, 153);">66</li><li style="color: rgb(153, 153, 153);">67</li><li style="color: rgb(153, 153, 153);">68</li><li style="color: rgb(153, 153, 153);">69</li><li style="color: rgb(153, 153, 153);">70</li><li style="color: rgb(153, 153, 153);">71</li><li style="color: rgb(153, 153, 153);">72</li><li style="color: rgb(153, 153, 153);">73</li><li style="color: rgb(153, 153, 153);">74</li><li style="color: rgb(153, 153, 153);">75</li><li style="color: rgb(153, 153, 153);">76</li><li style="color: rgb(153, 153, 153);">77</li></ul></pre> 
    <p>修改之后:<br> <img src="https://1000bd.com/contentImg/2022/08/15/054748045.png" alt="在这里插入图片描述"></p> 
    <h1><a name="t28"></a><a id="http_server_1021"></a>九、编写http_server</h1> 
    <pre data-index="39" class="set-code-hide prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"searcher.hpp"</span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"cpp-httplib/httplib.h"</span></span>
    <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string root_path<span class="token operator">=</span><span class="token string">"./wwwroot"</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string input<span class="token operator">=</span><span class="token string">"data/raw_html/raw.txt"</span><span class="token punctuation">;</span>
    
    <span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
      ns_searcher<span class="token double-colon punctuation">::</span>Searcher search<span class="token punctuation">;</span>
      search<span class="token punctuation">.</span><span class="token function">InitSearcher</span><span class="token punctuation">(</span>input<span class="token punctuation">)</span><span class="token punctuation">;</span>
      httplib<span class="token double-colon punctuation">::</span>Server svr<span class="token punctuation">;</span>
      svr<span class="token punctuation">.</span><span class="token function">set_base_dir</span><span class="token punctuation">(</span>root_path<span class="token punctuation">.</span><span class="token function">c_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      svr<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"/s"</span><span class="token punctuation">,</span><span class="token punctuation">[</span><span class="token operator">&</span>search<span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">const</span> httplib<span class="token double-colon punctuation">::</span>Request <span class="token operator">&</span>req <span class="token punctuation">,</span>httplib<span class="token double-colon punctuation">::</span>Response <span class="token operator">&</span>rsp<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
        <span class="token keyword">if</span><span class="token punctuation">(</span><span class="token operator">!</span>req<span class="token punctuation">.</span><span class="token function">has_param</span><span class="token punctuation">(</span><span class="token string">"word"</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
        <span class="token punctuation">{<!-- --></span>
          rsp<span class="token punctuation">.</span><span class="token function">set_content</span><span class="token punctuation">(</span><span class="token string">"必须要有搜索关键字!"</span><span class="token punctuation">,</span> <span class="token string">"text/plain; charset=utf-8"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
          <span class="token keyword">return</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token comment">// rsp.set_content("hhh","text/plain; charset=utf-8");</span>
        std<span class="token double-colon punctuation">::</span>string word<span class="token operator">=</span>req<span class="token punctuation">.</span><span class="token function">get_param_value</span><span class="token punctuation">(</span><span class="token string">"word"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        std<span class="token double-colon punctuation">::</span>cout<span class="token operator"><<</span><span class="token string">"用户正在搜索: "</span><span class="token operator"><<</span>word<span class="token operator"><<</span>std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>
        std<span class="token double-colon punctuation">::</span>string json_string<span class="token punctuation">;</span>
        search<span class="token punctuation">.</span><span class="token function">Search</span><span class="token punctuation">(</span>word<span class="token punctuation">,</span><span class="token operator">&</span>json_string<span class="token punctuation">)</span><span class="token punctuation">;</span>
        rsp<span class="token punctuation">.</span><span class="token function">set_content</span><span class="token punctuation">(</span>json_string<span class="token punctuation">,</span><span class="token string">"application/json"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      svr<span class="token punctuation">.</span><span class="token function">listen</span><span class="token punctuation">(</span><span class="token string">"0.0.0.0"</span><span class="token punctuation">,</span><span class="token number">8081</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li></ul></pre> 
    <h1><a name="t29"></a><a id="_1053"></a>十、前端显示</h1> 
    <pre data-index="40" class="set-code-hide prettyprint"><code class="prism language-html has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">html</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>html</span> <span class="token attr-name">lang</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>en<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>head</span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">charset</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>UTF-8<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">http-equiv</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>X-UA-Compatible<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>IE=edge<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>viewport<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>width=device-width, initial-scale=1.0<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>script</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>http://code.jquery.com/jquery-2.1.1.min.js<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token script"></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>script</span><span class="token punctuation">></span></span>
    
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>boost 搜索引擎<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>style</span><span class="token punctuation">></span></span><span class="token style"><span class="token language-css">
            <span class="token comment">/* 去掉网页中的所有的默认内外边距,html的盒子模型 */</span>
            <span class="token selector">*</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 设置外边距 */</span>
                <span class="token property">margin</span><span class="token punctuation">:</span> 0<span class="token punctuation">;</span>
                <span class="token comment">/* 设置内边距 */</span>
                <span class="token property">padding</span><span class="token punctuation">:</span> 0<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token comment">/* 将我们的body内的内容100%和html的呈现吻合 */</span>
            <span class="token selector">html,
            body</span> <span class="token punctuation">{<!-- --></span>
                <span class="token property">height</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token comment">/* 类选择器.container */</span>
            <span class="token selector">.container</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 设置div的宽度 */</span>
                <span class="token property">width</span><span class="token punctuation">:</span> 800px<span class="token punctuation">;</span>
                <span class="token comment">/* 通过设置外边距达到居中对齐的目的 */</span>
                <span class="token property">margin</span><span class="token punctuation">:</span> 0px auto<span class="token punctuation">;</span>
                <span class="token comment">/* 设置外边距的上边距,保持元素和网页的上部距离 */</span>
                <span class="token property">margin-top</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token comment">/* 复合选择器,选中container 下的 search */</span>
            <span class="token selector">.container .search</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 宽度与父标签保持一致 */</span>
                <span class="token property">width</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span>
                <span class="token comment">/* 高度设置为52px */</span>
                <span class="token property">height</span><span class="token punctuation">:</span> 52px<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token comment">/* 先选中input标签, 直接设置标签的属性,先要选中, input:标签选择器*/</span>
            <span class="token comment">/* input在进行高度设置的时候,没有考虑边框的问题 */</span>
            <span class="token selector">.container .search input</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 设置left浮动 */</span>
                <span class="token property">float</span><span class="token punctuation">:</span> left<span class="token punctuation">;</span>
                <span class="token property">width</span><span class="token punctuation">:</span> 600px<span class="token punctuation">;</span>
                <span class="token property">height</span><span class="token punctuation">:</span> 50px<span class="token punctuation">;</span>
                <span class="token comment">/* 设置边框属性:边框的宽度,样式,颜色 */</span>
                <span class="token property">border</span><span class="token punctuation">:</span> 1px solid black<span class="token punctuation">;</span>
                <span class="token comment">/* 去掉input输入框的有边框 */</span>
                <span class="token property">border-right</span><span class="token punctuation">:</span> none<span class="token punctuation">;</span>
                <span class="token comment">/* 设置内边距,默认文字不要和左侧边框紧挨着 */</span>
                <span class="token property">padding-left</span><span class="token punctuation">:</span> 10px<span class="token punctuation">;</span>
                <span class="token comment">/* 设置input内部的字体的颜色和样式 */</span>
                <span class="token property">color</span><span class="token punctuation">:</span> #CCC<span class="token punctuation">;</span>
                <span class="token property">font-size</span><span class="token punctuation">:</span> 14px<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token comment">/* 先选中button标签, 直接设置标签的属性,先要选中, button:标签选择器*/</span>
            <span class="token selector">.container .search button</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 设置left浮动 */</span>
                <span class="token property">float</span><span class="token punctuation">:</span> left<span class="token punctuation">;</span>
                <span class="token property">width</span><span class="token punctuation">:</span> 150px<span class="token punctuation">;</span>
                <span class="token property">height</span><span class="token punctuation">:</span> 52px<span class="token punctuation">;</span>
                <span class="token comment">/* 设置button的背景颜色,#4e6ef2 */</span>
                <span class="token property">background-color</span><span class="token punctuation">:</span> #4e6ef2<span class="token punctuation">;</span>
                <span class="token comment">/* 设置button中的字体颜色 */</span>
                <span class="token property">color</span><span class="token punctuation">:</span> #FFF<span class="token punctuation">;</span>
                <span class="token comment">/* 设置字体的大小 */</span>
                <span class="token property">font-size</span><span class="token punctuation">:</span> 19px<span class="token punctuation">;</span>
                <span class="token property">font-family</span><span class="token punctuation">:</span>Georgia<span class="token punctuation">,</span> <span class="token string">'Times New Roman'</span><span class="token punctuation">,</span> Times<span class="token punctuation">,</span> serif<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token selector">.container .result</span> <span class="token punctuation">{<!-- --></span>
                <span class="token property">width</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token selector">.container .result .item</span> <span class="token punctuation">{<!-- --></span>
                <span class="token property">margin-top</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
    
            <span class="token selector">.container .result .item a</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 设置为块级元素,单独站一行 */</span>
                <span class="token property">display</span><span class="token punctuation">:</span> block<span class="token punctuation">;</span>
                <span class="token comment">/* a标签的下划线去掉 */</span>
                <span class="token property">text-decoration</span><span class="token punctuation">:</span> none<span class="token punctuation">;</span>
                <span class="token comment">/* 设置a标签中的文字的字体大小 */</span>
                <span class="token property">font-size</span><span class="token punctuation">:</span> 20px<span class="token punctuation">;</span>
                <span class="token comment">/* 设置字体的颜色 */</span>
                <span class="token property">color</span><span class="token punctuation">:</span> #4e6ef2<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token selector">.container .result .item a:hover</span> <span class="token punctuation">{<!-- --></span>
                <span class="token property">text-decoration</span><span class="token punctuation">:</span> underline<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token selector">.container .result .item p</span> <span class="token punctuation">{<!-- --></span>
                <span class="token property">margin-top</span><span class="token punctuation">:</span> 5px<span class="token punctuation">;</span>
                <span class="token property">font-size</span><span class="token punctuation">:</span> 16px<span class="token punctuation">;</span>
                <span class="token property">font-family</span><span class="token punctuation">:</span><span class="token string">'Lucida Sans'</span><span class="token punctuation">,</span> <span class="token string">'Lucida Sans Regular'</span><span class="token punctuation">,</span> <span class="token string">'Lucida Grande'</span><span class="token punctuation">,</span> <span class="token string">'Lucida Sans Unicode'</span><span class="token punctuation">,</span> Geneva<span class="token punctuation">,</span> Verdana<span class="token punctuation">,</span> sans-serif<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
    
            <span class="token selector">.container .result .item i</span><span class="token punctuation">{<!-- --></span>
                <span class="token comment">/* 设置为块级元素,单独站一行 */</span>
                <span class="token property">display</span><span class="token punctuation">:</span> block<span class="token punctuation">;</span>
                <span class="token comment">/* 取消斜体风格 */</span>
                <span class="token property">font-style</span><span class="token punctuation">:</span> normal<span class="token punctuation">;</span>
                <span class="token property">color</span><span class="token punctuation">:</span> green<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
        </span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>style</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>head</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>body</span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>container<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
            <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>search<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
                <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>text<span class="token punctuation">"</span></span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>请输入搜索关键字<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
                <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span> <span class="token special-attr"><span class="token attr-name">onclick</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value javascript language-javascript"><span class="token function">Search</span><span class="token punctuation">(</span><span class="token punctuation">)</span></span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>搜索一下<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span>
            <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span>
            <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>result<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
                <span class="token comment"><!-- 动态生成网页内容 --></span>
                <span class="token comment"><!-- <div class="item">
                    <a href="#">这是标题</a>
                    <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                    <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
                </div>
                <div class="item">
                    <a href="#">这是标题</a>
                    <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                    <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
                </div>
                <div class="item">
                    <a href="#">这是标题</a>
                    <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                    <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
                </div>
                <div class="item">
                    <a href="#">这是标题</a>
                    <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                    <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
                </div>
                <div class="item">
                    <a href="#">这是标题</a>
                    <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                    <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
                </div> --></span>
            <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span>
        <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>script</span><span class="token punctuation">></span></span><span class="token script"><span class="token language-javascript">
            <span class="token keyword">function</span> <span class="token function">Search</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 是浏览器的一个弹出框</span>
                <span class="token comment">// alert("hello js!");</span>
                <span class="token comment">// 1. 提取数据, $可以理解成就是JQuery的别称</span>
                <span class="token keyword">let</span> query <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">".container .search input"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">val</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token string">"query = "</span> <span class="token operator">+</span> query<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//console是浏览器的对话框,可以用来进行查看js数据</span>
    
                <span class="token comment">//2. 发起http请求,ajax: 属于一个和后端进行数据交互的函数,JQuery中的</span>
                $<span class="token punctuation">.</span><span class="token function">ajax</span><span class="token punctuation">(</span><span class="token punctuation">{<!-- --></span>
                    <span class="token literal-property property">type</span><span class="token operator">:</span> <span class="token string">"GET"</span><span class="token punctuation">,</span>
                    <span class="token literal-property property">url</span><span class="token operator">:</span> <span class="token string">"/s?word="</span> <span class="token operator">+</span> query<span class="token punctuation">,</span>
                    <span class="token function-variable function">success</span><span class="token operator">:</span> <span class="token keyword">function</span><span class="token punctuation">(</span><span class="token parameter">data</span><span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
                        console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">;</span>
                        <span class="token function">BuildHtml</span><span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    <span class="token punctuation">}</span>
                <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
    
            <span class="token keyword">function</span> <span class="token function">BuildHtml</span><span class="token punctuation">(</span><span class="token parameter">data</span><span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 获取html中的result标签</span>
                <span class="token keyword">let</span> result_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">".container .result"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token comment">// 清空历史搜索结果</span>
                result_lable<span class="token punctuation">.</span><span class="token function">empty</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    
                <span class="token keyword">for</span><span class="token punctuation">(</span> <span class="token keyword">let</span> elem <span class="token keyword">of</span> data<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span>
                    <span class="token comment">// console.log(elem.title);</span>
                    <span class="token comment">// console.log(elem.url);</span>
                    <span class="token keyword">let</span> a_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">"<a>"</span><span class="token punctuation">,</span> <span class="token punctuation">{<!-- --></span>
                        <span class="token literal-property property">text</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>title<span class="token punctuation">,</span>
                        <span class="token literal-property property">href</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>url<span class="token punctuation">,</span>
                        <span class="token comment">// 跳转到新的页面</span>
                        <span class="token literal-property property">target</span><span class="token operator">:</span> <span class="token string">"_blank"</span>
                    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                    <span class="token keyword">let</span> p_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">"<p>"</span><span class="token punctuation">,</span> <span class="token punctuation">{<!-- --></span>
                        <span class="token literal-property property">text</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>desc
                    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                    <span class="token keyword">let</span> i_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">"<i>"</span><span class="token punctuation">,</span> <span class="token punctuation">{<!-- --></span>
                        <span class="token literal-property property">text</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>url
                    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                    <span class="token keyword">let</span> div_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">"<div>"</span><span class="token punctuation">,</span> <span class="token punctuation">{<!-- --></span>
                        <span class="token keyword">class</span><span class="token operator">:</span> <span class="token string">"item"</span>
                    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                    a_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>div_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    p_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>div_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    i_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>div_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    div_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>result_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
            <span class="token punctuation">}</span>
        </span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>script</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>body</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>html</span><span class="token punctuation">></span></span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><div class="hide-preCode-box"><span class="hide-preCode-bt" data-report-view="{"spm":"1001.2101.3001.7365"}"><img class="look-more-preCode contentImg-no-view" src="https://1000bd.com/contentImg/2022/06/27/191644837.png" alt="" title=""></span></div><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li><li style="color: rgb(153, 153, 153);">29</li><li style="color: rgb(153, 153, 153);">30</li><li style="color: rgb(153, 153, 153);">31</li><li style="color: rgb(153, 153, 153);">32</li><li style="color: rgb(153, 153, 153);">33</li><li style="color: rgb(153, 153, 153);">34</li><li style="color: rgb(153, 153, 153);">35</li><li style="color: rgb(153, 153, 153);">36</li><li style="color: rgb(153, 153, 153);">37</li><li style="color: rgb(153, 153, 153);">38</li><li style="color: rgb(153, 153, 153);">39</li><li style="color: rgb(153, 153, 153);">40</li><li style="color: rgb(153, 153, 153);">41</li><li style="color: rgb(153, 153, 153);">42</li><li style="color: rgb(153, 153, 153);">43</li><li style="color: rgb(153, 153, 153);">44</li><li style="color: rgb(153, 153, 153);">45</li><li style="color: rgb(153, 153, 153);">46</li><li style="color: rgb(153, 153, 153);">47</li><li style="color: rgb(153, 153, 153);">48</li><li style="color: rgb(153, 153, 153);">49</li><li style="color: rgb(153, 153, 153);">50</li><li style="color: rgb(153, 153, 153);">51</li><li style="color: rgb(153, 153, 153);">52</li><li style="color: rgb(153, 153, 153);">53</li><li style="color: rgb(153, 153, 153);">54</li><li style="color: rgb(153, 153, 153);">55</li><li style="color: rgb(153, 153, 153);">56</li><li style="color: rgb(153, 153, 153);">57</li><li style="color: rgb(153, 153, 153);">58</li><li style="color: rgb(153, 153, 153);">59</li><li style="color: rgb(153, 153, 153);">60</li><li style="color: rgb(153, 153, 153);">61</li><li style="color: rgb(153, 153, 153);">62</li><li style="color: rgb(153, 153, 153);">63</li><li style="color: rgb(153, 153, 153);">64</li><li style="color: rgb(153, 153, 153);">65</li><li style="color: rgb(153, 153, 153);">66</li><li style="color: rgb(153, 153, 153);">67</li><li style="color: rgb(153, 153, 153);">68</li><li style="color: rgb(153, 153, 153);">69</li><li style="color: rgb(153, 153, 153);">70</li><li style="color: rgb(153, 153, 153);">71</li><li style="color: rgb(153, 153, 153);">72</li><li style="color: rgb(153, 153, 153);">73</li><li style="color: rgb(153, 153, 153);">74</li><li style="color: rgb(153, 153, 153);">75</li><li style="color: rgb(153, 153, 153);">76</li><li style="color: rgb(153, 153, 153);">77</li><li style="color: rgb(153, 153, 153);">78</li><li style="color: rgb(153, 153, 153);">79</li><li style="color: rgb(153, 153, 153);">80</li><li style="color: rgb(153, 153, 153);">81</li><li style="color: rgb(153, 153, 153);">82</li><li style="color: rgb(153, 153, 153);">83</li><li style="color: rgb(153, 153, 153);">84</li><li style="color: rgb(153, 153, 153);">85</li><li style="color: rgb(153, 153, 153);">86</li><li style="color: rgb(153, 153, 153);">87</li><li style="color: rgb(153, 153, 153);">88</li><li style="color: rgb(153, 153, 153);">89</li><li style="color: rgb(153, 153, 153);">90</li><li style="color: rgb(153, 153, 153);">91</li><li style="color: rgb(153, 153, 153);">92</li><li style="color: rgb(153, 153, 153);">93</li><li style="color: rgb(153, 153, 153);">94</li><li style="color: rgb(153, 153, 153);">95</li><li style="color: rgb(153, 153, 153);">96</li><li style="color: rgb(153, 153, 153);">97</li><li style="color: rgb(153, 153, 153);">98</li><li style="color: rgb(153, 153, 153);">99</li><li style="color: rgb(153, 153, 153);">100</li><li style="color: rgb(153, 153, 153);">101</li><li style="color: rgb(153, 153, 153);">102</li><li style="color: rgb(153, 153, 153);">103</li><li style="color: rgb(153, 153, 153);">104</li><li style="color: rgb(153, 153, 153);">105</li><li style="color: rgb(153, 153, 153);">106</li><li style="color: rgb(153, 153, 153);">107</li><li style="color: rgb(153, 153, 153);">108</li><li style="color: rgb(153, 153, 153);">109</li><li style="color: rgb(153, 153, 153);">110</li><li style="color: rgb(153, 153, 153);">111</li><li style="color: rgb(153, 153, 153);">112</li><li style="color: rgb(153, 153, 153);">113</li><li style="color: rgb(153, 153, 153);">114</li><li style="color: rgb(153, 153, 153);">115</li><li style="color: rgb(153, 153, 153);">116</li><li style="color: rgb(153, 153, 153);">117</li><li style="color: rgb(153, 153, 153);">118</li><li style="color: rgb(153, 153, 153);">119</li><li style="color: rgb(153, 153, 153);">120</li><li style="color: rgb(153, 153, 153);">121</li><li style="color: rgb(153, 153, 153);">122</li><li style="color: rgb(153, 153, 153);">123</li><li style="color: rgb(153, 153, 153);">124</li><li style="color: rgb(153, 153, 153);">125</li><li style="color: rgb(153, 153, 153);">126</li><li style="color: rgb(153, 153, 153);">127</li><li style="color: rgb(153, 153, 153);">128</li><li style="color: rgb(153, 153, 153);">129</li><li style="color: rgb(153, 153, 153);">130</li><li style="color: rgb(153, 153, 153);">131</li><li style="color: rgb(153, 153, 153);">132</li><li style="color: rgb(153, 153, 153);">133</li><li style="color: rgb(153, 153, 153);">134</li><li style="color: rgb(153, 153, 153);">135</li><li style="color: rgb(153, 153, 153);">136</li><li style="color: rgb(153, 153, 153);">137</li><li style="color: rgb(153, 153, 153);">138</li><li style="color: rgb(153, 153, 153);">139</li><li style="color: rgb(153, 153, 153);">140</li><li style="color: rgb(153, 153, 153);">141</li><li style="color: rgb(153, 153, 153);">142</li><li style="color: rgb(153, 153, 153);">143</li><li style="color: rgb(153, 153, 153);">144</li><li style="color: rgb(153, 153, 153);">145</li><li style="color: rgb(153, 153, 153);">146</li><li style="color: rgb(153, 153, 153);">147</li><li style="color: rgb(153, 153, 153);">148</li><li style="color: rgb(153, 153, 153);">149</li><li style="color: rgb(153, 153, 153);">150</li><li style="color: rgb(153, 153, 153);">151</li><li style="color: rgb(153, 153, 153);">152</li><li style="color: rgb(153, 153, 153);">153</li><li style="color: rgb(153, 153, 153);">154</li><li style="color: rgb(153, 153, 153);">155</li><li style="color: rgb(153, 153, 153);">156</li><li style="color: rgb(153, 153, 153);">157</li><li style="color: rgb(153, 153, 153);">158</li><li style="color: rgb(153, 153, 153);">159</li><li style="color: rgb(153, 153, 153);">160</li><li style="color: rgb(153, 153, 153);">161</li><li style="color: rgb(153, 153, 153);">162</li><li style="color: rgb(153, 153, 153);">163</li><li style="color: rgb(153, 153, 153);">164</li><li style="color: rgb(153, 153, 153);">165</li><li style="color: rgb(153, 153, 153);">166</li><li style="color: rgb(153, 153, 153);">167</li><li style="color: rgb(153, 153, 153);">168</li><li style="color: rgb(153, 153, 153);">169</li><li style="color: rgb(153, 153, 153);">170</li><li style="color: rgb(153, 153, 153);">171</li><li style="color: rgb(153, 153, 153);">172</li><li style="color: rgb(153, 153, 153);">173</li><li style="color: rgb(153, 153, 153);">174</li><li style="color: rgb(153, 153, 153);">175</li><li style="color: rgb(153, 153, 153);">176</li><li style="color: rgb(153, 153, 153);">177</li><li style="color: rgb(153, 153, 153);">178</li><li style="color: rgb(153, 153, 153);">179</li><li style="color: rgb(153, 153, 153);">180</li><li style="color: rgb(153, 153, 153);">181</li><li style="color: rgb(153, 153, 153);">182</li><li style="color: rgb(153, 153, 153);">183</li><li style="color: rgb(153, 153, 153);">184</li><li style="color: rgb(153, 153, 153);">185</li><li style="color: rgb(153, 153, 153);">186</li><li style="color: rgb(153, 153, 153);">187</li><li style="color: rgb(153, 153, 153);">188</li><li style="color: rgb(153, 153, 153);">189</li><li style="color: rgb(153, 153, 153);">190</li><li style="color: rgb(153, 153, 153);">191</li></ul></pre> 
    <h1><a name="t30"></a><a id="_1248"></a>十一、添加日志信息</h1> 
    <pre data-index="41" class="prettyprint"><code class="prism language-cpp has-numbering" onclick="mdcp.signin(event)" style="position: unset;"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"><iostream></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"><string></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"><ctime></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">NORMAL</span>  <span class="token expression"><span class="token number">1</span></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">WARNING</span> <span class="token expression"><span class="token number">2</span></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">DEBUG</span>   <span class="token expression"><span class="token number">3</span></span></span>
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">FATAL</span>   <span class="token expression"><span class="token number">4</span></span></span>
    
    <span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name function">LOG</span><span class="token expression"><span class="token punctuation">(</span>LEVEL<span class="token punctuation">,</span> MESSAGE<span class="token punctuation">)</span> <span class="token function">log</span><span class="token punctuation">(</span>#LEVEL<span class="token punctuation">,</span> MESSAGE<span class="token punctuation">,</span> <span class="token constant">__FILE__</span><span class="token punctuation">,</span> <span class="token constant">__LINE__</span><span class="token punctuation">)</span></span></span>
    
    <span class="token keyword">void</span> <span class="token function">log</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string level<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string message<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string file<span class="token punctuation">,</span> <span class="token keyword">int</span> line<span class="token punctuation">)</span>
    <span class="token punctuation">{<!-- --></span>
        std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> level <span class="token operator"><<</span> <span class="token string">"]"</span> <span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> <span class="token function">time</span><span class="token punctuation">(</span><span class="token keyword">nullptr</span><span class="token punctuation">)</span> <span class="token operator"><<</span> <span class="token string">"]"</span> <span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> message <span class="token operator"><<</span> <span class="token string">"]"</span> <span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> file <span class="token operator"><<</span> <span class="token string">" : "</span> <span class="token operator"><<</span> line <span class="token operator"><<</span> <span class="token string">"]"</span> <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <div class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}"></div></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li></ul></pre> 
    <h1><a name="t31"></a><a id="Linux_1266"></a>十二、部署到Linux服务器上</h1> 
    <p><code>nohup</code>命令<br> 启动程序在后台调用,默认将日志信息打印打nohup.out文件中,方便查看<br> <code>nohup ./http_server > log/log.txt 2>&1 &</code><br> <img src="https://1000bd.com/contentImg/2022/08/15/054748301.png" alt="在这里插入图片描述"></p> 
    <h1><a name="t32"></a><a id="_1271"></a>十三、总结与改善</h1> 
    <blockquote> 
     <p><font size="3">1. 建立整站搜索<br> 2. 设计一个在线更新的方案,信号,爬虫,完成整个服务器的设计<br> 3. 不使用组件,而是自己设计一下对应的各种方案(有时间,有精力)<br> 4. 在我们的搜索引擎中,添加竞价排名(强烈推荐)<br> 5. 热次统计,智能显示搜索关键词(字典树,优先级队列)(比较推荐)<br> 6. 设置登陆注册,引入对mysql的使用(比较推荐的)</font></p> 
    </blockquote>
                    </div>
                        </div>
                    </li>
    
                    <li class="list-group-item ul-li">
    
                        <b>相关阅读:</b><br>
                        <nobr>
    <a href="/Article/Index/915727">OpenGL-状态机 理解</a>                            <br />
    <a href="/Article/Index/792256">python+django高校教师科研成果管理系统pycharm源码lw</a>                            <br />
    <a href="/Article/Index/886531">Jackson公司蛋白质电印迹方法&确认蛋白质转移</a>                            <br />
    <a href="/Article/Index/1458558">Python函数传递参数</a>                            <br />
    <a href="/Article/Index/1301378">一文讲解Linux内核中的设计模式</a>                            <br />
    <a href="/Article/Index/1230186">【Java】lambda表达式,Stream API,函数式编程接口</a>                            <br />
    <a href="/Article/Index/611199">超越iTerm! 号称下一代终端神器,功能贼强大!</a>                            <br />
    <a href="/Article/Index/929755">CTF-综合测试(高难度)【超详细】</a>                            <br />
    <a href="/Article/Index/1460047">C //例 8.22 用函数求整数a和b中的大者。</a>                            <br />
    <a href="/Article/Index/1473352">【万字长文】Java面试八股文:深入剖析常见问题与解答</a>                            <br />
                        </nobr>
                    </li>
                    <li class="list-group-item from-a mb-2">
                        原文地址:https://blog.csdn.net/weixin_57675461/article/details/125982513
                    </li>
    
                </ul>
            </div>
    
            <div class="col-lg-4 col-sm-12">
                <ul class="list-group" style="word-break:break-all;">
                    <li class="list-group-item ul-li-bg" aria-current="true">
                        最新文章
                    </li>
                    <li class="list-group-item ul-li">
                        <nobr>
    <a href="/Article/Index/1484446">攻防演习之三天拿下官网站群</a>                            <br />
    <a href="/Article/Index/1515268">数据安全治理学习——前期安全规划和安全管理体系建设</a>                            <br />
    <a href="/Article/Index/1759065">企业安全 | 企业内一次钓鱼演练准备过程</a>                            <br />
    <a href="/Article/Index/1485036">内网渗透测试 | Kerberos协议及其部分攻击手法</a>                            <br />
    <a href="/Article/Index/1877332">0day的产生 | 不懂代码的"代码审计"</a>                            <br />
    <a href="/Article/Index/1887576">安装scrcpy-client模块av模块异常,环境问题解决方案</a>                            <br />
    <a href="/Article/Index/1887578">leetcode hot100【LeetCode 279. 完全平方数】java实现</a>                            <br />
    <a href="/Article/Index/1887512">OpenWrt下安装Mosquitto</a>                            <br />
    <a href="/Article/Index/1887520">AnatoMask论文汇总</a>                            <br />
    <a href="/Article/Index/1887496">【AI日记】24.11.01 LangChain、openai api和github copilot</a>                            <br />
                        </nobr>
                    </li>
                </ul>
    
                <ul class="list-group pt-2" style="word-break:break-all;">
                    <li class="list-group-item ul-li-bg" aria-current="true">
                        热门文章
                    </li>
                    <li class="list-group-item ul-li">
                        <nobr>
    <a href="/Article/Index/888177">十款代码表白小特效 一个比一个浪漫 赶紧收藏起来吧!!!</a>                            <br />
    <a href="/Article/Index/797680">奉劝各位学弟学妹们,该打造你的技术影响力了!</a>                            <br />
    <a href="/Article/Index/888183">五年了,我在 CSDN 的两个一百万。</a>                            <br />
    <a href="/Article/Index/888179">Java俄罗斯方块,老程序员花了一个周末,连接中学年代!</a>                            <br />
    <a href="/Article/Index/797730">面试官都震惊,你这网络基础可以啊!</a>                            <br />
    <a href="/Article/Index/797725">你真的会用百度吗?我不信 — 那些不为人知的搜索引擎语法</a>                            <br />
    <a href="/Article/Index/797702">心情不好的时候,用 Python 画棵樱花树送给自己吧</a>                            <br />
    <a href="/Article/Index/797709">通宵一晚做出来的一款类似CS的第一人称射击游戏Demo!原来做游戏也不是很难,连憨憨学妹都学会了!</a>                            <br />
    <a href="/Article/Index/797716">13 万字 C 语言从入门到精通保姆级教程2021 年版</a>                            <br />
    <a href="/Article/Index/888192">10行代码集2000张美女图,Python爬虫120例,再上征途</a>                            <br />
                        </nobr>
                    </li>
                </ul>
    
            </div>
        </div>
    </div>
    <!-- 主体 -->
    
    
        <!--body结束-->
        <!--这里是footer模板-->
        
        <!--footer-->
    <nav class="navbar navbar-inverse navbar-fixed-bottom">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                    <div class="text-muted center foot-height">
                        Copyright © 2022 侵权请联系<a href="mailto:2656653265@qq.com">2656653265@qq.com</a>   
                        <a href="https://beian.miit.gov.cn/" target="_blank">京ICP备2022015340号-1</a>
                    </div>
                    <div style="width:300px;margin:0 auto; padding:0px 5px;">
                        <a href="/regex.html">正则表达式工具</a>
                        <a href="/cron.html">cron表达式工具</a>
                        <a href="/pwdcreator.html">密码生成工具</a>
                    </div>
                    <div style="width:300px;margin:0 auto; padding:5px 0;">
                        <a target="_blank" href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11010502049817" style="display:inline-block;text-decoration:none;height:20px;line-height:20px;">
                        <img src="" style="float:left;" /><p style="float:left;height:20px;line-height:20px;margin: 0px 0px 0px 5px; color:#939393;">京公网安备 11010502049817号</p></a>
                    </div>
                </div>
            </div>
        </div>
      
    </nav>
    <!--footer-->
    
        <!--footer模板结束-->
    
        <script src="/js/plugins/jquery/jquery.js"></script>
        <script src="/js/bootstrap.min.js"></script>
    
        <!--这里是scripts模板-->
        
    
        
     
    
    
        <!--scripts模板结束-->
    
    </body>
    </html>