• apache、iis6、ii7独立ip主机屏蔽拦截蜘蛛抓取(适用vps云主机服务器)


    如果是正常的搜索引擎蜘蛛访问,不建议对蜘蛛进行禁止,否则网站在百度等搜索引擎中的收录和排名将会丢失,造成客户流失等损失。可以优先考虑升级虚拟主机型号以获得更多的流量或升级为云服务器(不限流量)。更多详情请访问: http://www.west.cn/faq/list.asp?unid=626 

      

    1. 使用网站管理助手环境:http://www.west.cn/faq/list.asp?unid=650 参考此说明启用设置伪静态组件

    2.  windows2003+iis手工建站环境:http://www.west.cn/faq/list.asp?unid=639 参考此说明加载伪静态组件                 

    3.  然后在配置文件中按以下系统规则配置

    Linux下 规则文件.htaccess(手工创建.htaccess文件到站点根目录)

    <IfModule mod_rewrite.c>
    RewriteEngine On
    #Block spider
    RewriteCond %{HTTP_USER_AGENT}   "SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu"   [NC]
    RewriteRule !(^robots\.txt$) - [F]
    </IfModule>

    windows2003下 规则文件httpd.conf 

    #Block spider
    RewriteCond %{HTTP_USER_AGENT}   (SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu)   [NC]
    RewriteRule !(^/robots.txt$) - [F]

    windows2008下 web.config

    <?xml version="1.0" encoding="UTF-8"?>
      <configuration>
          <system.webServer>
           <rewrite>  
             <rules>         
    <rule name="Block spider">
          <match url="(^robots.txt$)"   ignoreCase="false" negate="true" />
          <conditions>
            <add   input="{HTTP_USER_AGENT}"   pattern="SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu"   ignoreCase="true" />
          </conditions>
          <action   type="AbortRequest" />
    </rule>
            </rules>  
            </rewrite>  
           </system.webServer>
      </configuration>

    Nginx对应屏蔽规则

    代码需添加到对应站点配置文件server段内

    if ($http_user_agent ~ "Bytespider|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"   )
    {
      return 444;
    }

    注:规则中默认屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按规则添加即可

    原文链接:https://www.west.cn/faq/list.asp?unid=820

  • 相关阅读:
    代码随想录算法训练营Day 55 || 583. 两个字符串的删除操作、72. 编辑距离
    .NET Core 中插件式开发实现
    批量剪辑视频怎么做?附保姆级教程,新手小白也能3分钟50+短视频。
    网格化下的服务熔断
    从零开始,开发一个 Web Office 套件(7):新的问题—— Click 事件的 z-index
    400电话的技术实现要点
    【复习必备】C语言中的文件操作
    [附源码]java毕业设计医院就诊流程管理系统
    Scala基础入门
    算法复杂度
  • 原文地址:https://blog.csdn.net/wwwwestcn/article/details/125440924