• apache、iis6、ii7独立ip主机屏蔽拦截蜘蛛抓取(适用vps云主机服务器)


    如果是正常的搜索引擎蜘蛛访问,不建议对蜘蛛进行禁止,否则网站在百度等搜索引擎中的收录和排名将会丢失,造成客户流失等损失。可以优先考虑升级虚拟主机型号以获得更多的流量或升级为云服务器(不限流量)。更多详情请访问: http://www.west.cn/faq/list.asp?unid=626 

      

    1. 使用网站管理助手环境:http://www.west.cn/faq/list.asp?unid=650 参考此说明启用设置伪静态组件

    2.  windows2003+iis手工建站环境:http://www.west.cn/faq/list.asp?unid=639 参考此说明加载伪静态组件                 

    3.  然后在配置文件中按以下系统规则配置

    Linux下 规则文件.htaccess(手工创建.htaccess文件到站点根目录)

    <IfModule mod_rewrite.c>
    RewriteEngine On
    #Block spider
    RewriteCond %{HTTP_USER_AGENT}   "SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu"   [NC]
    RewriteRule !(^robots\.txt$) - [F]
    </IfModule>

    windows2003下 规则文件httpd.conf 

    #Block spider
    RewriteCond %{HTTP_USER_AGENT}   (SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu)   [NC]
    RewriteRule !(^/robots.txt$) - [F]

    windows2008下 web.config

    <?xml version="1.0" encoding="UTF-8"?>
      <configuration>
          <system.webServer>
           <rewrite>  
             <rules>         
    <rule name="Block spider">
          <match url="(^robots.txt$)"   ignoreCase="false" negate="true" />
          <conditions>
            <add   input="{HTTP_USER_AGENT}"   pattern="SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu"   ignoreCase="true" />
          </conditions>
          <action   type="AbortRequest" />
    </rule>
            </rules>  
            </rewrite>  
           </system.webServer>
      </configuration>

    Nginx对应屏蔽规则

    代码需添加到对应站点配置文件server段内

    if ($http_user_agent ~ "Bytespider|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"   )
    {
      return 444;
    }

    注:规则中默认屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按规则添加即可

    原文链接:https://www.west.cn/faq/list.asp?unid=820

  • 相关阅读:
    Bootstrap-- 栅格系统
    电路原理图字母缩写表示什么?
    [资源推荐]看到一篇关于agent的好文章
    C++——多态
    【Spring面试】六、@Autowired、@Configuration、第三方Bean的配置
    Excel VSTO开发9 -使用Form窗口
    设计模式:原型模式
    Design A NearBy Friends
    springboot大学生兼职网站开发与设计毕业设计源码311734
    基于食肉植物优化的BP神经网络(分类应用) - 附代码
  • 原文地址:https://blog.csdn.net/wwwwestcn/article/details/125440924