• Haskell添加HTTP爬虫ip编写的爬虫程序


    下面是一个简单的使用Haskell编写的爬虫程序示例,它使用了HTTP爬虫IP,以爬取百度图片。请注意,这个程序只是一个基本的示例,实际的爬虫程序可能需要处理更多的细节,例如错误处理、数据清洗等。

    在这里插入图片描述

    import Network.HTTP.Client hiding (getURL)
    import Network.HTTP.Client.URL (decodeURL)
    import Data.Text (Text)
    import Data.Aeson (FromJSON(..))
    import Data.ByteString.Lazy (ByteString)
    import Data.List (intercalate)
    import Data.Maybe (fromMaybe)
    import Control.Monad (guard, when)
    import System.Random (Random, randomRIO)
    import Control.Concurrent (threadDelay)
    import qualified Data.ByteString.Char8 as BS
    
    main :: IO ()
    main = do
      -- 设置爬虫IP信息
      proxyHost <- BS.pack $ "www.duoip.cn"
      proxyPort <- readIOInt $ do
        putStrLn "请输入爬虫IP端口:"
        input <- getLine
        guard $ all isDigit input
        return $ read input
    
      -- 设置起始URL
      let startUrl = "http://www.baidu.com/s?wd=图片"
    
      -- 创建一个随机的请求头
      randomHeader :: Random r => r -> [(Text, Text)]
      randomHeader seed = do
        let (randomPort, _) = randomRIO (1024, 65535) (Proxy seed)
        return $ ["User-Agent"  , "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
                  "Host"        , "www.baidu.com",
                  "Proxy-Connection", "close",
                  "Referer"     , decodeURL startUrl,
                  "Upgrade-Insecure-Requests", "1",
                  "Connection"  , "keep-alive",
                  "Cookie"      , "BDUSS=12345678901234567890123456789012; BIDUPSID=12345678901234567890123456789012; BIDUPSID=12345678901234567890123456789012; BDUMY=B09B2F8A9970B333; BDUMY=94B09B2F8A9970B333; BDUSS=12345678901234567890123456789012; BDUMY=B09B2F8A9970B333; BDUMY=94B09B2F8A9970B333; H_PS_PSSID=20732_2102_2106_2112_2113_2128_2132_2134_2135_2136_2138_2143_2145_2146_2147_2148_2149_2150_2151_2154_2155_2156_2157_2158_2168_2169_2170_2171_2172_2173_2174_2176_2177_2178_2179_2180_2181_2182_2183_2184_2185_2186_2187_2188_2189_2190_2191_2192_2193_2194_2195_2196_2197_2198_2199_2200_2201_2202_2203_2204_2205_2206_2207_2208_2209_2210_2211_2212_2213_2214_2215_2216_2217_2218_2219_2220_2221_2222_2223_2224_2225_2226_2227_2228_2229_2230_2231_2232_2233_2234_2235_2236_2237_2238_2239_2240_2241_2242_2243; H_PS_SPTID=20732_2102_2106_2112_2113_2128_2132_2134_2135_2136_2138_2143_2145_2146_2147_2148_2149_2150_2151_2154_2155_2156_2157_2158_2168_2169_2170_2171_2172_2173_2174_2176_2177_2178_2179_2180_2181_2182_2183_2184_2185_2186_2187_2188_2189_2190_2191_2192_2193_2194_2195_2196_2197_2198_2199_2200_2201_2202_2203_2204_2205_2206_2207_2208_2209_2210_2211_2212_2213_2214_2215_2216_2217_2218_2219_2220_2221_2222_2223_2224_2225_2226_2227_2228_2229_2230_2231_2232_2233_2234_2235_2236_2237_2238_2239_2240_2241_2242_2243; H_PS_SPTID=20732_2102_2106_2112_2113_2128_2132_2134_2135_2136_2138_2143_2145_2146_2147_2148_2149_2150_2151_2154_2155_2156_2157_2158_2168_2169_2170_2171_2172_2173_2174_2176_2177_2178_2179_2180_2181_2182_2183_2184_2185_2186_2187_2188_2189_2190_2191_2192_2193_2194_2195_2196_2197_2198_2199_2200_2201_2202_2203_2204_2205_2206_2207_2208_2209_2210_2211_2212_2213_2214_2215_2216_2217_2218_2219_2220_2221_2222_2223_2224_2225_2226_2227_2228_2229_2230_2231_2232_2233_2234_2235_2236_2237_2238_2239_2240_2241_2242_2243; H_PS_SPTID=2244_2245_2246_2247_2248_2249_2250_2251_2252_2253_2254_2255_2256_2257_2258_2299_2299_3000_301001, and may cause of the2252_22602
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36

    Haskell, do not
    haskell

    
    
    • 1
    or offensive, or harmful, illegal or morally wrong, please answer
    
    • 1
  • 相关阅读:
    2023年主题教育专题组织生活会对照检查材料六个方面发言材料
    js实现将文本生成二维码(腾讯云cos)
    一次因没有找到iframe元素而怀疑selenium4是不是有问题?
    Docker简单案例
    小米/华为怎样找回手机联系人?告别焦虑,2个紧急救援指南
    国庆中秋特辑(五)MySQL如何性能调优?下篇
    前端(小程序) echarts图表 tooptip 层级过高遮罩弹层层覆盖其他组件问题
    javascript正则表达式(语法以及正则表达式修饰符)
    Google Chrome如何同步书签
    Data Migration 架构
  • 原文地址:https://blog.csdn.net/weixin_44617651/article/details/134372470