• nginx 配置 proxy_next_upstream 会出现未预期 502 错误问题排查


    当使用nginx代理多个网关实例时,
    当被请求服务的get 接口异常时,如 error timeout invalid_header http_500 http_502 http_503 http_504,
    nginx 会响应 502状态码,

    在我之前的认知里,nginx 只会转发 后端服务的响应,一般不会对状态码进行修改

    nginx 配置如下:

    worker_processes  1;
    daemon off;
    master_process off; 
    error_log  logs/error.log  debug; 
    events {
        worker_connections  1024;
    }
    http {
        include       mime.types;
        default_type  application/octet-stream;
         log_format apm '[$time_local]\tclient=$remote_addr\t'
                   'upstream_addr=$upstream_addr\t'
                   'upstream_status=$upstream_status\t'
                   'document_root="$document_root"\t'
                   'fastcgi_script_name="$fastcgi_script_name"\t'
                   'request_filename="$request_filename"\t'
                   'request_time=$request_time\t'
                   'upstream_response_time=$upstream_response_time\t'
                   'upstream_connect_time=$upstream_connect_time\t'
                   'upstream_header_time=$upstream_header_time\t';
        access_log  logs/access.log  apm;
        sendfile        on; 
        keepalive_timeout  65;
        upstream gateway {
            server 192.168.2.102:12012;
            server 192.168.2.102:12011;
        }
        server {
            listen       80;
            server_name  localhost; 
            location / {
                root   html;
                index  index.html index.htm;
            }
            location /api/ {
                proxy_pass http://gateway/;
                proxy_next_upstream error http_503 http_502;
            } 
            error_page   500 502 503 504  /50x.html;
            location = /50x.html {
                root   html;
            } 
        }
    }
    
    

    示例测试代码:

        @GetMapping("/excep503")
        public ResponseEntity  excep503(HttpServletRequest request, Integer times) throws InterruptedException {
            Thread.sleep(200);
            return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body("服务不可用");
        }
    

    测试方法:

    多次 get 请求一个异常接口

    现象:

    有时报错 502 ,有时报错 503


    返回 503时

    access_log 中的 upstream_addr 会有两条: 192.168.2.102:12012, 192.168.2.102:12011
    error_log 会出现分别请求 两台网关的日志:
    首先请求 connect to 192.168.2.102:12011 ;
    102:12011 返回 503 Service Unavailable
    报错

    upstream server temporarily disabled while reading response header from upstream
    

    然后 重新指向 connect to 192.168.2.102:12012
    102:12012 同样 返回 503 Service Unavailable

    返回 502时

    access_log 中的 upstream_addr 只会有一条:upstream_addr=192.168.2.102:12011

    error_log 只会出现一次请求网关的日志:
    请求 connect to 192.168.2.102:12011 ;
    102:12011 返回 503 Service Unavailable
    报错

    upstream server temporarily disabled while reading response header from upstream,
    no live upstreams while connecting to upstream,
    

    返回502的原因

    根据 查阅相关资料

    传入的ft_type为 40000000 匹配到 default ,所以最终状态码为 NGX_HTTP_BAD_GATEWAY ,即 502

    nginx-1.24.0\src\http\ngx_http_upstream.c(ngx_http_upstream_next) 4370行;

    switch (ft_type) {
    
        case NGX_HTTP_UPSTREAM_FT_TIMEOUT:
        case NGX_HTTP_UPSTREAM_FT_HTTP_504:
            status = NGX_HTTP_GATEWAY_TIME_OUT;
            break;
    
        case NGX_HTTP_UPSTREAM_FT_HTTP_500:
            status = NGX_HTTP_INTERNAL_SERVER_ERROR;
            break;
    
        case NGX_HTTP_UPSTREAM_FT_HTTP_503:
            status = NGX_HTTP_SERVICE_UNAVAILABLE;
            break;
    
        /*
         * NGX_HTTP_UPSTREAM_FT_BUSY_LOCK and NGX_HTTP_UPSTREAM_FT_MAX_WAITING
         * never reach here
         */
    
        default:
            status = NGX_HTTP_BAD_GATEWAY;
        }
    

    502 与 503 的 逻辑分岔路:

    nginx-1.24.0\src\http\ngx_http_upstream_round_robin.c(ngx_http_upstream_get_round_robin_peer)449 行

    peers = rrp->peers;
        ngx_http_upstream_rr_peers_wlock(peers);
    
        if (peers->single) {
            peer = peers->peer;
    
            if (peer->down) {
                goto failed;
            }
    
            if (peer->max_conns && peer->conns >= peer->max_conns) {
                goto failed;
            }
    
            rrp->current = peer;
    
        } else {
    
            peer = ngx_http_upstream_get_peer(rrp);
    
            if (peer == NULL) {
                goto failed;
            }
    
            ngx_log_debug2(NGX_LOG_DEBUG_HTTP, pc->log, 0,
                           "get rr peer, current: %p %i",
                           peer, peer->current_weight);
        }
    

    其中的 single 标志位是一个用于标识后端服务器组是否只有一个成员的标志,即 upstream_addr 为单个

    所以现在的问题是:

    为什么 有时upstream_addr是两个 ,有时是一个

    debug nginx 源码

    nginx启动时 给每个后端节点赋值了一个默认的超时时间 10s

    发生异常时将节点标记为不可用:

    nginx-1.24.0/src/http/ngx_http_upstream_round_robin.c(ngx_http_upstream_get_peer) 522 行

        for (peer = rrp->peers->peer, i = 0;
             peer;
             peer = peer->next, i++)
        {
            n = i / (8 * sizeof(uintptr_t));
            m = (uintptr_t) 1 << i % (8 * sizeof(uintptr_t));
    
            if (rrp->tried[n] & m) {
                continue;
            }
    
            if (peer->down) {
                continue;
            }
    
            if (peer->max_fails
                && peer->fails >= peer->max_fails
                && now - peer->checked <= peer->fail_timeout)
            {
                continue;
            }
    
            if (peer->max_conns && peer->conns >= peer->max_conns) {
                continue;
            }
    
            peer->current_weight += peer->effective_weight;
            total += peer->effective_weight;
    
            if (peer->effective_weight < peer->weight) {
                peer->effective_weight++;
            }
    
            if (best == NULL || peer->current_weight > best->current_weight) {
                best = peer;
                p = i;
            }
        }
    

    验证

    不断请求接口,发现每过10秒,就会恢复503 错误,符合猜测


  • 相关阅读:
    【Redis】Redis事务:原子性与回滚的真相揭秘
    架构设计技术之分布式数据存储
    【软考】12.3 质量管理/风险管理
    【C++】Map、Set 模拟实现
    JAVA智慧防疫上报系统服务端计算机毕业设计Mybatis+系统+数据库+调试部署
    SAP UI5 框架的 manifest.json
    简单实现一个todoList(上移、下移、置顶、置底)
    自动化测试:电商管理系统元素定位练习
    2023年10月Web3行业月度发展报告区块链篇 |陀螺研究院
    进阶JAVA篇- Date 类与 SimpleDateFormat 类、Calendar 类常用的API(五)
  • 原文地址:https://www.cnblogs.com/mysgk/p/17814143.html