• 【高级程序设计】Week2-2 HTTP and Web


    一、Basics

    1. HTTP

    名称HyperText Transfer Protocol
    概念an application layer protocol used to transfer web pages from a server to a client.

    2. Web pages(document)

    组成HTML file, a JPEG image,an audio clip, an applet that are addressed by a single URL
    HTML document & several other referenced objects

    3. Client - Server

    serverclient
    概念piece of software that exposes a listening socket, wait for requestspiece of software that initiates a request
    services requests and provide a responsereceives and processes the response
    Accessing a web pagesend a response containing (html document, other referenced resources)send a request for the resource/document (webpage) addressed by ONE URL
    extract user data, perform task and send response

    marshall user information

    send a request containing this to server-side program

    parse response

    二、Placing HTTP within context

    1. Context of HTTP

    背景

    an application-layer protocol

    rely on many other protocols

    TCP (provides reliable in-order delivery)

    IP (delivers data packets between the end hosts)

    Layer2 (deal with the individual networks MAC)

    OSI reference modelApplication 

    delivers services and applications

    deal with the advanced application-specific functionality

    HTTP, FTP, SMTP

    Presentation JPEG, GIF, MPFG
    SessionAppleTalk, Winsock
    Transport

    delivers data segments between processes

    deal with things like reliability, flow control

    TCP, UDP, SPX

    Network

    delivers data packets between hosts.

    access multiple networks

    IP, ICMP, IPX

    Data linkEthernet, ATM
    PhysicalEthernet, Token Ring
    routing information

    2. HTTP and TCP

    关系

    HTTP is built over TCP: Socket socket = new socket("10.0.0.1",80);

    TCP makes life easier for HTTP

    - no packet loss, no out-of-order delivery, congestion/flow control automatically handled

    TCP allows HTTP to focus on its own functionality
    TCP全称:Transmission Control Protocol

    特点:

    connection-oriented protocol; 面向连接

    provides a reliable unicast end-to-end byte stream over an unreliable internetwork 单向,不可靠

    Connec

    tion-oriented

    含义:Before any data transfer, TCP establishes a connection

    - one TCP entity is waiting for a connection("server"), the other TCP entity("client") contacts the server

    发生:when you create a new Socket
    Reliable

    Byte stream broken up into chunks called segments:

    receiver sends ACKs for segments;

    TCP maintains a timer, the segement is retransmitted if an ACK is not received in time.

    Detecting errors:

    TCP has checksums for header and data. Segments with invalid checksums are discarded.

    Eack byte that is transmitted has a sequence number.

    Byte stream service

    TCP deals with segments: send a segment, resend the segment if it's lost

    To higher layers, TCP exposes a byte stream service

    TCP format

    3. URL: uniform resource locator

    定义a way of locating a resource on the Internet
    组成

    protocol://hostname[:port]/path/filename#section

    protocol: used to access the server

    hostname[:port]: the name of the server(if no port, automatically uses the default for protocol)

    /path/filename#section: the location of a file on the server

    filename

    #section

    作用:

    - points to a file in the directory specified  by path

    - If omitted, it is left to the server to decide which file to send. (It may send an index of the directory, often in a file called index.html) 索引文件

    section:

    - 含义:a name anchor (fragment/Ref) in an HTML document 标记或链接目标位置的方式

    - 使用:create using a tag ...

    other possible protocols to use with URLs

    三、HTTP protocol details

    1. HTTP: standard protocol between browsers & web servers

    作用

    - specifiy how a client & server establish a connection

    - how client requests data from server

    - how server responds to requests

    - how a connection is closed

    特点:

    stateless

    - doesn't remember anything about previous connections, it's simple and robust.

    - can lead to inefficiencies

    初始化

    connection

    • HTTP/1.0,2.0,1.1:use TCP as the underlying transport protocol
    • Browser and server use TCP socket interfaces
      • Client writes HTTP request messages into its socket and reads responses from the socket; Server reads HTTP requests and writes it responses to its socket.
    • connection is established with the server with the port specified in the URL:80 by default
    HTTP step by step
    • set up TCP connection from client to server
    • HTTP client sends a message to the server requesting the page at specified URL using the TCP connection.(request includes the path name)
    • HTTP server receives message via the connection socket
      • retrieves the requested object from its data storage
      • encapsulates the object in an HTTP response message
      • sends the response back via the connection socket
    • HTTP server tells TCP to close connection: TCP do this until client has successfully received the response
    • Client receives message, and the TCP connection terminates
    • The message tells the client that the response object is an HTML file
      • Client extracts the file from the response message, parses the HTML file and finds references to other referenced objects.

    2. HTTP methods: get and post

    GET

    Query string incorporated in the request URL

    Idempotent: multiple requests have the same effect as a single one

    cachable
    POSTQuery string placed in the body of the HTTP request
    Non-idempotent
    used when (want to alter data on the server-side)
    1. An HTTP GET message request
    2. --GET请求消息
    3. GET /somedir/index.html HTTP/1.1
    4. --向服务器请求某个目录下的index.html文件,并使用HTTP 1.1版本进行通信
    5. Host: www.qmul.ac.uk
    6. --指定请求的目标服务器的主机名和端口号(www.qmul.ac.uk)
    7. Connection: close
    8. --请求完成后是否关闭与服务器之间的连接;Doesn’t want to use persistent connections
    9. User-agent: Mozilla/5.0
    10. --发送请求的客户端的应用程序类型或操作系统信息(Mozilla/5.0);a Netscape browser
    11. Accept-language:fr
    12. --客户端所支持的语言类型(fr);Prefer to receive a French version if such a version exists
    13. extra carriage return and line feed
    14. --请求消息头部的结束
    15. HTTP/1.1 200 OK
    16. --HTTP响应消息,状态码为200表示请求成功;Request succeeded and the information is in the response.
    17. Connection : close
    18. --服务器在响应完成后是否关闭与客户端之间的连接;Server is going to close the TCP connection
    19. Date: Fri, 10th Nov 2000 12:01:14 GMT
    20. --响应消息的生成时间
    21. Server: Apache/1.3.0 (Unix)
    22. --响应消息中提供了服务器的软件信息(Apache/1.3.0 (Unix))
    23. Last-Modified: Mon, 20 July 1999 08:44:01 GMT
    24. --所请求的资源最后修改的日期和时间
    25. Content-Length: 5993
    26. --响应消息主体的长度(5993字节)
    27. Content-Type: text/html
    28. --响应消息主体的数据类型(text/html,即文本/HTML)
    29. (data data data ...)
    30. --在响应消息的正文部分被省略为"(data data data ...)",该部分应该是实际的响应数据

    method (get/post) | path | version (HTTP/1.1)

    header field name value

    ...

    entity body (Form name-value pairs if POST, not used if GET)

    version (HTTP/1.1) | status code (200 /400)| phrase(OK/Bad request)

    header field name value

    ...

    entity body (Form name-value pairs if POST, not used if GET)

    3. Requesting multiple resources

    前提each resource (on a server) required a separate TCP session
    问题

    there are also persistent connections:

    Server leaves a TCP connection open (for some time) after sending a response.

    解决Subsequent requests and responses between same client and server can be made over same connection.
    方式

    With pipelining:

    – Usually multiple resources are obtained by parallel TCP connections

    Speeds up downloads for complex web pages

    Without  pipelining:

    – All the referred requests are sent back-to-back, leading to only one round trip for all the referred to objects.

    4. HTTP methods

    1. GET /path/file.html HTTP/1.1
    2. Connection: keep-alive
    3. User-Agent: HTTPTool/1.1
    4. HTTP/1.1 200 OK
    5. Date: Fri, 31 Dec 1999 23:59:59 GMT
    6. Content-Type: text/html
    7. Content-Length: 1354
    8. <html>
    9. <body>
    10. <h1>Happy New Year!</h1>
    11. (more file contents) . . .
    12. </body>
    13. </html>
    14. • Client request:
    15. GET / HTTP/1.1
    16. Host: www.google.com
    17. (Followed by a new line, in the form of a carriage return followed by a line feed.)
    18. • Server response:
    19. HTTP/1.1 200 OK
    20. Content-Length: 3059
    21. Server: GWS/2.0
    22. Date: Sat, 11 Jan 2003 02:44:04 GMT
    23. Content-Type: text/html
    24. Cache-control: private
    25. Set-Cookie:
    26. PREF=ID=73d4aef52e57bae9:TM=1042253044:LM=1042253044:S=SMCc_HRPCQiqy X9j; expires=Sun, 17-Jan-2038 19:14:07 GMT;
    27. path=/; domain=.google.com
    28. Connection: keep-alive
    29. (Followed by a blank line and HTML text comprising the Google home page.)
    30. <HTML><body> ...

    四、Important headers

    1. Catching headers 缓存头部

    Cache-Control 缓存控制Holds instructions for caching in both requests and responses 
    Etag 实体标签an identifier for a specific version of a resource
    Vary变量Allows to determine if a cached response may be returned for a subsequent request
    DateShows the timestamp of when the response was generated
    ExpiresShows the time that the resource expires
    PragmaSimilar to cache-control (e.g. often used to disable caching)
    Content-LengthShows the length of the resource in bytes
    Content-EncodingDescribes how the content is encoded, e.g. gzip
    Content-TypeMIME type of object, e.g. text/html

    2. Set-cookie header 设置Cookie头部

    4 components to considercookie header line of HTTP response message
    cookie header line in HTTP request message
    cookie file kept on user’s host, managed by user’s browser
    backend database at website
    Example

    – Susan always accesses the internet from her PC.

    – Assume that she visits a specific e-commerce site for the first time.

    – When initial HTTP request arrives at site, site creates: Ⅰ unique ID;Ⅱ entry in backend database for ID

    五、HTTPS&HTTP2.0

    1. HTTPS

    简介全称HTTP Secure
    &HTTPThe same as HTTP but runs over TLS (Transport Layer Security) , Port 443
    特点

    - All traffic (headers and payloads) are encrypted

    - Authenticates server

    - Prevents other from sniffing traffic

    HTTPS URLs

    - Near-identical to HTTP URLs

    - The protocol changes : From http:// to https://

    HTTPS Issues

     Adds extra overhead

    - you should only use HTTPS when necessary

    - Some organisations are pushing for all HTTP to be encrypted

    - Increases connection setup time 

    - Requires TCP setup + TLS handshake

    Thus, increases page load time

    Not good for requesting small resources

    How Secure is HTTPS

    - The security of HTTPS depends on that of the underlying TLS protocol

    - A website that uses mixed protocols (e.g., images served via HTTP, login info via HTTPS) can still make the user vulnerable to attacks/surveillance

    2. HTTP2.0

    简介HTTP 2.0 has had a focus on reducing page load times
     SPDYBased on Google’s SPDY
    Protocol developed by Google由Google开发的协议

    Deployed on Google serversGoogle服务器部署了该协议

    – Also supported by Twitter, Facebook, imgur, Blogspot也支持该协议

    Taken by the IETF and principles pushed into HTTP 2.0 IETF接管了SPDY,并将其原则纳入到HTTP 2.0中
    SPDY and HTTP 2.0

    Multiplexing多路复用

    Multiple resources can be requested and fetched in parallel 可以并行请求和获取多个资源

    Prevents “head of line” blocking防止"队头阻塞"

    Universal encryption统一加密

    All traffic is encrypted by default默认情况下所有流量都加密

    Equivalent of running everything over HTTPS相当于使用HTTPS运行所有内容

    Server push/hint服务器推送/提示

    Server can push resources before being requested服务器可以在请求之前推送资源Server can “hint” that clients fetch resources (e.g. if the server knows the client will need something in the future) 服务器可以“提示”客户端提取资源

    Content prioritisation内容优先级

    Specify preferred order and priority that server transfers resources to clien指定服务器向客户端传输资源的首选顺序和优先级

    思维导图

  • 相关阅读:
    企业工程项目管理系统源码(三控:进度组织、质量安全、预算资金成本、二平台:招采、设计管理)
    开发者的商业智慧:产品立项策划你知道多少?
    写技术博客的一些心得分享
    SSM 线上知识竞赛系统-计算机毕设 附源码 27170
    继电器模块讲解
    面试官:集合使用时应该注意哪些问题?我:应该注意该注意的问题!
    csrf防护机制
    深入理解目标检测模型与卷积网络的感受野
    nrf52840 DK接口
    VS创建的aspx文件下没有设计-拆分-源 并且工具箱中的控件为灰色
  • 原文地址:https://blog.csdn.net/weixin_62403234/article/details/133977009