HTTP

The most popular protocol for transfering documents over the WWW is the hypertext transfer protocol, HTTP. Typically the browser connects to the web server sending over a request for a URL, gets a response from the server and then the connection is closed. This means that the browser has to connect to the server for every thing it downloads, e.g. if a HTML page has 40 images the browser needs to make 1+40 separate requests to the server. Since there is no persistent connection between the browser and the server there is no way to know if the user is looking at the web page just sent to him, or if she continued to look at other web pages from another server.

The original intent with HTTP was to make it a "stateless" protocol, i.e. that all requests to the server is independent, from the servers point of view. In practice that means that each server response should only rely on the information given in that very request. The benefit with this approach is that it becomes easier to make an efficient server implementation that can server web pages to any number of users, since no information about the users or the requests needs to be stored once a response has been transmitted.

Authentication

A stateless protocol has some interesting implications for authentication. As you perhaps know it is possible for a web server to respond with a "authentication required" response, telling the browser that it has to provide a user name and a password in order to get the contents at the given URL. This is often referred to as a login request. If the webserver accepts the user name and the password the browser remembers the user name/password pair as a valid authentication for that server, and will use them for all subsequent requests to the server. (This is an oversimplification. Read about authentication realms in RFC 2617) As a consequence there is, contrary to common belief, no logout mechanism. This is often emulated by temporary rejecting the valid user name/password causing the browser to drop the user name/password pair as valid authentication. The drawback is that the user will be presented with a new login request that she has to cancel.

Cookies

Another way of overcoming the drawbacks of the stateless protocol model is cookies. A cookie is a browser variable that can be set and altered by the server. Once set the browser will include the cookie in all requests to the server, thus large cookies will "waste" a lot of bandwidth. When the server sets a new value to a cookie it also gets to decide the realm of the cookie, e.g. to which URLs the browser should send the cookie, and an expiration date, at which time the browser automatically removes the cookie. There is no defined method to remove a cookie, so that operation is often simulated by setting the cookie value to an empty string and setting the expiration time to a date that has already occured. Some browsers remove the cookie at once while others wait until the next time it is restarted. Read more about cookies in RFC 2109.

Content-Type

The URL of a resource doesn't necessarily give away the type of its contents, and URLs were never intended to be used for that either. Hence every response from the web server contains a content type header containing the contents "MIME type", e.g. text/html for HTML files or image/gif for GIF files. In Roxen WebServer these MIME types are usually derived from the file extension by the content type module, but scripts and other modules may choose another MIME type. MIME types are handled by IANA and a complete list of all official MIME types can be found at www.iana.org.