The most popular protocol for transfering documents over the WWW is the hypertext
transfer protocol, HTTP. Typically the browser connects to the web server
sending over a request for a URL, gets a response from the server and then the
connection is closed. This means that the browser has to connect to the server for
every thing it downloads, e.g. if a HTML page has 40 images the browser needs
to make 1+40 separate requests to the server. Since there is no persistent
connection between the browser and the server there is no way to know if the user
is looking at the web page just sent to him, or if she continued to look at other
web pages from another server.
The original intent with HTTP was to make it a "stateless" protocol, i.e. that all
requests to the server is independent, from the servers point of view. In practice
that means that each server response should only rely on the information given in
that very request. The benefit with this approach is that it becomes easier to
make an efficient server implementation that can server web pages to any number of
users, since no information about the users or the requests needs to be stored once
a response has been transmitted.
Authentication
A stateless protocol has some interesting implications for authentication. As you
perhaps know it is possible for a web server to respond with a "authentication
required" response, telling the browser that it has to provide a user name and a
password in order to get the contents at the given URL. This is often referred to
as a login request. If the webserver accepts the user name and the password
the browser remembers the user name/password pair as a valid authentication for that
server, and will use them for all subsequent requests to the server. (This is an
oversimplification. Read about authentication realms in RFC 2617) As a consequence
there is, contrary to common belief, no logout mechanism. This is often emulated
by temporary rejecting the valid user name/password causing the browser to drop
the user name/password pair as valid authentication. The drawback is that the user
will be presented with a new login request that she has to cancel.
Cookies
Another way of overcoming the drawbacks of the stateless protocol model is cookies.
A cookie is a browser variable that can be set and altered by the server. Once set
the browser will include the cookie in all requests to the server, thus large
cookies will "waste" a lot of bandwidth. When the server sets a new value to a
cookie it also gets to decide the realm of the cookie, e.g. to which URLs the
browser should send the cookie, and an expiration date, at which time the browser
automatically removes the cookie. There is no defined method to remove a cookie,
so that operation is often simulated by setting the cookie value to an empty
string and setting the expiration time to a date that has already occured. Some
browsers remove the cookie at once while others wait until the next time it is
restarted. Read more about cookies in RFC 2109.
Content-Type
The URL of a resource doesn't necessarily give away the type of its contents, and
URLs were never intended to be used for that either. Hence every response from
the web server contains a content type header containing the contents "MIME type",
e.g. text/html for HTML files or image/gif for GIF files. In Roxen WebServer these
MIME types are usually derived from the file extension by the content type module,
but scripts and other modules may choose another MIME type. MIME types are handled
by IANA and a complete list of all official MIME types can be found at www.iana.org.