Wednesday, May 12, 2021

HTTP Headers for Dummies

Whether you're a programmer or not, you have seen it everywhere on the web. Even your first Hello World PHP script sent HTTP headers without you realizing it. In this article we are going to learn about the basics of HTTP headers and how we can use them in our web applications.

What are HTTP Headers?

HTTP stands for "Hypertext Transfer Protocol". The entire World Wide Web uses this protocol. It was established in the early 1990's. Almost everything you see in your browser is transmitted to your computer over HTTP. For example, when you opened this article page, your browser probably have sent over 40 HTTP requests and received HTTP responses for each.

HTTP headers are the core part of these HTTP requests and responses, and they carry information about the client browser, the requested page, the server and more.

Example

When you type a URL in your address bar, your browser sends an HTTP request and it may look like this:

First line is the "Request Line" which contains some basic info on the request. And the rest are the HTTP headers.

After that request, your browser receives an HTTP response that may look like this:

The first line is the "Status Line", followed by "HTTP Headers", until the blank line. After that, the "content" starts (in this case, HTML output).

When you look at the source code of a web page in your browser, you will only see the HTML portion and not the HTTP headers, even though they actually have been transmitted together as you see above.

These HTTP requests are also sent and received for other things, such as images, CSS files, JavaScript files etc. That is why I said earlier that your browser has sent at least 40 or more HTTP requests as you loaded just this article page.

Now, let's start reviewing the structure in more detail.

How to See HTTP Headers

I used the Firefox Firebug to analyze HTTP headers, but you can use the Developer Tools in Firefox, Chrome or any modern web browser to view HTTP headers.

In PHP:

Further in the article, we will see some code examples in PHP.

HTTP Request Structure

The first line of the HTTP request is called the request line and consists of 3 parts:

  • The "method" indicates what kind of request this is. Most common methods are GET, POST and HEAD.
  • The "path" is generally the part of the URL that comes after the host (domain). For example, when requesting "https://code.tutsplus.com/tutorials/other/top-20-mysql-best-practices/" , the path portion is "/tutorials/other/top-20-mysql-best-practices/".
  • The protocol part contains HTTP and the version, which is usually 1.1 in modern browsers.

The remainder of the request contains HTTP headers as Name: Value pairs on each line. These contain various information about the HTTP request and your browser. For example, the User-Agent line provides information on the browser version and the Operating System you are using. Accept-Encoding tells the server if your browser can accept compressed output like gzip.

You may have noticed that the cookie data is also transmitted inside an HTTP header. And if there was a referring URL, that would have been in the header too.

Most of these headers are optional. This HTTP request could have been as small as this:

And you would still get a valid response from the web server.

Request Methods

The three most commonly used request methods are: GET, POST and HEAD. You're probably already familiar with the first two, from writing html forms.

GET: Retrieve a Document

This is the main method used for retrieving html, images, JavaScript, CSS, etc. Most data that loads in your browser was requested using this method.

For example, when loading a Nettuts+ article, the very first line of the HTTP request looks like so:

Once the html loads, the browser will start sending GET request for images, that may look like this:

Web forms can be set to use the method GET. Here is an example.

When that form is submitted, the HTTP request begins like this:

You can see that each form input was added into the query string.

POST: Send Data to the Server

Even though you can send data to the server using GET and the query string, in many cases POST will be preferable. Sending large amounts of data using GET is not practical and has limitations.

POST requests are most commonly sent by web forms. Let's change the previous form example to a POST method.

Submitting that form creates an HTTP request like this:

There are three important things to note here:

  • The path in the first line is simply /foo.php and there is no query string anymore.
  • Content-Type and Content-Length headers have been added, which provide information about the data being sent.
  • All the data is in now sent after the headers, with the same format as the query string.

POST method requests can also be made via AJAX, applications, cURL, etc. And all file upload forms are required to use the POST method.

HEAD: Retrieve Header Information

HEAD is identical to GET, except the server does not return the content in the HTTP response. When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself.

With this method the browser can check if a document has been modified, for caching purposes. It can also check if the document exists at all.

For example, if you have a lot of links on your website, you can periodically send HEAD requests to all of them to check for broken links. This will work much faster than using GET.

HTTP Response Structure

After the browser sends the HTTP request, the server responds with an HTTP response. Excluding the content, it looks like this:

The first piece of data is the protocol. This is again usually HTTP/1.x or HTTP/1.1 on modern servers.

The next part is the status code followed by a short message. Code 200 means that our GET request was successful and the server will return the contents of the requested document, right after the headers.

We all have seen 404 pages. This number actually comes from the status code part of the HTTP response. If the GET request would be made for a path that the server cannot find, it would respond with a 404 instead of 200.

The rest of the response contains headers just like the HTTP request. These values can contain information about the server software, when the page/file was last modified, the mime type etc...

Again, most of those headers are actually optional.

HTTP Status Codes

  • 200's are used for successful requests.
  • 300's are for redirections.
  • 400's are used if there was a problem with the request.
  • 500's are used if there was a problem with the server.

200 OK

As mentioned before, this status code is sent in response to a successful request.

206 Partial Content

If an application requests only a range of the requested file, the 206 code is returned.

It's most commonly used with download managers that can stop and resume a download, or split the download into pieces.

404 Not Found

When the requested page or file was not found, a 404 response code is sent by the server.

401 Unauthorized

Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.

Note that this only applies to HTTP password protected pages, that pop up login prompts like this:

403 Forbidden

If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a URL for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.

For example, on my local server I created an images folder. Inside this folder I put an .htaccess file with this line: "Options -Indexes". Now when I try to open http://localhost/images/ I see this:

There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives.

302 (or 307) Moved Temporarily & 301 Moved Permanently

These two codes are used for redirecting a browser. For example, when you use a URL shortening service, such as bit.ly, that's exactly how they forward the people who click on their links.

Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future. But if you redirect using 301, it will tell the spider that your website has moved to that location permanently. For example https://net.tutsplus.com redirects to https://code.tutsplus.com—that is the new canonical URL.

500 Internal Server Error

This code is usually seen when a web script crashes. Most CGI scripts do not output errors directly to the browser, unlike PHP. If there is any fatal errors, they will just send a 500 status code. And the programmer then needs to search the server error logs to find the error messages.

Complete List

You can find the complete list of HTTP status codes with their explanations here.

HTTP Headers in HTTP Requests

Now, we'll review some of the most common HTTP headers found in HTTP requests.

Almost all of these headers can be found in the $_SERVER array in PHP. You can also use the getallheaders() function to retrieve all headers at once.

Host

An HTTP Request is sent to a specific IP Addresses. But since most servers are capable of hosting multiple websites under the same IP, they must know which domain name the browser is looking for.

This is basically the host name, including the domain and the subdomain.

In PHP, it can be found as $_SERVER['HTTP_HOST'] or $_SERVER['SERVER_NAME'].

User-Agent

This header can carry several pieces of information such as:

  • Browser name and version.
  • Operating System name and version.
  • Default language.

This is how websites can collect certain general information about their surfers' systems. For example, they can detect if the surfer is using a cell phone browser and redirect them to a mobile version of their website which works better with low resolutions.

In PHP, it can be found with: $_SERVER['HTTP_USER_AGENT'].

Accept-Language

This header displays the default language setting of the user. If a website has different language versions, it can redirect a new surfer based on this data.

It can carry multiple languages, separated by commas. The first one is the preferred language, and each other listed language can carry a "q" value, which is an estimate of the user's preference for the language (min. 0 max. 1).

In PHP, it can be found as: $_SERVER["HTTP_ACCEPT_LANGUAGE"].

Accept-Encoding

Most modern browsers support gzip, and will send this in the header. The web server then can send the HTML output in a compressed format. This can reduce the size by up to 80% to save bandwidth and time.

In PHP, it can be found as: $_SERVER["HTTP_ACCEPT_ENCODING"]. However, when you use the ob_gzhandler() callback function, it will check this value automatically, so you don't need to.

If-Modified-Since

If a web document is already cached in your browser, and you visit it again, your browser can check if the document has been updated by sending this:

If it was not modified since that date, the server will send a "304 Not Modified" response code, and no content—and the browser will load the content from the cache.

In PHP, it can be found as: $_SERVER['HTTP_IF_MODIFIED_SINCE'].

There is also an HTTP header named Etag, which can be used to make sure the cache is current. We'll talk about this shortly.

Cookie

As the name suggests, this sends the cookies stored in your browser for that domain.

These are name=value pairs separated by semicolons. Cookies can also contain the session id.

In PHP, individual cookies can be accessed with the $_COOKIE array. You can directly access the session variables using the $_SESSION array, and if you need the session id, you can use the session_id() function instead of the cookie.

Referer

As the name suggests, this HTTP header contains the referring URL.

For example, if I visit the Envato Tuts+ Code  homepage and click on an article link, this header is sent to my browser:

In PHP, it can be found as $_SERVER['HTTP_REFERER'].

You may have noticed the word "referrer" is misspelled as "referer". Unfortunately it made into the official HTTP specifications like that and got stuck.

Authorization

When a web page asks for authorization, the browser opens a login window. When you enter a username and password in this window, the browser sends another HTTP request, but this time it contains this header.

The data inside the header is base64 encoded. For example, base64_decode('bXl1c2VyOm15cGFzcw==') would return 'myuser:mypass'.

In PHP, these values can be found as $_SERVER['PHP_AUTH_USER'] and $_SERVER['PHP_AUTH_PW'].

More on this when we talk about the WWW-Authenticate header.

HTTP Headers in HTTP Responses

Now we are going to look at some of the most common HTTP headers found in HTTP responses.

In PHP, you can set response headers using the header() function. PHP already sends certain headers automatically, for loading the content and setting cookies etc. You can see the headers that are sent, or will be sent, with the headers_list() function. You can check if the headers have been sent already, with the headers_sent() function.

Cache-Control

Definition from w3.org:

The Cache-Control general-header field is used to specify directives which MUST be obeyed by all caching mechanisms along the request/response chain.

These "caching mechanisms" include gateways and proxies that your ISP may be using.

Example:

public means that the response may be cached by anyone. max-age indicates how many seconds the cache is valid for. Allowing your website to be cached can reduce server load and bandwidth, and also improve load times at the browser.

Caching can also be prevented by using the no-cache directive.

For more detailed info, see w3.org.

Content-Type

This header indicates the "mime type" of the document. The browser then decides how to interpret the contents based on this. For example, an HTML page (or a PHP script with HTML output) may return this:

text is the type and html is the subtype of the document. The header can also contain more info such as charset.

For a GIF image, this may be sent:

The browser can decide to use an external application or browser extension based on the mime type. For example this will cause the Adobe Reader or browser built-in PDF reader to be loaded:

When loading directly, Apache can usually detect the mime type of a document and send the appropriate header. Also most browsers have some amount fault tolerance and auto-detection of the mime-types, in case the headers are wrong or not present.

You can find a list of common mime types here.

In PHP, you can use the finfo_file() function to detect the mime type of a file.

Content-Disposition

This header instructs the browser to open a file download box, instead of trying to parse the content. Example:

That will cause the browser to do this:

Note that the appropriate Content-Type header should also be sent along with this:

Content-Length

When content is going to be transmitted to the browser, the server can indicate the size of it (in bytes) using this header.

This is especially useful for file downloads. That's how the browser can determine the progress of the download.

For example, here is a dummy script I wrote, which simulates a large download.

The result is:

Now I am going to comment out the Content-Length header

Now the result is:

The browser can only tell you how many bytes have been downloaded, but it does not know the total amount. And the progress bar is not showing the progress.

Etag

This is another header that is used for caching purposes. It looks like this:

The web server may send this header with every document it serves. The value can be based on the last modify date, file size or even the checksum value of a file. The browser then saves this value as it caches the document. Next time the browser requests the same file, it sends this in the HTTP request:

If the Etag value of the document matches that, the server will send a 304 code instead of 200, and no content. The browser will load the contents from its cache.

Last-Modified

As the name suggests, this header indicates the last modify date of the document, in GMT format:

It offers another way for the browser to cache a document. The browser may send this in the HTTP request:

We already talked about this earlier in the If-Modified-Since section.

Location

This header is used for redirections. If the response code is 301 or 302, the server must also send this header. For example, when you go to http://net.tutsplus.com your browser will receive this:

In PHP, you can redirect a surfer like so:

By default, that will send a 302 response code. If you want to send 301 instead:

Set-Cookie

When a website wants to set or update a cookie in your browser, it will use this header.

Each cookie is sent as a separate header. Note that the cookies set via JavaScript do not go through HTTP headers.

In PHP, you can set cookies using the setcookie() function, and PHP sends the appropriate HTTP headers.

Which causes this header to be sent:

If the expiration date is not specified, the cookie is deleted when the browser window is closed.

WWW-Authenticate

A website may send this header to authenticate a user through HTTP. When the browser sees this header, it will open up a login dialogue window.

Which looks like this:

There is a section in the PHP manual, that has code samples on how to do this in PHP.

Content-Encoding

This header is usually set when the returned content is compressed.

In PHP, if you use the ob_gzhandler() callback function, it will be set automatically for you.

How to Send HTTP Headers

After reading the tutorial up to this point, you should have a good idea of what HTTP headers are and what their different values mean. Some headers are sent and received automatically when you make a request to a server and get a response back.

However, there will be situations where you would want to send your own custom headers besides the ones sent by the client or server.

One of the most common ways of sending your own headers in a request is by using the cURL library in PHP. The library comes with a bunch of functions to handle all your needs. There are four basic steps involved:

  1. You use curl_init() to start your cURL session. You can pass it the URL which you want to request.
  2. The curl_setopt() function is used to configure the request according to your needs. This is where you can set your own headers by using the CURLOPT_HTTPHEADER option.
  3. After you have set all the options, you can execute the request by calling curl_exec().
  4. Finally, you can close the session by calling the curl_close() function.

Here is a basic example that sends a request to https://code.tutsplus.com/tutorials.

You can learn more about cURL by reading these two tutorials. They cover all the basics of the library to help you get started.

If you want to send response headers in PHP, then you should use the header() function. Among other things, one of its common use is redirecting visitors to other pages. This can be done by using the Location header. Here is an example:

You have to remember to call the header() function before any kind of output either in HTML or in PHP. Even blank output is not permitted. Otherwise, you will get the Headers already sent error.

Conclusion

Thanks for reading. I hope this article was a good starting point to learn about HTTP Headers. Please leave your comments and questions below, and I will try to respond as much as I can.

If you want to take your web development further, check out some of the popular files on CodeCanyon. These scripts, apps, templates and plugins can save you precious development time and help you add new features quickly and easily. 

The Best PHP Scripts on CodeCanyon

Explore thousands of the best and most useful PHP scripts ever created on CodeCanyon.

Here are a few of the best-selling and up-and-coming PHP scripts available on CodeCanyon for 2021.

This post has been updated with contributions from Monty Shokeen. Monty is a full-stack developer who also loves to write tutorials, and to learn about new JavaScript libraries.

No comments:

Post a Comment