Friday, August 20, 2021

Using PHP urlencode and urldecode

Every now and then you will have to pass around URLs between different webpages and services. It sounds like a pretty easy task since URLs are basically just text strings, but things can sometimes get complicated. For example, sometimes you need to encode a URL in another URL, for example in a GET request!

In this tutorial, you'll learn why you need to encode or decode your URLs and how to use the built-in PHP functions to do so.

The Need for Encoding and Decoding URLs

Perhaps you want to pass a URL as a query parameter to a web service or another web page. Say for example, you want to send the following data to a website as a query string:

key data
redirect https://code.tutsplus.com
author monty shokeen
page 2

That information would be encoded in the following query string:

Notice that the special characters like : and / in the "redirect" URL have been encoded as %3A and %2F to avoid interfering with the structure of the overall URL. This is called escaping and that's where the encoding functions come in.

The server at example.com will receive that encoded URL in the query string and will probably need to decode it later. That's where URL decoding is important.

Encoding URLs with urlencode() and rawurlencode()

There are two different functions in PHP for encoding URLs. These are called urlencode() and rawurlencode(). The major difference between these two is the set of characters they encode and how they handle spaces.

In case of urlencode(), the function replaces all other non-alphanumeric characters except -, _ and . with a percent sign followed by two hex digits. Any spaces are replaced with the + character. On the other hand, the rawurlencode() function replaces all other non-alphanumeric characters except -, _, ., and ~ with a percent sign followed by two hex digits. This function also replaces spaces with a percent followed by two hex digits: %20.

The following example should clear things up a bit for you.

As you can see, non-alphanumeric characters like ?, / and = were encoded in the same manner by both the functions. However, the urlencode() function changed php basics to php+basics while rawurlencode() changed it to php%20basics.

In general, it is a good idea to use rawurlencode() to encode all your URLs. There are a couple of reasons for that. First, the rawurlencode() function encodes URLs based on the more modern RFC 3986 scheme. Second, it provides better compatibility if the URLs you encode have to be decoded later in JavaScript.

Decoding URLs with urldecode() and rawurldecode()

The functions urldecode() and rawurldecode() are used to roll back the changes made by corresponding urlencode() and rawurlencode() functions.

This basically means that all sequences which contain a percent sign followed by two hex strings will be replaced with their proper characters. The urldecode() function will change the + symbol to a space character and rawurldecode() function will leave it unchanged.

Here is an example to illustrate the difference.

In the above example, we have used the decoding functions to decode the URLs we encoded in the previous example. The $urlencoded_string variable has the space character changed to +. Using the urldecode() function on this string changes it back to space character. However, the rawurldecode() function leaves it untouched.

The $rawurlencoded_string variable has the space character converted to %20 which is handled the same way by both urldecode() and rawurldecode().

It is important to be careful about the function that you use for decoding an encoded URL because the final result may vary depending on what you used to originally encode it.

Final Thoughts

Our focus in this tutorial was to teach you how to encode and decode URLs in PHP. We started with the need to encode and decode URLs and then provided a brief overview of four different functions to do it.

As I mentioned earlier, the safest bet is to use rawurlencode() and rawurldecode() everywhere. This will ensure consistency in your own code as well as compatibility with other languages that use the new scheme of encoding and decoding URLs.

No comments:

Post a Comment