HTACCESS - A mod_rewrite Tutorial For Beginners
Posted by The Curious Website Designer | Posted on Fri 2 Jul 2021
Every now and again, I find myself having to revisit the process of redirecting webpages for any number of reasons. This is usually effected by editing (or creating) a file named .htaccess. The instructions in .htaccess are interpreted and executed by an Apache module called mod_rewrite.
Because I'm an infrequent visitor to this subject, I tend to forget what I learned the last time and inevitably end up copy, pasting and editing what I've done before. This rarely goes smoothly and I often end up spending several days trying to work out where I've gone wrong and sorting out the mess I've made!
This article is for those like me who have a little knowledge, but would like to refresh their memory or learn a little more.
What is mod_rewrite?
As I mentioned earlier, mod_rewrite is an Apache module for rewriting / manipulating url's so the user gets to see the actual page you want them to . You get it to do this by creating a number of rules.
You would use these rules when you want to make sure:
- your visitors always reach the ssl version of your site (ie. https://domain.com). Or vice versa.
- your visitors always reach the non-www (or www) version of your site.
- a page (or folder or site) is permanently or temporarily redirected to a different location.
- others can't use your resources (like images or videos) by directly linking to them from their site.
- any url's that don't point to an actual page are redirected to a custom designed page on your site.
Before we get into the nitty gritty of writing some of the most commonly used rules though, it's worth having a look at and understanding how mod_rewrite and the Rewrite Rules work.
Anatomy of a URL
A url is made up of a number of components. Take the following url as an example:
https://www.domain.com/folder/subfolder/index.php?item=deckchair&colour=blue
The four main components are:
Protocol: This indicates whether the protocol is ssl or not. ie http:// or https://
Domain: This is the full domain name. It may contain a subdomain element and it may or may not include the www. part. eg. www.domain.com or blog.domain.com etc.
Path: This holds the filename including the path from the domain root.
eg /folder/subfolder/index.php
Query string: As it's name implies, this is the query string. It DOES NOT include the initial question mark (?).
eg item=deckchair&colour=blue
In order to assist with rewriting and redirecting url's, mod_rewrite employs a number of variables which can be used in the Rewrite Rules to make life easier for the person creating them. These are the main ones we will be using in this article (there are more):
%{HTTPS} Relates to the Protocol above and has the value of either 'on' or 'off'. '
%{HTTP_HOST} This is the Domain eg. blog.domain.com etc.
%{REQUEST_FILENAME} Same as the Path - folder/subfolder/index.php. Note the forward slash is dropped from the beginning of the path!
%{REQUEST_URI} This is a combination of the Path and Query string. It is the default string that the Rewrite Rules use to compare against a 'match pattern' (more on this a little further on).
eg /folder/subfolder/index.php?item_id=deckchair&colour=blue
$1, $2, %1, %2 These variables are generated by the rewrite rules and will be explained in the following examples.
Switching On mod_rewrite
The first thing you need to do is make sure there is a file named .htaccess in the root folder of your domain. If the file already exists that's fine, if not then you must create it.
To turn on mod_rewrite, you need to insert the following line (if it's not there already):
Anatomy Of A Rewrite Rule
There are often (but not always) two elements to a Rewrite Rule:
- Rewrite Condition (RewriteCond)
- Rewrite Rule (RewriteRule).
If a Rewrite Rule is preceded by an optional Rewrite Condition, then the Rule is only processed if the condition is true.
The Rewrite Rule may or may not change the URL.
There can be more than one condition for any particular rule; but let's not get ahead of ourselves just yet!
The format for the Rewrite Rule is:
- Pattern is a regular expression pattern. If a URL matches this pattern, the rule is processed and the matched pattern is replaced by the Substitution string. Otherwise, the rule is skipped.
-
Substitution can be made up from a string, variables or more commonly, a mixture of both.
-
[Optional Flags] are one or more flags that let you alter the behaviour of the rule. More on flags later.
A Simple Example
In it's simplest form a Rewrite Rule might look like this:
- RewriteRule motors.php cars.php
By default, the string that the pattern is tested against is the Path + Query string. For example, if the url entered in the browser address bar was
https://domain.com/folder/subfolder/motors.php?make=ford&model=escort,
the string being tested against would be:
folder/subfolder/motors.php?make=ford&model=escort.
The pattern 'motors.php' appears in the comparison string (folder/subfolder/motors.php?make=ford&model=escort) and it would be exchanged for the substitution string 'cars.php' in the url. The resultant url would become:
https://domain.com/folder/subfolder/cars.php?make=ford&model=escort,
Note: If the pattern was set up to match 'folder/subfolder/motors.php' and the substitution remained the same (cars.php), then the url would become:
https://domain.com/cars.php?make=ford&model=escort.
mod_rewrite can deal with more complex substitutions than this though by using the power of regular expressions. Although I touch on some of the more common regex rules in this article, it is not intended to provide a comprehensive overview. If you want to learn more, this is a good place to start: https://www.regular-expressions.info/tutorial.html. For testing your regular expressions, this is a good resource: https://regexr.com/.
So let's look at some more common (and more complex) examples of the Rewrite Rules in action.
Introducing The Rewrite Condition
We've seen that the Rewrite Rule compares a Pattern with the Path + Query String and if there's a match, it makes a Substitution.
A Rewrite Condition is an optional test that provides a true or false result. If the condition is true, then the Rule following the condition will be applied (which may or may not make a substitution depending on whether or not there is a pattern match).
The Rewrite Condition can test against much more than just the Path + Query String and this makes it extremely useful for targetting Rewrite Rules exactly where they are needed.
It is possible to have more than one Rewrite Condition preceding a Rewrite Rule.
OK, to the examples . . .
Forcing https://
Possibly one of the most common reasons for using mod_rewrite, we are going to ensure that the visitor is always directed to the https:// version of the page . Here is the code to use:
- # FORCING HTTP PROTOCOL TO HTTPS://
- RewriteEngine on
- RewriteCond %{HTTPS} off
- RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=302,L]
Let's have a look at that in a little more detail. We'll consider the url http://domain.com/index.php.
Line 3: RewriteEngine on. This simply switches mod_rewrite on. It should only appear once in the file and before any Rewrite Rules
Line 5: RewriteCond %{HTTPS} off. Checks whether the protocol for https:// is 'switched off'. It is in our case, so the condition is true.
Line 6: RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=302,L]. The above condition is true, so the rewrite rule is processed.
The first part (the pattern) ^(.*)$ means the whole of the url after the domain name but not including the forward slash (/).
- The carat (^) signifies the start of the pattern.
- The dollar sign($) signifies the end of the pattern.
- The fullstop (.) means any character.
- The asterisk(*) means any number of characters. So (.*) means any string of any length. In our example, this will be index.php.
- The fact that the fullstop and asterisk have been enclosed in brackets means that this string will be added into a variable. As it's the first set of brackets in the pattern, it gets assigned to $1. We will come back to that in a moment.
As the pattern matches the string, index.php will be replaced with the substitution part as follows:
- https://. The substitution string will start with https://.
- %{HTTP_HOST}. The domain name - I tend to use this variable because it makes it easier to copy/paste to other websites without having to make any edits. It would have been equally valid to have used the actual domain name instead (domain.com).
- /$1. Because we aren't changing the url other than the protocol we can use this variable which was assigned as a result of the part of the pattern enclosed in brackets (.*) earlier. There is more information about these 'back-references' here: https://httpd.apache.org/docs/2.4/rewrite/intro.html#InternalBackRefs.
We must precede the variable with a forward slash (/) because that element is not included as part of the pattern. - [R=302,L]. Optional flags:
- R=302. This forces the browser to redirect to the revised URL. Code 302 means the redirect is only temporary. Normally you would want to use Code 301 - Permanent, but it's good practise to start with R=302 to test the redirects fully before updating it to a permanent change.
If you get unexpected results with your redirects having flagged them as permanent, that will cause problems for those users who visited the site as the redirect will be maintained in their browsers cache as a permanent change. - L. Last rule. Instructs mod_rewrite to stop processing any more rules.
IMPORTANT: Don't leave a space after the comma in the optional flags group!
- R=302. This forces the browser to redirect to the revised URL. Code 302 means the redirect is only temporary. Normally you would want to use Code 301 - Permanent, but it's good practise to start with R=302 to test the redirects fully before updating it to a permanent change.
Putting all this together, you will see that this should lead to the path part of the url, index.php being replaced by https://domain.com/index.php and this would result in the url, http://domain.com/https://domain.com/index.php.
Fortunately mod_rewrite recognises that our replacement string includes a Protocol and Domain, so it replaces those too, giving us https://domain.com/index.php.
www to non-www.
Another popular rewrite.
Google views www.domain.com and domain.com as two separate domains. If your site allows either, this can affect your page rankings as Google will see these as duplicate pages. It seems most people prefer a non-www version of their site, but it doesn't really matter either way. I will show you both.
So to change www.domain.com to domain.com we would use:
- # USING NON-WWW VERSION OF A SITE
- RewriteCond %{HTTP_HOST} ^www\.(.*)
- RewriteRule ^(.*)$ https://%1/$1 [R=302,L]
Line 3: RewriteCond %{HTTP_HOST} ^www\.(.*). This time we are checking if the domain name begins with www. Note that the check pattern starts with a carat (^) and that the dot after www is escaped with a backslash. THIS IS IMPORTANT; in a regular expression, a dot (.) means 'any character' which is why it must be escaped!
Also, the pattern concludes with (.*). This adds the remainder of the domain name string to a variable (we will need it shortly). Because this is the Rewrite Condition part of the rewrite rule, the variable starts with a percent sign (%1).
There is no need to mark the end of the test string with the dollar sign ($).
The test result was a match and therefore returns 'true'. We could have left the test pattern as ^www\.
Line 4: RewriteRule ^(.*)$ https://%1/$1 [R=302,L]. The same pattern as before ^(.*)$.
Breakdown of the substitution part of the rule:
- https://. As before, we're changing the url anyway so lets make sure it's https://.
- %1. This is the remainder of the string from the RewriteCondition test (domain.com).
- /$1. The pattern from the Rewrite Rule (everything after the domain name except the first forward slash) which is why we included the forward slash before the $1.
- [R=302,L]. As before, temporary redirect, Last Rule.
non-www to www
- # USING WWW VERSION OF A SITE
- RewriteCond %{HTTP_HOST} !^www\.
- RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=302,L]
Line 3: RewriteCond %{HTTP_HOST} !^www\. Note the exclamation mark at the beginning of the pattern here (!^www\.). This signifies NOT; the domain name does NOT begin with www. (the dot is always escaped in a regular expression pattern, remember).
Also, there is no need to include the rest of the domain name because we will be using all of it in the rewrite (ie. no need for (.*) at the end of the pattern).
Line 4: RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=302,L]. Again we are using the whole of the URI, hence ^(.*)$.
- https://www. This time we're prefixing the url with https:// and the www. part of the domain name
- %{HTTP_HOST}. The domain name as requested without the www. part.
- /$1. The folder path, filename and any query string.
- [R=302,L]. The redirect instruction
Redirect from one folder to another.
There are a number of possible scenarios here. Let's use an example URL of https://domain.com/folder/subfolder/index.html.
1. First Folder in Path. Imagine we have renamed 'folder' (the first folder in the path) to 'new_folder' but the rest of the directory tree remains unchanged.
- # REDIRECT FROM ONE FOLDER TO ANOTHER
- # First Folder in The Path
- RewriteRule ^folder\/ new_folder/
- # Alternate way of achieving the same thing
- RewriteRule ^folder\/(.*)$ new_folder/$1
There is no need for a Rewrite Condition as the folder pattern can be directly tested for in the Rewrite Rule.
Also, as we are only impacting on the path element of the URL, we don't need to include the HTTP protocol or the domain in the Substitution element; mod_rewrite will include it by default unless we add it to the substitution string.
The rule at line 4 can be broken down as follows:
- ^folder\/. This is the pattern. If the url string starts with 'folder/', then perform the substitution (the forward slash [/] is escaped with a backslash[\]). We don't end the pattern with a dollar sign ($) because that isn't the end of the string. If we did include it, the Rewrite Rule would fail.
- new_folder/. The substitution is that 'folder/' is replaced with 'new_folder/'. Thus we get:
https://domain.com/new_folder/subfolder/index.html.
The alternate rule at line 7 is not quite as straightforward:
- ^folder\/(.*)$. As before, if the url string starts with 'folder/', then perform the substitution (the forward slash [/] is escaped with a backslash). The remainder of the url is added to the variable $1 because we included the (.*) at the end of the pattern. Because the pattern starts with a circumflex (^) and ends with a dollar sign ($), the pattern must match the whole url string.
- new_folder/$1. Replace the whole url string with 'new_folder/' plus the contents of (.*). Thus we get:
https://domain.com/new_folder/subfolder/index.html.
2. Remove First Folder From Path. Similar to above, but instead of renaming the folder, we have moved the site 'up' one level, so the url now needs to be changed to: https://domain.com/subfolder/index.html.
- # REDIRECT FROM ONE FOLDER TO ANOTHER
- # Remove First Folder from The Path
- RewriteRule ^folder\/ ""
- # Alternate way of achieving the same thing
- RewriteRule ^folder\/(.*)$ /$1
Again, no Rewrite condition required. The substitution here (line 4) is replace 'folder/' with nothing ("").
You could, of course use the alternative method at line 7 where you capture the remainder of the url using [(.*)] and substitute using /$1
3. Second Folder In Path. This time we will rename the second folder in the path (subfolder) to new_subfolder.
- # REDIRECT FROM SECOND FOLDER TO ANOTHER
- # First Folder in The Path
- RewriteRule subfolder\/ new_subfolder/
- # Alternate way of achieving the same thing
- RewriteRule ^(.*)subfolder\/(.*)$ /$1new_subfolder/$2
Notice that on line 4, there is no cirumflex (^). This is because we are not looking for the matching pattern to be at the start of the comparison string.
On the alternative approach (line 7), the circumflex is there because we want to capture the complete string and replace the parts that don't change. The content of the first occurrence of (.*) gets allocated to $1, and the second (.*) gets allocated to $2.
4. Remove a 'Middle Folder From The Path. In this final example, we are going to remove the subfolder from the path. This is similar to example 2 above, but when the pages are deeper in the folder structure, and you have moved it one level closer to the domain root.
- # REDIRECT FROM ONE FOLDER TO ANOTHER
- # Remove Second Folder from The Path
- RewriteRule subfolder\/ ""
- # Alternate way of achieving the same thing
- RewriteRule ^(.*)subfolder\/(.*)$ /$1$2
So whats the point of the alternate Rewrite Rule when the original way is more straightforward?
That's a valid question. The first way of writing the rules for the above examples is simpler, but will only work if you are not changing the Protocol or Domain. If the substitution part of the rule includes either of those two parts (eg https:// or www / non-www), you will need to also provide the full route to the url.
Organising Your .htaccess File.
So we've seen a number of examples of how we can redirect the URL:
- Forcing https://
- Forcing non-www (or the opposite)
- Redirecting the user to a different folder.
Now we could simply put those instructions into an .htaccess file and let it redirect away. For example, lets take the url: http://www.domain.com/folder/index.html and redirect it to https://domain.com/alternative/index.html. Our .htaccess file might look like this:
- RewriteEngine on
- # Force HTTPS
- RewriteCond %{HTTPS} off
- RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=302,L]
- # Force non-www
- RewriteCond %{HTTP_HOST} ^www\.(.*)
- RewriteRule ^(.*)$ https://%1/$1 [R=302,L]
- # Change folder where file is located
- RewriteRule ^folder\/ alternative/
This will work, but the browser will be redirected three times to get to the final destination. This is not good practice and the .htaccess file should be rewritten so that every url combination is redirected on the first pass. Test this out here
Here is a better, more efficient solution:
- RewriteEngine on
- # Redirects www to HTTPS non-www and changes route when the path starts with 'folder/'.
- RewriteCond %{HTTP_HOST} ^www\.(.*)
- RewriteRule ^folder\/(.*)$ https://%1/alternative/$1 [R=302,L]
- # Redirects www to HTTPS non-www for other url's when the path DOES NOT start with 'folder/'
- RewriteCond %{HTTP_HOST} ^www\.(.*)
- RewriteRule ^(.*)$ https://%1/$1 [R=302,L]
- # Changes route for non-www url's when the path starts with 'folder/'
- RewriteRule ^folder\/(.*)$ https://%{HTTP_HOST}/alternative/$1 [R=302,L]
- # Redirects HTTP to HTTPS for any url's not covered above
- RewriteCond %{HTTPS} off
- RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=302,L]
Line 4 checks if the domain is of the 'www.' variety. If so, the Rewrite Rule is applied.
Line 5 - the Rewrite Rule is only applied if the path starts with 'folder/'. In this case, HTTPS is automatically applied to the redirected URL, the www. part of the domain is removed, and the folder name is changed. ie the browser is redirected to https://domain.com/altenative/index.html.
Any www.domain.com url's that do not start with 'folder/' remain unchanged and will progress to the next rule.
Line 8 also checks if the domain is of the 'www.' variety. If so, the Rewrite Rule is applied. This will only capture those url's where the start of the path IS NOT 'folder/' as they will have already been dealt with by the rule at line 5.
Line 9 The match pattern here is the whole of the path plus any query string, so there will always be a match and the substitution made. Once again HTTPS is automatically applied to the URL whether it was initially presented as HTTP or HTTPS, the www. part of the domain is removed by using the variable (%1) captured at line 8 and the path appended unchanged. The browser is then redirected to the appropriate url.
At this point all www. type url's have been captured and rewritten to the correct address, so from here on we only need to consider non-www domains in the next rules.
Line 12 checks if the path in the url starts with 'folder/' and if so, automatically applies HTTPS and substitutes 'folder/' with 'alternative/'. Any url that started out as a www. type and the path started with 'folder/' will be coming through this rule already changed to the correct address and so will not be affected by this rule.
Line 15 tests for any other url's that have not previously been changed by the earlier rule and are using the HTTP protocol. If so, the rule on line 16 is applied.
Line 16 simply applies the HTTPS protocol, but doesn't otherwise change the url.
So in a single pass, every permutation of url is tested and redirected (if necessary) to the correct location without having to be redirected again. Test this out here
Testing, Testing, Testing
Although Apache can be configured to provide a log of mod_rewrite events, it is not a straightforward task to decipher. Also many people don't have access to the logfile. So realistically, it is difficult to get any feedback about where to start if things don't go to plan and your redirects aren't working as they should.
Luckily, the madewithlove.be team have developed a great tool for testing your Rewrite Rules / .htaccess file. Even better they have provided an API, so I (and others) can harness the power of their tool while providing you with our interpretation of the best design / layout. Here is my version of the .htaccess tester:
Acknowledgements
Tags: htaccess, mod_rewrite, rewrite rule, rewriterule, rewrite rules, rewritecond, rewrite condition