Trailing Slashes are the very last slash (i.e. “/”) at the end of a URI.
Example: https://daxondata.com/about/
Key Tips for Redirecting with Regular Expressions (i.e. Regex):
Regular expressions are wonderfully powerful, flexible, and extremely dangerous. We use regular expressions when you need to watch for a pattern of characters, words, or numbers. I am in love with them when it comes to Google Sheets or Google Data Studio when you need to remove URL parameters or standardize text dimensions so that things sum as intended.
In redirections, for example, regular expressions can replace a word or redirect a set of pages or redirect all the pages in a category to a new category. If you have 20 pages in that category, you don’t have to set up redirects for each one. You can use a regex to properly 301 redirect them all to their respective new pages (as long as the rest of the URI is consistent, of course).
However, keep in mind, when setting up redirects you can shut down your site, keep images from loading, or create a ton of 404 errors you would never know about until it’s too late. But when done correctly, they can make your world much easier.
Read John Godley’s great post on using regex in his Redirection plugin. He has great tips on avoiding loops and other issues.
I’m writing this post because I could not find any articles on the web about how we can use regex to add a trailing slash for WordPress sites. There are plenty of posts for situations when working with data, using JavaScript or other programming language, HTAccess files, and many other situations. But finding one solution for all URI situations was almost not possible. Here’s how I did it…after much trial and error.
I use the Redirection plugin for my WordPress sites (please donate, it’s an amazing plugin). I’ll use it’s terminology, but that should translate for your other situations. In that context, like an HTAccess file, redirects are processed from the top to the bottom. When anyone visits a page, that web address is matched to redirects one at a time. As soon as one of them applies, it processes it and does not continue through the list. If not ordered properly, you could have a regex applying you don’t intend to, create a loop effect of redirects (and shutting down your site), or creating a very long redirect chain (where redirects happen five or more times in sequence) that Google won’t follow.
The Regex:
Source Regex: ^/([^?.]+)([^/])(\?.*)?$
Target: /$1$2/$3
Source Regex Explanation:
^/ | Requires a slash to be at the beginning of a URI. Since slashes can be anywhere in a URI, this helps set the stage for whatever is next. |
() | Parenthesis stores the match so we can pass it along to the “target.” |
[^?.]+ | [^x] – The combo of bracket and hat means that whatever follows the hat should NOT be matched. The whole bracket portion represents ONE character, not a word. In other words, it matches all possible characters BUT the ones following the hat. You can have multiple characters like I do here (i.e. “?” and “.”). All are taken literally (i.e. they don’t need to be escaped like other uses) and there is no limit. I use a question mark to keep this portion from storing URL parameters. I also add a period to keep images from having a slash added to their web addresses. That simple issue creates 404 errors. I’m always amazed how such a tiny little character has the power to shut down a site — whether using regex or programming a site. IMPORTANT: No matter how many characters you add, it’s still evaluating a single character in your pattern. It’s easy to get confused. If you want to exclude a character from anywhere in a URL (like I’m doing with a period), you must add the plus sign after it like I have. That plus sign means that portion is NOT optional and applies to the entire web address — not just a single character. |
[^/] | Adding this portion AFTER the prior section tells the system to place it after the prior portion. I know, it’s logical. However, when it’s after a plus sign (+) or asterisk (*), this can produce unexpected results. It’s especially true when a URI has more than one slash. It often stops matching after the first slash it comes to. That means it wouldn’t grab the whole URI like we need. So, it works best if followed by a dollar sign ($) or other clear determinant that wouldn’t show anywhere else in the URL (e.g. question mark). This combination is vital for our scenario to make sure that when a slash is at the end of a URI, this redirect should not be applied. Putting this portion in parenthesis allows you to pass the matched character onto the target URI. |
(\?.*)? | This portion captures URL parameters (commonly used for campaign tracking or user click tracking in Google Adwords, display ad links, cross domain tracking, and many others. We want those parameters passed to the final destination to keep the marketing source reported in Google Analytics. (Tangent: This may change, but when using regex in the Redirection plugin, the “ignore slash” setting doesn’t apply even if checked.) In regex, the question mark (“?”) usually means the character before it is optional. In our case, we need it to be literal and “escaped.” And the preceding backslash does just that. It means to treat a question mark like a question mark character. But you’ll see at the end of the parenthesis, the question mark is used for pattern intelligence. It means a URI doesn’t have to have the portion before it. Question marks usually just apply to the character just prior to it. But when you put a group of characters in parenthesis — whether you plan to pass the match on or not, it still groups them together so a question mark makes the whole set optional. In our situation here, this combination effectively means this regex applies to regular URIs AND those with parameters. Next, the period & asterisk (.*) essentially means match all characters or none. Whatever if after the question mark, whether something is there or not, it’ll be stored and passed to the target. |
$ | Lastly, this means that whatever is before it must be at the end of the URI. If you didn’t have it, then whatever is before it would match any portion of the URI. In our case, we really need it after the “[^/]” so that the ending slash is identified. But since we sometimes have URL parameters, we need to insert that between the two. |
Target Regex Explanation:
/$1 | To keep things clear and make sure the right parts of a URI are passed on, we didn’t store the first slash of a URI. So we add it back here. $1 is the first matched set of parenthesis in the regex. Based on how we’ve built it here, we’ve captured all of the URI up to either the last character or question mark as long as the URI doesn’t end with a slash. |
$2/ | This portion is the ending character match — as long as it’s not a slash. The regex set prior to this doesn’t capture the last character since we have the exception followed by a dollar sign. We then add the ending slash we want the URIs to have. |
$3 | Lastly, we add back in the URL tracking parameters — if there is a question mark in the URI. There won’t be any error if parameters don’t exist. It’ll just be blank. Easy peasy. |
Please add comments if you think I have something wrong or a URI case that will create an error if this is applied. This is all very technical and it’s easy to miss use cases.
I hope this helps. Always glad to share and keep others from spending hours reinventing the wheel!