Friday, May 26, 2017

301 Redirect all subdirectory URLS to a 404 and clean query strings

Leave a Comment

We are removing two sections from our site.

/warehouse/
/clothing/

I'd like to send all the URLS beneath these two to a single (404) landing page saying the item has been removed. I'd like to clean up the query strings too if possible.

Where do I start?

2 Answers

Answers 1

If you're using nginx, you can just add a pair of location sections. They'll match as long as there aren't more specific locations. Check out the documentation for more detail.

location /warehouse/ {     return 410; }  location /clothing/ {     return 410; } 

If there are too many locations, it could be cumbersome to list them separately, so you can use regex like this:

location ~* ^/(warehouse|clothing|something-else)/ {     return 410; } 

If you want a customized 410 page, add configuration like this in your server block:

error_page 410 /410.html; location = /410.html {     root /var/www/error/;    # Put a file /var/www/error/410.html     internal; } 

Replace 410 with 404 if you want to return that status code. I believe 410 "Gone" is more appropriate answer, but YMMV.

I'd suggest to do this in whatever is closer to the client, so if nginx is in front of Apache - do it with nginx. This way you have less round-trips.

If you want to do this in Apache, you can do it with RedirectMatch:

// I'm not sure `.*$` part is even necessary. Can be probably omitted. RedirectMatch gone "^/(warehouse|clothing)/.*$" "/410.html" 

Or I'd suggest to use mod_rewrite as a somewhat more flexible option:

RewriteEngine on RewriteRule ^/(warehouse|clothing)/ - [G,L] ErrorDocument 410 /410.html 

Here [G] means "gone" (410 status code). If you want a 404 response, do this instead:

RewriteEngine on RewriteRule ^/(warehouse|clothing)/ - [R=404,L] 

Note, that you need ^/ in your regexes to indicate that path not just contains /warehouse/ or /clothing/ but starts with those. Otherwise you'll see suposedly incorrect responses on addresses like /about/clothing/. I'm not exactly sure if you need trailing .*$, but I believe you don't. Don't have Apache to test this. Add it if rules don't work for you (i.e. ^/(warehouse|clothing)/.*$).

Or you can handle the logic in your application - which can be the only way if your base layout contains something user-dependent and you want consistency. No answer could be written without knowing what language/framework/stack do you use.

Answers 2

First, I'd recommend that you redirect to a 410 (Gone) rather than a 404 to acknowledge that the resource once existed.

In Apache, you'd do something like the following. Refer to this page for more information.

RedirectMatch permanent "^/(warehouse|clothing)/?.*" "http://www.example.com/404" 

In IIS, your web config would look something like the following. Note that IIS won't let you use question marks in your regex, since it interprets that as a query string. Refer to this page for more information.

<?xml version="1.0" encoding="UTF-8"?>    <configuration>     <system.webServer>     <rewrite>     <rules>         <rule name="404 Redirect" stopProcessing="true">                     <match url="^/(warehouse|clothing)/" />             <action type="Redirect" url="404" appendQueryString="true" redirectType="Permanent" />             <conditions trackAllCaptures="true"></conditions>         </rule>    </rules>    </rewrite>         <httpProtocol allowKeepAlive="false" />         <caching enabled="false" />         <urlCompression doDynamicCompression="true" />   </system.webServer> </configuration> 

Updated to include ^/ at the beginning of the regex based on drdaeman's comment.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment