Saturday, June 17, 2017

How do I correct my htaccess for proxying search engine crawl requests?

Leave a Comment

I have built a website with React at the front end and WordPress as the backend. For search engine crawlers to see my site, I have set up prerendering at the server side, and am trying to set up htaccess to proxy requests coming from search engines so that they are served pre-rendered pages.

For testing, I am using the "Fetch as Google" tool in Google Webmasters.

Here is my attempt:

<IfModule mod_rewrite.c>     RewriteEngine On     <IfModule mod_proxy_http.c>     RewriteCond %{REQUEST_FILENAME} -f [OR]     RewriteCond %{REQUEST_FILENAME} -d     RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]     RewriteCond %{QUERY_STRING} _escaped_fragment_     # Proxy the request ... works for inner pages only     RewriteRule ^(?!.*?)$ http://example.com:3000/https://example.com/$1 [P,L]      </IfModule> </IfModule> # BEGIN WordPress <IfModule mod_rewrite.c>    RewriteEngine On    RewriteBase /    RewriteRule ^index\.php$ - [L]    RewriteCond %{REQUEST_FILENAME} !-f    RewriteCond %{REQUEST_FILENAME} !-d    RewriteRule . /index.php [L] </IfModule> # END WordPress 

My problem is that this directive doesn't work for my home page, and works only for inner pages (http://example.com/inner-page/):

RewriteRule ^(?!.*?)$ http://example.com:3000/https://example.com/$1 [P,L] 

When I change this line to the following line, the home page request is indeed proxied correctly, but the inner pages stop working.

RewriteRule ^(index\.php)?(.*) http://example.com:3000/https://example.com/$1 [P,L] 

Could you help me fix the rewrite rule so that my home page is also proxied correctly for the googlebot?

2 Answers

Answers 1

First avoid the repetetions

<IfModule mod_rewrite.c>     RewriteEngine On     <IfModule mod_proxy_http.c>     RewriteCond %{REQUEST_FILENAME} -f [OR]     RewriteCond %{REQUEST_FILENAME} -d     RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]     RewriteCond %{QUERY_STRING} _escaped_fragment_     # Proxy the request ... works for inner pages only     RewriteRule ^(?!.*?)$ http://example.com:3000/https://example.com/$1 [P,L]     RewriteBase /     RewriteRule ^index\.php$ - [L]     RewriteCond %{REQUEST_FILENAME} !-f     RewriteCond %{REQUEST_FILENAME} !-d     RewriteRule . /index.php [L]      </IfModule> </IfModule> 

Then change ^(?!.*?)$ to ^.*$ or with a good pattern like [a-zA-Z0-9-.]*. Don't forget to use 0 or more flag (*) there.

The correct code will be

<IfModule mod_rewrite.c>     RewriteEngine On     <IfModule mod_proxy_http.c>     RewriteCond %{REQUEST_FILENAME} -f [OR]     RewriteCond %{REQUEST_FILENAME} -d     RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]     RewriteCond %{QUERY_STRING} _escaped_fragment_     # Proxy the request ... works for inner pages only     RewriteRule ^(.*)$ http://example.com:3000/https://example.com/$1 [P,L]     RewriteBase /     RewriteRule ^index\.php$ - [L]     RewriteCond %{REQUEST_FILENAME} !-f     RewriteCond %{REQUEST_FILENAME} !-d     RewriteRule . /index.php [L]      </IfModule> </IfModule> 

Answers 2

Change the RewriteRule to:

RewriteRule ^(.*)/?$ http://example.com:3000/https://example.com/$1 [P,L] 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment