Remove special characters in a URL

I need a help to prevent special characters in URL & it should redirect to 404 error page, if anyone specifically entered characters. I tried a couple of ways through .htaccess but none of them worked.

Expected URL:

http://www.example.com/ar/shop-listing/health-beauty/

but still I can see the page if I entered like

http://www.example.com/ar/shop-listing/health-beauty~@/#$&&&%%$%3Cscript%3E/

Can anyone suggest a solution to approach this issue?

1 Answer
1

http://www.example.com/ar/shop-listing/health-beauty~@/#$&&&%%$%3Cscript%3E/

In this particular URL, everything after the # (hash) is the fragment identifier and is not sent to the server by the browser, so cannot be blocked server-side in .htaccess.

The server only sees:

/ar/shop-listing/health-beauty~@/

If the # was to be removed from this URL then this would be a wholly invalid URL (because of the improperly encoded % chars) and the server will respond with a “400 Bad Request”.

Aside: Not sure why WordPress would still resolve this URL to the “correct” URL? This is technically a different URL.

Rather than blocking special characters, it’s probably much easier to allow a whitelist of characters in the URL-path. Most URL-paths will only consist of lowercase a-z, - (hyphen) and / (slash – path separator), so we could simply block the request if any other character is present in the URL-path. This could be implemented using the following mod_rewrite directive at the top of your .htaccess file, before the existing WordPress directives:

RewriteRule [^a-z/-] - [R=404]

If the URL-path contains any characters other than those mentioned then it will trigger a 404.

Note that this only checks the URL-path, not the query string part of the URL.

Leave a Comment