Migrating content from an old solution. How to easily change hardcoded hyperlinks in legacy content

I’m not entirely sure where I can even ask this question as it pertains to both HTML and WP.

Anyways, our old intranet solution basically held a bunch of .htm documents that we made in Word and then turned into .htm.

We have switched to a WP solution and have begun uploading these .htm docs as media as it is the easiest solution for the 1700 docs we have. The issue is, these .htm docs contain hyperlinks inside of them that link to other .htm docs on our old website. This means that once we decommission our old server, all of those hyperlinks will be dead. I’m talking about thousands of useless links inside of the .htm docs.

How can I solve this problem? Ideally, we don’t want to individually edit each .htm doc and change the links inside of them. We have also tried the html to post converter plugin but it makes the formatting all wonky. We don’t want to, but should we just keep the old server as a file server of sorts so the links still work?

URL structure

OLD: http://websrv04/DIL%20Manuals/Employee%20Manual/Drayden%20Core%20Values.htm
NEW: droogle.dil/wp-content/uploads/2017/08/Core-Values.htm

href example from an old .htm doc

Code

</span></span><span lang=EN-CA style="font-family:
"Arial","sans-serif";color:green;mso-ansi-language:EN-CA">Refer to </span><a
href="http://websrv04/DIL%20Manuals/Policy%20Works%20Accessing%20Policy%20Works.htm"><span
lang=EN-CA style="font-family:"Arial","sans-serif";mso-ansi-language:EN-CA">Policy
Works Accessing Policy Works</span></a><span lang=EN-CA style="font-family:
"Arial","sans-serif";color:green;mso-ansi-language:EN-CA"><o:p></o:p></span></p>

Final product

Refer to Policy
Works Accessing Policy Works

2 Answers
2

If the old htm files have a commonality in the links that you need to convert (as in www.oldsite.com/page100.htm needs to convert to www.newsite.com/page100.htm ), then you could load the site into an editor and do a mass search/replace of ‘oldsite.com’ to ‘newsite.com’. (I would use my copy of Dreamweaver, which as a search/replace across files in a folder.)

But that would require putting the page100.htm in a non-WP location.

You could create a template that did a ‘list files’ of a folder and convert those to HREFs. But that would only get those HTMLs on the site.

If you need to get the old HTML content into WP, you could write a manual conversion program that reads each HTML file and creates a post with that content. That doesn’t take care of links in the content though. For that, you’d need to write a 404-not-found process that would help the user find the content via a search function (“We didn’t find that page, but here are some pages with similar content”).

I suspect a lot of custom programming and labor to do this. Although more details of the problem (with a few examples) might be helpful. Interesting question, though, since I face a possible similar problem on a site that I manage, so look forward to other’s thoughts.

Leave a Comment