Too many 404’s

I use a cus­tom 404 page on this site which emails me a noti­fic­a­tion whenever a bad request for a page is made. Over the last five days I’ve received in excess of 15,000 of these 404 emails. Oh dear!

This sys­tem has been pretty use­ful for spot­ting mis­takes, mine and other peoples. If I include a wrong link in a post, I soon start get­ting 404 emails. I can quickly cor­rect the issue. If someone else has a bad link to me, the email includes the ref­erer (sic) field, so I can quickly trace the prob­lem. Great stuff! It’s been work­ing fine for ages.

Then, a few days ago, I notice a lot (a few hun­dred) of 404 emails in the Gmail folder they are auto­mat­ic­ally shunted to. I glanced through them and noticed that they looked like per­man­ent links to my old blog url .../b2/archives/p/1234... that had been some­how cor­rup­ted into .../journalized//p/1234/.... I also noticed that it was Yahoo’s search engine web crawler. I moved one back to my Gmail inbox and popped a star on there to remind me to look into it.

So here I am today look­ing into it. I still hadn’t real­ized there were more than a few hun­dred! It was only when I fired up Thun­der­bird to clear out my POP3 mail­boxes, that I saw some 20,000 emails wait­ing to download!

A fairly quick invest­ig­a­tion revealed that my old b2 redir­ect script was still in place. But when I changed some code around and added some debug to it, I got noth­ing. Ah ha! I vaguely remembered fid­dling with redir­ects in my .htac­cess file the other day. I quickly spot­ted the cul­prit and com­men­ted out the line. Yay! instantly fixed.

I’d been try­ing to short cir­cuit the PHP redir­ect code with the quicker apache redir­ect for the simplest case with the fol­low­ing line: Redirect Permanent /b2/archives http://zed1.com/journalized/ There are so many reg­u­lar expres­sion Redir­ect­Match lines in there that I for­got that that line would retain the rest of the URL when redir­ect­ing. You can even see where the extra slash came from!

Les­son learned: When mak­ing a change like this don’t just check it works, check that the other stuff isn’t broken!