Too many 404’s

I use a cus­tom 404 page on this site which emails me a noti­fic­a­tion whenever a bad request for a page is made. Over the last five days I’ve received in excess of 15,000 of these 404 emails. Oh dear!

This sys­tem has been pretty use­ful for spot­ting mis­takes, mine and other peoples. If I include a wrong link in a post, I soon start get­ting 404 emails. I can quickly cor­rect the issue. If someone else has a bad link to me, the email includes the ref­erer (sic) field, so I can quickly trace the prob­lem. Great stuff! It’s been work­ing fine for ages.

Then, a few days ago, I notice a lot (a few hun­dred) of 404 emails in the Gmail folder they are auto­mat­ic­ally shunted to. I glanced through them and noticed that they looked like per­man­ent links to my old blog url .../b2/archives/p/1234... that had been some­how cor­rup­ted into .../journalized//p/1234/.... I also noticed that it was Yahoo’s search engine web crawler. I moved one back to my Gmail inbox and popped a star on there to remind me to look into it.

So here I am today look­ing into it. I still hadn’t real­ized there were more than a few hun­dred! It was only when I fired up Thun­der­bird to clear out my POP3 mail­boxes, that I saw some 20,000 emails wait­ing to download!

A fairly quick invest­ig­a­tion revealed that my old b2 redir­ect script was still in place. But when I changed some code around and added some debug to it, I got noth­ing. Ah ha! I vaguely remembered fid­dling with redir­ects in my .htac­cess file the other day. I quickly spot­ted the cul­prit and com­men­ted out the line. Yay! instantly fixed.

I’d been try­ing to short cir­cuit the PHP redir­ect code with the quicker apache redir­ect for the simplest case with the fol­low­ing line: Redirect Permanent /b2/archives http://zed1.com/journalized/ There are so many reg­u­lar expres­sion Redir­ect­Match lines in there that I for­got that that line would retain the rest of the URL when redir­ect­ing. You can even see where the extra slash came from!

Les­son learned: When mak­ing a change like this don’t just check it works, check that the other stuff isn’t broken!

11 thoughts on “Too many 404’s

  1. Don’cha hate when that hap­pens? I did the exact same thing when I moved serv­ers, so it was doing some strange 301-ing.

    Chalk it up to the webmaster’s lazi­ness. Or rather, the mis­com­mu­nic­a­tion between the brain and the dim­wit fin­gers :P

  2. Pingback: Is there a PC Doctor in the house?

  3. Richard has a great idea … you could also prob­ably hack together a data­base tool that would allow you to get some handle on the sources of the 404s … let some scripts do the work for you rather than using brain­power to fol­low the rab­bit trail. :)

  4. Hey Mike, just a quick note to say thanks loads for the great theme. I am using it for my inter­net Radio Show (pod­cast) FirstPersonShow.net

    I’ve Enjoyed hav­ing a poke around your web­site, and I’ll be back again to check out more later.

    thanks,
    Kevin
    “cel­eb­rat­ing the uncelebrated”

  5. O ok i just saw that the calandar u have is related to ur archieve .

    oops !!

    ney­way what bout the pic ? i want to have a pic next to my nick name each time i upload a post how can i do that?

  6. Oh, you mean you don’t like it when every­one puts in a wrong url just so you can get e-mail? =)

    I usu­ally just go over my stats at the end of every week an see where every­one is going wrong and fix it from there.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>