Sciencetext Tips & Tricks

Blogging tips, browsing tricks and computing hacks

Creating Dangerous Random Blog Content

January 11th, 2008 · by David Bradley

Random blog content

Some time ago, Rob Watts over on YackYack created a dynamic blog post, one that utilizes the rand function to appear to be discussing any given popular topic. A page refresh will bring up a new subject. It might be “Epilepsy”, “Missouri western state college”, “erectile dysfunction”, any of countless keywords in fact. Cleverly, he’s also rendering a Google News feed using the same randomly generated keyword so that within the post are headlines that are genuinely on-topic and also displaying a thumbnail from a flickr search.

But, I think he’s missing a trick.

He could scrape headlines from a blended RSS feed based on the rand keyword not just Google News and render that inline within the dynamic post. But, then, rather than letting it drift off into the ether, he could use some clever server-side trick to capture the page generated, have it added to his blog database and archived in a sub-folder. A bit of coding could be used to generate a dynamic sitemap that adds each page as it’s created to that sub-folder’s index file and then he could link to that from his homepage.

It could get very big very quickly. Each post would capture the zeitgeist of the day it was created as users hit the site. If he gets lots of traffic it could escalate quickly producing (mostly) unique content pages for SEO purposes perhaps but also, more importantly, it would generate useful content on popular keywords for his readers. A bit more tweaking would ensure titles, filenames, and meta tags were suitably SEO’ed too.

He’d have to watch out for nasties and check that the dynamic post’s content was truly unique to preclude Google penalties. Even then, however, I guess a site built like this would be considered nothing more than a sophisticated scraper and is probably best avoided. It might shine briefly like a shooting star in the SERPs but would quickly burn out as search engine users flag it up as spam.

The one-off dynamic post on Rob’s site though is great fun, last time I checked it was “about “German beer steins”! Check it out, you never know what he might be talking about when you visit that page.

4 responses so far ↓

  • rob // Jan 11, 2008 at 6:04 pm

    Hello David

    I do a similar thing (but I dropped the rss) on my linky love pages.

    >Even then, however, I guess a site built like this would be considered nothing more than a sophisticated scraper and is probably best avoided. It might shine briefly like a shooting star in the SERPs but would quickly burn out as search engine users flag it up as spam

    Exactly, I couldn’t agree more. Things like that end up getting sites dropped from search indices too. You will see that I noindex follow those pages as ultimately the content created wasn’t really up to much and looked a little madlibish.

    My motto with this webstuff is that unless you are a search engine, or are doing something really really cool with keyword insertions then its best left well alone. Especially if you value your search traffic :D
    rob’s last blog post..Really really bad blackhat SEO

  • David Bradley // Jan 11, 2008 at 7:05 pm

    Hi Rob

    Nice to see your blog back up to full speed! Thanks for the comment. You might be interested in an anti-scraper post that’s coming up soon. If you look at the Sig Figs newsfeed (foot of each item, you’ll get an idea of what the post will be about). Grab the feed now to see the post when it appears or if you cannot wait Google “RSS Footer plugin” and follow the link to Joost van de Valk’s blog.

    db

  • rob // Jan 12, 2008 at 7:55 am

    Hi David

    Yes…the my.cnf file for mysql was the cause. I was using the new default that came with the install and well, it just kept crashing. Out of desperation I refered to the old and its performed a whole lot better since.

    I like joosts idea, although it wont stop a determined scraper. There are lots of ways of parsing things, html can be stripped right out using a php function, without looking at it i dont see how this add on will prevent that.

    Still its a good attempt at fixing a problem that plagues many.

    rob’s last blog post..Really really bad blackhat SEO

  • David Bradley // Jan 12, 2008 at 8:38 am

    Rob, there’s always one little rogue file somewhere. I installed a plugin that screwed up cron and meant that some posts I’d backdated were suddenly appearing on the day for which they’d originally been set. Uninstalled the plugin and back to normal within seconds.

    db

Leave a Comment

Comments are checked for spam before appearing, no need to post it twice.

Related Posts