Tech talk, social media, blogging, computing tips and tricks

Dynamic Quality Control for the Web

February 23rd, 2009 by David Bradley >> 3 Comments

quality-controlAt the end of January, Google flagged up all websites on the internet as containing malware. Apparently, it was a human error in that someone had a slash on the database and it dampened the search engine’s output. I reported on the issue from Sciencetext and via Twitter until the problem was resolved and Google admitted culpability.

That was a “one-off”, although it brought the whole world to a grinding halt at least if you were trying to visit a page from that search engine. In fact the grinding halt to which it brought the internet is on a par with the grinding halt to which the UK has been brought (at the time of writing) by a few snow flurries. Check out Twitter hash tag #uksnow to see what I mean. This topic was “trending” on Twitter higher than the Superbowl and Groundhog Day…on Groundhog Day while the Superbowl was under way!

Anyway, the debacle highlights an issue the vast majority of internet users face time and again: How can you tell if a web page you’re visiting is of good quality or not?

Obviously, you can use your own judgment and make a decision as to whether the content and opinions expressed are valid, evidence based and justifiable. But, what if you are reading outside your field of expertise, or are a student investigating a topic for the first time. Most worryingly, is the question of medical guidance on the web. If you’re a patient desperate for information about your condition, looking for effective treatments, then how can you tell that the site you find at the top of the Google search results is not a peddler of snake oil?

Search engine overload

Moreover, search engines usually provide huge numbers of results. These are often ranked by relevance and link popularity, but that does not necessarily reflect quality. A lot of people might simply link to garbage for whatever reason.

“To evaluate the quality of these results, one must manually check each web document and apply some heuristics to determine what the quality of the information is,” Surya Yadav at Texas Tech University, in Lubbock, Texas, explains. However, that simply brings the argument full circle to how does a lay person evaluate seemingly expert information. Moreover, while this task would be extremely difficult if only a few results were returned, often there are thousands.

The internet is extremely useful in that it gives people all over the world access to all sorts of information. Much of this information is valuable, but much of it is also useless. While search engines have become very effective at retrieving relevant information, it is currently the responsibility of the user to wade through the results to evaluate them with respect to quality.

Quality criteria

Yadav, who is the James & Elizabeth Sowell Professor of Telecom Technology, at Rawls College of Business, at Texas Tech University, in Lubbock, Texas, has pondered such issues at length. Now, he and his colleagues have developed a series of criteria for determining web page quality, which they say can be implemented in a flexible and automated system to calculate a quality score for each page. They have demonstrated proof of principle with a medical search.

Quality criteria consists of measures for evaluating websites as well as their web pages,” Yadav told Sciencetext, “Website-evaluation criteria include Source, Credential, Conflict of Interest, and Bias etc. Web page-evaluation criteria include Relevance, Accuracy, Cohesiveness, Currency, Information Context, and Evidence etc.”

Needless to say, should such a system be implemented it would in practice work well only for a short time without ongoing human checking. As with Google search engine results pages (SERPs), those seeking to game the system and get their page to the top of the results or spoof the quality of their site will quickly reverse engineer the criteria being applied and find so-called black hat (spammy) methods to circumvent them.

The result would be that the results returned as being of good quality would be quickly contaminated by spam pages just as are the SERPs of most of the major search engines. Users would then have to rely on their own expertise and knowledge to sift the wheat from the chaff as they currently do. We would return once again to our initial question: How can you tell if a web page you’re visiting is of good quality or not?

The answer lies in the quality rating system being dynamic and checking web page content and sources under the website criteria, explains Yadav. “A system flexible enough to change dynamically and adjust the quality criteria and also evaluate the website credibility will mean spammers will have a tough time circumventing the system. A quality rating system must be dynamic and flexible where criteria set can easily be changed,” he says.

Surya B. Yadav (2008). Automation of webpage quality determination Int. J. Information Quality, 2 (2), 152-176


Leave a comment ↓

  • Hersh Bhardwaj // Feb 24, 2009 at 12:45 pm

    Hi David,
    Insightful post, thanks. I have read prof. Yadav’s writings in past. Everyone from a Matt Cutts to an ordinary searcher has always wondered if such a system can be developed where we get the most relevant information every time. Theoretically, a dynamic and flexible system can be developed but again websites are not designed upon an absolute standard in science labs. Even the most perfect system will have problems rating the same ten sites in the same ranking every time. Its not the spammers we should worry about, its the basic relevancy-issues of SEs that first need a lot of attention. What you think?

  • David Bradley // Feb 24, 2009 at 2:56 pm

    Yeah, I think you’re probably right. I reported on this because I thought it was an interesting paper, but I don’t feel they actually have an answer.

  • Hersh Bhardwaj // Feb 24, 2009 at 4:48 pm

    Exactly! I am glad you thought the same. Its easy to theorize and list problems; as we do in most university research departments!(sorry to sound anti-academic).