Dynamic Quality Control for the Web
February 23rd, 2009 by David Bradley >> 3 Comments
At the end of January, Google flagged up all websites on the internet as containing malware. Apparently, it was a human error in that someone had a slash on the database and it dampened the search engine’s output. I reported on the issue from Sciencetext and via Twitter until the problem was resolved and Google admitted culpability.
That was a “one-off”, although it brought the whole world to a grinding halt at least if you were trying to visit a page from that search engine. In fact the grinding halt to which it brought the internet is on a par with the grinding halt to which the UK has been brought (at the time of writing) by a few snow flurries. Check out Twitter hash tag #uksnow to see what I mean. This topic was “trending” on Twitter higher than the Superbowl and Groundhog Day…on Groundhog Day while the Superbowl was under way!
Anyway, the debacle highlights an issue the vast majority of internet users face time and again: How can you tell if a web page you’re visiting is of good quality or not?
Obviously, you can use your own judgment and make a decision as to whether the content and opinions expressed are valid, evidence based and justifiable. But, what if you are reading outside your field of expertise, or are a student investigating a topic for the first time. Most worryingly, is the question of medical guidance on the web. If you’re a patient desperate for information about your condition, looking for effective treatments, then how can you tell that the site you find at the top of the Google search results is not a peddler of snake oil?
Search engine overload
Moreover, search engines usually provide huge numbers of results. These are often ranked by relevance and link popularity, but that does not necessarily reflect quality. A lot of people might simply link to garbage for whatever reason.
“To evaluate the quality of these results, one must manually check each web document and apply some heuristics to determine what the quality of the information is,” Surya Yadav at Texas Tech University, in Lubbock, Texas, explains. However, that simply brings the argument full circle to how does a lay person evaluate seemingly expert information. Moreover, while this task would be extremely difficult if only a few results were returned, often there are thousands.
The internet is extremely useful in that it gives people all over the world access to all sorts of information. Much of this information is valuable, but much of it is also useless. While search engines have become very effective at retrieving relevant information, it is currently the responsibility of the user to wade through the results to evaluate them with respect to quality.
Yadav, who is the James & Elizabeth Sowell Professor of Telecom Technology, at Rawls College of Business, at Texas Tech University, in Lubbock, Texas, has pondered such issues at length. Now, he and his colleagues have developed a series of criteria for determining web page quality, which they say can be implemented in a flexible and automated system to calculate a quality score for each page. They have demonstrated proof of principle with a medical search.
Quality criteria consists of measures for evaluating websites as well as their web pages,” Yadav told Sciencetext, “Website-evaluation criteria include Source, Credential, Conflict of Interest, and Bias etc. Web page-evaluation criteria include Relevance, Accuracy, Cohesiveness, Currency, Information Context, and Evidence etc.”
Needless to say, should such a system be implemented it would in practice work well only for a short time without ongoing human checking. As with Google search engine results pages (SERPs), those seeking to game the system and get their page to the top of the results or spoof the quality of their site will quickly reverse engineer the criteria being applied and find so-called black hat (spammy) methods to circumvent them.
The result would be that the results returned as being of good quality would be quickly contaminated by spam pages just as are the SERPs of most of the major search engines. Users would then have to rely on their own expertise and knowledge to sift the wheat from the chaff as they currently do. We would return once again to our initial question: How can you tell if a web page you’re visiting is of good quality or not?
The answer lies in the quality rating system being dynamic and checking web page content and sources under the website criteria, explains Yadav. “A system flexible enough to change dynamically and adjust the quality criteria and also evaluate the website credibility will mean spammers will have a tough time circumventing the system. A quality rating system must be dynamic and flexible where criteria set can easily be changed,” he says.
Surya B. Yadav (2008). Automation of webpage quality determination Int. J. Information Quality, 2 (2), 152-176