How to get the most from Q&A sites
May 31st, 2011 by David Bradley >> No Comments
Hundreds of millions of people answer questions online, hoping to assist others in need of information, to show off their intellectual prowess, sometimes to point the questioners to their own products and resources and for many other reasons. Sites such as Yahoo Answers, Quora, Aardvark, Answerbag, eHow, Mahalo Answers, Yedda, wikiHow and many other Q&A sites have sprung up over the years to fulfill the questioning and answering needs of users.
Unfortunately, the multitude of Q&A sites out there also use a multitude of formats, have very variable quality, and indeed very variable quality control. There is therefore a need to extract standardized information from such sites if we are to make the whole loose endeavor that we might refer to as social computing. Computer and information scientists in the Republic of Korea have now devised a framework, which they believe could be used to extract useful and validated information from Q&A sites.
Won Kim, a former senior adviser at Samsung and now at Kyungwon University and colleagues there and at Sungkyunkwan University, have created what they refer to as a Q&A thesaurus to organize and hold collective intelligence. They tested their approach on Yahoo Answers and Yedda both of which produce XML output and have an API (application programming interface) allowing them to hook into the Q&As on both sites and to extract data. Fields within the Q&As are mapped, so that title and subject from each are made equivalent, they then knock out the stop, or noise, words (the “it”, “so”, “the” etc). They also applied a standard stemming algorithm so that words are made equivalent if they have tightly related meaning, e.g. find, finds, and finding are equated. Synonyms are also tied in this way. A thesaurus is then compiled for the Q&As using Bayesian statistics as an automated means to classify the data.
For their test data, the team collected almost 1000 Q&A sets from the Beauty and Style/Fashion and Accessories category of the Yahoo Answer site, and manually classified them into eight categories: Shirts, Shirt/blouses, Sweater, Dresses, Suits, Jeans, Pants and Accessories. They extracted a total of 6238 terms. They also collected 500 Q&A data from the Beauty and Style/Fashion and Accessories category of the Yedda site. The choices being useful in demonstrating how their approach might work in extracting recommendation data for an e-commerce site or fashion discussion group, for instance.
Jehwan Oh, Ok-Ran Jeong, Eunsoek Lee, & Won Kim (2011). A framework for collective intelligence from internet Q&A documents Int. J. Web and Grid Services, 7 (2), 134-146

"Deceived Wisdom: Why What You Thought Was Right Is Wrong" from David Bradley. Available now on 

