Tech talk, social media, blogging, computing tips and tricks

The IF, THEN and WHY of Web use

February 3rd, 2011 by David Bradley >> No Comments

Analyzing user statistics across Web sites is an issue that often comes under scrutiny from privacy advocates worried that marketing companies are exploiting their personal data to track their behavior and target them with advertising. The issue is, of course, a double-edged sword. Many of us would prefer that our online behavior is not being monitored in any way, but if advertising is inevitable then allowing marketers to display pertinent advertisements could be less annoying than seeing the random offerings that everyone else sees.

But, Web usage can be used in a much more positive way than propping up the profit lines of marketing companies. Google’s FluTrends has become an essential global tool for monitoring outbreaks of influenza. It works by spotting unusual search activity centered on topics related to the disease. Similar search tracking could readily be used to spot other trends or the emergence of new diseases. Tracking and analysis technology similar to that used by marketing teams could also be put to good use in spotting important global, national and regional trends. It might also have negative connotations in detecting and stifling uprisings and militant activity too, which could be a serious problem in particular parts of the world.

Regardless of whether you perceive it as a good or a bad thing tracking is happening across the globe, its impact on the lives of you, your family and friends will depend on whether you live in a so-called free or totalitarian state. You might think that cookie-control plugins and next-generation browsers that claim to prevent tracking will help, but if “they” want to track you, “they” will.

Meanwhile, Dong (Haoyuan) Li of the University of Tours and colleagues Anne Laurent and Pascal Poncelet at the University of Montpellier, France, point out that much of the research into tracking and Web usage analysis has focused on gathering data on common behavior. But, of far more interest would be data on unusual behavior. The odd activity that is to the left, or right, of the norm.

The team has now analyzed patterns of Web usage and discovered that various rules can be extracted that allow unexpected Web usage to be predicted. They call their approach WebUser (Web Unexpected SEquence Rules) and describe it as “a belief-driven framework for mining unexpected Web usage in session sequence databases.”

“A rule in the context of mining unexpected Web usage is an IF-THEN implication such as ‘IF a user visits science news THEN he/she will visit technology news later’ that might be discovered from Web usage data,” explains Li. “A belief is an extension of a rule such as ‘IF a user visits the science news THEN he/she will technology news later, HOWEVER he/she will not visit politics related content.’ In this case, visiting science news and then politics news corresponds to unexpected behavior, and Li and colleagues anticipate their system will discover WHY.

They suggest that such rules could be applied for “Web content personalization and recommendation, site structure optimization and critical event prediction.” That hints at marketing again but could equally allow a company to offer a better service (or a net neutrality skewed service if you like) but could also be used to spot when something is about to go viral, whether that’s the latest pop video or a popular revolution.

To evaluate their approach, the team performed a series of experiments on three web access log files, including a very large log file of a BSD UNIX online discussion forum, a large log file of a customer support forum of an online game provider, and a small log file of a university library Web portal. All log files were converted to session sequence databases and anonymized. They then applied their WebUser algorithms to the data to effectively extract the rules for each data sequence.

Research Blogging IconDong Li, Anne Laurent, & Pascal Poncelet (2011). WebUser: mining unexpected web usage Int. J. Business Intelligence and Data Mining, 6 (1), 90-111


Enhanced by Zemanta