Petar Maymounkov

On the edge of noise: Recommending without adding value

Wed, May 22, 2013

In layman's terms, the Internet is overflowing with information these days. This has created huge opportunities (e.g. a definitive increase in the velocity of science), as well as huge problems (e.g. the emergence of powerful detrimental teenage subcultures) for society. Why this is the case and whether we could decouple the good and the bad of the Internet is a topic that many ponder day in and day out.

Attacking this question head-on is very laborious because we are only able to observe the high-level (written or video-taped, perhaps) interactions of a large set of highly complicated humans (themselves not well understood), influenced by many potentially unobserved factors, interacting via a global storage and connectivity system.

Alternatively, one can ask a more myopic question: What changed between the olden times and now? I find that this question has a somewhat convincing answer, which definitively can be used to turn the dial down on the information noise. The answer is:

It used to be required to add value before you can recommend.

So let me build up to this assertion. First of all, postulating this statement began by assuming the commonly-held belief about what is going wrong on the Internet today. The short answer: there is too much noise. Noise is the availability of too much information presented to the user, requiring "manual" (really, mental) work to drill down into a desired topic.

One of the Internet's two big features is its high-connectivity (the other is its big memory) which has enabled pushing pre-existing information to new users (or to new logical domains, to be precise and abstract). Pushing information to a user can be both good and bad. More information means more knowledge and therefore more creative potential. More information also means more time spent sorting out, and therefore reduced time for creativity.

Nature is fraught with this phenomenon which carries the deserved technical name internal conflict. It would appear that the present Internet is over-utilizing its connectivity to push more information to users, to an extent to which the manual cost of sorting out the information exceeds the value of the relevant new information.

Conjecturing that this is the problem (and I am sure this is an over-simplification, as discussed below), I chose to make the following subjective verification. I divide the information age in three categories: pre-Internet, non-social Internet and social Internet.

Now notice this. In the pre-Internet stage recommendation existed in the form of person-to-person verbal, written on paper or audio/video sharing. In all cases, it was impossible to make a recommendation to another, without having that recommendation be part of a context, like a person-to-person conversation or a newspaper article. In other words, you needed to have invested quite some time interacting contextually and defining a topic before you could offer your opinion and say "You should really read that book."

Much of the same applies to the early non-social Internet. The form of recommendation there was a link on a web page. Interestingly, links were never created at the speed at which retweets and reblogs happen these days, because you needed to make a web page first, before you had the privilege of being able to put a link within it.

The main, in my subjective opinion, difference that the social Internet brought, was that people became able to recommend (by retweeting, reblogging, resharing, etc.) without having to invest any mental labor during the act of recommending. E.g. when you retweet, you are not required to add your own take on the subject, and thus retweeting becomes as easy as a click of a mouse. The recommender never has to invest in convincing the listener that the recommendation is good or is on an interesting topic.

I think this clear difference is something worth pondering. And I think that if you believe it, it gives clear prescription as to how to design "sharing" interfaces so as to slow down the injection of noise.

While the argument so far ties a nice simple story together, I want to make it clear that it certainly doesn't capture the depth of the problem. The next section gives an example of one other human phenomenon that is influenced by Internet dynamics.

Another dimension: Exploration vs exploitation

Admittedly, the manual labor costs of sorting out information are far too simplistic a model, in general, for what creates the "bad" on the Internet. The conflict of attention dissipation vs information utilization is another concurrent phenomenon.

So, in the case of humans, there is a secondary cost in sorting out information: attention dissipation. One stumbles upon something interesting and puts the ongoing pursuit on hold. Attention dissipation is not always bad, however. Sometimes one finds new "unsolicited" information that furthers their pursuits in unexpected ways. I have. Many times.

Since each human has a fixed number of hours-per-day when they are able to exercise intent focus, the main internal conflict elucidated by attention dissipation is that of exploration vs exploitation. In other words, what part of the day will one spend exploring noisy information, hoping to find something unexpected, and what part will they work on creating new information by recombining and deriving from the old.

This conflict suggests that in general the Internet noise problem is quite a bit more subtle. Sometimes more information is OK, as long as it is of a certain type that has a higher potential for exploration rewards. E.g. a search query for "mathematics" that returns 10 random Mathematics books is more useful than a search query for "mathematics" that returns 2 random books.

The issue with unconditional listening

Comments coming in have reminded me of another aspect I neglected to mention. While it is true that "recommendation with vs without added value" describes very crisply the difference between before and now, one can give just as crisp a description in other terms.

Notice that all systems that allow no-value recommendations (retweet, reblog, etc.) also include a subscription model where one user unconditionally subscribes to all future posts of another. The presence of this subscription model is nearly 100% correlated with the presence of no-value recommendation mechanisms.

For one, the model of promising to "listen to someone else unconditionally (regardless of topic)" in the future is problematic simply because, in general, future promises made in the presence of partial information today are always problematic: one needs to monitor and make an exit when needed. This is akin to concerns arising when investing into equity. Except of course, Internet users are not nearly as attentive to the freshness of their subscriptions as are day traders.

There is a secondary axis of evil too, however. The notion of "subscribing to a person" is somewhat ill-defined, because each person has a set of interests, not all of which align with the subscriber. This problem is weakly addressed by Google+ in the notion of circles. Unfortunately, it is not solved because the circles and how they can be used to filter are on "the wrong side of the publisher/subscriber equation."

The subscription model can thus well be another culprit for the state of affairs. That said, this is not the model in some niche cases, like Hacker News, where the original noise problem (also refered to as "lack of focus") persists despite the lack of a publisher/subscriber model.

Comments? Please join the Google+ discussion. A read-only copy follows: