From Watson to Big Data: Content Intelligence is all around us.

What is Content Intelligence?

The most common initial question about Intelligent Content may be, “So why not just make computers smarter?”

Actually developers have been working on this problem for decades with continual, but slow, progress. Recently, efforts have resulted in products that offer Content Intelligence—the ability to comprehend “unstructured data.” As with any technology, understanding the capabilities and limitations is a key to success; and these might not always match the marketing claims.

Much of the current interest in Content Intelligence is based on the very hot topic of “Big Data,” a term used to describe the 70-80 percent or more of organizational information that exists in an unstructured format with a strong focus on social media, email, and other short textual exchanges. Often missing from the discussion of Big Data is what we like to call “Big Content” – the segment of Big Data that is created as knowledge workers go about their daily business and that contains some of the most valuable information assets of any organization. This includes nearly everything created in Microsoft Office and the new Enterprise Social applications that organizations are just starting to use.

Content Intelligence is currently focused on social media for two reasons. First, marketing and other business strategists are fascinated with the prospect of actually knowing what everyone is saying on Twitter, Facebook, and the other sites that are capturing millions of consumer impressions every day. This represents a quantum leap in the reach and effectiveness of market research. The second reason is that these short conversational bursts of text present a challenge that is more easily addressed with current technology. Understanding tweets is not an easy thing to do; however it is much easier than replacing the human mind when the purpose of the analysis is to automate the creation of Intelligent Content.

Some Content Intelligence vendors do talk about replacing the human effort required to create Intelligent Content with Content Intelligence features such as “automated tagging” and “entity extraction.” This is usually in an all or nothing fashion. While Intelligent Content and Content Intelligence are obviously two sides of the same problem their respective technologies have been developed and marketed almost completely in isolation. They are so mutually exclusive today that their respective vendors normally compete for the same customer business, forcing organizations to decide whether to approach the problem through manual effort or automation.

As consultants, we would suggest that both the tools to create Intelligent Content and the tools for Content Intelligence have room for improvement. Creating Intelligent Content is often tedious and time consuming for users even when the tags and other technical aspects of XML are hidden. Automation that can arrive at the same output achieved with manual effort is well beyond the current technical ability of Content Intelligence, marketing claims aside. The “aha moment” we are promoting is the vision of these two technologies working together instead of competing against each other.

There is no reason why Content Intelligence vendors should assume that we must place the burden completely on technologies that mine unstructured data. This makes the problem extremely difficult and far more complex than it needs to be and one that in most cases the resulting problem is too complex to be solved by currently available technology.

In the same way, Intelligent Content by itself assumes that we must place the burden completely on the users who create the content. This results in usability and change management problems that are far more disruptive than they need to be and that can lead to solutions that sound good but do not work in practice.

When the technologies are combined, the user becomes responsible for a much smaller amount of “data improvement.” At the same time mining technologies do not have to be as sophisticated because even a small amount of data improvement provided by the user dramatically reduces the technical challenge of automating the rest.