
The information that can be extracted from web documents includes customer and reviewer sentiment, brand associations (what special terms is a brand associated with) and topic associations (what topic is a document associated with).
It is also possible to extract mentions of named entities (people, places, products, etc) in news documents and determine whether the mention is positive or negative.
Some typical web text analysis tasks would include:
Identification of relevant web-pages.
Finding answers to factoid questions.
Identification of whether a mention endorses the object of interest or expresses a negative sentiment about it.
Identifying similar documents or passages.
Raw text obtained from the internet can be used to construct raw text corpora and further cleaned up and processed to extract some kinds of structure.
Some of the above are still research areas that are not ready for products just yet.