DigiBored: December 2011

“The future success of companies and organizations will increasingly be based on their ability to unlock hidden intelligence and value from unstructured data, and text in particular”

(The 451 Group, 2005)

The Information Age has made organisations data rich, instead of being starved of information; they now have volumes of data available to them. This presents challenges because organisations often lack the resources to convert them into useful intelligence.

Amongst this data is qualitative data i.e. a person’s opinion of a product or a service, which can be far more useful in developing insight instead of hard facts and figures. However translating qualitative data into knowledge is really hard work. This is because this information is usually collated in a text form made up of observations and quotes, and it takes a lot of organisational resources including time, people and technology to develop it into useful intelligence.

One method available to organisations to develop this knowledge is text mining. Text mining has traditionally been associated with the academic and research fields, but now its becoming more widespread with commercial organisations recognising that the unstructured data that they hold is as valuable as their structured data since it offers a more holistic view of the organisation. As a result many companies are offering text analytics, the text mining equivalent used in business settings as a solution for analysing this data (Feldman, 2004).

So what is text mining?

Text mining is “the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources… linking together the extracted information together to form new facts or new hypotheses…” (Hearst, 2003).

The key tasks of text mining include:

· Classification - The process of tagging new data based on its features.

· Clustering - The process of grouping documents based on the similarities in the content

· Association - The process of organising information into hierarchical networks. (Leong et al., 2004)

Transforming unstructured text into intelligence

Qualitative data can be taken from many different sources including emails from customers, employees and suppliers, research reports. It can also be published information about competitors such as promotional materials & customer comments, and information related to legislative and regulatory changes (Leong et al, 2004).

This data is turned into a structured and a relevant form. The “mining process” is made up of three elements:

1. “information selection and preprocessing;

2. patterns analysis, recognition and visualization; and

3. validation and interpretation” (Zhang and Segall, 2010, p.625).

Mining this information enables organisation to enhance their business intelligence (BI). Wang and Wang (2008, p.623) state that “the central theme of BI is to fully utilize massive data to help organizations gain competitive advantages”. BI helps organisations to discover the knowledge hidden in their data.

Text mining (TM) helps organisations to develop knowledge using “various algorithms and tools to extract metadata or high-level information and/or to discover patterns and relationship within the extracted information” (Choudhary et al. 2009, p.730). The subsequent knowledge can help decision makers to make more informed decisions.

The difference between data, text and web mining

Zhang and Segall (2010 p.625) highlight the differences between data, text and web mining stating that “Data mining primarily deals with structured data organized in a database. Text mining most handles unstructured data/text. Web mining lies in between and copes with semi-structured data and/or unstructured data”.

Ok so that's part 1, in part 2 I’ll outline the ways in which commercial organisations are using text mining to move their business forward. See you next time!

Note: I developed this post a while ago, initially as part of an essay for the MMU Information and Communications department.

References

CHOUDHARY, A.K., OLUIKPE, P.I., HARDING, J.A. and CARILLO, P.M., 2009. The needs and benefits of text mining applications on Post-Project Reviews. Computers in Industry[online]. 60 [cited 12 March 2011] pp.728-740.

FELDMAN, R., 2004. Text Analytics: Theory and Practice, ClearForest Corporation. [online] Available at: [cited 27 March 2011].

HEARST, M., 2003. What is text mining?. Berkeley, [online] Available at: [cited 2 March 2011].

LEONG, E.K.F., EWING, M.T. and PITT, L.F., 2004. Analysing competitors’ online persuasive themes with text mining. Marketing Intelligence & Planning [online]. 22 (2) [cited 14 March 2011] pp. 187-200.

THE 451 GROUP, 2005. Text-aware Applications: The Endgame for Unstructured Data Analysis. cited by CLARABRIDGE, 2008. Text Mining’s Moment: The three trends triggering commercial adaptation. clarabridge.com, [online] Available at: [cited 1 March 2011].

WANG, H. and WANG, S., 2008. A knowledge management approach to data mining process for business intelligence. Industrial Management & Data Systems [online]. 108 (5) [cited 12 March 2011] pp. 622-634.

ZHANG, Q. and SEGALL, R.S., 2010. Review of data, text and web mining software. Kybernetes [online]. 39 (4) [cited 1 March 2011] pp.625-655.

DigiBored

Search DigiBored

Friday, 2 December 2011

Mining our knowledge Part 1