Skip to main content icon/video/no-internet

Data mining typically refers to the statistical analysis of large volumes of data to identify potentially useful patterns and relationships in these data. In particular, data mining practitioners seek undiscovered, unexpected, or previously unsubstantiated patterns and relationships within the databases they are analyzing. Assisted by computers, these practitioners look for indicators they can use to generate accurate forecasts of future conditions and adaptive responses to current or future conditions. Although most data mining projects follow a similar process, this process is applied to a wide range of decision-making contexts, including health care management, strategic communication, and epidemiological research, to name only a few.

Databases used in data mining often involve information about individual patients, consumers, citizens, or students, depending on the type of database. For this reason, corporations, governments, and other organizations that use data mining are often asked to take steps to address concerns for individuals' privacy.

The term data mining highlights the central concept of its definition: In the same way that miners searching for buried minerals dig through vast quantities of less valuable materials to extract what they seek, data miners evaluate large quantities of data to identify indicators that can be used to enhance subsequent decision making. Like traditional forms of mining, data mining usually begins as an exploratory process, with data miners willing to look for insights in whatever areas of the database prove promising. For this reason data mining generally relies upon technology to sift through large databases, using data miners' statistical expertise and topical knowledge (e.g., about the health care industry) to guide automated data processing.

The goals of typical data mining projects involve maximizing positive outcomes for a particular organization while minimizing its costs, including retaining current customers, employees, or donors; identifying new and potentially lucrative opportunities for the organization's products or services; or customizing the organization's marketing communications for specific audiences. The data used in such projects can come from an almost limitless number of sources, often incorporating Internet browser tracking data; customer transaction histories; medical, government, and academic records; and global positioning system (GPS), text, audio, or image data. Because data mining is an exploratory process, its practitioners may use a range of data sources and analysis techniques, with the aim of generating useful recommendations based on past events, establishing relationships between important events across time, or developing indicators that can be used to predict future events.

Data Mining Process

Data mining generally follows a process to construct statistical models from available data that describe important known outcomes and to use these models to predict or improve outcomes of scenarios in which the outcome is not yet known. For example, in a health communication context, data mining might be used to construct a model of risk factors for a particular disease, using a large database of hospital admission records to extract patterns of demographic and behavioral characteristics of patients diagnosed with the disease. These patterns of characteristics could then be used to more efficiently target disease-related communications (e.g., about behaviors relevant to disease prevention) to particular healthy audiences most at risk for developing the disease.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading