Monthly Archives: February 2014

What is the difference between data mining, statistics, machine learning and AI?

There is considerable overlap among these, but some distinctions can be made. Of necessity, I will have to over-simplify some things or give short-shrift to others, but I will do my best to give some sense of these areas.

Firstly, Artificial Intelligence is fairly distinct from the rest. AI is the study of how to create intelligent agents. In practice, it is how to program a computer to behave and perform a task as an intelligent agent (say, a person) would. This does not have to involve learning or induction at all, it can just be a way to ‘build a better mousetrap’. For example, AI applications have included programs to monitor and control ongoing processes (e.g., increase aspect A if it seems too low). Notice that AI can include darn-near anything that a machine does, so long as it doesn’t do it ‘stupidly’.

In practice, however, most tasks that require intelligence require an ability to induce new knowledge from experiences. Thus, a large area within AI is machine learning. This involves the study of algorithms that can extract information automatically (i.e., without on-line human guidance). It is certainly the case that some of these procedures include ideas derived directly from, or inspired by, classical statistics, but they don’t have to be. Similarly to AI, machine learning is very broad and can include almost everything, so long as there is some inductive component to it. An example of a machine learning algorithm might be a Kalman filter.

Data mining is an area that has taken much of its inspiration and techniques from machine learning (and some, also, from statistics), but is put to different ends. Data mining is carried out by a person, in a specific situation, on a particular data set, with a goal in mind. Typically, this person wants to leverage the power of the various pattern recognition techniques that have been developed in machine learning. Quite often, the data set is massive, complicated, and/or may have special problems (such as there are more variables than observations). Usually, the goal is either to discover / generate some preliminary insights in an area where there really was little knowledge beforehand, or to be able to predict future observations accurately. Moreover, data mining procedures could be either ‘unsupervised’ (we don’t know the answer–discovery) or ‘supervised’ (we know the answer–prediction). Note that the goal is generally not to develop a more sophisticated understanding of the underlying data generating process. Common data mining techniques would include cluster analyses, classification and regression trees, and neural networks.

I suppose I needn’t say much to explain what statistics is on this site, but perhaps I can say a few things. Classical statistics (here I mean both frequentist and Bayesian) is a sub-topic within mathematics. I think of it as largely the intersection of what we know about probability and what we know about optimization. Although mathematical statistics can be studied as simply a Platonic object of inquiry, it is mostly understood as more practical and applied in character than other, more rarefied areas of mathematics. As such (and notably in contrast to data mining above), it is mostly employed towards better understanding some particular data generating process. Thus, it usually starts with a formally specified model, and from this are derived procedures to accurately extract that model from noisy instances (i.e., estimation–by optimizing some loss function) and to be able to distinguish it from other possibilities (i.e., inferences based on known properties of sampling distributions). The prototypical statistical technique is regression.

Source: http://stats.stackexchange.com/questions/5026/what-is-the-difference-between-data-mining-statistics-machine-learning-and-ai