Machine learning is a multidisciplinary field focused
on algorithms that learn using concepts from statistics, artificial
intelligence, cognitive science, and many other disciplines (Qiu, Wu, Ding, Xu, & Feng, 2016). Supervised learning utilizes a training set
containing inputs and desired outputs.
Supervised learning is typically concerned with classification,
regression, and estimation.
Supervised learning can be used to address large data
sets in which all elements are statistically significant because it is not
concerned with the relationships, but rather fitting a curve (in the case of
regression) in a way that minimizes error.
There are many supervised machine learning algorithms (Jordan & Mitchell, 2015). From professional experience, it is often a trial
and error exercise to find the algorithm that works best for a given dataset.
Machine learning, like most approaches based on the scientific method, requires a clear
hypothesis, or null hypothesis. From
professional experience, many organizations think they can learn from their
data without having a clear idea of what they hope to learn outlined. Our team hosts workshops with the explicit
goal of determining what the answers are that require an answer. For example, does increasing advertising
spend correlate with an increase in sales?
Creating a good sample is important in both statistics
and machine learning. The approach I
have seen most often is to create two
sets from a labeled data set. The first
set is used to train while the second is used to test the accuracy of the
trained model. I have not seen a hard and fast recommendation on how big each
should be, but often we I have seen 80%
designated for training and 20% for
testing. The motivation behind this is
to avoid overtraining the model. The basic idea is that an algorithm might be
very good at classifying the data it has
seen before, but very poor when it sees
new data.
This would result in very poor performance. The selection of the items for each group is
typically random.
Machine learning is a powerful way of interacting with
data. Because it leverages many
approaches, including statistics, it facilitates experimentation to reach the
best results. While machine learning is
a powerful tool, for a solution leveraging it to be a success requires clear goals and purpose.
References
Jordan, M., & Mitchell, T.
(2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
Qiu,
J., Wu, Q., Ding, G., Xu, Y., & Feng, S. (2016). A survey of machine
learning for big data processing. EURASIP
Journal on Advances in Signal Processing, 2016(1), 67.
doi:10.1186/s13634-016-0355-x
No comments:
Post a Comment