Abstract:
Over the past two decades, Data-Intensive Analysis has emerged not only as a basis for
the Fourth Paradigm of engineering and scientific discovery but as a basis for discovery
in most human endeavors for which data is available. Originating in the 1960s, its recent
emergence due to Big Data and massive computing power is leading to widespread deployment,
yet it is in its infancy in its application and our understanding of it; hence in its development.
Given the potential risks and rewards of Data-Intensive Analysis and its breadth of application,
it is imperative that we get this right.
The perspective taken here is first that the objective of this emerging Fourth Paradigm,
like its predecessor, the Scientific Method, is more than merely acquiring data and extracting knowledge;
it is to investigate phenomena by acquiring new knowledge, and correcting and integrating it
with previous knowledge; and second, that Data Science is a body of principles and techniques
with which to measure and improve the correctness, completeness, and efficiency of Data-Intensive Analysis.
It is now time to identify and understand the fundamentals. This perspective is used to analyze
more than 30 very large-scale use cases to understand current practical aspects to gain insight
into the fundamentals, to address the fourth “V” of Big Data — veracity. This development may take decades.