The focus of this article will not be to define what data science is, the task of coming up with a correct definition that can be widely accepted is just out of scope for us today (we will include, however, what an «accepted» definition of data science is). What we want, is to focus on the science part of data science.
We hold science in high esteem. We can say that there is a widely held belief that there is something special about science and its «methods». If we add the «scientific tag» into some claim, idea, or line of reasoning, or we say that some research is «scientific», we unconsciously make an assumption that there is some sort of merit, some kind of special reliability. But is there anything really special about science? What is the «scientific method» that is argued to lead us to so meritorious and reliable results?
We can find plenty of evidence that science is held on high regard. I like this video where Richard Dawkins makes an argument for the use of science (I like his concluding remark) «Planes fly, cars drive, computers compute. If you base medicine on science, you cure people. If you base the design of planes on science, they fly. If you base the design of rockets on science, they reach the moon…» he says. Back in Mexico, there were a plethora of advertisements frequently asserting that a particular product had been scientifically shown to be more potent, more efficient, more sexually appealing or in some way «scientifically proven» to be superior to its competitors.
Not everything is milk and honey with the tag «science». There can also be certain disenchantment with science because of some, perhaps unconscious, associations with bad parts of our history: the hydrogen bombs, pollution, etc. And when it comes to data science, there is also a level of disenchantment (also a very high level of skepticism, due to all the buzz around it), an example of this that comes to mind is the book «Weapons of math destruction: How Big Data Increases Inequality and Threatens Democracy» by Cathy O’Neil.
It is important to keep a critical mindset, I dare saying, especially in a business setting where you have the risk of being seduced by the power and buzz of terms such as data science. You need to ask yourself, what is the basis for such authority and trust we put in this data science thing. Take for example the academic world, many areas of study are now tagged as sciences by their proponents, and we can presume that this is an effort to imply that the methods used are as firmly based and as potentially fruitful as in a stem science such as physics. Examples of this «tagging» are:
- Political «sciences»
- Social «sciences»
- Library «science»
- Administrative «science»
- Speech «science»
- Forest «science»
- Dairy «science»
- Meat and animal «science»
- Mortuary «science»
- Creation «science»
- Etc. «science»
Please, take the following argument into consideration:
We can assume that the undoubted success of physics over the last three hundred years, is to be attributed to the application of a special method: «the scientific method». Therefore, if data science is to emulate the success of physics then that is to be achieved by first understanding what this method is and then applying it to data science.
At least two fundamental questions are raised by this argument:
- What the %#@! is this scientific method thing that is argued to be the key to the success of physics?
- Is it a legitimate goal to try to transfer the method from physics and apply it through data science in a business context?
Well, answering these questions is not trivial. An attempt to capture our intuitions about what could be the answers to these questions is that,
The knowledge that (data) science offers is derived from the facts, from the data, rather than being based on personal opinion.
With this, we mean to say that, while our personal opinions may disagree over the quality of the compositions of A$AP Rocky and Frédéric Chopin, there is no room for such difference of opinion on the relative merits of Galileo’s and Einstein’s theories of relativity. It is the facts that determine, presumably, the superiority of Einstein’s ideas over previous concepts of relativity… anyone who fails to appreciate this is «just wrong».
In future articles we will present reasons for doubting that data, facts, acquired by different «observation methods» and experiments are as straight forward and secure as it is traditionally assumed. This is especially true in the enterprise context where there is not a single data scientist that claims to be working less than 75% with data cleaning, where data cleaning is the lesser evil when it comes to trusting data in the industry.
Even in hard sciences, one can find strong evidence to back the claim that scientific knowledge can not be conclusive nor proved conclusively disproved by reference to data, or facts, even if we assume the quality of the data collection. Some arguments to support this skepticism are backed on an analysis of the nature of observation. Other types of arguments are based on the nature of logical reasoning and its limitations. When we look at the history of science, an embarrassing result is that those episodes in the history of science that are regarded as most characteristic of major scientific advances (e.g. the innovations of Galileo, Newton, Darwin or Einstein), do not match what standard accounts of science say they should be like.
A reaction to the situations we presented here, is the line of thought that states that the scientific theories cannot be conclusively proved or disproved and the constructions of philosophers to define what the scientific method is, bears little resemblance to what actually goes on in the real world, and so one should give up the idea that science is a rational activity operating according to a special (scientific) method. According to this anarchistic current of thought, science has no special features that make it intrinsically superior to other kinds of knowledge such as mythology or hechicería. They support the idea that high regard for science is seen as a modern religion, with a similar role to that played by Christianity in Europe in earlier times. This line of thought suggests that the choices between scientific theories boil down to choices determined by the subjective values and ideals of individuals.
This kind of anarchistic point of view, or response to the difficulties with traditional accounts of science and the scientific method will be resisted from adoption our subsequent explorations of the meaning of science in data science.
In future entries, we will adopt a pragmatic point of view and we will accept what is valid in the challenges that the anarchist of science present. We will try to give an account of science that captures what is truly different, special and core in the features of the scientific method in a way that aims to face the anarchistic point of view. After all, we truly believe that there is real power in the methods driving the advances of a science such as physics, and even though it is not a trivial task, we will argue that it is possible to transfer, if not entirely, at least the core features of scientific thinking into data science. The digitalization agenda in Norway us very high in the government’s priorities (yes I have to mention it, in Norway amazingly enough the conservative party has the first minister of digitization ). Perhaps this race for digitalization is one of the reasons why data tends to have more weight in data science (big data!), some more of the potential reasons for this weighting are:
- John Tukey’s quote: “The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”. You may have 100 Gb and only 3kb are useful for answering the real question you care about.
- When you start with the question you often discover that you need to collect new data or design an experiment to confirm you are getting the right answer.
- It is easy to discover “structure” or “networks” in a data set. There will always be correlations for a thousand reasons if you collect enough data. Understanding whether these correlations matter for specific, interesting questions is much harder.
- Often the structure you found on the first pass is due to phenomena (measurement error, artifacts, data processing) that doesn’t answer an interesting question.
We will try to convince you that:
The keyword in data science is not data, it is science
About the writer
Arturo is a technologist born in Veracruz, Mexico and has been living in Norway since 2010. He completed his BSc and MSc in Mexico and his PhD in Trondheim, Norway. He is taking the last semester of his Executive MBA at BI in Oslo, to be concluded in September 2020.
During his experience as a scientific researcher, which started with the publication of a couple of papers in international journals after his BSc thesis, he has participated in several research projects. The results of these projects can be found in the 6 research papers shared below. The topics of these projects vary from theoretical physics, mathematics, statistics, and applications of Big Data technologies to understand human behavior. Part of Arturo’s training as an academic includes the use of different programming languages and scientific tools such as Mathematica, Matlab, C++, being Python one of his most common go-to tools.
After concluding his PhD work at NTNU, Arturo made a transition into the private sector in 2015. He started leading the development of the Mobility Analytics service in the business division of Telenor Norway. Arturo led the development of this service, from strategic aspects and all the way to the technical aspects of the development. Not only did Arturo work in tasks such as developing Go To Market strategies but also with hands-on development of GIS data visualizations and development of the code-base to power the Mobility Analytics Service.
The extensive experience as a leader and communicator in the scientific and private sector fields has made Arturo a firm believer that it is not often that leaders and executives make decisions based on hard, cold numbers, but that it is necessary to communicate the results from the analytic work via great story-telling and understanding of the business mechanisms in which value can be created. This realization made Arturo take the decision to start his EMBA program in 2019.
During his EMBA studies with a specialization in managing and developing digital enterprises, and which include a leadership program, Arturo has expanded his business knowledge to include fields such as digitalization strategies, innovation frameworks (such as design thinking lean and agile), entrepreneurship, HR management, and value creation and capture.
As his professional track shows, Arturo has a mind eager to always learn and apply his knowledge. He has taken several courses and certifications within Machine Learning and Artificial Intelligence and has virtually never stopped studying, even after concluding his doctoral degree.
Arturo enjoys mentoring and helping colleagues and people in general. He is always open to grab a coffee, a slice of pizza or a pint of beer, so… feel free to send him a message and connect with him, even if you haven’t met him in person.