Technology is advancing, and it is advancing fast! These advances bring changes with them, they are changes that not only affect our daily lives but they are changes that are affecting the industry in a very dramatic way. An universe of new data sources, new content, new data are emerging all around us, call it IoT in an industrial context, mobile applications, or social network data. There is great potential to create value through these data: innovative insights, a better understanding of problems in industries or even in societies, even opportunities to predict and perhaps shape the future.
How can we tap into this promising potential? Well, we can argue in favor of data science. We can go as far as saying that data science is the principal means to discover and tap into the potential to create value. Data science, when done properly, can enable an organization to deal with Big Data: see underlying patterns, discover relationships in dynamic variables, detect anomalies, making sense of unstructured data such as images, text, etc.
However, even with the great promise that the use of data science represents in industry, things are not so straight forward. Data science in the industrial context can be a dangerous thing, if not handled with care! Obviously, when top management sees the leading technology companies embrace data science, machine learning, artificial intelligence, and many other exciting digital technologies, they feel pressured to try to do the same. So top management commits to the AI wave, and the way they do it is with large investments, which to their dismay, can end up getting disappointing results.
How can an organization operationalize quickly and effectively to take advantage of the use of data science? That will perhaps be the topic of future posts! In this entry, we will examine causes that can make data science initiatives go «off track»… Ten digital sins, if you wish.
1. Lack of understanding of data science makes it hard to define a vision for an analytics initiative
Well let’s face it, not everyone has studied statistical analysis at a deep level. Not everyone holds a PhD in a stem science and even though it is said to be one of the sexiest jobs of the 21st century, not everyone is a data scientist, yet.
In our experience as data scientists, we often see that executives tend to lack a good understanding of the different types of analytics (descriptive analytics, diagnostic analytics, predictive analytics, prescriptive analytics, and cognitive analytics). So management doesn’t understand what analytics at a fundamental level is, what is the problem? Well, the problem can be that if they do not have a solid understanding of analytics, they might struggle defining problems whose solution will generate value. If they define projects to attack the «wrong problems» they will end up building a team and a set of skills that will be redundant. As a result of this mismatch, the data science pilots or PoCs are almost guaranteed to fail to get traction. This is why we often see that data science projects end up being a sort of «curiosity» in the organization, and lacking to bring value after investments can make skepticism in data science grow.
2. Not determining the value that can be delivered nor the attainability of the project
As fancy as it might sound, data science, machine learning, and artificial intelligence are not what sci-fi can make us believe. Data scientists come with fancy titles included (they can help with the making of really cool article headlines such as «astroparticle-physicists now are leading machine learning in banking»), and often companies that offer data science services can over-promise.
It is very important to assess how realizable, feasible a project is. What are the time horizons? When introducing data science in an enterprise it is important to do it strategically, ensuring quick wins in the short term horizon, while working on larger initiatives (larger as in, they bring larger value and might take longer time) in parallel tracks. This parallel work should have a balance to be determined after careful consideration of the company’s internal processes and «culture conditions».
Perhaps I have been working as a consultant for too long and I am brainwashed, but I really love matrices as the one shown below. Creating a list of potential use cases and placing them in such a matrix can help management prioritize the data science projects to pursue.
3. Lack of strategy (good strategy/bad strategy)
So, the executives decided on a few use cases to prioritize to obtain the so desired quick wins and deliver on the value promise. They ran very ad-hoc projects and delivered successfully. So everybody packs their bags, go home and live happily forever after. Well, not quite…
There most be a good strategy for how to generate value with analytics passed the quick wins. AI should deliver on sustainable competitive advantages. Writing about what a good strategy is, is out of the scope of this article. However, as I am about to finish with «Good Strategy Bad Strategy, the difference and why it matters» by Richard Rumelt, I want to mention that according to this book the «kernel» of a good strategy should contain:
- A diagnosis that defines or explains the nature of the challenge (adapted to this: what are we trying to solve that is feasible to solve by data science)
- A guiding-policy for dealing with the challenge
- A set of coherent-actions that are designed to carry out the guiding-policy
4. Lack of clearly defined roles in an analytics project
If the executives in a company do not have a solid understanding of what data science is, it is extremely difficult that they understand the different roles and skill-sets that are necessary to make a data science project successful. There is no general recipe, each data science project can have different skill needs. What is very important is that these needs are defined properly and a set of roles are agreed upon.
Due to the state of the job market in Norway, it is sometimes difficult to define roles. I have interviewed people with years of experience as «data scientists», yet when I asked them to describe the analytics work they have performed, they talk to me about getting data they can analyze in Excel. There is not anything wrong with Excel, don’t get me wrong! But having a team of 10 such data scientists covering all the needs in a data science project is not really ideal.
5. Lack of analytics «bridges» or «translators»
A trend we have seen is that there is a cultural divide between decision-makers on the enterprise and the data scientists «crunching the data». There is a consistent disconnect within performance management practice between the data scientists and the decision-makers they support. There seems to be a certain dismissive attitude of many executive decision makers (line managers, head of «x», CXOs, etc.) to both the data itself and those responsible for delivering it, an attitude that can perhaps be caused by the lack of understanding of each other «worlds».
Evidence points to the fact that the origin of the problem is bad communication. While at my MBA studies and also from a few customers, I have heard phrases such as «Data scientists are too arrogant, they don’t see a need to explain or talk about the implications of their findings». What can be done about this gap of two worlds, this «interpretation gap»?
Enter the «data bridge» or «data translator». Even though there are strong arguments for a data scientist to take this role, it seems that in many cases, the role is best fit by a domain expert. A data translator is an expert that can bridge the business and analytics worlds by identifying high-value use cases, communicating business needs to data scientists and generating engagement with the data-science-enabled product.
6. Isolation (Do not put the data scientist in the corner!)
There is more than one way of organizing data science teams within a company. A great way to do it is by building what is called a «center of excellence». In this approach, the data scientists are supposed to work as some sort of internal consultant, being sent to different analytics projects within the organization.
However, over-centralization can create bottlenecks and leads to a lack of business buy-in. It can lead to poorly coordinated silos, rather than organizing it in ways that allow analytics and business experts to work closely together.
A definite sign that the current model of organization might not be working as desired is the complaint from a data scientist that her work has little or no impact and that the business keeps doing what it has been doing before the existence of data science projects. Data scientists are a difficult talent to get and to maintain, we recommend managers to keep an ear to the ground for those kinds of complaints.
7. Tidying of data all over the place
Very often, executives seem to have the idea that a first step after «adopting a data science first» mindset should be a massive data cleansing project, a project that aims to tidy and clean all the data in the company. This might seem like the only sensible option to follow but consider this: what actual value is created after cleaning all the data in the organization? If any, it is difficult to measure.
Data cleaning and data consolidation should instead be aligned with the most valuable (and feasible) use cases. The ones in the grey zone in our prioritization matrix above.
8. Let’s build the biggest data lake, my data lake is bigger than yours…
Commitment in the data science realm is great, commitment in the data science realm without a clear strategy can be suicidal. Building a full-blown analytics platform before identifying the right business cases to tackle, working out an architecture such as data lakes without actually knowing that they will be needed for is something I have actually experienced in Norway. This platforms and new architectures sometimes even need to connect to legacy systems, because you know, let’s have all the data ready for when we know what to do with it… This does not work and is one of the greatest dangers of adopting data science without a clear strategy.
9. Lack of clear metrics of success
Do not become one more in the list of companies that are spending millions of NOK on advanced analytic and digitization but are not able to attribute any real impact from these investments.
The bottom-line impact of analytics projects should be quantified within a good performance management framework with clear pre-defined metrics for tracking each project.
One way to measure the success of a data science project is through the reward function. Any AI model you build or incorporate into your organization is guided by a reward function, also called an “objective function”, or “loss function.” This is a mathematical formula, or set of formulas, that the AI model uses to determine “right” vs. “wrong” predictions. It determines the action or behavior your system will try to optimize for and will be a major driver of the final user experience.
When designing your reward function, you must make a few key decisions that will dramatically affect the final experience for your users and the value that the data science initiative can bring. Designing your reward function should be a collaborative process across disciplines. Your conversations should include UX, Product, and Engineering perspectives at the minimum. Throughout the process, spend time thinking about the possible outcomes, and bounce your ideas off other people. That will help reveal pitfalls where the reward function could optimize for the wrong outcomes.
10. Don’t be creepy, don’t violate GDPR
Data science, machine learning, artificial intelligence, and technology, in general, should be about people. It is paramount to keep ethical, social, and regulatory implications always in mind. Otherwise, you risk being vulnerable to missteps when it comes to data acquisition and use, algorithmic bias, and other risks.
If ethics do not strike a nerve, think about the fines that GDPR have brought to the data game.
Technology and its adaption in the enterprise in Norway happens fast, there is no time to waste! But hey, let’s try to get value creation through data science right. Do not let your company get caught up in the hype, even though there is no time to waste, you should not rush into adopting initiatives that can cost large amounts of money and time and return no value (Perhaps a topic for another post is, once value is created through data science, how do we ensure we capture it.).
Try to keep these ten deadly sins so you can avoid them in your data science project.
About the writer
Arturo is a technologist born in Veracruz, Mexico and has been living in Norway since 2010. He completed his BSc and MSc in Mexico and his PhD in Trondheim, Norway. He is taking the last semester of his Executive MBA at BI in Oslo, to be concluded in September 2020.
During his experience as a scientific researcher, which started with the publication of a couple of papers in international journals after his BSc thesis, he has participated in several research projects. The results of these projects can be found in the 6 research papers shared below. The topics of these projects vary from theoretical physics, mathematics, statistics, and applications of Big Data technologies to understand human behavior. Part of Arturo’s training as an academic includes the use of different programming languages and scientific tools such as Mathematica, Matlab, C++, being Python one of his most common go-to tools.
After concluding his PhD work at NTNU, Arturo made a transition into the private sector in 2015. He started leading the development of the Mobility Analytics service in the business division of Telenor Norway. Arturo led the development of this service, from strategic aspects and all the way to the technical aspects of the development. Not only did Arturo work in tasks such as developing Go To Market strategies but also with hands-on development of GIS data visualizations and development of the code-base to power the Mobility Analytics Service.
The extensive experience as a leader and communicator in the scientific and private sector fields has made Arturo a firm believer that it is not often that leaders and executives make decisions based on hard, cold numbers, but that it is necessary to communicate the results from the analytic work via great story-telling and understanding of the business mechanisms in which value can be created. This realization made Arturo take the decision to start his EMBA program in 2019.
During his EMBA studies with a specialization in managing and developing digital enterprises, and which include a leadership program, Arturo has expanded his business knowledge to include fields such as digitalization strategies, innovation frameworks (such as design thinking lean and agile), entrepreneurship, HR management, and value creation and capture.
As his professional track shows, Arturo has a mind eager to always learn and apply his knowledge. He has taken several courses and certifications within Machine Learning and Artificial Intelligence and has virtually never stopped studying, even after concluding his doctoral degree.
Arturo enjoys mentoring and helping colleagues and people in general. He is always open to grab a coffee, a slice of pizza or a pint of beer, so… feel free to send him a message and connect with him, even if you haven’t met him in person.
2 kommentarer til “Ten sins in data science projects”
Great article, Arturo! Interesting structure and prioritisation of the “sins” which is very thought provoking – agree or disagree, reading through it brings value. I think you’ve nailed it with the understanding of what data science can give you and the business strategy.
I am glad you enjoyed it, Radu! I will be publishing some more strategy related content soon…