As business leaders strive to get the most out of their analytics investments, democratized data science often seems to offer the perfect solution. Using analytics software with no-code and low-code tools makes data science techniques accessible to virtually anyone. In the best-case scenario, especially as the demand for data scientists far outstrips the supply, this will lead to better decision-making and greater autonomy and self-service in data analysis. Add to this the savings in labor costs (reduction in costly data scientists) and more scalable customization to tailor your analytics to your specific business needs and context.
But in all the debate about whether and how to democratize data science and analytics, an important point is overlooked.Conversation must define when To democratize data and analytics, and even redefine what democratization means.
A fully democratized data science and analytics comes with many risks. As Reid Blackman and Tamara Sipes wrote in a recent article, data science is hard, and even with good software, it’s hard to believe that untrained “experts” can solve difficult problems. There is no limit. Just because you can click a button and get results does not guarantee that the answer is good. In fact, the answer could be so flawed that only a trained data scientist would know.
it’s a matter of time
But even with these reservations, the democratization of data science will continue. Software and analysis tools. Thomas Redman and Thomas Davenport are among those who have advocated for the development of “citizen data scientists” and even screened all positions for basic data science skills and aptitudes. is.
However, the democratization of data science should not be taken to extremes. Analytics don’t have to be available to everyone for an organization to thrive. How many highly talented people aren’t hired just because they lack “basic data science skills”? That’s unrealistic and overly restrictive.
As business leaders seek to democratize data and analytics within their organizations, the real question they should ask is “when” does it make the most sense? This starts with recognizing that not every “citizen” within an organization has the skills to be a citizen data scientist. Nick Elprint, his CEO and co-founder of Domino Data Labs, which provides organizations with data science and machine learning tools, told me in a recent conversation. water surface. “
Data democratization challenges
Recently, consider a grocery chain that uses advanced forecasting techniques to optimize demand planning to avoid having too much inventory (leading to spoilage) or too little inventory (leading to lost sales). prize. Losses from spoilage and out-of-stocks were modest, but the problem of reducing them was very difficult to solve given all the variables of demand, seasonality and consumer behavior. The complexity of the problem meant that the grocery store chain could not leave its solution to citizen data scientists, but rather tapped into a team of well-trained and conscientious data scientists.
As Elprin and I argued, data citizens need “representative democracy.” In the same way that Americans elect politicians to represent Congress (perhaps to act in their best interest in legislative matters), so too do organizations choose issues where others simply lack expertise. We need proper representation by data scientists and analysts to consider. deal with.
That means knowing when and how much to democratize your data. I propose the following five criteria of his.
Think about the skill level of “citizens”. Citizen data scientists, one way or another, will remain here. As mentioned earlier, data scientists are simply in short supply, and leveraging this scarce workforce to address any data problem is unsustainable. Furthermore, democratizing data is key to instilling analytical thinking throughout the organization. A well-known example is coca colahas developed a digital academy that trains managers and team leaders and has graduated from the program with approximately 20 digital, automation and analytics initiatives at several sites in the company’s manufacturing operations.
However, it is important to consider the skill level of the “citizen” when tackling predictive modeling and advanced data analytics that can fundamentally change the way companies operate. Sophisticated tools in the hands of data scientists are additional and valuable. Errors, false assumptions, questionable results, and misinterpretation of results and conclusions can occur when the same tools are used by people who are simply “playing with the data.”
Measure the importance of the issue. The more important the issue is to the company, the more essential it is to have an expert in charge of data analysis. For example, generating a simple graphic of historical purchasing trends could probably be accomplished by someone using a dashboard that presents the data in a visually appealing format. But strategic decisions that have the greatest impact on how a company operates require expertise and dependable precision. For example, how much an insurance company should charge for a policy is so fundamental to the business model itself that it would be unwise to leave this task to non-professionals.
Determine the complexity of the problem. Solving complex problems is beyond the capabilities of the average citizen data scientist. Comparing customer satisfaction scores across customer segments (simple, well-defined metrics, low risk) versus using deep learning to detect cancer in patients (complex, high risk) Let’s think about. Such complexity cannot be left to non-expert and ill-advised decisions, and in some cases can lead to wrong decisions. Democratizing data makes sense when complexity and risk are low.
As an example, the Fortune 500 company I work for uses data throughout its operations.A few years ago I training program Over 4,500 managers were divided into small teams and asked to identify key business problems that analysis could solve. The team was empowered to solve simple problems using available software tools, but most problems surfaced precisely because they were difficult to solve. Importantly, these managers no They are responsible for actually solving those hard problems, but rather working with the data science team. Remarkably, these 1,000 teams identified 1,000 business opportunities and 1,000 ways analytics could help their organization.
Empower those with domain expertise: If a company is looking for some kind of “directional” insight (customer X is more likely to buy a product than customer Y), data democratization and low-level citizen data science will probably suffice. In fact, tackling this kind of low-level analysis can be a great way to provide some simplified data tools for those with domain expertise (that is, closest to the customer). Higher accuracy (such as complex high-stakes problems) requires expertise.
Accuracy is most important when making high-stakes decisions based on some threshold. For example, if an aggressive cancer treatment plan with serious side effects is undertaken when the chance of getting cancer is greater than 30%, it becomes important to distinguish between 29.9% and 30.1%. Accuracy is critical, especially for medical, clinical and technical operations, as well as financial institutions navigating markets and risks, as they often earn very small margins at massive scale.
Ask a professional to investigate for bias. Advanced analytics and AI can lead to decisions that are easily seen as “biased.” This is difficult because the point of analysis is discernment, that is, making choices and decisions based on certain variables. (I would send this offer to this older man, but not to this younger woman, because I think they would exhibit different buying behavior accordingly.) So the big question is that When is such discrimination actually acceptable, even a good thing, and when is it inherently. It’s problematic, unfair, and dangerous to the company’s reputation.
Consider the following example goldman sachs, was accused of discriminating by offering less Apple Credit Card credit to women than men. In response, Goldman Sachs said its model does not use gender, only factors such as credit history and income. However, they argue that credit history and income are correlated with gender, and using these variables penalizes women who have lower average earnings and who historically tend to have fewer opportunities to build credit. Some may be. When using discriminative output, decision makers and data professionals alike need to understand how data is generated, how it is interconnected, and how to measure things like differentiated treatment. there is. Companies should not jeopardize their reputations by allowing only citizen data scientists to determine whether a model is biased.
Democratizing data has its benefits, but it also comes with challenges. Giving everyone a key doesn’t make you an expert, and gathering the wrong insights can be catastrophic. New software tools make data available to everyone, but don’t mistake that broad access for real expertise.