The essential check list for effective data democratization

Feature
20 Jan 202310 mins
CIOComplianceData and Information Security

To become data-driven, companies need a democratization strategy that’s equal parts disciplined and diverse. Data collection, selecting a platform and employee training are just the beginning.

Shared responsibility concept  >  Hands take equal shares of pie chart [partnership / teamwork]
Credit: Griboedov / Shutterstock

Truly data-driven companies see significantly better business outcomes than those that aren’t. According to a recent IDC whitepaper, leaders saw on average two and a half times better results than other organizations in many business metrics. In particular, companies that were leaders at using data and analytics had three times higher improvement in revenues, were nearly three times more likely to report shorter times to market for new products and services, and were over twice as likely to report improvement in customer satisfaction, profits, and operational efficiency.

But to get maximum value out of data and analytics, companies need to have a data-driven culture permeating the entire organization, one in which every business unit gets full access to the data it needs in the way it needs it.

This is called data democratization. Doing it right requires thoughtful data collection, careful selection of a data platform that allows holistic and secure access to the data, and training and empowering employees to have a data-first mindset. Security and compliance risks also loom.

Starting on a solid data foundation

Before choosing a platform for sharing data, an organization needs to understand what data it already has and strip it of errors and duplicates.

A big part of preparing data to be shared is an exercise in data normalization, says Juan Orlandini, chief architect and distinguished engineer at Insight Enterprises.

Data formats and data architectures are often inconsistent, and data might even be incomplete. “All of a sudden, you’re trying to give this data to somebody who’s not a data person,” he says, “and it’s really easy for them to draw erroneous or misleading insights from that data.”

Organizations often turn to outside help with data normalization because, if done incorrectly, a business might still be left with data quality issues and can’t get as much use out of their data as intended.

As more companies use the cloud and cloud-native development, normalizing data has become more complicated.

“It might be in a NoSQL database, a graph database, or in all these other types of databases now available, and making those consistent becomes really challenging,” Orlandini says.

Exercising tactful platform selection

In many cases, only IT has access to data and data intelligence tools in organizations that don’t practice data democratization. So in order to make data accessible to all, new tools and technologies are required.

Of course, cost is a big consideration, says Orlandini, as well as deciding where to host the data, and having it available in a fiscally responsible way. An organization might also question if the data should be maintained on-premises due to security concerns in the public cloud. But Kevin Young, senior data and analytics consultant at consulting firm SPR, says organizations can first share data by creating a data lake like Amazon S3 or Google Cloud Storage. “Members across the organization can add their data to the lake for all departments to consume,” says Young. But without proper care, a data lake can end up disorganized and cluttered with unusable data. Most organizations don’t end up with data lakes, says Orlandini. “They have data swamps,” he says.

But data lakes aren’t the only option for creating a centralized data repository.

Another is through a data fabric, an architecture and set of data services that provide a unified view of an organization’s data, and enable integration from various sources on-premises, in the cloud and on edge devices.

A data fabric allows datasets to be combined, without the need to make copies, and can make silos less likely.

There are many data fabric software vendors, like IBM Cloud Pak for Data and SAP Data Intelligence, which were both named leaders in Forrester’s Enterprise Data Fabric Q2 2022 report. But with many available options, it can be difficult to know which to choose.

The most important thing is to analyze and monitor data, says Amaresh Tripathy, global analytics leader at professional services firm Genpact.

“Many platforms are out there,” he says. “Choose any platform that works for you, but it should be automated and visible.” Also, the data should be easily accessible from a self-service platform that makes data analysis reporting easy, even for people with no technical experience — “Like a portal where people can see all the data, what it means, what the metrics are, and where it’s coming from,” says Tripathy.

There’s no perfect tool, and there’s often a trade-off between how well a tool does data lineage, data cataloging, and maintains data quality. “Most organizations are trying to solve all three problems together,” Tripathy adds. “Sometimes you over-index on one and don’t get a very good value on another.” So an organization should decide what’s most important, he says. “They should know why they’re doing it, which tool gives them the best bang for their buck on those three dimensions, and then make the appropriate decision.”

When thinking about how to share data, an organization can also consider implementing a data mesh, which takes the opposite approach to data fabric. While data fabric manages multiple data sources from a single virtual centralized system, a data mesh is a form of enterprise data architecture that takes a decentralized approach and creates multiple domain-specific systems.

With a data mesh, organizations can help ensure data is properly handled by putting it in the hands of those who best understand it, says Chris McLellan, director of operations at Data Collaboration Alliance, a global nonprofit that helps people and organizations get full control of their data. It could be a person, such as the head of finance, or a group of people that are acting as data stewards.

“At its core, it’s got this concept of data as a product,” he says. “And a data product is something that can be owned and curated by someone with domain expertise.”

Implementing a data mesh architecture allows an organization to put specific data sets in the hands of subject matter experts. “These people are closer to the regulations, the customer, and the end users,” McLellan says. “They’re closer to everything about that specific domain of information.”

Data mesh isn’t linked to any specific tools, so individual teams can choose whichever ones  best fit their needs, and there isn’t the bottleneck of everything having to go through a central data team.

“You’re seeing a decentralization not just of IT or app delivery, but also of data management and data governance,” says McLellan, “which are good things because marketers know the laws around consumer protection better than the IT team, and finance knows finance regulations better than IT.”

While there are many vendors selling data mesh, it’s still a shiny new object, Forrester warns, and it has its challenges, including conflicts in how it’s defined, the technologies it uses, and its value.

Training and change management

Once an architecture for data democratization is established, employees need to understand how to work with the new data processes. People can be given the right data, but even if they’re trained as administrators or accountants, they’re not necessarily going to understand what to do with it, says Insight’s Orlandini. Data access is not sufficient in itself to make an organization data-driven. “You have to do some training,” he says. “If you don’t do it properly, you’re going to have mixed success at best, or it might be a failure.”

Some organizations have started their own in-house training programs to ensure employees understand how to interpret and properly handle data.

Genpact, for instance, introduced what it calls its DataBridge initiative last year to increase data literacy across the organization.

“Our intention was not to make 100,000 people citizen data scientists,” says Tripathy. “We provide the awareness in the context of how they do their work.” For example, an employee doing claims analysis doesn’t need to learn all about anomaly detection — what they need to understand is what anomaly detection means for them. “You may or may not have all the skill sets to look at the data yourself, but you should be able to raise a question and seek help — and being able to ask that question in the right manner is the data-aware aspect of it,” he adds.

Laying the security and compliance groundwork

Proper data governance needs to be implemented from the start to maintain the integrity of data and avoid costly penalties.

Along with IT leaders, security and compliance teams need to be part of the initial conversation, says Insight’s Orlandini. “It’s a big challenge, and a lot of organizations struggle with this,” he says, adding that it’s a prerequisite a company’s leadership understands exactly what they’re offering to share, and makes sure it’s being offered to the right people.

“We live in a highly regulated world where we have to be super careful,” he says, “especially in industries like healthcare and finance where there are laws that have severe consequences if you let the wrong person have access to the wrong data.”

There are also tools that help organizations with data masking and data obfuscation to avoid revealing personally identifiable information. “You can start getting insights without revealing PII data, HIPAA records, or any of those regulatory requirements that are out there,” he continues. “There are also tools with attribute-based access controls where you actually tag data with very specific kinds of attributes — this has PII or HIPAA, whatever your attributes are — and then you only have access to the data with the right kind of attributes associated with it.”

In this way, the data controls itself automatically, and it’s available in a public cloud or hybrid environment with data in multiple locations, or even in private environments with strict compliance controls that can be put in place.

Long-term benefits

Not only can data democratization help an enterprise speed up its data pipelines, it can empower people to find new ways to solve problems through a better awareness of how to analyze and work with data.

Gartner says that by adopting data democratization, organizations can solve resource shortages, decrease bottlenecks, and enable business units to handle their own data requests more easily. By democratizing data, organizations can improve their decision-making by allowing more people to contribute to the analysis and interpretation of data; increase collaboration across teams within an organization; and enhance transparency, since more people have access to information, and can see how data-driven decisions are made.