Martin De Saulles
Contributing writer

5 hidden costs of working with alt data

Analysis
19 May 2022
Big DataData ManagementData Quality

Alt data offers enterprises the opportunity to gain competitive advantage, but the costs of integrating it into business workflows may be more than you think.

cost cloud computing dollar sign hidden costs
Credit: Getty Images

Alternative data sources are now embedded in the business processes of enterprises across a range of sectors. According to a 2022 survey by law firm Lowenstein Sandler, 92% of investment organizations, from hedge funds and private equity to venture capital, are using alt data to a moderate or significant extent to inform decision making. Respondents also expect their use of alt data to increase through 2022. Typically, this data comes from the exhaust of other business processes such as social media activity, satellite imagery, location tracking data, credit card transactions, and web scraping. 

While alt data may be used across an organization, from marketing and sales to finance and strategy functions, IT departments are often responsible for management and ownership of third-party data. In 2019, Forrester Research found that 56% of alt data acquisitions were managed by CIOs and CDOs working within IT.

Sourcing, storing, and managing alt data creates new challenges for IT managers and may carry significant and unnecessary costs. Here are 5 such challenges and how to mitigate their impact.

Vendor selection costs

According to Lowenstein’s survey, vendor selection costs is the single most important worry that alt data users have, with 61% saying it is a major concern for them. The costs are incurred through the time-consuming process of vetting alt data providers and then ensuring the data they supply is of sufficient quality. This is particularly important when the data will be a core element of any business processes and is not easily replaceable. In these situations, it is vital that purchasers have confidence the vendor will continue to offer this data into the foreseeable future.

One way to mitigate these risks is to look to industry consortia to identify reliable data sources. It is likely that other firms operating in the same sector will have similar needs and may be able to share ideas and best practices.

Finding appropriately skilled staff

According to a survey from Quanthub, there was a shortage of 250,000 data scientists in 2020. As of late April 2022, the job listing site Indeed.com was listing 2,700 data scientist vacancies in the UK alone. This shortage of appropriately skilled professionals is forcing salaries up and making it more difficult to retain staff. And data scientists are not the only staff needed to integrate alt data into a business. Forrester Research recommends firms employ the services of “data hunters” whose role is to track down viable alt data and validate these sources for accuracy and integrity. European reinsurance provider Munich Re employs a team of 20 data hunters for this very purpose. 

Potential solutions to this skills shortage include training up existing staff whose knowledge of the business and its needs gives them a head start over new hires. Forging links with colleges and universities offering data science courses and exploring possibilities for student placements and graduate training programs is another way to build a skills pipeline.

Ascertaining data ownership

The nature of alt data and its origins in non-traditional sources can make validating data ownership more difficult than with data provided by established and trusted vendors. This is especially true when multiple data sources have been combined prior to purchase and where untangling their origins can prove complex. Difficulties may arise around licensing, intellectual property laws and data protection regulations. 

Problems can be mitigated through the selection of trusted vendors that offer customers a degree of transparency in their data sourcing methods. Of course, using internal data where possible is another way of reducing risk.

Updating models to process alt data

Maintaining data models to ensure consistency and dealing with errors as they occur is a significant cost that many businesses underestimate. Idera calculates that maintenance generally accounts for 50-80% of development budgets. Adding new sources of data into models can also add significant costs to stretched budgets. 

Careful data modelling at the beginning and incorporating a degree of flexibility into model designs can smooth this process.

Appropriate tools to store alt data

A quarter of respondents to Lowenstein’s survey cited the lack of tools and techniques to store alt data as a serious concern. Part of the problem lies in a lack of consistency between different sources in terms of frequency of updates, APIs, and data formats. Cleaning up data to ensure models run smoothly and produce consistent and reliable results can be a significant cost. The ever-increasing options for storage, from on-prem systems to cloud and hybrid solutions, and making sure they work efficiently for the ingest requirements of data models adds another layer of complexity and cost to the equation.

As data continues to provide a source of competitive advantage for firms able to leverage its commercial potential, alt data will grow in importance. It is important to understand that while many alt data sources may cost little or nothing to access, there may be other, sometimes substantial, costs involved in making them fit for purpose and integrating them into established workflows.

Martin De Saulles
Contributing writer

Dr. Martin De Saulles is a writer and academic specializing in researching and writing about data-driven innovation and the Internet of Things. He has a Ph.D. in Innovation Studies from the University of Sussex in the UK and worked as a technology analyst and entrepreneur before his current role as a Principal Lecturer at the University of Brighton.