Skip to main content
IT Strategy

How to get your legacy data AI-ready (and fast!)

Start small, consultants tell IT Brew.

5 min read

Execs want AI ASAP, but sometimes the necessary data quality is TBD.

The essential data supporting AI platforms may be stuck in a mainframe, scribbled on paper, or placed inconsistently within a database. For some legacy companies, decades’ worth of data might be scattered across multiple storage devices in multiple offices.

A Harvard Business Review survey cited in a report by AI platform Cloudera found that just 7% of respondents (decision-makers considering AI for business purposes) said their data was “completely ready” for AI adoption.

We spoke with IT pros about how to get data ready as fast as possible.

What does “not ready” look like? CJ Combs, senior AI consultant at AI and data solutions firm Columbus, has seen plenty of data sets that need cleaning up before they hit the large language model. Here are some common issues:

  • Inconsistent, incomplete formats. Maybe an LLM has to pull in data related to US states, Combs said, but that exists in different forms: “Massachusetts” in one instance and “MA” in another, for example. When attributes like size, location, ingredients, or materials are missing or labeled differently, AI systems pull from imperfect data to present their outputs.
  • Tough to access. Data may be locked in a closed system, like a mainframe, which lacks easy connections to cloud environments.
  • Unnecessary data. Certain data fields can be redundant, outdated, and trivial.

“None of this data may be ‘bad’—it was just built for siloed operational purposes, not to power intelligent systems,” Nicholas Latwis, director of innovation at global data standards organization GS1 US, wrote to us in an email. “That history is precisely why so many organizations discover that having a lot of data is not the same as having good data.”

So, you need to move fast? Start small. For the CEO who wants to move quickly on AI, Combs recommends that IT leaders pick “three to five” AI cases that leadership wants and “work backward” to determine required data sets: “That’s going to help us have a good structure on what we need to pull. Instead of just saying: ‘Dump all of it here.’”

Map what your AI use-case requires against your existing data for completeness, consistency, and accuracy, Latwis recommended. “That analysis can tell you whether you have six weeks of work ahead of you or six months,” he wrote, as organizations focus on the effort needed to establish consistent identifiers and definitions across platforms.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

By subscribing, you accept our Terms & Privacy Policy.

Three of a kind. A new report from global IT services and technology provider NTT Data found that 50% of surveyed organizations say “the need to modernize applications and data platforms is holding them back from cloud-related innovation.”

Aishwarya Singh, executive managing director within the cloud and security practice at NTT Data, sees lot of AI beginners at the enterprise level think their infrastructure is ready for the automation, only to discover that their data is siloed, unstructured (say, in PDFs and emails), or lacks documentation on data locations and management ownership.

When orgs want to move fast, Singh recommends three tracks of AI:

  • Pick and choose. Rather than trying to clean up an ocean of data, pick a high-value AI case that already has “good” curated data—governed data, stored deliberately in a location, and “treated like a system of record.”
  • See where you can add logic. APIs and data lake architectures, for example, allow IT pros to connect their data sources from disparate locations without having to modify information.
  • Think big. And while those two steps are happening, you can start the longer-term project of getting larger datasets in order.

Pick up the pace?! If your boss wants the AI yesterday, be clear on which tradeoffs come with speed, Latwis warned in his email—incorrect recommendations, failed transactions, or operational delays, for example. (“Framing it that way also keeps the conversation grounded in business impact rather than technical complexity,” he wrote.)

And if that still doesn’t slow expectations, start with one high-value dataset “and make that usable first, before trying to fix everything at once,” Latwis continued. “Look at prioritizing data that drives core workflows, like product or supplier data.”

Narrowing down the answer. At experience strategy agency AnswerLab, the team chose to focus its AI effort on a subset of the team’s 20 years or so of unstructured research data, a collection of interview transcripts, surveys, and reports.

The company’s AI-powered intelligence now pulls from recent, research-only data, supported by new “pipelines” that provide consistently formatted transcripts and structure.

“We’re only focused on research data,” Dan Hou, fractional head of AI at AnswerLab and founder of AI advisory Eskridge, said. Hou’s team is helping AnswerLab ignore any noise in a corpus of data, like financial info or sales data. “We’re just focused on the heart of the heart of what we need.”

About the author

Billy Hurley

Billy Hurley has been a reporter with IT Brew since 2022. He writes stories about cybersecurity threats, AI developments, and IT strategies.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

By subscribing, you accept our Terms & Privacy Policy.