Skip to main content
IT Operations

Why Protolabs calls for ‘data stewards’ to set rules for agentic AI

With 15 million CAD files, an agent needs at least a few guidelines…

4 min read

TOPICS: IT Operations / Data & Digital Ops / Data Quality Monitoring

Before rapid manufacturing company Protolabs lets a large language model run through 15 million computer-aided design (CAD) files, the LLM needs to know the rules—and which files to stay away from, due to restrictions based on geography and sensitivity (i.e., a device for defense versus commercial use).

Protolabs’s Chief Technology and AI Officer Marc Kermisch needs to bring order to those millions of blueprints—and that means non-negotiable data standards.

Thread time. Protolabs has what it calls a full “digital thread”: Engineers upload a CAD file, and that virtual representation helps drive the quick production of a component. Over the last 36 months, the team has woven AI into that workflow.

Pulling from millions of the company’s historical quotes and CAD files, AI tools evaluate a design for requirements and manufacturing specs. But reaching that level of automation required major cleanup.

“That raw data wasn’t tagged. We didn’t know what the value was going to be, we just stored it, but we had to start going back and tagging that data, categorizing it, and creating some metadata around it to make it usable by AI and the algorithms that we’re starting to evolve,” Kermisch told IT Brew, referring to CAD file elements like materials and manufacturing process.

Where do you even start? First, they set up clear rules for those 15 million CAD files.

International Traffic in Arms Regulations (ITAR), for example, mean export controls, and Protolabs must ensure that non-US employees don’t have access to any information for ITAR classified projects. That means separate databases for separate companies—and that data can’t show up as AI-powered feedback for non-US customers.

Similarly, some designs may be deemed suited for “dual use” by both consumers and defense orgs; those are subject to the EU’s dual-use restrictions and have their own access and governance considerations, Kermisch told us.

Domain event. CAD files, which exist in a centralized platform, surface revealing attributes like customer association, material, and manufacturing process that can inform and speed up large sets of classification decisions. Every CAD file needs to be classified as sensitive or non-sensitive, which then leads to specific access and storage rules. The process is not exactly a manual review of 15 million CAD files, Kermisch wrote in a follow-up email to IT Brew, but “a process driven through structured data and metadata.”

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

By subscribing, you accept our Terms & Privacy Policy.

With core fields required for both CAD files (and customer data, too), Kermisch divided the company’s giant data set into six or seven categories—groupings like customer data, employee data, quotes, and CAD files. Each area had their own assigned data steward, Kermisch said.

A data steward—someone already employed and familiar with the workflows associated with the dataset—defines required fields and rule sets that connect to data pipelines; an HR pro, for example, might be the right data steward for employee data. Then, they test the data quality, or at least a subset of it.

“We wouldn’t necessarily do the whole data set to inspect, but you would sample data and say, ‘Hey, I’m going to look at 10 CAD files this quarter and determine if they are all tagged correctly.’ If I see an issue, then I might have to go back and do some further investigation,” he said.

Without understanding the locations of sensitive data, AI agents could surface ITAR-controlled designs to the wrong people, or pull PII or confidential data out of a misconfigured share site.

“The last thing you’d want is somebody using the agentic tool that was able to accidentally find salary information, or customer confidentiality, or PII information, or HIPAA information, because you had a misconfigured SharePoint setting,” he said.

Sue Bergamo, general partner and CISO at AI risk and security company Cyber Scale, spoke with IT Brew about recommended non-negotiable data standards, including the assignment of data management owners, clear structure, access privileges, lineage documentation, and retention policies.

“Does every organization have that list now? Unfortunately, they don’t,” Bergamo said.

IT Brew’s State of the Industry Report found that more than one in three respondents (36%) cited data quality, silos, or accessibility issues as a primary challenge when implementing AI and automation.

Kermisch hopes to add AI to data governance. For example, his team is currently trying to determine whether they can replace the human data steward with an agent.

“The data steward would really be shifted to being more of the master data owner,” he said. “So, they define the definition of that master data, they train the agent on what that definition is, and the agent monitors the quality.”

About the author

Billy Hurley

Billy Hurley has been a reporter with IT Brew since 2022. He writes stories about cybersecurity threats, AI developments, and IT strategies.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

By subscribing, you accept our Terms & Privacy Policy.