Generative AI data leaks are a serious problem, experts say

Source code, meeting notes, and even senior management comms are all being pasted into ChatGPT prompts.
article cover

Francis Scialabba

4 min read

Generative AI tools like OpenAI’s ChatGPT are everywhere—and so are users unknowingly leaking sensitive data while querying them for answers.

In March, according to Economist Korea, Samsung experienced several incidents at its Korea-based semiconductor business involving employees pasting lines of proprietary code into ChatGPT within weeks of rescinding a division-wide ban on the use of the AI chatbot. Bloomberg reported that on May 1, Samsung warned staff of a ban on the use of ChatGPT or similar tools.

Despite warnings to employees using the generative AI not to upload sensitive internal information as ChatGPT prompts, two staffers reportedly uploaded segments of proprietary code for bug-fixing purposes, while a third uploaded a recording of a meeting via a personal assistant app. Immediately after discovering the incident in March, Economist Korea reported, Samsung throttled all future uploads to ChatGPT to just 1,024 bytes. According to Bloomberg, Samsung then conducted an internal survey that found 65% of respondents believed generative AI was a security risk. In the May 1 memo, the company warned consequences could include termination of employment.

“Because it’s such a powerful productivity tool, we’re seeing all kinds of activity that can be deemed very risky and dangerous, like CEOs uploading sensitive emails to their board of directors to get it rewritten,” Howard Ting, CEO of data protection firm Cyberhaven, told IT Brew.

It’s not clear whether the Samsung employees in question were using the paid API version of ChatGPT—which OpenAI says does not contribute submitted data to the AI’s training set—or the free version, which OpenAI says in its terms of service is used to further train ChatGPT. An internal Samsung memo obtained by Economist Korea didn’t draw a distinction, noting that violations of policy occurred as soon as proprietary data left Samsung’s control.

ChatGPT does allow free users to submit an external opt-out form requesting their data be deleted after 30 days and not used for future training. In late April, it announced that it would be adding a similar option within the app itself.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

Ting said that the risk of proprietary data leakage via tools like ChatGPT has precedent, such as the bug in the plugin Grammarly. But he illustrated the scale of the problem by pointing to Cyberhaven data showing that on April 12, the firm had detected 6,352 attempts to paste corporate data into ChatGPT for every 100,000 employees of its customers.

“If I wanted to find out something about you, and I’m launching a targeted attack against you, I can go ask OpenAI questions about what it knows about you,” Ting added. “Some of those data elements might come back to me in the response, because they’re using all this data that’s been uploaded to train their model.”

Uri Gal, a professor who researches business information systems at the University of Sydney Business School, told IT Brew he found the risk of data submitted to generative AI tools like ChatGPT ending up in future outputs “really concerning, given the nature of the tool and how seemingly omnipotent it is.”

“People might use it for very different purposes, from writing music to summarizing meeting notes, and there’s all sorts of data that people are going to be tempted or are willing to put in there,” Gal added. “It requires thinking around how to use that tool in a secure as possible manner.”

“Organizations need to be very clear on what they should allow, and what they don’t allow their employees to do,” Jonathan Dambrot, co-founder and CEO of AI cybersecurity firm Cranium, told IT Brew. “And if they want to shut that down, that’s kind of an easier binary exercise.”

Where it gets more complicated is how organizations looking to use AI models can “integrate this into [their] business process,” Dambrot added, which requires multiple layers of security and new types of policies, procedures, and audits.

An OpenAI spokesperson directed us to the company’s FAQ and privacy policy, while Samsung didn’t respond to IT Brew’s request for comment.—TM

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.