Employees are using Generative AI tools like ChatGPT, Claude, Gemini to summarize notes, debug code, draft client emails, and speed up daily work. The productivity upside is clear; according to a report from Superhuman just released, employees are saving up to 1 day per week.
But as excitement builds, so does unease. Many organizations are either holding back from fully adopting GenAI or feeling like they’ve lost visibility into how these tools are being used. The concern is real: what if sensitive company data is leaking into GenAI platforms without anyone realizing?
This isn’t just a compliance issue. It’s a real business risk. And if you haven’t asked yourself where your company’s sensitive data is going, or what GenAI tools your employees are using, now is the time.
AI tools don’t just forget your data
A common myth about generative AI tools is that they act like calculators. You input a prompt, get a response, and the data disappears. Sometimes it does, but not all the time.
In reality, many popular GenAI applications store user prompts, save conversation histories, and may use that data to train future models.
That means confidential business data shared with AI tools could be:
- Logged and stored on third-party servers;
- Seen by developers or internal reviewers at the vendor;
- Used to train public AI models;
- Exposed in future breaches;
- Or even accidentally shown to other users.
If your team is copying in source code, customer PII, internal reports, or product roadmaps, you may be unknowingly handing over your company’s most valuable assets.
1. Free AI tools often train on your data
Some companies assume they’re safe because enterprise plans don’t train on user input. That’s true for paid enterprise AI accounts, but most employees are, in fact, using free GenAI tools or personal accounts. In our own research, for example, almost 64% of ChatGPT users are using the free version.
Worse, many are testing niche AI tools or browser plugins that aren’t reviewed by security or IT. These tools often state in their privacy policies that inputs may be stored, logged, or used to train models.
Real-world incidents make the risk clear. Samsung employees pasted internal code and meeting notes into ChatGPT to save time. Amazon noticed that ChatGPT responses closely resembled internal documents, raising alarms about data exposure. Even OpenAI’s own policies confirm that unless you disable training, inputs may be retained.
So yes, secure enterprise AI tools exist, but if you don’t have enforcement in place, most of your company’s GenAI activity is likely happening elsewhere.
2. Some GenAI tools don’t protect data properly
Not all AI tools are created equal. While larger providers have improved their controls, lesser-known tools often lag behind on basic security practices.
A recent example is OmniGPT, a third-party aggregator that let users access ChatGPT, Claude, and other AI models in one interface. It was popular with employees who wanted to avoid juggling subscriptions or corporate restrictions.
Then it got hacked.
Hackers leaked more than 34 million chat records, including full conversations, login credentials, API keys, and uploaded files with financials and customer data. An absolute treasure trove for attackers. Many users didn’t know how much data was being stored. Most companies didn’t even know their employees were using it.
This is the hidden danger of shadow AI: tools employees bring in without IT approval. These unvetted GenAI services often log everything, and when they fail, your data is the collateral.
3. Some AI tools are hosted in countries with different data laws
Another growing concern is where AI tools are hosted. Many generative AI chatbots and platforms are based in countries with very different views on data privacy. China, in particular, poses unique risks.
Tools like DeepSeek, Manus, Qwen Chat and ERNIE Bot are hosted on servers in China. Under Chinese law, these companies must give the government access to any stored user data, including foreign business content.
DeepSeek’s policy confirms that user chats and uploaded documents are stored indefinitely on Chinese servers. Even deleting a chat locally doesn’t remove it from their system.
If an employee enters confidential information into one of these AI platforms (such as client records, deal terms, or product details) it could be legally accessed by foreign state authorities. That’s not just a security issue. It could be a violation of GDPR, HIPAA, or US state privacy laws.
“But the chances of data being exposed are low, right?”
Sure, the chance of a specific prompt showing up in someone else’s chat is low. But is that a bet you really want to take with your source code, client contracts, or future product plans?
Amazon, Samsung, and other global companies already learned the hard way that even low-probability risks become real when you can’t see how tools are being used.
GenAI tools are opaque by design. You often don’t get audit logs or visibility into how data is stored, trained, or reused unless you’re using the most advanced enterprise options. And even then, it requires configuration.
How to protect your company from GenAI data risks
The answer isn’t banning AI. All that will achieve is pushing users underground to devices where you have no control at all.
It’s building a real policy and enforcing it with the right guardrails.
Step 1: Define what not to share
These should be clearly documented in your GenAI usage policy. It isn’t enough to just say “don’t share sensitive data”.
Identify the types of information that should never go into AI tools: source code, personal data (PII), financials, legal docs, health info, and confidential deal terms. Provide training specific to user’s roles and make it clear what they should and should not do.
Step 2: Approve safe GenAI tools
List tools that meet your security and privacy requirements, such as OpenAI Enterprise or Google Gemini Advanced. Be clear about what’s not approved. This could be AI tools hosted in high-risk regions, personal accounts, or tools that train on user inputs by default.
Step 3: Monitor and enforce GenAI usage in real time
Most companies have written policies but no enforcement. To close the gap, deploy tooling that works where GenAI tools are accessed (usually in the browser). Real-time monitoring and inline intervention are essential to prevent accidental data leaks before they happen.
GenAI is a huge productivity unlock, if it is used safely
AI is not the problem. When used responsibly, it’s a competitive advantage. But right now, too many companies are relying on trust and good intentions instead of actual controls. Or worse, they don’t feel like they can use it because they don’t have the controls.
Sensitive data is being shared with AI tools every day. Some of those tools log everything. Some are hosted in countries with little to no data protection laws. Others are breached without warning.
The risk may be low in any single case. But if it happens even once, the impact could be significant.
This is why GenAI needs the same level of attention as cloud storage or third-party SaaS. It’s a new kind of data exposure vector, and it needs proper security policies and tools to match.
Where to go from here
Start with visibility. Ask these questions across your organization:
- What GenAI tools are employees using?
- What kind of data is being entered into these tools?
- What files are my employees uploading?
- Are these tools hosted in challenging geographies?
- Are we able to block risky activity before it happens?
If you don’t know the answers, now is the time to close the gap.