Every AI tool you use is reading your data. The question isn't whether they are — it's what they're doing with it, how long they keep it, and whether your conversations are being used to train the next version of the model.
I spent two weeks reading the privacy policies, terms of service, and data processing agreements of the 12 most popular AI tools. Here's what I found, broken down so you don't have to read 400 pages of legal text yourself.
ChatGPT (OpenAI)
Does it train on your data? Yes, by default. If you use the free tier or Plus without opting out, your conversations can be used to improve their models. You can disable this in Settings → Data Controls → "Improve the model for everyone."
Enterprise/Team plans: Your data is NOT used for training. This is explicitly stated in their business terms. If you're using ChatGPT for work, this distinction matters enormously.
Data retention: Conversations are retained for 30 days for abuse monitoring, even with training opt-out enabled. After 30 days, they're deleted unless flagged.
The fine print: OpenAI's API has a separate data policy. API usage does NOT train models by default, which is why most businesses build on the API rather than using the consumer product.
Claude (Anthropic)
Does it train on your data? Free tier conversations may be used for training, but Anthropic states they apply safety filters and remove personal information before any training use. Pro and Team plan data is NOT used for training.
Data retention: Conversations are retained for a limited period for trust and safety purposes. Anthropic is notably more conservative with data than most competitors.
The fine print: Anthropic's Constitutional AI approach means they're less dependent on human conversation data for training, which gives their privacy claims more credibility. Their API data is never used for training.
Google Gemini
Does it train on your data? Yes, for the free consumer version. Google explicitly states that human reviewers may read your Gemini conversations to improve the product. Workspace plans have different terms.
Data retention: This is where it gets complicated. Gemini conversations linked to your Google account are retained for up to 18 months by default. You can change this to 3 months or turn it off entirely in your Google Activity settings.
The fine print: Google's data practices are intertwined with their broader advertising ecosystem. While they state Gemini data isn't used for ad targeting, your Google account activity as a whole still informs ad personalization.
Microsoft Copilot
Does it train on your data? For enterprise customers using Copilot for Microsoft 365: no. Microsoft has been very clear that enterprise data stays within your tenant boundary and is not used for model training.
Consumer Copilot: Different story. Free Copilot conversations may be reviewed by humans and used for product improvement. The terms are similar to ChatGPT's free tier.
The fine print: Microsoft's commercial data protection promise is one of the strongest in the industry, but it only applies to paid enterprise products. The free consumer experience has significantly weaker protections.
Midjourney
Does it train on your data? Your generated images and prompts are used to improve the service. There is currently no opt-out for free or basic tier users.
Privacy concern: All generations on the free and basic tiers are public by default — visible to anyone browsing the Midjourney gallery. You need a Pro plan ($60/month) for stealth mode, which hides your generations from public view.
The fine print: If you're generating images for a client or for business use, be aware that your creative process is essentially public unless you're paying for privacy.
The Pattern Nobody Talks About
After reading all 12 policies, one pattern is unmistakable: free tiers are the product. Every major AI company uses free-tier data for training. Every single one. The "free" experience is subsidized by your data.
Paid tiers — especially business and enterprise plans — consistently offer stronger protections. This isn't a coincidence. Companies know that businesses will pay for privacy, and consumers generally won't.
What This Means for You
If you're an individual user:
- Assume everything you type into a free AI tool will be read by someone and possibly used for training
- Never share sensitive personal information, passwords, financial details, or medical information in AI conversations
- Use the opt-out settings where available (ChatGPT, Gemini Activity controls)
- Consider paying for a Pro/Plus plan if you regularly discuss sensitive topics
If you're a business:
- Use enterprise or API plans exclusively — never route business data through consumer AI products
- Get your legal team to review the DPA (Data Processing Agreement) before signing
- Implement an AI usage policy for employees that specifies which tools are approved and what data can be shared
- Consider self-hosted or on-premise AI solutions for highly sensitive workflows
The Bottom Line
AI privacy isn't a binary "safe or unsafe" question. It's a spectrum that depends on which tool you use, which plan you're on, and what settings you've configured. The companies aren't hiding this information — it's in their terms of service. But they're counting on the fact that nobody reads those.
Now you don't have to.