Protecting your business data in the age of AI assistants

December 17, 2025
Joe DeMicco

AI assistants are rapidly becoming everyday business tools, but the convenience comes with a hidden cost: every prompt is a potential data exposure. From high-profile incidents like Samsung’s 2023 code leak to rising regulatory pressure, organizations are learning that AI privacy is not a single toggle, but a system of deliberate choices. This guide breaks down the controls, policies, and practical frameworks required to use AI confidently without surrendering sensitive information.

The DeMicco Digest

Grab your headphones and enjoy a mini-podcast version of this blog. Sit back and listen while we walk you through the key points!

A note before we dive in

I used AI tools to help research and compile the technical details in this piece. That’s partly the point – I practice what I preach. But I also want you to know that every recommendation here reflects my 30+ years of helping businesses navigate technology transitions, and the strategic perspective is entirely my own.

This post goes deep into platform-specific settings and technical controls. If you’re a business leader without time to wade through the specifics, here’s what you need to know:

The TLDR:

The default settings on every major AI platform expose your data. ChatGPT, Claude, Gemini, and others use your conversations for training unless you actively change settings. Most users never do.
Consumer AI and enterprise AI are fundamentally different products. Samsung’s engineers didn’t make a stupid mistake – they used a consumer tool without understanding it operates under completely different privacy rules than enterprise versions. Your organization needs clear policies on which tier is acceptable for which use cases.
Eight categories of data should never touch AI systems regardless of settings: government IDs, financial credentials, medical records, passwords/API keys, proprietary source code, client privileged information, unreleased business plans, and infrastructure details.
Three actions provide 80% of the protection: Turn off training data sharing on every platform (specific steps below). Block free-tier AI services and mandate enterprise accounts. Deploy automated tools that catch sensitive data before it reaches AI systems.
Regulation is coming fast. Colorado’s AI Act hits in 2026. The EU AI Act is already in effect. Organizations that don’t build governance now will be scrambling later.

If those five points are enough for you, you’ve got the essentials. Forward this to your IT lead or operations manager for the implementation details :)

If you want to understand the why behind each recommendation and get the specific settings for each platform, keep reading. The technical sections are thorough because I know some of you will need to actually execute this, and I’d rather give you too much detail than leave you guessing.

AI tools offer significant opportunities and efficiencies, but it is important to remember that every prompt you type is a potential data leak. The Samsung incident of 2023 – where engineers pasted proprietary semiconductor code into ChatGPT, inadvertently adding trade secrets to OpenAI’s training data-wasn’t an anomaly. Cyberhaven detected 6,352 attempts to paste corporate data into ChatGPT per 100,000 employees in a single study period. Most AI platforms use your inputs for model training by default, which means confidential information can theoretically influence outputs visible to competitors. The good news: every major AI platform now offers settings to prevent this, but they’re off by default and hidden in menus most users never open.

This post outlines key settings, step-by-step procedures, and implementation frameworks your organization can use to safely deploy AI tools while maintaining compliance with evolving regulations.

ChatGPT requires three separate settings changes to protect your data

OpenAI’s privacy controls have expanded significantly, but protection requires configuring multiple independent settings. The most critical is the training opt-out, found in Settings → Data Controls → “Improve the model for everyone”…toggle this OFF. This single action prevents your conversations from training future models, though past conversations already used cannot be removed.

The Memory feature introduces an additional risk. ChatGPT can store approximately 1,200-1,400 words of information about you across conversations, and deleting a chat does NOT delete associated memories. Navigate to Settings → Personalization → Memory to disable this entirely, or click “Manage Memory” to delete specific stored information. For sensitive one-off queries, enable Temporary Chat mode via the model dropdown – these conversations are excluded from your history and deleted within 30 days.

ChatGPT Team and Enterprise tiers offer fundamentally different protections: data is never used for training by default, custom retention windows are available, and SOC 2 Type 2 certification provides audit assurance. For organizations handling sensitive data, these enterprise tiers aren’t premium features; they’re really baseline requirements.

Claude changed its privacy defaults in 2025, and most users missed it

Anthropic made a significant policy change in August 2025 that caught many users off guard: consumer Claude data is now used for training by default unless you actively opt out. If you haven’t checked your settings recently, your conversations may be retained for up to five years for model training purposes.

To opt out, navigate to Settings → Privacy → toggle OFF “Help improve Claude”. Once disabled, data retention drops from five years to 30 days. Critically, Claude for Work (Team/Enterprise), Claude Gov, Claude for Education, and all API usage through Amazon Bedrock or Google Vertex AI are automatically exempt from training; your data is never used, regardless of settings.

One important nuance: reopening old conversations makes them “resumed sessions” subject to your current privacy settings. If you opted into training in the past, consider deleting sensitive conversations before they’re accessed again.

Microsoft Copilot and Google Gemini require different approaches for consumer versus enterprise users

Microsoft’s ecosystem creates important distinctions. Consumer Copilot (copilot.microsoft.com) may use data for training by default. Disable this via Profile → Settings → Privacy, toggling off “Model training on text” and “Model training on voice.” Microsoft 365 Copilot for enterprise operates under completely different rules: prompts and responses are never used to train foundation models, data inherits your existing Microsoft 365 security controls, and everything flows through Microsoft Purview for compliance.

Google Gemini’s controls are buried in “Gemini Apps Activity” settings (being renamed to “Keep Activity” in 2025). Turn this OFF if you want to prevent conversations from being saved and used for AI improvement. For enterprise users, Workspace Gemini editions provide guarantees that prompts are not used for training and are never human-reviewed. A newer Temporary Chats feature automatically deletes conversations after 72 hours.

Perplexity AI takes a different approach yet: the “AI Data Retention” toggle in Profile → Settings controls whether your data trains models, but it’s enabled by default for free and Pro tiers. Enterprise plans are automatically exempt. One critical warning: if you use Perplexity without logging in, your data is collected for training with no opt-out available.

The Samsung incident revealed how quickly AI privacy failures cascade

In late March 2023, Samsung lifted an internal ban on ChatGPT. Within 20 days, three separate incidents occurred. An engineer pasted faulty source code from a proprietary measurement database while seeking help with a bug fix. Another employee entered test sequences for chip defect identification, requesting optimization suggestions. A third recorded a company meeting, transcribed it using NAVER CLOVA, then pasted the entire transcription into ChatGPT to generate meeting notes.

At the time, ChatGPT used all consumer inputs for training. Samsung’s semiconductor manufacturing data, source code, and internal meeting content became part of OpenAI’s training corpus – impossible to retrieve or delete. Samsung implemented an emergency limit of 1024 bytes per prompt, then banned all generative AI tools entirely on company devices in May 2023. The incident rippled across industries: major banks, including Goldman Sachs, JPMorgan, and Deutsche Bank, immediately restricted ChatGPT access.

The lesson isn’t that AI tools are inherently dangerous; it’s that consumer interfaces and enterprise solutions operate under fundamentally different privacy guarantees. Samsung employees used a consumer product for enterprise-sensitive work without understanding the data flow implications.

API access provides privacy protections that consumer interfaces cannot match

The distinction between consumer web interfaces and API access represents one of the most important but least understood aspects of AI privacy. For both OpenAI and Anthropic, API data is not used for training by default – a stark contrast to their consumer products.

Access Method	Training Data Use	Data Retention	Human Review	Compliance Tools
Consumer Web/App	Yes (opt-out available)	Varies; chat history stored	Possible for moderation	Limited
API Access	No (by default)	30 days default; ZDR available	Limited; excluded with ZDR	Full audit trails, DPAs, BAAs

Zero Data Retention (ZDR) agreements represent the gold standard for sensitive applications. With ZDR, logs are processed only for real-time abuse detection and immediately discarded. OpenAI offers ZDR for qualifying use cases; Anthropic provides it for Enterprise API customers. Note that ZDR typically excludes Files API uploads and certain beta features – verify coverage for your specific use case.

For regulated industries or organizations processing highly confidential information, API access with ZDR and appropriate Data Processing Agreements should be the minimum acceptable configuration.

Eight categories of information should never enter AI systems, regardless of settings

Even with optimal privacy settings, certain data types carry too much risk. The following categories should be treated as absolute prohibitions for AI input:

Direct identifiers: Full names with SSNs, passport numbers, driver’s licenses, or any government ID combination
Financial credentials: Credit card numbers, CVVs, bank account details, or complete tax documents
Medical records: Diagnoses, treatment histories, prescriptions, mental health notes, or genetic information (consumer AI tools are not HIPAA-compliant)
Authentication data: Passwords, API keys, encryption keys, or security configurations
Proprietary source code: Production codebases, algorithms, or vulnerability findings
Client privileged information: Legal correspondence, contracts under NDA, or third-party confidential data
Unreleased business intelligence: M&A plans, unannounced financials, or strategic initiatives
Infrastructure details: Server configurations, network architecture, or penetration test results

The risk isn’t limited to training data exposure. Even with training disabled, data may be accessed by AI company personnel for safety review, stored in logs subject to legal discovery, or exposed through platform vulnerabilities – like the March 2023 ChatGPT bug that exposed payment information for 1.2% of Plus subscribers.

Privacy-preserving prompts extract value without exposing sensitive data

The most effective privacy protection is preventing sensitive data from reaching AI systems in the first place. This requires deliberate prompt engineering.

A problematic prompt might read:

“Draft a contract termination letter for John Smith at Acme Corp, employee ID 45892, who has been underperforming since his $85,000 salary increase in March.” This exposes names, company identification, employee IDs, salary figures, and dates—all unnecessarily.

A privacy-preserving alternative:

“Draft a contract termination letter template for an employee with performance issues. Include placeholders for [EMPLOYEE_NAME], [COMPANY], [SPECIFIC_PERFORMANCE_ISSUES], and [DATE].” You receive the same value while keeping sensitive details on your side of the interaction.

For financial analysis, avoid: “Our Q4 revenue was $4.2M, down from $5.1M. Customer XYZ Corp owes us $250K. What’s our strategy?” Instead: “Revenue dropped approximately 18% quarter-over-quarter. A major customer has significant outstanding receivables. What general financial strategies could address declining revenue with accounts receivable issues?”

The template-first workflow proves particularly effective: ask AI to create a framework, methodology, or structure first, then apply it yourself with real data locally. This captures AI’s pattern-recognition and synthesis capabilities while keeping confidential details entirely offline.

Automated redaction tools intercept sensitive data before it reaches AI systems

For organizations processing high volumes of AI interactions, manual redaction isn’t scalable. Several enterprise tools now offer real-time PII detection and masking:

Microsoft Azure PII Detection achieves 98% accuracy across multiple redaction policies, including synthetic replacement that maintains data utility while removing identifiers. Strac DLP provides real-time redaction specifically for ChatGPT interactions with GDPR/CCPA compliance built in. Cloudflare AI Prompt Protection supports ChatGPT, Gemini, Claude, and Perplexity with pre-transmission masking. Private AI handles over 50 languages across multiple document formats with Docker/Kubernetes deployment options.

For synthetic data generation – creating realistic fake datasets that maintain statistical properties without exposing real information – tools like Faker (Python library), K2view, and MOSTLY AI can generate compliant test data. This approach works well for AI-assisted analysis of patterns and trends where specific individual records aren’t necessary.

One critical warning from ISACA research: simple anonymization often fails against AI re-identification attacks. Combining just three data points – gender, date of birth, and ZIP code – can identify over 50% of the US population. True synthetic data generation provides stronger protection than anonymization of real records.

Local AI deployment eliminates cloud data transfer entirely

For organizations with stringent confidentiality requirements, local AI deployment offers complete data isolation. Open-source models like Llama 3.2 and Mistral can run entirely on-premises using platforms such as Ollama (easiest setup), vLLM (enterprise production with 8x throughput), or LM Studio (graphical interface for non-technical users).

Hardware requirements have become increasingly accessible. Running a 7B parameter model requires approximately 8GB VRAM (achievable with an RTX 3060), 16GB RAM, and SSD storage. Larger 13B-70B models need 24GB+ VRAM (RTX 4090 or A100 class GPUs) and 32GB+ RAM. For enterprise production, A100 80GB or H100 GPUs with container orchestration provide the necessary scale.

The trade-off is capability: local models generally lag cloud offerings like GPT-4 or Claude 3.5 Sonnet in reasoning quality and knowledge breadth. A hybrid approach often proves optimal. Route sensitive prompts to local models while using cloud AI (with enterprise agreements) for general queries where cutting-edge capability matters more than data isolation.

Browser extensions and third-party AI tools create hidden exposure risks

The December 2024 Chrome extension compromise affected 3.7 million users across 35 extensions, including Cyberhaven’s own security product. Browser-based AI tools present unique risks: many request access to all browsing data, cookies, and session tokens. Gartner’s 2024 advisory recommended organizations “block all AI browsers for the foreseeable future” due to cybersecurity concerns.

Red flags in AI tool privacy policies include vague language like “reasonable means to preserve privacy,” user responsibility clauses for content compliance, data retention without deletion timelines, training on user data without opt-out, and missing security certifications. The absence of SOC 2 or ISO 27001 certification should disqualify enterprise consideration.

Effective controls include maintaining an allowlist of approved extensions only (don’t rely on user judgment), blocking free-tier AI services organization-wide, deploying browser management solutions, and conducting regular permission audits. Many organizations underestimate shadow AI – Harmonic Security found 8.5% of employee prompts contain sensitive data, with 63.8% of ChatGPT usage occurring on free tiers outside IT visibility.

Regulatory requirements are accelerating across US states and the EU

The Colorado AI Act takes effect June 30, 2026, requiring risk management programs aligned with NIST AI RMF, annual impact assessments, and consumer notice before AI-driven consequential decisions affecting employment, healthcare, financial services, and more. California’s CCPA amendments on Automated Decision-Making Technology finalized in May 2025 grant consumers the right to opt out of AI in “significant decisions.”

The EU AI Act began enforcement in February 2025 with prohibited AI practices (social scoring, certain biometric systems), with full high-risk system requirements applying August 2026. Any AI system affecting EU residents must comply regardless of where the company is headquartered. Penalties reach €35 million or 7% of global turnover for prohibited practices.

Industry-specific requirements layer additional obligations. Healthcare organizations must recognize that consumer AI tools are not HIPAA-compliant – OpenAI does not offer Business Associate Agreements for ChatGPT. Financial services face SEC examination priorities specifically flagging AI controls, fraud prevention, and accuracy of AI-related disclosures. The FTC has pursued enforcement for “AI washing” (false capability claims) and can require algorithmic disgorgement – deletion of AI models trained on unlawfully obtained data.

For GDPR’s right to erasure, the challenge is technical: once data is “learned” by AI models, it becomes embedded across billions of parameters and cannot be selectively deleted without retraining. Organizations can delete conversation logs and account data, but the model’s “knowledge” persists. Prevention, using proper privacy settings before sharing data, remains far more effective than attempted deletion afterward.

A practical implementation roadmap for your organization

Weeks 1-2 should focus on governance foundation: establish an AI steering committee spanning IT, Legal, HR, and business leadership; audit current AI tool usage including shadow IT; and draft an AI acceptable use policy specifying prohibited data categories and approved platforms.

Weeks 3-4 shift to technical controls: deploy enterprise AI accounts (eliminating free-tier usage), configure privacy settings across all platforms using this guide’s specifications, implement DLP tools with AI prompt scanning, and create an approved tool allowlist blocking unauthorized browser extensions.

Weeks 5-8 address training and policy enforcement: train all employees on the eight prohibited data categories, safe prompting techniques, and incident reporting procedures. Deploy monitoring for AI API traffic and establish monthly audit cycles.

The vendor due diligence checklist for any new AI tool should verify: training data opt-out availability, security certifications (minimum SOC 2), data residency compliance, incident notification SLAs, and contractual no-training guarantees. For high-risk applications, require Zero Data Retention agreements and conduct on-site security assessments.

Conclusion

AI privacy protection isn’t a single setting – it’s an architecture of technical controls, governance policies, and employee practices. The organizations experiencing data exposure incidents share common patterns: using consumer-tier products for enterprise work, assuming default settings are protective, and lacking visibility into shadow AI usage.

The most important insight from this research is the magnitude of the gap between consumer and enterprise AI products. Samsung’s engineers didn’t make an obvious error, they used a widely-available tool without understanding that consumer ChatGPT and enterprise ChatGPT operate under completely different privacy guarantees.

Every organization needs clear policies specifying which tier is acceptable for which use cases.

Three actions provide the highest immediate impact:

First, configure training opt-outs on every AI platform your organization uses (the specific steps are in this guide).

Second, block free-tier AI services and mandate enterprise accounts with no-training guarantees.

Third, deploy automated PII detection to intercept sensitive data before it reaches AI systems.

These three controls address the vast majority of AI privacy risk while your organization builds out comprehensive governance.

Joseph DeMicco brings over 30 years of experience to his roles as founder and CEO of Amplify Industrial Marketing + Guidance, founder of Industrial Web Search, and instructor for the Goldman Sachs 10,000 Small Businesses program, specializing in data-driven marketing strategies.

FAQ

How do I stop ChatGPT from using my data for training?

Go to Settings → Data Controls → toggle OFF “Improve the model for everyone.” This prevents future conversations from training OpenAI’s models. Additionally, disable the Memory feature under Settings → Personalization → Memory, and use Temporary Chat mode for sensitive one-off queries. Note that ChatGPT Team and Enterprise tiers exclude your data from training by default.

What's the difference between consumer and enterprise AI privacy?

Consumer AI tiers (including paid subscriptions like ChatGPT Plus and Claude Pro) may use your conversations for model training unless you opt out. Enterprise tiers such as ChatGPT Enterprise, Claude for Work, and Microsoft 365 Copilot contractually guarantee your data is never used for training, offer custom data retention policies, and provide compliance certifications like SOC 2. For organizations handling sensitive information, enterprise tiers are the minimum acceptable configuration.

What information should I never put into AI tools?

Eight categories of data should never enter AI systems regardless of privacy settings: government IDs and personal identifiers, financial credentials and bank details, medical records and health information, passwords and authentication data, proprietary source code, client-privileged or NDA-protected information, unreleased business intelligence like M&A plans, and infrastructure details like server configurations. Even with training disabled, this data may be accessed by platform personnel, stored in logs subject to legal discovery, or exposed through security vulnerabilities.

Does Claude use my conversations for AI training?

Yes, by default. Anthropic changed its policy in August 2025 so consumer Claude data is used for training unless you opt out. To disable this, go to Settings → Privacy → toggle OFF “Help improve Claude.” With training enabled, data may be retained for up to five years; opting out reduces retention to 30 days. Claude for Work, Claude Gov, Claude for Education, and all API access are automatically exempt from training.

How can I use AI for sensitive business tasks without exposing data?

Use the template-first approach: ask AI to create frameworks, methodologies, or document structures without including actual sensitive details, then apply the template yourself with real data offline. For example, instead of asking AI to draft a termination letter with specific employee names and salaries, request a template with placeholders like [EMPLOYEE_NAME] and [SALARY]. This captures AI’s analytical capabilities while keeping confidential information entirely on your side.

What AI privacy regulations should businesses prepare for?

The Colorado AI Act takes effect June 30, 2026, requiring risk management programs and annual impact assessments for AI systems making consequential decisions. California’s CCPA amendments on Automated Decision-Making Technology give consumers the right to opt out of AI in significant decisions. The EU AI Act began enforcement in February 2025, with penalties reaching €35 million or 7% of global turnover for violations. Healthcare organizations must also recognize that consumer AI tools are not HIPAA-compliant, and financial services face SEC examination priorities specifically targeting AI controls.

RELATED THOUGHTS

Beyond Rankings: The Technical Infrastructure Behind SEO, AEO, and GEO; What Industrial Leaders Need to Know in 2026

The Industrial Company’s AI Reckoning Is Here

Search Is Changing Right Under You… And Most Businesses Don’t See It Yet

Have questions about your industrial marketing strategy? Contact me for a consultation.

Other Thoughts

Artificial Intelligence, Marketing

March 27, 2026

Beyond Rankings: The Technical Infrastructure Behind SEO, AEO, and GEO; What Industrial Leaders Need to Know in 2026

Traditional SEO still matters, but it no longer tells the whole story. As AI-powered search, answer engines, and generative platforms reshape how buyers discover suppliers, industrial leaders need to understand the infrastructure driving visibility behind the scenes. This article unpacks the rise of SEO, AEO, and GEO, and explains why structured data, entity clarity, and machine-readable content are becoming essential to staying discoverable in the emerging agentic web.

Artificial Intelligence

January 5, 2026

The Industrial Company’s AI Reckoning Is Here

What’s coming in 2026 is not another marketing tactic but a speed reckoning for industrial companies. Buyers now complete most of their evaluation before first contact, and the advantage goes to those that can respond in hours rather than days. AI is not the strategy itself, but the multiplier that makes this pace possible, removing friction so expertise can actually be applied. Companies that move now will pull ahead quietly, while those that wait will spend years trying to understand how their competitors got so fast.