In April 2023, Samsung engineers did something thousands of employees do every day. They pasted code into ChatGPT. One was debugging a bug. Another transcribed a meeting and fed it to the AI for summary notes. A third optimized a test sequence.
Within weeks, Samsung banned ChatGPT company-wide.
The code contained proprietary semiconductor information. Once submitted, it became part of OpenAI’s training data. Samsung couldn’t retrieve it. The damage was permanent, invisible, and entirely preventable.
This incident captures everything complicated about AI security. The tool worked exactly as designed. Employees used it to solve real problems. Nobody intended harm. Yet sensitive data left the building and can never come back.
What Data Exposure Actually Looks Like
The Samsung leak is famous because it involved a major company. But research from late 2025 found that 34.8% of all employee ChatGPT inputs contain sensitive data. Over a third of every query. That’s not an edge case. That’s the baseline.
Shadow AI drives much of this exposure. When IT doesn’t provide approved AI tools, employees find their own. A recent survey found more than 60% of workers rely on personal, unmanaged AI tools rather than enterprise-approved alternatives. These tools operate outside corporate monitoring. No logs. No oversight. No data loss prevention controls.
The data that leaks follows predictable patterns:
Source code and configuration files. Developers paste code for debugging help. Those files often contain API keys, database credentials, and internal system details.
Customer information. Support teams summarize tickets. Sales reps analyze prospect data. Marketing generates personalized content. All of it may contain names, emails, purchase histories, or financial details.
Internal communications. Meeting transcripts, Slack messages, and email threads get fed to AI for summarization. Those conversations reveal strategy, personnel issues, and competitive intelligence.
Financial data. Spreadsheets, forecasts, and transaction records get analyzed by AI tools that were never vetted for compliance.
Most employees have no idea this is risky. They see a helpful tool, not a data exfiltration vector. That perception gap is where security programs fail.
Prompt Injection: The Attack Vector Nobody Prepared For
Traditional security focuses on network perimeters and access controls. AI introduces something different. Prompt injection attacks manipulate AI systems by hiding instructions in the data they process.
The attack works because AI can’t reliably distinguish between legitimate instructions and malicious ones embedded in content. An attacker embeds hidden text in a document, website, or email. When the AI processes that content, it executes the attacker’s instructions instead of, or alongside, the user’s commands.
Security researcher simonw, writing on Hacker News, identified the core problem: “for a security issue like this we need a 100% reliable solution, or people WILL figure out how to exploit it.”
That reliable solution doesn’t exist yet.
In the same discussion, user bawolff framed the architecture problem bluntly: “Black box we don’t really understand executing shell scripts in response to untrusted user input.”
When you connect AI to tools that can send emails, modify databases, or access file systems, you’re giving those capabilities to anyone who can influence what the AI reads. A malicious PDF, a compromised webpage, even a carefully crafted email in someone’s inbox. All become potential attack vectors.
Real exploits have already emerged. Researchers demonstrated prompt injection against Bing Chat within weeks of launch. GitHub Actions using AI agents proved vulnerable to attacks hidden in pull request descriptions. MCP (Model Context Protocol) servers, which extend AI capabilities, have shown critical vulnerabilities in roughly 10% of scanned implementations.
User noodletheworld captured the current state on Hacker News: “there is no solution currently, other than only use trusted sources…this idea of arbitrary content going into your prompt…can’t be safe. It’s flat out impossible.”
This isn’t fear-mongering. It’s accurate technical assessment. Current AI architectures lack the ability to enforce strict boundaries between instructions and data. Until that changes fundamentally, any AI system with access to untrusted content and powerful tools represents a security risk.
Evaluating AI Vendor Security
Not all AI services carry equal risk. The difference between consumer and enterprise offerings is substantial, but marketing often obscures real distinctions.
Start with training data policies. By default, OpenAI uses conversations from free ChatGPT to improve their models. Their enterprise tier explicitly does not. “By default, we do not use your business data for training our models,” states their enterprise privacy documentation. Other vendors vary. Always get written confirmation.
Encryption matters, but details matter more. Look for AES-256 encryption at rest and TLS 1.2+ in transit. More importantly, understand who holds the keys. Vendor-managed encryption is standard. Customer-managed keys provide more control but increase operational complexity.
Data residency becomes critical for regulated industries. Where is your data stored? Can it cross borders? GDPR requires adequate protection for data leaving the EU. Healthcare and financial services face additional geographic restrictions.
Examine retention policies carefully. How long does the vendor keep your prompts and responses? Can you delete them? What happens to conversation history when you close an account? Some vendors retain data indefinitely for “safety research.” That may conflict with your data minimization requirements.
Third-party integrations multiply risk. The November 2025 breach affecting OpenAI users came through Mixpanel, a third-party analytics vendor. The breach exposed user names, email addresses, and usage data. Your security is only as strong as your vendor’s weakest integration partner.
Audit rights and compliance certifications provide partial assurance. SOC 2 Type II reports, ISO 27001 certification, and GDPR compliance attestations indicate baseline security practices. They don’t guarantee those practices work. Request the actual reports, not just the marketing claims.
Personal vs. Enterprise Security: Real Differences
The gap between consumer and business AI tiers matters more than most people realize.
Consumer tiers (free or low-cost subscriptions) typically:
- Use your conversations for model training unless you specifically opt out
- Store conversation history indefinitely
- Provide no enterprise authentication options
- Lack administrative controls and audit logging
- Offer limited or no compliance certifications
- Route data through shared infrastructure without isolation
Enterprise tiers generally:
- Exclude business data from model training by default
- Provide configurable data retention with deletion capabilities
- Support SSO, SCIM, and enterprise identity management
- Include admin dashboards, usage analytics, and audit logs
- Maintain compliance certifications (SOC 2, ISO 27001, HIPAA BAA where applicable)
- Offer dedicated infrastructure or logical isolation
The practical difference shows up in liability. When an employee pastes customer data into free ChatGPT and that data influences future model outputs, your legal exposure is substantial. That same data submitted through a properly configured enterprise tier stays isolated, logged, and deletable.
Cost follows capability. Enterprise AI subscriptions run $20-60 per user monthly for basic tiers. Advanced features like dedicated instances, custom models, or enhanced compliance controls push pricing higher. Compare that cost against the regulatory fines and breach response expenses you’re avoiding.
Building Practical Security Controls
Policies alone don’t prevent data exposure. Samsung had policies. They also had engineers under deadline pressure using the fastest tool available.
Technical controls create friction that policies can’t match.
Deploy enterprise AI tools proactively. If you don’t provide approved alternatives, employees will find unapproved ones. The shadow AI problem is fundamentally a supply problem. Solve it with better supply.
Implement DLP for AI workflows. Data loss prevention tools can monitor and block sensitive content before it reaches AI services. Modern DLP solutions recognize common AI tool traffic and can apply specific policies.
Use browser isolation for AI access. Enterprise browser solutions can route AI tool access through controlled environments where corporate data can’t reach the clipboard.
Create allowlists, not blocklists. Blocking ChatGPT by domain is trivial to circumvent. Approving specific AI tools with proper configuration is more defensible.
Log everything. Enterprise AI tiers provide usage logs. Collect them. Integrate them with your SIEM. Establish baselines. Investigate anomalies.
Train specifically on AI risks. Generic security awareness doesn’t cover the unique risks of AI tools. Show employees what data exposure looks like. Demonstrate prompt injection. Make the risks concrete.
Establish incident response procedures for AI exposure. If sensitive data reaches an AI tool, what’s your response? Who investigates? What gets reported? Define this before you need it.
The Limits of Current Solutions
Honest assessment requires acknowledging what we can’t yet solve.
Prompt injection has no complete fix. Mitigation helps. Careful architecture helps more. But as user TeMPOraL noted on Hacker News: “Prompt injection, being equivalent to social engineering, will always be a problem.” The fundamental architecture of current language models makes reliable separation of instructions and data extremely difficult.
AI output verification remains largely manual. When AI generates code, contracts, or customer communications, humans must verify accuracy and appropriateness. Automation of that verification is nascent at best.
Data removal from trained models is impractical. If your data contributed to model training, extracting it completely may be technically impossible. The Samsung engineers’ code will influence model behavior indefinitely.
Compliance frameworks lag technology. GDPR wasn’t written with large language models in mind. Neither was HIPAA. Regulators are catching up, but ambiguity persists. “Reasonable security measures” in a 2016 regulation meant something different than it does now.
What This Means Going Forward
Security teams face a genuine dilemma. AI tools provide legitimate productivity benefits. Blocking them entirely pushes employees toward uncontrolled alternatives. Permitting them without controls creates data exposure. Neither extreme works.
The pragmatic path runs through controlled adoption. Provide secure AI tools. Configure them correctly. Monitor their use. Train employees on risks. Accept that some residual risk remains.
Watch the regulatory landscape. The EU AI Act’s compliance deadlines approach. California’s new automated decision-making rules took effect in 2026. More jurisdictions will follow. Security programs that account for AI-specific requirements now will adapt more easily as requirements evolve.
Build relationships with your AI vendors. Security is not a feature you purchase once. It’s an ongoing conversation about configurations, policies, and incident response. Vendors who won’t engage on security details probably can’t provide enterprise-grade protection.
The organizations that navigate this well won’t be the ones who avoided AI. They’ll be the ones who integrated it deliberately, with clear policies, appropriate controls, and honest recognition of what they could and couldn’t prevent.
That recognition might be the hardest part. We’re accustomed to security problems with solutions. Patches. Configurations. Training. The AI security landscape includes risks without current fixes. Living with that uncertainty while still moving forward requires a different kind of security thinking.
Perhaps that’s the real shift. Not just new tools requiring new policies, but new categories of risk requiring new comfort with ambiguity.