PHI vs PII: What Actually Counts as Protected Data When You Use AI in 2026
A clear walkthrough of how PHI (HIPAA), PII (state privacy law), and personal data (GDPR/CCPA) overlap and differ — and how each interacts with AI tools. For healthcare compliance, legal, and operations leaders who need a working framework, not a glossary.
"PHI" and "PII" get used interchangeably in workforce conversations about AI and privacy, but they're not the same category. They overlap in the middle, they're governed by different laws with different obligations, and the wrong frame leads to either (a) over-restricting AI use by treating ordinary business data as PHI, or (b) under-restricting it by treating regulated health data as ordinary PII. Either failure mode is expensive.
This is a working framework — not a glossary — for what counts as what, how each category interacts with AI tools, and where the most common misclassifications happen.
The three categories, in plain language
PHI (Protected Health Information) is the HIPAA term. Under the HIPAA Privacy Rule (45 CFR § 160.103 and § 164.514), PHI is individually identifiable health information held or transmitted by a covered entity (health plans, healthcare clearinghouses, and providers conducting electronic transactions) or its business associate. "Individually identifiable" plus "health" is the test — both have to be true. Diagnoses, medications, lab values, treatment notes, billing information, appointment records, mental health records — if they're identifiable to a patient, they're PHI.
PII (Personally Identifiable Information) is a U.S. state-privacy and federal-sector term. There's no single federal definition; state laws (CCPA/CPRA in California, SHIELD in New York, Texas DPSA, Virginia VCDPA, Colorado CPA, and many more) each define it slightly differently. Generally: information that identifies, relates to, or could reasonably be linked to an individual. Names, emails, IP addresses, device IDs, government IDs, financial account info, employment records, geolocation, biometrics, and more. PII includes PHI when the data is also health-related — but most PII is not health-related.
Personal data (GDPR / EU AI Act) is the EU equivalent of PII, defined in GDPR Article 4(1). Broader than U.S. PII in important ways: includes pseudonymous identifiers (cookie IDs, online identifiers), and applies extraterritorially to any processor handling EU residents' data. Special category data (Article 9) includes health, biometric, genetic, racial/ethnic origin, religious belief, sexual orientation, trade union, and political opinion data — these get heightened protection.
The overlap: A patient's name, address, and diagnosis is both PHI (under HIPAA) and PII (under state privacy law) and personal data (under GDPR if the patient is in the EU). The non-overlap: An ordinary B2B contact's name and work email is PII (under state privacy law) and personal data (under GDPR) but not PHI. A de-identified clinical research dataset may be neither PHI (after HIPAA-compliant de-identification) nor personal data (after GDPR-compliant anonymization), though re-identification risk has to be assessed.
The HIPAA Privacy Rule's 18 Safe Harbor identifiers
When the question is "is this PHI," the practical screen most healthcare teams use is the Safe Harbor de-identification list. Under 45 CFR § 164.514(b)(2), information is considered de-identified (and therefore not PHI) if these 18 identifiers — relating to the individual, their relatives, employers, or household — are removed AND the covered entity has no actual knowledge that the remaining information could identify the individual:
- Names
- Geographic subdivisions smaller than a state (street, city, county, ZIP — though some 3-digit ZIPs are allowed)
- All elements of dates (except year) directly related to the individual — birth date, admission date, discharge date, death date — and all ages over 89
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate / license numbers
- Vehicle identifiers and serial numbers (including license plates)
- Device identifiers and serial numbers
- URLs
- IP addresses
- Biometric identifiers (fingerprints, voiceprints)
- Full-face photographs and comparable images
- Any other unique identifying number, characteristic, or code
If all 18 are removed, the data is no longer PHI for HIPAA purposes — even if it's still health data. The alternative is the Expert Determination Method (§ 164.514(b)(1)), where a qualified statistician determines that the re-identification risk is very small. Most practical clinical workflows use Safe Harbor.
A clinical note that reads "65-year-old female with type 2 diabetes, A1C 8.2%, on metformin, presents with neuropathy" — with no name, MRN, dates other than year, or other identifiers — is not PHI under Safe Harbor. The same note with the patient's name attached is PHI.
How each category interacts with AI
PHI + AI: A HIPAA-covered AI workflow needs a BAA with the AI vendor, the right vendor tier (sales-managed enterprise tiers; consumer tiers are out), the right feature scope (some BAA-covered tiers exclude specific features), and minimum-necessary controls at the prompt layer. The de-identification-first strategy — strip PHI before the AI sees it, re-attach identifiers after — is the most common safe pattern for everyday clinical documentation work, because it sidesteps the BAA-scope question on a per-interaction basis. See our HIPAA-compliant AI guide and AI BAA vendor guide for more.
PII + AI (state privacy law): Most U.S. state privacy laws (CCPA, VCDPA, CPA, etc.) require notice and choice for personal information processing, including processing by service providers. If you're using an AI tool to process PII, you generally need: (a) the AI vendor as a service provider / processor in your privacy policy, (b) a data processing agreement (DPA) or equivalent contractual language with the vendor, (c) consideration of consumer rights (access, deletion, opt-out of sale/share). Consumer tiers of AI tools that train on inputs by default are problematic for PII because they conflict with "purpose limitation" and "no sale/share" expectations.
Personal data + AI (GDPR): GDPR requires a lawful basis for processing (Article 6), specific protections for special category data (Article 9), data processing agreements with processors (Article 28), and — for high-risk AI processing — a Data Protection Impact Assessment (DPIA, Article 35). Cross-border transfers from the EU to AI vendors in the U.S. require Standard Contractual Clauses (SCCs) or equivalent transfer mechanism. The EU AI Act adds risk-classified obligations on top of GDPR for AI systems deployed in the EU.
Common misclassifications
"It's de-identified, so it's not PHI" — without checking against the 18 identifiers. A note that has the patient's name removed but keeps the MRN, admission date, and ZIP code is still PHI under Safe Harbor. De-identification is a checklist, not a feel.
"It's a B2B contact, so it's not PII" — under U.S. state law, often true; under GDPR, false. GDPR applies to EU-resident natural persons in any capacity, including in their professional role.
"It's a clinician's name, so it's not PHI." False if the clinician's name is associated with a specific patient's care. PHI includes relatives of patients, employers of patients, and household members, plus the providers' names when tied to a specific patient encounter — though provider directory information is generally not PHI on its own.
"The patient consented, so HIPAA doesn't apply." Consent under HIPAA is narrower than under GDPR. HIPAA authorization (the formal written authorization) has specific content requirements. A general consent or terms-of-service acceptance is not a HIPAA authorization. And even with authorization, the BAA requirement on the vendor side doesn't disappear.
"It's research data, so HIPAA doesn't apply." Often false. Research uses of PHI have their own framework under HIPAA (waivers, limited data sets with data use agreements, authorization). Don't treat research as automatically out-of-scope.
"We're not a covered entity, so HIPAA doesn't apply." Often partially false. Business associates (vendors that touch PHI for covered entities) are directly liable under HIPAA. And many non-HIPAA-covered workflows touch health-adjacent data that's still regulated under state health-privacy laws (Washington My Health My Data Act, Nevada SB 370, California CMIA) and consumer health-privacy laws.
A working decision framework
For each piece of data you might put into an AI tool, ask in order:
- Is the data identifiable to an individual? If clearly no, it's not PII / PHI / personal data. (But check re-identification risk on quasi-identifiers.)
- Is it health information? If yes and the holder is a HIPAA-covered entity or business associate, it's PHI. HIPAA workflow applies (BAA, de-identification, minimum necessary).
- Is the individual an EU resident or processed in connection with EU operations? If yes, GDPR applies — lawful basis, processor agreement, transfer mechanism, possibly DPIA.
- Does a U.S. state privacy law apply? Most likely yes if the data subject is a state resident in a state with a comprehensive privacy law. CCPA, CPRA, VCDPA, CPA, TDPSA, etc. each add requirements.
- Does a sectoral law apply on top? GLBA (financial), FERPA (education), COPPA (children), Washington MHMDA (consumer health), etc.
The framework isn't "is this PHI or PII" — it's "which categories apply, and what does each require." Most regulated workflows fall into multiple categories at once.
What this guide is — and what it isn't
This is a working framework for compliance, privacy, and AI deployment teams. It is not legal advice. The classification of specific data and the application of specific privacy laws depends on the data, the holder, the jurisdiction, the data subject, and the use. For decisions about your organization's specific data and AI workflows, consult your privacy officer and counsel.
Related reading: the HIPAA-compliant AI guide, the AI BAA vendor guide, and the healthcare compliance officer profession hub.
This article is general data-protection orientation as of May 2026. PHI is governed by HIPAA (45 CFR Parts 160 and 164); PII definitions vary by U.S. state; personal data is governed by GDPR (EU 2016/679) and the EU AI Act (EU 2024/1689). Definitions, identifier lists, and obligations can change. This article does not constitute legal advice. Sources: HHS OCR HIPAA Privacy Rule (45 CFR § 164.514), state attorney general privacy guidance, EU GDPR text, EU AI Act text, vendor compliance documentation.
Save hours every week with the Accountant AI Cowork Vault
50 skills with citation guardrails and IRS practice-rights routing for tax season.
Related Guides
AI Business Associate Agreements (BAAs) in 2026: Which Vendors Will Sign One, and What That Actually Covers
A vendor-by-vendor look at HIPAA BAAs for AI platforms in 2026. Anthropic, OpenAI, Microsoft, Google, AWS Bedrock — what's eligible, what's excluded, what you still own. For healthcare compliance officers, practice managers, and clinical leaders evaluating AI tools.
AI for Healthcare Compliance Officers: Validate the Device, Surface the Reportable Event, Never Replace the MDR Coordinator
How working healthcare compliance officers are using AI in 2026 — QSR + GMLP documentation gap audits, PCCP scope assessments under the December 2024 final guidance, MedWatch reportability triage, and 510(k) Substantial Equivalence evidence mapping.
Best AI Tools for Healthcare Compliance Officers in 2026
A curated list of the best AI tools for working healthcare compliance officers in 2026 — QSR + GMLP documentation gap audits, PCCP scope assessment, MedWatch reportability triage, 510(k) evidence mapping, plus the surrounding stack (QMS platforms, eQMS, MDR systems, post-market surveillance).