Purview can generate a significant number of false positives from its built-in Sensitive Information Types. Part of my job when running datasecurity projects is to help resolve these and keep them from popping up again in the future. Some of this may be common sense, and some of it may be new to you...either way, let's dive in:
Analytics & Initial Assessment
One of the more recent options for reporting on DLP is Data Loss Prevention analytics. I highly, highly recommend you enable this if you haven't already:

Enabling Data Loss Prevention Analytics in Microsoft Purview
If you already have DLP policies configured in your environment, you should review specific DLP rule match hits in the Activity Explorer.
Data Loss Prevention Analytics can take a few days to start showing data, so this is a good intermediate step to start getting a handle on your sensitive data.
Confidence-Level Triage
Each SIT has multiple confidence levels (low, medium, and high) that are essentially different levels of accuracy. Filtering your reports by Confidence Level gives you a more actionable idea of what you're dealing with.
It's worth noting that if you have, say, 1000 hits on Low Confidence Social Security Numbers, you might not necessarily have that many Social Security Numbers floating around.

Spot check a few hits under each confidence level to get an idea of what you're dealing with.
Customizing SITs and Managing Exceptions
If you're dealing with Low Confidence Social Security Number hits that are actually unformatted 9 digit numbers that, for example, represent your customer account numbers, you have a few options:
Copy the built-in Social Security Number SIT and build exceptions for your customer account numbers, or:
Use the built-in Social Security Number SIT in your DLP policies, and build exceptions into the DLP policies themselves.
Of course, one of these scales better than the other, but it's worth knowing all of your options.
DLP Policy & Rule Structure
You probably won't ever get your SITs 100% accurate (if you do, let me know. I'd love to build a case study). One method that I use to account for this is to build (3) rules per policy:
Low Confidence Rule
This rule will trigger on low confidence SSN hits and generate a policy tip with "non-aggressive" text, something to the effect of:
"Potentially sensitive PII data may exist in this email. If this is an error, you can ignore this and continue."

Medium Confidence Rule
This rule will trigger on medium confidence SSN hits and generate a policy tip formatted something like:
"Sensitive PII data may exist in this email. Please consider a more secure format when working with PII. If this is an error, you can provide a justification and override the policy."
I would then setup a "block only people outside your organization" action with an option for the user to override it with a business justification.

High Confidence Rule
This rule will trigger on high confidence SSN hits and generate a policy tip similar to:
"Sensitive PII data has been detected in this email. This message will be blocked. If this is an error, please contact IT Security."
Then setup an action of "block only people outside your organization" with no option configured to override.
Optionally, you could also configure an administrator alert for this.


Maybe I'll come up with some automation for this :)
What About...?
Exact Data Match (EDM)
If your organization has sensitive structured data (like customer accounts), consider using an EDM Classifier to further reduce false positives by matching data in a more scalable fashion.
This is especially great for financial services orgs where you're constantly adding and removing customers from your records.
Advanced Conditions
You can also use additional DLP Rule conditions (e.g., character proximity, document property, etc.) to further refine your SIT detection.
Just be careful modifying the default character proximity to anything above 300. The larger the number, the more likely you are to encounter DLP engine timeouts.
Be especially cautious when using the "Anywhere in the document" option. Honestly, I'd just never consider using it.

I know the checkbox is enticing, just...don't.
Other Notes
Review User Overrides
If you allow user overrides at the medium confidence level, track how often these are actually used.
If you notice frequent overrides, this might mean there are false positives that need to be addressed or that your users need more training.
Communicate!
As you update your SITs and DLP policies, make sure you keep your users informed so they understand new notifications and rule actions.
This ensures you reduce help desk calls, giving you more time for more important things (like meetings that could have been emails).
One More Thing...
Most of the built-in SITs have an entity definition page that explains how they work. Here's one for US Social Security Number:
U.S. social security number (SSN) entity definition | Microsoft Learn
If you're having issues with your Purview Data Security efforts, let's connect and see where I can help!
This post originally appeared on Rubix - Solving for the Modern Workplace.