This service matches and consolidates records between sources, resulting in cleaner source data without duplicate records. In case of duplicate records, CUD provides information about multiple matches that enable records to be merged or de-merged. CUD provides consolidated data about a person, even from multiple queries to multiple systems, making it easier to get the desired results.
A globally unique identifier is required to achieve identity and access management, reporting and auditing. It must be possible to reference information about a person from more than one data source and be assured that it is the same person.
Data matching is the process of matching records from multiple sources. Matching may result when few or all attributes of a record in a primary data system (PDS) are found identical to a record in another PDS.
Once person records are uniquely matched then a global unique identifier may confidently be assigned.
The process of matching occurs in the following scenarios:
- Data from a new PDS is available for CUD provisioning
- Data is entered into existing PDS
- Data is changed significantly within an existing PDS
Dynamic data matching also happens when new or significantly changed data is available to CUD.
Full data sets can be compared against CUD data, where the matching process compares every PDS record against every record stored in CUD. This can result in multiple matches indicating duplicate records within the PDS.
Matching Strategies
CUD implements various matching strategies applying different test conditions for the records to be matched. The strategies result in matches generated with varying levels of confidence. Matches with a high measure of confidence (exact matches) are accepted without further processing.
Other high confidence matches are made where one or more unique attributes match between systems. For example, where email addresses for entities in more than one system are same.
One of the low confidence matching strategies considers unclear or fuzzy match results. In such a case, CUD tests for character similarities between two attribute values. Also, it allows for typographical errors intentionally. For example: Ann Smith may be treated same as Anne Smith.
More importantly, low confidence matches require Data Controllers to confirm matches manually.
Guidance for matching
A key function of CUD is to match data received from different systems. The following types of matches are defined:
- Definite one-to-one match - This is triggered automatically
- Possible one-to-one match - This requires human confirmation
- Possible one-to-many match - This requires human confirmation
The UI provides a means for authorised users to:
- view all matches that are made automatically.
- view and confirm or reject possible matches.
- reject existing matches.
- manually make a match where no possible match has been found by the system.
The CUD Attribute Set
Details of the full CUD attribute set are available. Although, it would be technically possible to represent every data item from every data source in CUD, this is not the function of CUD and data warehouse is responsible for it. CUD provides a set of commonly used, defined and agreed identity attributes, which are use cases for similar systems to enable equivalent functions and services to be applied to other data and are out of scope for CUD.