With these functions, the columns to be used for the deduplication and the criteria to be applied can be defined freely. It can be determined for each column whether it should be included in the comparison or not.
If a data field is to be compared with, the following information must be entered:
- Field contents: Type of content of the data field. The selection made here should describe the contents of the data field as precisely as possible, so that the program can handle the data appropriately during the comparison. For a data field that contains a postal code, you should also select 'postal code' as field content.
- Confidence score: Threshold value for the confidence score in percent that must at least be reached for the column pair or group in question.
If several data fields contain the same field contents, they can be combined into a group. This either combines their contents for comparison or compares each data field in one group individually with each data field in the other group.
A separate threshold value can optionally be defined for the confidence score calculated for the entire record.
In addition, the following options can be used if required:
- Multiple matching definitions: This allows you to define several different comparison criteria that are then processed one after the other. These could be, for example, the telephone number, the email address and the postal address, similar to the All-in-One Deduplication.
- Weighting: By reducing the weight of less important data fields, the confidence score calculated for the entire record can be minimised.
- Skip record if data field is empty: This allows you to exclude incomplete data records from the comparison.
- Condition which may not apply: In this case, the threshold value for the confidence score must not be exceeded but undershot to lead to a hit. For example, data records can be determined in which the first name matches, but not the personal form of address. Or it could be ensured that when comparing two tables, two data records are not compared if their ID is identical.
Since the criteria for the comparisons with this function can be selected freely, a wide variety of applications are conceivable: Thus, deduplication can be made using birth dates, banking accounts or credit card numbers. But deduplications could also be made between tables that contain information other than address data, such as the item description, book title or a comment.