Processing of the Results (DataQualityTools)

Manual Post-Editing

Manual post-editing serves to verify and if need be, correct the results of a function. In the case of matching functions, the addresses that are recognised as being duplicates are presented in the form of duplicate groups. Besides the data which is relevant for the comparison, the program also displays the contents of all other columns of the table that are not involved in the comparison. Based on this information, the user can then decide which of the addresses that were recognised as duplicates should really be deleted.

Deletion Log

The deletion log is the printable version of the duplicate records. Two different forms of the deletion log are available, the first being relatively short and compact, and the other form being significantly longer but therefore easier to read. In both forms, the output can be saved as different file types: besides PDF, you may also select HTML, Text, CSV, MHT, EXCEL, RichText (rtf) and graphics. Furthermore, in the above-mentioned file formats, the output can also be printed or sent as an attachment per email.

Results Log

The results log is the printable form of the summary of the matching results. The options to print, save and send as an email are similar to those of the deletion log.

Deleting in the Source Table

With all functions that are capable of deleting records, the results of the function can also be applied directly to the source table. The data records that are marked with a red cross in the manual post-editing are then deleted out of the source table.

Export results

When exporting the result, the records from the source table with all the columns that are contained there are written to a new file according to the result of the function. This file can be either a CSV or an Excel file.

As a rule, all data records from the original table are written to the new file. For the match functions, i.e. all those functions where the result can be used to delete certain data records, the following variants can be selected:

The (cleaned) remainder: here all records are written except those marked with a deletion flag.
Only the hits from the result: here only those records that are marked with a deletion indicator are written.

If two tables are compared against each other, then it is also possible to select which of these two tables will be used as the data source for the file to be created.

Duplicates File

In the duplicates file, the results are saved in a separate file. In doing so, one can select whether the duplicates file should be saved as a CSV or Excel file. Based on this file, more advanced queries can be configured, e.g. in order to delete or transfer contact persons in a second table. The data can thereby be stored in two different forms. One option is a form where only the duplicates and address number of a single record is written in each row. In the other form, both the address number of the record to be deleted and that of the record to be maintained are saved in the row beside the duplicate number.

Flagging Records

The records that are recognised as being duplicates can also be flagged in the table, either with the duplicate number or with a simple deletion flag. Also here, this information can then be used to configure more advanced queries.

Processing Using a Stored Procedure

The results can also be further processed using a specific stored procedure in the database. Of course, this is only possible with database servers. Also here, this option can then be used to add further actions to the deletion process, such as for example a secondary deletion procedure or the transfer of information from the address that is to be deleted to the address that is to be maintained.

Enrich data

The results can also be used to transfer information from one table into another. The user can determine which data fields can be used for this purpose. For example, the data field that contains the telephone number can be entered as a source data field, and in the other table, a previously empty data field is designated as target data field, where the contents of the target data field can also be overwritten with data from the source table if necessary. If the target table is an Excel file, then either an existing target data field can be entered, or a new data field that is to still be created by the program. With all other data formats, the target data field has to already exist in the table. In other words, the results of the function establish a relationship between the individual records of both tables, which are then used to supplement information in certain data fields from one table into the other.

Optionally, records that are enriched with additional data can be supplemented with a note indicating the source of this data. This entry is required in order to be able to fully comply with the right to information in connection with personal data and thus meet the requirements of various data protection laws such as the GDPR (General Data Protection Regulation).

Completing Records

Completing records involves transferring data based on the result from one record to another in order to complete the latter.

Merging Records

Merging records involves the following two steps:

The records marked with a red cross in manual post-editing are deleted.
In addition, selected columns in the remaining records are supplemented with data from the deleted records.