Remove Duplicates in LibreOffice Calc

Duplicates in address lists intended for marketing campaigns increase the costs for the campaign and reduce their success. Thereby, it is actually not hard to find and delete duplicates in address lists.

In the article 'Duplicate Addresses', you can read about how duplicates occur and how they may appear. As mentioned in this article, it doesn’t make much sense to use simple tools for list cleansing, or even to do so manually. Because only a powerful program, such as DedupeWizard, really delivers viable results. And with DedupeWizard, it is even a piece of cake to find and delete multiple entries from address lists.

To do so, follow the steps below:

  1. If you have not already done so, download DedupeWizard free of charge here. Install the program and request a trial activation. Then you can work with the program for one whole week without any restrictions.
  2. The required function is 'Deduplication in one file'.

    Duplicate search in a table

    After starting this function, you first have to select the matching criteria. The duplicate detection can be carried out based on the postal address, the telephone number or the email address. Let’s choose the postal address.

  3. Matching criterion

  4. After clicking on the 'Next' button, the next step is to open the LibreOffice file (ods) to be processed.

    File selection

    If you want to use a worksheet other than the first worksheet in this LibreOffice Calc file, then you can still select it later in the dialogue where you do the field assignment.
  5. After opening of the LibreOffice file (ods), the program automatically leads you to the next step, the field assignment. There, the program has to be told in which columns it can find what information in the table, i.e., which column contains the street or name of the city. To do so, you have to select each data field from the table from the selection list with the column headings that fits best with the designation on the left.

    Field assignment

    The program automatically carries out a default field assignment using the column headings. Since we want to search for duplicates based on the postal address, we also have to indicate the respective columns from the table to be processed that contain the information for all of the components of the postal address. The results of the field assignment can be verified by using 'Verify field assignment’, which can be found on the right half of the screen.
  6. With the 'Next' button, we come to the dialogue where the actual function can be configured. Here, the most important step is to set the threshold for the maximum allowed discrepancy between two addresses.

    Confidence score

    Furthermore, individual components of the postal address can be excluded from the comparison. In doing so, a column from the table to be processed has to be indicated, during the field assignment in the previous step, for each component of the postal address that should be included in the comparison.
  7. A click on the 'Next' button starts the matching. In no time at all, a summary of the results is displayed.

    Matching results

    If the program has found duplicates in the processed table, a click on the 'OK' button takes you to the manual post-editing. Otherwise, reduce the threshold for the degree of matching and the re-start the comparison.
  8. The ‘manual post-editing’ presents the results of the comparison in table form. Those records that should be deleted are marked with a red X, which can be removed if necessary or moved to a different record from the deuplicate group.

    Matching Results en

  9. Another click on the ‘Next’ button leads us to the dialogue with the functions used to transform the results. For example, we could delete the records marked with a deletion flag directly of the source file. All you need to do is click on the 'Delete in source file’ button.

    Deleting in the source table

You can also use DataQualityTools to carry out duplicates within one table. This program offers much more possibilities than DedupeWizard, not only to find duplicates.

Note: There is a video tutorial available for DedupeWizard, introducing the user to the operation of the program based on the example of a duplicate detection within one table.

You can read about how to use DedupeWizard to search for duplicates between two tables in the article 'Find Duplicates between two Tables in Excel'.