Remove Duplicates in Access
Duplicates in address lists intended for marketing campaigns increase the costs for the campaign and reduce their success. Thereby, it is actually not hard to find and delete duplicates in address lists.
In the article 'Duplicate Addresses', you can read about how duplicates occur and how they may appear. As mentioned in this article, it doesn’t make much sense to use simple tools for list cleansing, or even to do so manually. Because only a powerful program, such as DataQualityTools, really delivers viable results. And with DataQualityTools, it is even a piece of cake to find and delete multiple entries from address lists.
To do so, follow the steps below:
- If you have not already done so, download DataQualityTools free of charge here. Install the program and request a trial activation. Then you can work with the program for one whole week without any restrictions.
- The functions we need can be found in the menu of the 'Matching within a table' block. Let’s choose 'Matching using the postal address'.
- After starting this function, the project management appears. Here, you create a new project with a project name of your choice and then click on the 'Next' button.
- In the next step, we start by opening the Access file with the data to be processed by clicking on the 'Open file' button. With database servers (MS SQL Server, MySQL, Oracle, IBM DB2 or PostgreSQL), we have to select the corresponding database server instead, in the 'Format / Access to' selection list. After that, we enter the name of the database server. After clicking on the 'Connect to server' button, the access data have to be entered. Finally, the desired database containing the table can be selected in the corresponding selection lists.
- Afterwards, the program has to be told in which columns it can find what information in the table, i.e., which column contains the street or name of the city. To do so, you have to select each data field from the table from the selection list with the column headings that fits best with the designation on the left. The program automatically carries out a default field assignment using the column headings. Since we want to search for duplicates based on the postal address, we also have to indicate the respective columns from the table to be processed that contain the information for all of the components of the postal address. The results of the field assignment can be verified by using 'Verify field assignment’, which can be found on the right half of the screen.
- With the 'Next' button, we come to the dialogue where the actual function can be configured. Here, the most important step is to set the threshold for the maximum allowed discrepancy between two addresses. Furthermore, individual components of the postal address can be excluded from the comparison. In doing so, a column from the table to be processed has to be indicated, during the field assignment in the previous step, for each component of the postal address that should be included in the comparison.
- A click on the 'Next' button starts the matching. In no time at all, a summary of the results is displayed. If the program has found duplicates in the processed table, a click on the 'OK' button takes you to the dialogue with the functions for transforming the results. Otherwise, reduce the threshold for the degree of matching and the re-start the comparison.In the dialogue with the functions for transforming the results, there is a button at the very top with the caption 'Manual post-editing'. Here, the results of the comparison is presented in table form. Those records that should be deleted are marked with a red X, which can be removed if necessary.
- Finally, you must transform the results. For example, we could delete the records marked with a deletion flag directly of the source file. All you need to do is click on the 'Delete in source file’ button.
You can use DeduplicationWizard as well to carry out duplicates within one table. It can only process Excel files and comprises only the most important functions of DataQualityTools.
Note: There is a video tutorial available for DataQualityTools, introducing the user to the operation of the program based on the example of a duplicate detection within one table.
You can read about how to use DataQualityTools to search for duplicates between two tables in the article 'Find Duplicates between two Tables in Access'. Moreover, in the article 'Scheduled Duplicate Detection with BatchDeduplicator', you can read about how to automate the search for duplicates.