Find Duplicates between two Tables in Access
Black lists are an important component of marketing. It is not difficult to maintain them. And with the help of the DataQualityTools, black lists can also easily be processed.
Black lists are a good method of respecting the wishes of individual addressees to not receive any advertising, and even save money in doing so. Furthermore, creating your own black lists allows you to exclude certain recipients from a particular marketing campaign, such as competitors or your own employees, and for campaigns designed to recruit new customers, you can exclude your existing customers.
To process black lists, you need a program that is able to carry out an error-tolerant duplicate detection between two tables, such as the DataQualityTools. By doing a search for duplicates between two tables, you can also synchronise address lists, for example to remove addresses that were already solicited with a previous marketing campaign so as not to solicit them again, or to transfer information, such as the telephone or customer number, from one address list to another based on the results of the comparison. There are many ways to make use of a duplicate detection between two tables. And with the DataQualityTools, a search for duplicates between two tables is done in a snap.
To do so, follow the steps below:
- If you have not already done so, download DataQualityTools free of charge here. Install the program and request a trial activation. Then you can work with the program for one whole week without any restrictions.
- The functions we need can be found in the menu of the 'Matching between two tables' block. Let’s choose 'Matching using the postal address'.
- After starting this function, the project management appears. Here, you create a new project with a project name of your choice and then click on the 'Next' button.
- In the next step, we start by opening the Access file with the data to be processed by clicking on the 'Open file' button. With database servers (MS SQL Server, MySQL, Oracle, IBM DB2 or PostgreSQL), we have to select the corresponding database server instead, in the 'Format / Access to' selection list. After that, we enter the name of the database server. After clicking on the 'Connect to server' button, the access data have to be entered. Finally, the desired database containing the table can be selected in the corresponding selection lists.
- Afterwards, the program has to be told in which columns it can find what information in the table, i.e., which column contains the street or name of the city. To do so, you have to select each data field from the table from the selection list with the column headings that fits best with the designation on the left. The program automatically carries out a default field assignment using the column headings. Since we want to search for duplicates based on the postal address, we also have to indicate the respective columns from the table to be processed that contain the information for all of the components of the postal address. The results of the field assignment can be verified by using 'Verify field assignment’, which can be found on the right half of the screen.
- The table we have just indicated in the program is the table in which the deduplication will be carried out. If duplicates are then deleted in the source table at a later time, they will then also be deleted in this table. Now we also have to specify a second table in the program, the black list. Records are never deleted from this table. It is write-protected. To select the second table, click on the 'Next' button. The dialogue that appears looks exactly like the previous dialogue and is also used in the same way. Indicate the table with the black list and carry out the field assignment the same way as for the first table.Note: The two tables used for the duplicate detection may have completely different structures. For example, the house number could be written in the same column as the street in one table, and in the other, this information could be indicated in two separate columns. The important thing is that all three columns are indicated in the field assignment for both these tables.
- With the 'Next' button, we come to the dialogue where the actual function can be configured. Here, the most important step is to set the threshold for the maximum allowed discrepancy between two addresses. Furthermore, individual components of the postal address can be excluded from the comparison. In doing so, a column from the table to be processed has to be indicated, during the field assignment in the previous step, for each component of the postal address that should be included in the comparison.
- A click on the 'Next' button starts the matching. In no time at all, a summary of the results is displayed. If the program has found duplicates between the processed tables, a click on the 'OK' button takes you to the dialogue with the functions for transforming the results. Otherwise, reduce the threshold for the degree of matching and the re-start the comparison.In the dialogue with the functions for transforming the results, there is a button at the very top with the caption 'Manual post-editing'. Here, the results of the comparison is presented in table form. Those records that should be deleted are marked with a red X, which can be removed if necessary.
- Finally, you must transform the results. For example, we could delete the records marked with a deletion flag directly of the source file. All you need to do is click on the 'Delete in source file’ button.
You can use DeduplicationWizard as well to execute a search for duplicates between two tables. It can only process Excel files and comprises only the most important functions of DataQualityTools.
Note: There is a video tutorial available for DataQualityTools, introducing the user to the operation of the program based on the example of a duplicate detection within one table.
You can read about how to use DataQualityTools to search for duplicates within an address list in the article 'Remove Duplicates in Access'. And in the article 'Scheduled Duplicate Detection with BatchDeduplicator', you can read about how the duplicate detection can be automated.