Imported Duplicated Email Entries, How to Clean Up the DB?

bobby · Post by **bobby** » Tue Mar 10, 2009 8:22 pm

We upgraded to OpenEMM 5.5.1 successfully. In doing so, we preserved our current configuration, i.e. all contacts in the OpenEMM DB.

We regularly import contacts from Outlook vcards, exported as CSV files into OpenEMM via the GUI. This use to work fine in our previous version of OpenEMM (don't recall version number), however in our new import, we noticed that duplicate entries were actually imported, even though we specifically said to use the "email" field as the primary key.

This is really bad because people that already unsubscribed, likely got another email. Also, people got multiple copies. In addition, I noted that there were about 2000 bounced emails that came into my inbox, but the stats in OpenEMM show only 612.
i.e.

mysql> select * from customer_1_tbl where email="bobby@abc.com";
+-------------+-------------------------------+--------+----------+-----------+----------+---------------------+---------------------+-------------------------+---------------+
| customer_id | email | gender | mailtype | firstname | lastname | creation_date | change_date | title | datasource_id |
+-------------+-------------------------------+--------+----------+-----------+----------+---------------------+---------------------+-------------------------+---------------+
| 202 | bobby@abc.com | 2 | 1 | Bobby | SSS | 2007-10-24 14:27:58 | 2007-10-24 14:27:58 | Consultant | 7 |
| 2844 | bobby@abc.com | 2 | 1 | Bobby | SSS | 2009-01-20 10:40:49 | 2009-01-20 10:40:49 | NULL | 11 |
| 10409 | bobby@abc.com | 2 | 1 | Bobby | SSS | 2009-01-20 11:32:35 | 2009-01-20 11:32:35 | Managing Director | 29 |

We have over 2000 duplicate entries and I was wondering what is the best way to clean up the database? I was hoping I could just undo the last few imports, or should I just use a SQL statement to delete them. If the latter, what SQL statements and what tables should I run them on? Any other way to clean up.

Is the database schema posted anywhere?

thanks,
bobby

pheelix · Post by **pheelix** » Wed Mar 11, 2009 8:56 am

i got the official handbook and there are all db-tables explained.

but for a graphical overview of the connections between the tables, you could some reverse-engineering (good application => http://schemaspy.sourceforge.net/).

or you have a look at the sourcecode and try to find the function for deleting recipients and run it in a loop for all your duplicate entries.

greetz

mmenke · Post by **mmenke** » Fri Feb 12, 2010 4:15 pm

Even though I think you already solved your problem by now, maybe the following helps someone else.

A possible solution is to delete all entries from the bad import(s) at first and to start all over again.

Could be done by a simple MySQL query like this:

Code: Select all

delete from customer_1_tbl where creation_date like '2010-01-01 16%';

This would delete all entries that were created on January 1st, 2010 after 16:00. You need to know the exact times of the imports, which can be done by selecting/showing the entries in the DB first.

posbis · Post by **posbis** » Mon May 03, 2010 1:53 pm

mmenke wrote:Even though I think you already solved your problem by now, maybe the following helps someone else.

A possible solution is to delete all entries from the bad import(s) at first and to start all over again.

Could be done by a simple MySQL query like this:
Code: Select all
delete from customer_1_tbl where creation_date like '2010-01-01 16%';
This would delete all entries that were created on January 1st, 2010 after 16:00. You need to know the exact times of the imports, which can be done by selecting/showing the entries in the DB first.

I have the same problem. By accident I imported the same Excel-CSV twice. If I know the datasource_id could I use it for deleting the duplicates ? Which would be the exact statement ?

Code: Select all

delete from customer_1_tbl where datasource_id =  '5';

Does deleting records from "customer_1_tbl" table have any side effects apart from loosing these records ?

Any help appreciated....

Regards,
Oliver