Imported Duplicated Email Entries, How to Clean Up the DB?

Use this forum for questions regarding adoption and functionality of OpenEMM

Moderator: moderator

bobby
Posts: 7
Joined: Thu Oct 18, 2007 10:21 pm

Imported Duplicated Email Entries, How to Clean Up the DB?

Post by bobby »

We upgraded to OpenEMM 5.5.1 successfully. In doing so, we preserved our current configuration, i.e. all contacts in the OpenEMM DB.

We regularly import contacts from Outlook vcards, exported as CSV files into OpenEMM via the GUI. This use to work fine in our previous version of OpenEMM (don't recall version number), however in our new import, we noticed that duplicate entries were actually imported, even though we specifically said to use the "email" field as the primary key.

This is really bad because people that already unsubscribed, likely got another email. Also, people got multiple copies. In addition, I noted that there were about 2000 bounced emails that came into my inbox, but the stats in OpenEMM show only 612.
i.e.

mysql> select * from customer_1_tbl where email="bobby@abc.com";
+-------------+-------------------------------+--------+----------+-----------+----------+---------------------+---------------------+-------------------------+---------------+
| customer_id | email | gender | mailtype | firstname | lastname | creation_date | change_date | title | datasource_id |
+-------------+-------------------------------+--------+----------+-----------+----------+---------------------+---------------------+-------------------------+---------------+
| 202 | bobby@abc.com | 2 | 1 | Bobby | SSS | 2007-10-24 14:27:58 | 2007-10-24 14:27:58 | Consultant | 7 |
| 2844 | bobby@abc.com | 2 | 1 | Bobby | SSS | 2009-01-20 10:40:49 | 2009-01-20 10:40:49 | NULL | 11 |
| 10409 | bobby@abc.com | 2 | 1 | Bobby | SSS | 2009-01-20 11:32:35 | 2009-01-20 11:32:35 | Managing Director | 29 |

We have over 2000 duplicate entries and I was wondering what is the best way to clean up the database? I was hoping I could just undo the last few imports, or should I just use a SQL statement to delete them. If the latter, what SQL statements and what tables should I run them on? Any other way to clean up.

Is the database schema posted anywhere?

thanks,
bobby
pheelix
Posts: 101
Joined: Thu Nov 20, 2008 4:34 pm
Location: Dresden, Germany
Contact:

Post by pheelix »

i got the official handbook and there are all db-tables explained.

but for a graphical overview of the connections between the tables, you could some reverse-engineering (good application => http://schemaspy.sourceforge.net/).

or you have a look at the sourcecode and try to find the function for deleting recipients and run it in a loop for all your duplicate entries.

greetz
mmenke
Posts: 30
Joined: Fri Jul 24, 2009 12:14 pm
Location: Germany
Contact:

Post by mmenke »

Even though I think you already solved your problem by now, maybe the following helps someone else.

A possible solution is to delete all entries from the bad import(s) at first and to start all over again.

Could be done by a simple MySQL query like this:

Code: Select all

delete from customer_1_tbl where creation_date like '2010-01-01 16%';
This would delete all entries that were created on January 1st, 2010 after 16:00. You need to know the exact times of the imports, which can be done by selecting/showing the entries in the DB first.
posbis
Posts: 17
Joined: Thu May 28, 2009 8:06 am

Post by posbis »

mmenke wrote:Even though I think you already solved your problem by now, maybe the following helps someone else.

A possible solution is to delete all entries from the bad import(s) at first and to start all over again.

Could be done by a simple MySQL query like this:

Code: Select all

delete from customer_1_tbl where creation_date like '2010-01-01 16%';
This would delete all entries that were created on January 1st, 2010 after 16:00. You need to know the exact times of the imports, which can be done by selecting/showing the entries in the DB first.
I have the same problem. By accident I imported the same Excel-CSV twice. If I know the datasource_id could I use it for deleting the duplicates ? Which would be the exact statement ?

Code: Select all

delete from customer_1_tbl where datasource_id =  '5';
Does deleting records from "customer_1_tbl" table have any side effects apart from loosing these records ?

Any help appreciated....

Regards,
Oliver
Post Reply