Very slow Recipient Import

Use this forum for questions regarding adoption and functionality of OpenEMM

Moderator: moderator

quasimotoca
Posts: 8
Joined: Thu Nov 02, 2006 10:08 pm
Location: Toronto

Very slow Recipient Import

Post by quasimotoca »

Hi:
The import of recipient lists is very very slow. Even on moderately sized lists of around 5K. I tried a list of 185K and I got a heap error. Is this something that you are aware of and are looking into? Is there a quick and dirty way I can bulk load a .csv file into the database? Maybe a simple SQL query from the command-line? Other than that, killer system folks, well done!!
Cheers,
Dave Cook
ramnel
Posts: 1
Joined: Fri May 18, 2007 4:08 pm

Index

Post by ramnel »

I was having the same problem and a friend found that the file was not index. It now will upload 5,000 emails in 4 seconds. I didn't do it myself but I know any "MYSQl how to book" will tell you how to index a file.
carlshark
Posts: 3
Joined: Wed Aug 29, 2007 6:09 am
Contact:

Post by carlshark »

Same problem here.

Yes, but the databases have default indexes already. customer_1_tbl has customer_id as index. To what column did you change your index to?
gcarr
Posts: 1
Joined: Fri Nov 02, 2007 2:33 pm

yes.

Post by gcarr »

please share your solutions as it has been frustrating for a non-uber-techie - but I have gotten this far...
thanks for your help
carlshark
Posts: 3
Joined: Wed Aug 29, 2007 6:09 am
Contact:

Post by carlshark »

Aside from logging in through SSH, starting top, and renice-ing the mysqld process, I don't have any fast solution.

To optimize MySQL settings using mysqladmin, I had to look up the man page and play around with max settings.

Even when optimized, I still cannot reliably upload more than 6000+ email addresses (minimum of 5 fields only, CSV). It takes me about an hour to upload a list of this size, and you have to babysit it.
maschoff
Site Admin
Posts: 2608
Joined: Thu Aug 03, 2006 10:20 am
Location: Munich, Germany
Contact:

import performance

Post by maschoff »

Could you please post the OpenEMM and the MySQL version you are using?
OpenEMM Maintainer
austin
Posts: 2
Joined: Wed Nov 28, 2007 1:59 am

Post by austin »

I have seen this problem and the single issue is that none of the tables have more than one index on them (primary key) which causes everything to be very slow.

Rule of thumb: If you are looking up a field (column) in WHERE or ORDER BY that field needs to have an index. Yes you can end up with a lot of extra space used but not having any keys is absolutely stupid. Whoever came up with the schema for these tables needs to get a book on SQL and read the sections on Indexing and Optimizations.

I will hopefully post some sql for you guys in a few days with proper indexes created on them.

Example: customer_1_tbl has no keys other than primary.
Add indexes on the following: email, mailtype, datasourrce_id, and any custom defined fields. When you run an import and verify email address against all records it will take 1 day to complete if you have several thousand records.

Does any one have a list of most of the sql queries used in the import and sending sections? I am not good with Java so it will take me a lot longer to find the queries than for someone just to list them all out.

Whoever is in charge of the SQL please, please go read a manual then go back an optimize the table Schemas to add the proper indexes.

-Austin
austin
Posts: 2
Joined: Wed Nov 28, 2007 1:59 am

Post by austin »

On second glance it also appears that there is no indexing on the tmp tables created during the import process which slows things considerably as well.

-Austin
maschoff
Site Admin
Posts: 2608
Joined: Thu Aug 03, 2006 10:20 am
Location: Munich, Germany
Contact:

Post by maschoff »

Field email is indexed. If you plan to use a different field (like customer ID) for check of duplicates or synchronization you should put an index on this field.

If a MySQL expert can provide an improvement for our code we are happy to accept it since we are more familiar with Oracle.
OpenEMM Maintainer
yiyihui
Posts: 1
Joined: Thu Dec 06, 2007 3:43 am

Post by yiyihui »

I have same problem. I try to monitor mysql server and find the following sql query could cause this problem:
select count(*) from ( SELECT distinct cust.customer_id, cust.gender, cust.firstname, cust.lastname, cust.email FROM customer_1_tbl cust WHERE 1 ) as tmp_tbl

I don't know why using this query to count the total number of email. when I run this query, it needs more 4 minutes to get the result (I have 462464 emails), but if I use "select count(*) from customer_1_tbl", it will get same result and just spend less than 1 second.
duke
Posts: 8
Joined: Tue Nov 27, 2007 4:35 pm

Post by duke »

I've had the same problem. I tried to import ~50k user and it was awfully slow. After 1/2 an hour I stopped the process and looked deeper into the database. Here is my solution:

- connect to the mysql-server an change to the openemm-db
- add indexes to 'email', 'gender', 'firstname' and 'lastname' within the customer_1_tbl-table:

Code: Select all

mysql> ALTER TABLE customer_1_tbl ADD INDEX(email);
mysql> ALTER TABLE customer_1_tbl ADD INDEX(gender);
mysql> ALTER TABLE customer_1_tbl ADD INDEX(firstname);
mysql> ALTER TABLE customer_1_tbl ADD INDEX(lastname);
After that I was able to import the ~50k users within ~30 seconds.

I added some more indexes to customer_1_tbl and customer_1_binding_tbl which will hopefully speed up mail creation, too (well, I actually can't compare ... but indexes are usefull most of the time ;))
duke
Posts: 8
Joined: Tue Nov 27, 2007 4:35 pm

Post by duke »

Just for the record:
with growing addresses the import slows down again. I needed ~1 minute for the last 9k users of my ~340k list.

The recipient overview slows down, too. I think the problem is the creation of temp-tables, I don't know why they are used (this needs a lot of diskspace, too ... ~600MB in my case).
maschoff
Site Admin
Posts: 2608
Joined: Thu Aug 03, 2006 10:20 am
Location: Munich, Germany
Contact:

Post by maschoff »

From release 5.3.2 on field email is indexed by default. Thanks for your input!
OpenEMM Maintainer
Post Reply