Discussion:
This just seems to slow
(too old to reply)
Jerry Schwartz
2011-01-03 01:51:48 UTC
Permalink
I'm trying to load data into a simple table, and it is taking many hours (and
still not done). I know hardware, etc., can have a big effect, but NOTHING
should have this big an effect.

=====
us-gii >show create table t_dmu_history\G
*************************** 1. row ***************************
Table: t_dmu_history
Create Table: CREATE TABLE `t_dmu_history` (
`t_dmu_history_id` int(11) NOT NULL AUTO_INCREMENT,
`DM_History_DM_ID` int(11) DEFAULT NULL,
`DM_History_Customer_ID` int(11) DEFAULT NULL,
PRIMARY KEY (`t_dmu_history_id`),
KEY `DM_History_DM_ID` (`DM_History_DM_ID`),
KEY `DM_History_Customer_ID` (`DM_History_Customer_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1446317 DEFAULT CHARSET=utf8
=====

Here's a snip of what the input file looks like:
=====
SET autocommit=1;

#
# Dumping data for table 'T_DMU_History'
#

INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299519);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299520);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299521);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299522);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299524);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299526);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299527);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299528);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299529);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299531);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299532);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299533);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299534);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299535);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298880);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298881);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298882);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298883);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298884);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298885);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298886);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298887);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298889);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298890);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298891);
=====

There are about 870000 records.

I realize that using one INSERT per row is going to hurt, but I don't control
the format of the incoming data.

Besides, I'd have thought this would be pretty quick regardless of how clumsy
the method was.

Is that "autocommit" a problem? This is a bulk load into an empty table, so
I'm not worried about ACID.

Any suggestions?





Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: ***@gii.co.jp
Web site: www.the-infoshop.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-***@m.gmane.org
mos
2011-01-03 04:41:42 UTC
Permalink
Jerry,
Use "Load Data Infile" when loading a lot of data. Whoever is giving
you the data should be able to dump it to a CSV file. Your imports will be
much faster.

Mike
Post by Jerry Schwartz
I'm trying to load data into a simple table, and it is taking many hours (and
still not done). I know hardware, etc., can have a big effect, but NOTHING
should have this big an effect.
=====
us-gii >show create table t_dmu_history\G
*************************** 1. row ***************************
Table: t_dmu_history
Create Table: CREATE TABLE `t_dmu_history` (
`t_dmu_history_id` int(11) NOT NULL AUTO_INCREMENT,
`DM_History_DM_ID` int(11) DEFAULT NULL,
`DM_History_Customer_ID` int(11) DEFAULT NULL,
PRIMARY KEY (`t_dmu_history_id`),
KEY `DM_History_DM_ID` (`DM_History_DM_ID`),
KEY `DM_History_Customer_ID` (`DM_History_Customer_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1446317 DEFAULT CHARSET=utf8
=====
=====
SET autocommit=1;
#
# Dumping data for table 'T_DMU_History'
#
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299519);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299520);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299521);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299522);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299524);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299526);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299527);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299528);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299529);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299531);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299532);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299533);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299534);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13071, 299535);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298880);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298881);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298882);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298883);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298884);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298885);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298886);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298887);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298889);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298890);
INSERT INTO `T_DMU_History` (`DM_History_DM_ID`, `DM_History_Customer_ID`)
VALUES (13073, 298891);
=====
There are about 870000 records.
I realize that using one INSERT per row is going to hurt, but I don't control
the format of the incoming data.
Besides, I'd have thought this would be pretty quick regardless of how clumsy
the method was.
Is that "autocommit" a problem? This is a bulk load into an empty table, so
I'm not worried about ACID.
Any suggestions?
Regards,
Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
860.674.8796 / FAX: 860.674.8341
Web site: www.the-infoshop.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-***@m.gmane.org
Shawn Green (MySQL)
2011-01-03 15:51:52 UTC
Permalink
-----Original Message-----
Sent: Sunday, January 02, 2011 11:49 PM
...
Also delete your INDEX / KEYs and add them at the very end instead.
[JS] Wouldn't it take as long to build the indices? I guess it probably
wouldn't.
It will not. MySQL does not "grow" or "edit" its index files
incrementally, it computes a fresh on-disk index image for every change.

Right now, you are doing a complete index rebuild for every row you add.
If you add up the total work you are saving (index 121000 rows, index
121001 rows, index 121002 rows,...) then you can see a big improvement
by waiting to put the indexes on the table at the very end of the process.

http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

Yours,
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-***@m.gmane.org
Carsten Pedersen
2011-01-03 18:48:12 UTC
Permalink
It's been a long time sine I used mysqlimport, but you might want to try:

- using "--fields-terminated-by" rather than "--fields-terminated"
- losing (or escaping) the backticks in --columns=

- checking my.cnf to see if the client settings are the same for mysql>
and mysqlimport
- checking user privileges (are you using the same account in both
instances?)
- checking the line delimiter and --lines-terminated-by

FWIW, I always prefer tab-delimited files over comma-separated ones.
This gets around a lot of i18n issues.

/ Carsten
<sigh>
localhost>TRUNCATE t_dmu_history;
Query OK, 0 rows affected (0.41 sec)
localhost>LOAD DATA LOCAL INFILE 't_dmu_history.txt' INTO TABLE t_dmu_history
FIELDS TERMINATED BY "," (`dm_history_dm_id`,`dm_history_customer_id`);
Query OK, 876211 rows affected (25.16 sec)
Records: 876211 Deleted: 0 Skipped: 0 Warnings: 0
localhost>SELECT * FROM t_dmu_history LIMIT 4;
+------------------+------------------+------------------------+
| t_dmu_history_id | DM_History_DM_ID | DM_History_Customer_ID |
+------------------+------------------+------------------------+
| 1 | 13071 | 299519 |
| 2 | 13071 | 299520 |
| 3 | 13071 | 299521 |
| 4 | 13071 | 299522 |
+------------------+------------------+------------------------+
4 rows in set (0.03 sec)
============================
localhost>TRUNCATE t_dmu_history;
localhost>quit
C:\Users\Jerry\Documents\Access MySQL
Production>mysqlimport --columns=`dm_history_dm_id`,`dm_history_customer_id` --fields-terminated=','
--local --password=xxx --pipe --user=access --verbose maintable_usa
t_dmu_history.txt
Connecting to localhost
Selecting database maintable_usa
Loading data from LOCAL file: C:/Users/Jerry/Documents/Access MySQL
Production/t_dmu_history.txt into t_dmu_history
maintable_usa.t_dmu_history: Records: 876211 Deleted: 0 Skipped: 0
Warnings: 1752422
Disconnecting from localhost
localhost>SELECT * FROM t_dmu_history LIMIT 4;
+------------------+------------------+------------------------+
| t_dmu_history_id | DM_History_DM_ID | DM_History_Customer_ID |
+------------------+------------------+------------------------+
| 1 | 13071 | NULL |
| 2 | 13071 | NULL |
| 3 | 13071 | NULL |
| 4 | 13071 | NULL |
+------------------+------------------+------------------------+
4 rows in set (0.00 sec)
=========================
Before you ask, the mysql CLI is also using a named pipe.
Windows Vista 32-bit
MySQL version 5.1.31-community
Mysqlimport Ver 3.7 Distrib 5.1.31, for Win32 (ia32)
What am I missing?
Regards,
Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
860.674.8796 / FAX: 860.674.8341
Web site: www.the-infoshop.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-***@m.gmane.org
Jerry Schwartz
2011-01-03 20:35:47 UTC
Permalink
-----Original Message-----
Sent: Monday, January 03, 2011 1:48 PM
To: Jerry Schwartz
Subject: Re: mysqlimport doesn't work for me
- using "--fields-terminated-by" rather than "--fields-terminated"
[JS] Good catch! Unfortunately, it didn't fix the problem:

Connecting to localhost
Selecting database maintable_usa
Deleting the old data from table t_dmu_history
Loading data from LOCAL file: C:/Users/Jerry/Documents/Access_MySQL
Tests/t_dmu_history.txt into t_dmu_history
maintable_usa.t_dmu_history: Records: 1 Deleted: 0 Skipped: 0 Warnings: 2
Disconnecting from localhost

I don't know if there's any way to find out what those warnings are.
- losing (or escaping) the backticks in --columns=
[JS] The loons who designed this system were fond of putting spaces in the
table names. I've tried this particular table with and without the back-ticks.

Many of the table and field names are in Japanese, too. I shudder to think how
that will work out.
- checking my.cnf to see if the client settings are the same for mysql>
and mysqlimport
[JS] Good thought.
- checking user privileges (are you using the same account in both
instances?)
[JS] Yes.
- checking the line delimiter and --lines-terminated-by
FWIW, I always prefer tab-delimited files over comma-separated ones.
This gets around a lot of i18n issues.
[JS] No doubt.
/ Carsten
<sigh>
localhost>TRUNCATE t_dmu_history;
Query OK, 0 rows affected (0.41 sec)
localhost>LOAD DATA LOCAL INFILE 't_dmu_history.txt' INTO TABLE t_dmu_history
FIELDS TERMINATED BY "," (`dm_history_dm_id`,`dm_history_customer_id`);
Query OK, 876211 rows affected (25.16 sec)
Records: 876211 Deleted: 0 Skipped: 0 Warnings: 0
localhost>SELECT * FROM t_dmu_history LIMIT 4;
+------------------+------------------+------------------------+
| t_dmu_history_id | DM_History_DM_ID | DM_History_Customer_ID |
+------------------+------------------+------------------------+
| 1 | 13071 | 299519 |
| 2 | 13071 | 299520 |
| 3 | 13071 | 299521 |
| 4 | 13071 | 299522 |
+------------------+------------------+------------------------+
4 rows in set (0.03 sec)
============================
localhost>TRUNCATE t_dmu_history;
localhost>quit
C:\Users\Jerry\Documents\Access MySQL
Production>mysqlimport --columns=`dm_history_dm_id`,`dm_history_customer_id`
--fields-terminated=','
--local --password=xxx --pipe --user=access --verbose maintable_usa
t_dmu_history.txt
Connecting to localhost
Selecting database maintable_usa
Loading data from LOCAL file: C:/Users/Jerry/Documents/Access MySQL
Production/t_dmu_history.txt into t_dmu_history
maintable_usa.t_dmu_history: Records: 876211 Deleted: 0 Skipped: 0
Warnings: 1752422
Disconnecting from localhost
localhost>SELECT * FROM t_dmu_history LIMIT 4;
+------------------+------------------+------------------------+
| t_dmu_history_id | DM_History_DM_ID | DM_History_Customer_ID |
+------------------+------------------+------------------------+
| 1 | 13071 | NULL |
| 2 | 13071 | NULL |
| 3 | 13071 | NULL |
| 4 | 13071 | NULL |
+------------------+------------------+------------------------+
4 rows in set (0.00 sec)
=========================
Before you ask, the mysql CLI is also using a named pipe.
Windows Vista 32-bit
MySQL version 5.1.31-community
Mysqlimport Ver 3.7 Distrib 5.1.31, for Win32 (ia32)
What am I missing?
Regards,
Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
860.674.8796 / FAX: 860.674.8341
Web site: www.the-infoshop.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-***@m.gmane.org
Jerry Schwartz
2011-01-03 18:33:11 UTC
Permalink
<sigh>

This works:

localhost >TRUNCATE t_dmu_history;
Query OK, 0 rows affected (0.41 sec)

localhost >LOAD DATA LOCAL INFILE 't_dmu_history.txt' INTO TABLE t_dmu_history
FIELDS TERMINATED BY "," (`dm_history_dm_id`,`dm_history_customer_id`);

Query OK, 876211 rows affected (25.16 sec)
Records: 876211 Deleted: 0 Skipped: 0 Warnings: 0

localhost >SELECT * FROM t_dmu_history LIMIT 4;
+------------------+------------------+------------------------+
| t_dmu_history_id | DM_History_DM_ID | DM_History_Customer_ID |
+------------------+------------------+------------------------+
| 1 | 13071 | 299519 |
| 2 | 13071 | 299520 |
| 3 | 13071 | 299521 |
| 4 | 13071 | 299522 |
+------------------+------------------+------------------------+
4 rows in set (0.03 sec)
============================
This does not work:

localhost >TRUNCATE t_dmu_history;
localhost >quit

C:\Users\Jerry\Documents\Access MySQL
Production>mysqlimport --columns=`dm_history_dm_id`,`dm_history_customer_id` --fields-terminated=','
--local --password=xxx --pipe --user=access --verbose maintable_usa
t_dmu_history.txt
Connecting to localhost
Selecting database maintable_usa
Loading data from LOCAL file: C:/Users/Jerry/Documents/Access MySQL
Production/t_dmu_history.txt into t_dmu_history
maintable_usa.t_dmu_history: Records: 876211 Deleted: 0 Skipped: 0
Warnings: 1752422
Disconnecting from localhost

localhost >SELECT * FROM t_dmu_history LIMIT 4;
+------------------+------------------+------------------------+
| t_dmu_history_id | DM_History_DM_ID | DM_History_Customer_ID |
+------------------+------------------+------------------------+
| 1 | 13071 | NULL |
| 2 | 13071 | NULL |
| 3 | 13071 | NULL |
| 4 | 13071 | NULL |
+------------------+------------------+------------------------+
4 rows in set (0.00 sec)
=========================

Before you ask, the mysql CLI is also using a named pipe.

Windows Vista 32-bit
MySQL version 5.1.31-community
Mysqlimport Ver 3.7 Distrib 5.1.31, for Win32 (ia32)

What am I missing?

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: ***@gii.co.jp
Web site: www.the-infoshop.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-***@m.gmane.org
Loading...