i have table of ~300gb of data in mysql. want add new column in table. when alter table using alter command, never completes , process dies. planning on writing sqoop job data table , dump hdfs , create new db in mysql , create table additional column , re-import data hdfs again.
table structure: create table `nodes` ( `id` bigint(20) not null auto_increment, `type` int(11) not null, `name` varchar(50) not null, `postcode` varchar(4) default null, `updated` datetime default null, `db_updated` datetime default current_timestamp, `useragent` bigint(20) default null, `last_seen` date default null, --newly added column primary key (`id`), unique key `akaid_index` (`type`,`name`), key `useragent_idx` (`useragent`), key `type` (`type`), constraint `useragentfk` foreign key (`useragent`) references `useragents` (`id`) on delete no action on update no action ) engine=innodb auto_increment=1091725696 default charset=latin1;
sqoop command:
sqoop export --connect jdbc:mysql://localhost:3306/graph1 --table nodes --username root --password password --export-dir <dir-path> --input-fields-terminated-by ','
i'm getting below error:
error: java.io.ioexception: can't export data, please check failed map task logs @ org.apache.sqoop.mapreduce.textexportmapper.map(textexportmapper.java:112) @ org.apache.sqoop.mapreduce.textexportmapper.map(textexportmapper.java:39) @ org.apache.hadoop.mapreduce.mapper.run(mapper.java:146) @ org.apache.sqoop.mapreduce.autoprogressmapper.run(autoprogressmapper.java:64) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:787) @ org.apache.hadoop.mapred.maptask.run(maptask.java:341) @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:164) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1657) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:158) caused by: java.lang.runtimeexception: can't parse input data: 'null' @ nodes.__loadfromfields(nodes.java:470) @ nodes.parse(nodes.java:388) @ org.apache.sqoop.mapreduce.textexportmapper.map(textexportmapper.java:83) ... 10 more caused by: java.lang.numberformatexception: input string: "null" @ java.lang.numberformatexception.forinputstring(numberformatexception.java:65) @ java.lang.long.parselong(long.java:441) @ java.lang.long.valueof(long.java:540) @ nodes.__loadfromfields(nodes.java:467) ... 12 more
file in hdfs contains following records:
1289603991,1,fee1cee723bdb0bc499c443765b40e3d,,2016-04-13 10:19:59,2016-04-14 03:44:55,5296252 1289603992,1,edf65c2e7b89388fe9068cc3a898a3fd,,2016-04-13 10:20:00,2016-04-14 03:44:55,5411481 1289603993,1,5760fd1cca92a65ce6f2db43853fc118,,2016-04-13 10:19:59,2016-04-14 03:44:55,4441745 1289603994,1,65dd92c80df5581f55bc60f3e997ec05,,2016-04-13 10:19:59,2016-04-14 03:44:55,5332084 1289603995,1,7654a84428f3064828f5972cfce5f8e6,,2016-04-13 10:20:00,2016-04-14 03:44:55,5202243 1289603996,1,84c270212fe5f3a52cb2bd75403da058,,2016-04-13 10:20:00,2016-04-14 03:44:55,5398729 1289603997,1,a486382c4fc296a5e8d3c0491568c22c,,2016-04-13 10:19:57,2016-04-14 03:44:55,5289170 1289603998,112,2_3nns7yxpms_xv3imjbiw04bqf1snc2tjrtfj5tcx98,,2016-04-13 10:20:00,2016-04-14 03:44:55,null 1289603999,1,a3607df77e025b12c1728f62589857fa,,2016-04-13 10:19:59,2016-04-14 03:44:55,12 1289604000,113,570e1d4e6372cd9c,,2016-04-13 10:19:59,2016-04-14 03:44:55,null 1289604001,113,57023dd016258fbf,,2016-04-13 10:20:00,2016-04-14 03:44:55,null
after adding --input-null-string , --input-null-non-string sqoop job works fails following reason:
2016-06-07 18:11:37,750 error [thread-9] org.apache.sqoop.mapreduce.asyncsqloutputformat: got exception in update thread: com.mysql.jdbc.exceptions.jdbc4.mysqlintegrityconstraintviolationexception: duplicate entry '1289603991' key 'primary' @ sun.reflect.nativeconstructoraccessorimpl.newinstance0(native method) @ sun.reflect.nativeconstructoraccessorimpl.newinstance(nativeconstructoraccessorimpl.java:57) @ sun.reflect.delegatingconstructoraccessorimpl.newinstance(delegatingconstructoraccessorimpl.java:45) @ java.lang.reflect.constructor.newinstance(constructor.java:526) @ com.mysql.jdbc.util.handlenewinstance(util.java:404) @ com.mysql.jdbc.util.getinstance(util.java:387) @ com.mysql.jdbc.sqlerror.createsqlexception(sqlerror.java:934) @ com.mysql.jdbc.mysqlio.checkerrorpacket(mysqlio.java:3966) @ com.mysql.jdbc.mysqlio.checkerrorpacket(mysqlio.java:3902) @ com.mysql.jdbc.mysqlio.sendcommand(mysqlio.java:2526) @ com.mysql.jdbc.mysqlio.sqlquerydirect(mysqlio.java:2673) @ com.mysql.jdbc.connectionimpl.execsql(connectionimpl.java:2549) @ com.mysql.jdbc.preparedstatement.executeinternal(preparedstatement.java:1861) @ com.mysql.jdbc.preparedstatement.execute(preparedstatement.java:1192) @ org.apache.sqoop.mapreduce.asyncsqloutputformat$asyncsqlexecthread.run(asyncsqloutputformat.java:233)
since table has auto_increment column getting above error. there other way alter table , make fast? per thread(changing large mysql innodb tables) seems large innodb tables takes more time days.
if can suggest alternate suggestion good.
your data violate primary key constraint:
if table has constraints (e.g., primary key column values must unique) , contains data, must take care avoid inserting records violate these constraints.
you need truncate table before executing exporting command, or update existing data using argument :
--update-key id