- Used to import data from traditional RDBMS to HDFS/Hive/HBase etc and vice-versa
- Best approach for filtering :
- Run Query in RDBMS -> Create a temp table there -> Import this temp table using Sqoop.
- Password passed in Sqoop Query?
- Use -P : Prompts user to enter password.
- Save password in a file -> in query mention: -- password-file
- Default, outputs CSV file in HDFS after import:
- Avro support : -- as-avrodatafile
- SequenceFile : -- as-sequencefile
- Compression Support :
- --compress --compression-codec..............
- Splittable : Bzip2, LZO
- Not Splittable: GZip,Snappy
- For faster transfer:
- --direct : Supported for MySql, PostGreSql
- -- map-column-java col1=String,col2=Float (Change Col type while importing from RDBMS)
- CSV output file, does not handle BLANK values well, so.
- If colType = VARCHAR,CHAR,NCHAR,TEXT
- --null-string '\\N'
- If any other colType
- --null-non-string '\\N'
- Import all tables from a DB?
- sqoop import-all-tables
- Tables imported in sequential order
- option of --exclude-tables
- Cannot use --target-dir instead --warehouse-dir is fine.
- Incremental in Sqoop:
- When getting new rows and existing is not change:
- Use --incemental append --check-column id --last-value 0
- When data is changed:
- Use --incremental lastmodified --check-column --last-value
- Create Sqoop job for automatic pickup of last-value:
- sqoop job --create name_of_job --import --connect...............
- sqoop job --list
- sqoop job --exec name_of_job
- Sqoop will searialise last imported value back to metastore after each successful incremental job
- Use boundary query for optimization:
- --username sqoop --password sqoop --query 'SELECT normcities.id, countries.country, normcities.city FROM normcities JOIN countries USING(country_id) WHERE $CONDITIONS' --split-by id --target-dir cities --boundary-query "select min(id), max(id) from normcities"
Thursday, 17 December 2015
Working with Sqoop
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment