Monday, 26 October 2015

HCatalog Basics


  • HCatalog is an extension of Hive, that exposes the Hive metadata to other tools and frameworks.
  • To define a HCatalog schema, one simply needs to define a table in Hive.
  • The usefulness of HCatalog is, when one needs to expose the schema outside of Hive i.e to other frameworks - ex : Pig
  • To load a table student, managed by HCatalog:
    • stu_table= LOAD 'student' USING org.apache.hcatalog.pig.HCatLoader();
      • the schema of stu_table is whatever the schema of student is.
  • Similarly, to store we use :
    • STORE stu_table INTO 'student' USING org.apache.hcatalog.pig.HCatStorer();


  • Using PIG shell, we can run Hive DDL command.
  • grunt> sql create table movies (
  •    title string,
  •    rating string,
  •    length double)
  • partitioned by (genre string)
  • stored as ORC;

No comments:

Post a Comment