Tuesday, August 2, 2016

It's All About Apache Sqoop ... Continued...



It’s all about Apache Sqoop… contd...

In the previous post, we have seen Sqoop Imports, Sqoop Exports and Sqoop Jobs. We will cover the remaining topics in this post.

Sqoop-Codegen – Purpose:

The codegen tool generates Java classes which encapsulate and interpret imported records. The Java definition of a record is instantiated as part of the import process, but can also be performed separately. For example, if Java source is lost, it can be recreated. New versions of a class can be created which use different delimiters between fields, and so on.
In a short, this tool helps us to generate the Data Access Object related to a RDBMS table along with getter and setter methods in it. All of the database table columns will be derived in that. The output of the codegen will be a jar file which contains the java file and compiled .class file.

Syntax will be

$ sqoop codegen (generic-args) (codegen-args)
$ sqoop-codegen (generic-args) (codegen-args)

The following is some of the Code Generation Arguments:

--bindir <dir>
Output directory for compiled objects
--class-name <name>
Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file <file>
Disable code generation; use specified jar
--outdir <dir>
Output directory for generated code
--package-name <name>
Put auto-generated classes in this package


Invocation: The following is the syntax for the invocation of codegen.

$ sqoop codegen --connect jdbc:mysql://<db_host>/<db_name>
--username <<user> --password <password>
--table <table_name>

Sqoop-Create-Hive_Tables – Purpose:

The create-hive-table tool populates a Hive metastore with a definition for a table based on a database table previously imported to HDFS, or one planned to be imported. This effectively performs the "--hive-import" step of sqoop-import without running the preceding import.

If data was already loaded to HDFS, you can use this tool to finish the pipeline of importing the data to Hive. You can also create Hive tables with this tool; data then can be imported and populated into the target after a preprocessing step run by the user.

Syntax will be

$ sqoop create-hive-table (generic-args) (create-hive-table-args)
$ sqoop-create-hive-table (generic-args) (create-hive-table-args)

Hive Aruguments: The following are some of the Hive arguments used to import

--hive-home <dir>
Override $HIVE_HOME
--hive-overwrite
Overwrite existing data in the Hive table.
--create-hive-table
If set, then the job will fail if the target hive table exits. By default, this property is false.
--hive-table <table-name>
Sets the table name to use when importing to Hive.
--table
The database table to read the definition from.

The following is the example snippet to import hive tables.

$ sqoop create-hive-table --connect jdbc:mysql://<db_host>/<db_name>
--username <<user> --password <password>
--table <table_name>
--hive-table <hive_table_name>

Sqoop-Eval – Purpose:

The eval tool allows users to quickly run simple SQL queries against a database; results are printed to the console. This allows users to preview their import queries to ensure they import the data they expect.

The eval tool is provided for evaluation purpose only. You can use it to verify database connection from within the Sqoop or to test simple queries. It’s not supposed to be used in production workflows.

Syntax will be

$ sqoop eval (generic-args) (create-hive-table-args)
$ sqoop-eval (generic-args) (create-hive-table-args)

The SQL Evaluation arguments will be

-e,--query <statement>         Execute statement in SQL.

Invocations Examples of Eval would be,

$ sqoop eval --connect jjdbc:mysql://<db_host>/<db_name>
--username <<user> --password <password>
--query "<<SELECT QUERY>>"

$ sqoop eval --connect jjdbc:mysql://<db_host>/<db_name>
--username <<user> --password <password>
-e "<<INSERT QUERY>>"

Sqoop-List-Databases – Purpose:

List database schemas present on a server.

Syntax will be

$ sqoop list-databases (generic-args) (list-databases-args)
$ sqoop-list-databases (generic-args) (list-databases-args)

Invocation example:

$ sqoop list-databases --connect jdbc:mysql://<db_host>/<db_name>
--username <<user> --password <password>

Sqoop-List-Tables – Purpose:

List all tables present in a database

Syntax will be

$ sqoop list-tables (generic-args) (list-tables-args)
$ sqoop-list-tables (generic-args) (list-tables-args)

Invocation example:

$ sqoop list-tables --connect jdbc:mysql://<db_host>/<db_name>
--username <<user> --password <password>
--schema <schema_name>

Other Tools:


$ sqoop help
List all available tools in Sqoop
$ sqoop help import
Displaying usage instructions for the Sqoop Import tool
$ sqoop version
Displaying version information for Sqoop



The above are only syntax and usages; we will provide details on Sqoop concepts with examples in next posts.


We are data integration experts at Pacific Big Dataproviding solutions to fit your needs.

 Contact us at: