Category Archives: mySQL

Understanding Connectors and Drivers in the World of Sqoop

Sqoop is a tool for efficient and large loads/extracts between RDMS and Hadoop.

This ecosystem has enough made up words that it’s important to get the commonplace industry standard words correct — “JDBC Driver” and “JDBC Connector”.

  • Driver is a JDBC driver.
  • Connector could be generic or vendor specific
    • Sqoop’s Generic JDBC connector is always available as part of the standard distribution.
    • Also includes connectors for MySQL, PostgreSQL, Oracle, MS SQL, IBM DB2, and Neteza. However, the DB vendors (or someone else) might have customized/optimized connectors.
    • If the programmer doesn’t select a connector, or if the data source is not known until runtime, Sqoop can try to figure out what the appropriate connector is. Sometimes this is easy, such as if the url to access the data is something like jdbc::myslq//…


Article on IBM DeveloperWorks

Open Source Big Data for the Impatient, Part 1: Hadoop tutorial: Hello World with Java, Pig, Hive, Flume, Fuse, Oozie, and Sqoop with Informix, DB2, and MySQL