Cài đặt Apache Hive & Apache Sqoop
Bạn click vào title để download!
- Chuẩn bị:
- Cấu hình Hadoop Cluster xem ở bài trước.
- apache-hive-2.1.0
- apache-sqoop-1.4.7
- mysql-connector-java-8
Apache Hive
Bước 1: Giải nén Hive.
tar -xvf apache-hive-2.1.0-bin
Bước 2: cấu hình file .bashrc
cd ~
vim .bashrc
------------------------
# set hive
export HIVE_HOME=$HOME/apache-hive-2.1.0-bin
export PATH=$PATH:$HOME/apache-hive-2.1.0-bin/bin
------------------------
source .bashrc
Bước 3: Kiểm tra hive version
hive --version
-------------------------
...
Hive 2.1.0
Subversion git://jcamachguezrMBP/Users/jcamachorodriguez/src/workspaces/hive/HIVE-release2/hive -r 9265bc24d75ac945bde9ce1a0999fddd8f2aae29
Compiled by jcamachorodriguez on Fri Jun 17 01:03:25 BST 2016
...
-------------------------
Bước 4: Khởi tạo thư mục lưu trữ dữ liệu của Hive trên HDFS
# tạo thư mục trên HDFS
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir /tmp
# set quyền read/write
hdfs dfs -chmod g+w /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
Bước 5: Cấu hình file hive-env.sh
cd $HOME/apache-hive-2.1.0-bin/conf
vim hive-env.sh
---------------------------
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/home/sherlock/hadoop-2.7.7
export HIVE_CONF_DIR=/home/sherlock/apache-hive-2.1.0-bin/conf
---------------------------
Bước 6: Cấu hình file hive-site.xml
cd $HOME/apache-hive-2.1.0-bin/conf
vim hive-site.xml
---------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=//home/sherlock/apache-hive-2.1.0-bin/metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>
---------------------------------
Bước 7: Hive sử dụng Derby database, nên khởi tạo derby
cd $HOME/apache-hive-2.1.0-bin
bin/schematool -initSchema -dbType derby
...
Initialization script completed
schemaTool completed
Bước 8: Run Hive
cd ~
hive
# Hoặc sử dụng
cd $HOME/apache-hive-2.1.0-bin/bin/
hive
...
hive>
hive> show databases;
OK
default
Time taken: 1.889 seconds, Fetched: 1 row(s)
hive>
hive> exit; # thoát khỏi hive
Apache Sqoop
Bước 1: Giải nén sqoop và mysql
tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -xvf mysql-connector-java-8.0.15.tar.gz
Bước 2: Cấu hình file .bashrc
cd ~
vim .bashrc
---------------------------
# set sqoop
export SQOOP_HOME=$HOME/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin
---------------------------
source .bashrc
Bước 3: Chuyển thư viện mysql-connector-java-8.0.15 vào sqoop-1.4.7/lib
cd ~
mv mysql-connector-java-8.0.15/mysql-connector-java-8.0.15.jar sqoop-1.4.7.bin__hadoop-2.6.0/lib
Bước 4 : Kiểm tra sqoop version
cd ~
sqoop version
...
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
Bài tiếp theo chúng ta sẽ làm một số ví dụ về hive và sqoop nhé!