Configuring Hook Connections for Hive High Availability
Describes how to configue the EZHiveServer2Hook, the EzHiveCLIHook, and the EzHiveMetastoreHook to connect to Hive with High Availability (HA) enabled.
EEP 9.2.1 and later include hook connection support in Airflow to connect to HiveServer2 with High Availability (HA) enabled. When any of the hooks are configured, if one of the HS2 servers is unreachable, Airflow connects to another server in the list of hosts that you specify.
Configuring the EzHiveServer2Hook for Hive HA
The EZHiveServer2Hook supports a
pyhive
connection to HiveServer 2 HA. To
configure the pyhive
connection with HiveServer 2 HA:- Add the
hive_ha
property to theextra
section of the connection configuration. For example:{ "authMechanism": "MAPRSASL", "ssl": "true", "hive_ha": "true" }
- Add the list of your active HS2 instances in the
host
section using this format:
For example:<hs2_hostname1>:<port1>,<hs2_hostname2>:<port2>,<hs2_hostname3>:<port3>…
myhost-48-n2.storage.mycorp.net:10000,myhost-23-n2.storage.mycorp.net:10000
In the following example, one of the HS2 servers is unusable, so Airflow reconnects to
another
server:
{ezhive.py:196} INFO - Trying to connect to myhost-23-n2.storage.mycorp.net:10000
{TSocket.py:142} INFO - Could not connect to ('<ip_address>', 10000)
Traceback (most recent call last):
File "/opt/mapr/airflow/airflow-2.7.3/build/env/lib/python3.9/site-packages/thrift/transport/TSocket.py", line 137, in open
handle.connect(sockaddr)
File "/opt/mapr/airflow/airflow-2.7.3/build/python/lib/python3.9/ssl.py", line 1343, in connect
self._real_connect(addr, False)
File "/opt/mapr/airflow/airflow-2.7.3/build/python/lib/python3.9/ssl.py", line 1330, in _real_connect
super().connect(addr)
ConnectionRefusedError: [Errno 111] Connection refused
{TSocket.py:145} ERROR - Could not connect to any of [('<ip_address>', 10000)]
[2023-12-15, 09:06:32 UTC] {ezhive.py:210} WARNING - Failed to connect to myhost-23-n2.storage.mycorp.net:10000
{ezhive.py:196} INFO - Trying to connect to myhost-48-n2.storage.mycorp.net:10000
{hive.py:475} INFO - USE 'default'
Configuring the EzHiveCliHook for Hive HA
The EZHiveCliHook supports a
beeline
connection to HiveServer 2 HA. To
configure the beeline
connection with HiveServer 2 HA:- Add the following properties to the
extra
section of the connection configuration:{ "use_beeline": true, "ssl": "true", "hive_ha": "true", "serviceDiscoveryMode": "zooKeeper", "zooKeeperNamespace": "hiveserver2" }
- Add the list of your active ZooKeeper instances in the
host
section using this format:<ZK_FQDN1>:5181,<ZK_FQDN2>:5181,<ZK_FQDN3>:5181
Configuring the EzHiveMetastoreHook for Hive HA
The EZHiveMetastoreHook supports an
hmsclient
connection to HiveServer 2
HA. To configure the hmsclient
connection with HiveServer 2 HA:- Configure Hive Metastore HA as described in Enabling High Availability for Hive Metastore.
- Add the following properties to the
extra
section of the connection configuration:{ "authMechanism": "MAPRSASL"}
- In the
host
section, specify the list of active Hive metastore hosts using the following format:<hive_metastore1>,<hive_metastore2>,<hive_metastore3>
[2023-12-15, 12:57:54 UTC] {base.py:73} INFO - Using connection ID 'metastore_default' for task execution.
[2023-12-15, 12:57:54 UTC] {hive.py:576} INFO - Trying to connect to myhost-23-n2.storage.mycorp.net:9083
[2023-12-15, 12:57:54 UTC] {hive.py:582} ERROR - Could not connect to myhost-23-n2.storage.mycorp.net:9083
[2023-12-15, 12:57:54 UTC] {hive.py:576} INFO - Trying to connect to myhost-48-n2.storage.mycorp.net:9083
[2023-12-15, 12:57:54 UTC] {hive.py:578} INFO - Connected to myhost-48-n2.storage.mycorp.net:9083