Configuring Hook Connections for Hive High Availability
Describes how to configue the EZHiveServer2Hook, the EzHiveCLIHook, and the EzHiveMetastoreHook to connect to Hive with High Availability (HA) enabled.
EEP 9.2.1 and later include hook connection support in Airflow to connect to HiveServer2 with High Availability (HA) enabled. When any of the hooks are configured, if one of the HS2 servers is unreachable, Airflow connects to another server in the list of hosts that you specify.
Configuring the EzHiveServer2Hook for Hive HA
The EZHiveServer2Hook supports a
pyhive connection to HiveServer 2 HA. To
configure the pyhive connection with HiveServer 2 HA:- Add the
hive_haproperty to theextrasection of the connection configuration. For example:{ "authMechanism": "MAPRSASL", "ssl": "true", "hive_ha": "true" } - Add the list of your active HS2 instances in the
hostsection using this format:
For example:<hs2_hostname1>:<port1>,<hs2_hostname2>:<port2>,<hs2_hostname3>:<port3>…myhost-48-n2.storage.mycorp.net:10000,myhost-23-n2.storage.mycorp.net:10000
In the following example, one of the HS2 servers is unusable, so Airflow reconnects to
another
server:
{ezhive.py:196} INFO - Trying to connect to myhost-23-n2.storage.mycorp.net:10000
{TSocket.py:142} INFO - Could not connect to ('<ip_address>', 10000)
Traceback (most recent call last):
File "/opt/mapr/airflow/airflow-2.7.3/build/env/lib/python3.9/site-packages/thrift/transport/TSocket.py", line 137, in open
handle.connect(sockaddr)
File "/opt/mapr/airflow/airflow-2.7.3/build/python/lib/python3.9/ssl.py", line 1343, in connect
self._real_connect(addr, False)
File "/opt/mapr/airflow/airflow-2.7.3/build/python/lib/python3.9/ssl.py", line 1330, in _real_connect
super().connect(addr)
ConnectionRefusedError: [Errno 111] Connection refused
{TSocket.py:145} ERROR - Could not connect to any of [('<ip_address>', 10000)]
[2023-12-15, 09:06:32 UTC] {ezhive.py:210} WARNING - Failed to connect to myhost-23-n2.storage.mycorp.net:10000
{ezhive.py:196} INFO - Trying to connect to myhost-48-n2.storage.mycorp.net:10000
{hive.py:475} INFO - USE 'default'Configuring the EzHiveCliHook for Hive HA
The EZHiveCliHook supports a
beeline connection to HiveServer 2 HA. To
configure the beeline connection with HiveServer 2 HA:- Add the following properties to the
extrasection of the connection configuration:{ "use_beeline": true, "ssl": "true", "hive_ha": "true", "serviceDiscoveryMode": "zooKeeper", "zooKeeperNamespace": "hiveserver2" } - Add the list of your active ZooKeeper instances in the
hostsection using this format:<ZK_FQDN1>:5181,<ZK_FQDN2>:5181,<ZK_FQDN3>:5181
Configuring the EzHiveMetastoreHook for Hive HA
The EZHiveMetastoreHook supports an
hmsclient connection to HiveServer 2
HA. To configure the hmsclient connection with HiveServer 2 HA:- Configure Hive Metastore HA as described in Enabling High Availability for Hive Metastore.
- Add the following properties to the
extrasection of the connection configuration:{ "authMechanism": "MAPRSASL"} - In the
hostsection, specify the list of active Hive metastore hosts using the following format:<hive_metastore1>,<hive_metastore2>,<hive_metastore3>
[2023-12-15, 12:57:54 UTC] {base.py:73} INFO - Using connection ID 'metastore_default' for task execution.
[2023-12-15, 12:57:54 UTC] {hive.py:576} INFO - Trying to connect to myhost-23-n2.storage.mycorp.net:9083
[2023-12-15, 12:57:54 UTC] {hive.py:582} ERROR - Could not connect to myhost-23-n2.storage.mycorp.net:9083
[2023-12-15, 12:57:54 UTC] {hive.py:576} INFO - Trying to connect to myhost-48-n2.storage.mycorp.net:9083
[2023-12-15, 12:57:54 UTC] {hive.py:578} INFO - Connected to myhost-48-n2.storage.mycorp.net:9083