Getting Started with Iceberg
Summarizes what you need to know to begin using Iceberg with HPE Ezmeral Data Fabric release 7.6.x.
Version Support
HPE Ezmeral Data Fabric
7.6.x
has been tested with:
Other data-processing engines, such as open-source Spark, PrestoDB, Flink, and data-processing technologies, such as Snowflake, have not been tested.
Catalog Support
Catalogs manage the metadata for datasets and tables in Iceberg. You must specify the
catalog when interacting with Iceberg tables through Spark. The following built-in
catalogs have been tested for use with Data Fabric
7.6.x:
- HiveCatalog
- HadoopCatalog
Spark Setup for Iceberg
Setting up Spark to use Iceberg is a two-step process:
- Add the
org.apache.iceberg:iceberg-spark-runtime-<spark.version>_<scala.version>:<iceberg.version>
jar file to your application classpath. Add the runtime to thejars
folder in yourspark
directory. Add it directly to the application classpath by using the--package
or--jars
option. - Configure a catalog. For information about using catalogs with Iceberg, see Catalogs.
For examples, see the Spark and Iceberg Quickstart.
Configuring Your Spark Application
Consider adding the following parameters to your Spark
application:
spark.sql.catalog.<catalog_name>.type=hive
spark.sql.catalog.<catalog_name>.warehouse=<path_to_your_warehouse>
spark.sql.catalog.<catalog_name>=org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.legacy.pathOptionBehavior.enabled=true