What is Hivesterix?
Hivesterix is an open-source structured-data warehouse software built on top of the UC Irvine Algebricks framework and Hyracks data-parallel engine, which scales to very large clusters and very large datasets. Users can write HiveQL (a SQL-like query language) to express their data analytics jobs. Internally, we perform a lot of compiler and runtime optimizations so that Hivesterix can be much faster than Hive-on-Hadoop. Hivesterix benefits from Algebricks optimizer which takes of parallel query optimizations and Hyracks operators which have been designed to make efficient use of available main memory to produce results quickly. At the same time, the operators gracefully spill data to disk when running in memory-constrained situations. This property allows Hivesterix to perform computations on very large amounts of data regardless of the size of the cluster being used.
-- the query q9_product_type_profit insert overwrite table q9_product_type_profit select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, year(o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from (select l_extendedprice, l_discount, l_quantity, l_orderkey, n_name, ps_supplycost from part p join (select l_extendedprice, l_discount, l_quantity, l_partkey, l_orderkey, n_name, ps_supplycost from partsupp ps join (select l_suppkey, l_extendedprice, l_discount, l_quantity, l_partkey, l_orderkey, n_name from (select s_suppkey, n_name from nation n join supplier s on n.n_nationkey = s.s_nationkey ) s1 join lineitem l on s1.s_suppkey = l.l_suppkey ) l1 on ps.ps_suppkey = l1.l_suppkey and ps.ps_partkey = l1.l_partkey ) l2 on p.p_name like '%green%' and p.p_partkey = l2.l_partkey ) l3 join orders o on o.o_orderkey = l3.l_orderkey )profit group by nation, o_year order by nation, o_year desc;
The above HiveQL is the query q9 in the TPC-H benchmark.