Hivesterix

What is Hivesterix?

Hivesterix is an open-source structured-data warehouse software built on top of the UC Irvine Algebricks framework and Hyracks data-parallel engine, which scales to very large clusters and very large datasets. Users can write HiveQL (a SQL-like query language) to express their data analytics jobs. Internally, we perform a lot of compiler and runtime optimizations so that Hivesterix can be much faster than Hive-on-Hadoop. Hivesterix benefits from Algebricks optimizer which takes of parallel query optimizations and Hyracks operators which have been designed to make efficient use of available main memory to produce results quickly. At the same time, the operators gracefully spill data to disk when running in memory-constrained situations. This property allows Hivesterix to perform computations on very large amounts of data regardless of the size of the cluster being used.


Quick example:

-- the query q9_product_type_profit
insert overwrite table q9_product_type_profit
select nation, o_year, sum(amount) as sum_profit
from 
  (
  select n_name as nation, year(o_orderdate) as o_year, l_extendedprice * (1 - l_discount) -  ps_supplycost * l_quantity as amount
  from
      (select l_extendedprice, l_discount, l_quantity, l_orderkey, n_name, ps_supplycost 
       from part p join
         (select l_extendedprice, l_discount, l_quantity, l_partkey, l_orderkey, n_name, ps_supplycost 
          from partsupp ps join
            (select l_suppkey, l_extendedprice, l_discount, l_quantity, l_partkey, l_orderkey, n_name 
             from
               (select s_suppkey, n_name from nation n join supplier s on n.n_nationkey = s.s_nationkey
               ) s1 join lineitem l on s1.s_suppkey = l.l_suppkey
            ) l1 on ps.ps_suppkey = l1.l_suppkey and ps.ps_partkey = l1.l_partkey
         ) l2 on p.p_name like '%green%' and p.p_partkey = l2.l_partkey
     ) l3 join orders o on o.o_orderkey = l3.l_orderkey
  )profit
group by nation, o_year
order by nation, o_year desc;

The above HiveQL is the query q9 in the TPC-H benchmark.

Performance:

Releases:

Contact

Hivesterix Talks and Tutorials:

Sponsors: