Tuesday, December 23, 2014

Complex Data Types in HIVE

There are three complex types in hive.

Arrays: It is an ordered collection of elements.The elements in the array must be of the same type.

Map: It is an unordered collection of key-value pairs.Keys must be of primitive types.Values can be of any type.

Struct: It is a collection of elements of different types.


Examples:

Struct:

If the data pattern is like 100,John$Martin$Doe (customerID,Customer First middle and lastname seperated by '$')

create table cust_struct(
custid int,
name struct<fname:string,mname:string,lname:string>
)
row format delimited
fields terminated by ','
collection items terminated by '$';

Loading Data

!hdfs dfs -copyFromLocal cust-struct.dat /user/hive/warehouse/cust_struct;

Extracting data
select name.fname from cust_struct;


Map:

Data pattern:

100, John$Martin$Doe, home#01234$office#00000$mobile#9999

(last field is a map but map is also a collection terminated by '$' to identify the various key-value pairs )

create table cust_struct_map (
custid int,
name struct<fname:string,mname:string,lname:string>,
phone_nos map<string,int>
)
row format delimited
fields terminated by ','
collection items terminated by '$'
map keys terminated by '#';

Loading Data

!hdfs dfs -copyFromLocal cust-struct-map.dat /user/hive/warehouse/cust_struct_map;

Extracting data:

select name.fname,phone_nos["home"] from cust_struct_map;


Arrays:

Data pattern:

100,John$Martin$Doe,home#01234$office#00000$mobile#9999,abc@yahoo.com$doe@ooo.com

create table cust_struct_map_array(
custid int,
name struct<fname:string,mname:string,lname:string>,
phone_nos map<string,int>,
emails array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '$'
map keys terminated by '#';

Loading Data

!hdfs dfs -copyFromLocal cust-struct-map-array.dat /user/hive/warehouse/cust_struct_map_array;

Extracting Data

select custid,emails from cust_struct_map_array;

No comments:

Post a Comment