Hive resembles a
traditional database by supporting SQL interface but it is not a full database.
Hive can be better called
as data warehouse instead of database.
Hive enforces schema on read
time whereas RDBMS enforces
schema on write time.
In RDBMS, a table’s schema
is enforced at data load time, If the data being loaded doesn’t conform to the
schema, then it is rejected. This design is called schema on write.But Hive
doesn’t verify the data when it is loaded, but rather when a it is retrieved.
This is called schema on read.
Schema on read makes for a
very fast initial load, since the data does not have to be read, parsed, and
serialized to disk in the database’s internal format. The load operation is
just a file copy or move.
Schema on write makes
query time performance faster, since the database can index columns and perform
compression on the data but it takes longer to load data into the database.
Hive is based on the
notion of Write once, Read many times but RDBMS is designed for Read and Write
many times.
In RDBMS, record level
updates, insertions and deletes, transactions and indexes are possible. Whereas
these are not allowed in Hive because Hive was built to operate over HDFS data using MapReduce,
where fulltable scans are the norm and a table update is achieved by transforming
the data into a new table.
In RDBMS, maximum data
size allowed will be in 10’s of Terabytes but whereas Hive can 100’s Petabytes
very easily.
As Hadoop is a
batchoriented system, Hive doesn’t support OLTP (Online Transaction Processing)
but it is closer to OLAP (Online Analytical Processing) but not ideal since
there is significant latency between issuing a query and receiving a reply, due
to the overhead of Mapreduce jobs and due to the size of the data sets Hadoop
was designed to serve.
RDBMS is best suited for
dynamic data analysis and where fast responses are expected but Hive is suited
for data warehouse applications, where relatively static data is analyzed, fast
response times are not
required, and when the data is not changing rapidly.
To overcome the
limitations of Hive, HBase is being integrated with Hive to support record
level operations and OLAP.
Hive is very easily
scalable at low cost but RDBMS is not that much scalable that too it is very costly
scale up.
source:http://hadooptutorial.info/
source:http://hadooptutorial.info/
Great article,keep sharing more posts with us.
ReplyDeletethank you..
big data and hadoop training
hadoop administration certification