HBase Region Split

- Split Policy (ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy(default), KeyPrefixRegionSplitPolicy)

- Split Point, The first row of center block of the biggest file of the store

- Split Workflow (prepare -> execute -> (rollback)), try to be transactional

  • prepare (create two sub regions in memory, like tablename, regionname, startkey and endkey)
  • execute

HBase Region Split

  1. change region status to 'SPLITTING' in zookeeper /region-in-transaction (RIT)
  2. master gets notification of region status change and change its status in memory
  3. create temp folder .split under parent region.
  4. close parent region and flush its data to disk.
  5. create sub folders for the two sub regions and create reference files (name comprising of hfile+region, content: split point + true/false) for them HBase Region Split
  6. copy the sub folders to HBase folder
  7. notify offline of parent region to META table and add two sub regions to the table (the data of parent region is kept until no reference to it. it's major compactions of sub regions removing the reference. Master checks parent regions in splitting and remove it if there is no reference to it)
  8. the new two regions are in the same region server since they need to reference the data of the parent region.

  • rollback (in version 2.0, use HLog to record transaction thus almost no RIT)

HBase Region Split

- show region info

scan 'hbase:meta',{FILTER=>"PrefixFilter('table_name')"}

references: http://hbasefly.com/2017/08/27/hbase-split/, https://zh.hortonworks.com/blog/apache-hbase-region-splitting-and-merging/