Pre Splitting of hbase table

Requirement

To distribute the load evenly in cluster, it is required to do pre splitting of hbase table at the time of creation of table .

Solution :

Pre splitting have the power to maintain the same amount of data in each region of hbase. It would be helpful when You know the hbase keys in advance and wants to distribute same number records in every region. It would Make reading from hbase tables Very fast.

This type of practice is useful when you are reading hbase table in spark and then processing it.

When there are same number of records ,one task doesn’t take long for finish than other. Hence advantage of parallelism can be taken.

Below is the scala code which identifies the split point based on given data .

Let’s have a look.

Step 1 : Imports

 import org.apache.spark.sql.expressions.window
import org.apache.spark.sql.functions.row_number

Step 2 : The Regions

You need to define the number of regions which you need for hbase table.

 var regions=200

Step 3 : Loading the row keys and identification of split points

Now you need to load the row key in dataframe to identify the splitting point. once done you can use rank function to identify the exact point where splitting should happen in hbase table.

After showing the data, let’s say X1,X2…Xn are the results of above code.

 var data_df=spark.table("dev.sample_dta").select("id".cast("string")).withColumn("rank",row_number().over(Window.orderBy("id"))).cache()
var records_per_region=(data_df.count()/regions).toInt

data_df.filter($"rank"===1 || $"rank" % records_per_region ===0).show(regions,false)

Step 4 : Creation of Hbase Table

Now open hbase shell and run below command after replacing with the actual values.

 create 'sample_table',{NAME => 'CF' , VERSIONS => 2 ,COMPRESSION => 'SNAPPY'},{SPLITS => ['X1','X2','X3','Xn']}

Step 5 : Load and Truncate

Now You can load data in to hbase table, and it will be served to all regions if all the row keys are present. In case you want to delete the data use truncate_preserve instead of truncate.

Pre Splitting of hbase table

Requirement

Solution :

Step 1 : Imports

Step 2 : The Regions

Step 3 : Loading the row keys and identification of split points

Step 4 : Creation of Hbase Table

Step 5 : Load and Truncate

Leave a Reply Cancel reply

Data migration from Hive to HBase

Import CSV data into HBase

Pre Splitting of hbase table

Certifications

Top Machine Learning Courses You Shouldn’t Miss

Top courses for data engineers

Top Big Data Courses on Udemy You should Take

Pre Splitting of hbase table

Requirement

Solution :

Step 1 : Imports

Step 2 : The Regions

Step 3 : Loading the row keys and identification of split points

Step 4 : Creation of Hbase Table

Step 5 : Load and Truncate

Leave a Reply Cancel reply

Data migration from Hive to HBase

Import CSV data into HBase

Pre Splitting of hbase table

Certifications

Top Machine Learning Courses You Shouldn’t Miss

Top courses for data engineers

Top Big Data Courses on Udemy You should Take

Tags