Requirement
Suppose, you have one table in hive with one column and you want to split this column into multiple columns and then store the results into another Hive table.
Solution
Assume the name of hive table is “transact_tbl” and it has one column named as “connections”, and values in connections column are comma separated and total two commas are present in each value.
Step 1: Create Hive Table
Create an input table transact_tbl in bdp schema using below command.
CREATE TABLE bdp.transact_tbl(connections STRING);
Step 2: Insert Data in Hive
Insert data into connections columns, String should be comma separated. For e.g. “PSTP, POST, FRDE” is one value of connections column.
INSERT INTO TABLE transact_tbl VALUES('PSTP,POST,FRDE'), ('LEFD,HSYT,YDTH'), ('AQSV,NESA,LRAJ'), ('KSAI,SJAA,PSOA'), ('SHAG,JSGS,AGAQ'), ('AQDS,KAHY,AGQG'), ('BAGW,AHQG,HAFY');
Step 3: Show Hive data
Let’s see how data looks in a transact_tbl table using below command:
SELECT connections FROM bdp.transact_tbl;
It should show output as below:-
Step 4: Create Output hive table
Create an output table where you want to store split values. Run below command in the hive:
CREATE TABLE optrans_tbl(con1 STRING,con2 STRING,con3 STRING);
Step 5: Use Hive function
There is a built-in function SPLIT in the hive which expects two arguments, the first argument is a string and the second argument is the pattern by which string should separate.
It will convert String into an array, and desired value can be fetched using the right index of an array.
Use below query to store split records in the hive table:-
INSERT INTO TABLE bdp.optrans_tbl SELECT CONCTNS.splitted_cnctns[0] AS con1, CONCTNS.splitted_cnctns[1] AS con2, CONCTNS.splitted_cnctns[2] AS con3 FROM (SELECT split(connections,',') AS splitted_cnctns FROM bdp.transact_tbl)CONCTNS;
In the above query, you can see that splitted_cnctns is an array with three values in it, which can be extracted using the proper index as con1, con2, and con3.
Inner query is used to get the array of split values and the outer query is used to assign each value to a separate column.
Step 6: Show output
Let’s see how data looks like in an optrans_tbl table using below command:
SELECT con1,con2,con3 FROM bdp.optrans_tbl;
It should show output as below:-
Keep Learning 🙂
Don’t miss the tutorial on Top Big data courses on Udemy you should Buy
what if i want one more column other than the split String, say transact_tbl table have one more column amount(Every connections have different amount) and i want amount column as well in the optrans_tbl in order to use JOIN. What should i do for that?
I have solved it, thank you 🙂