Requirement
In this, we are going to read and load TSV file data in Spark Dataframe. The TSV file contains data with a tab separator.
Sample Data
Let’s use the same sample data:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Solution
Read TSV in dataframe
We will load the TSV file in a Spark dataframe. Find the below snippet code for reference.
%scala val tsvFilePath = "/FileStore/tables/emp_data1.tsv" val tsvDf = spark.read.format("csv") .option("header", "true") .option("sep", "\t") .load(tsvFilePath) display(tsvDf)
Here, we have used CSV as a format but changed the separator to tab (“\t”). By default, CSV file have comma(“,”) as a separator.
Save as Delta table
Validate data in Table
Wrapping Up
In this post, we have seen how to read TSV file data in a spark dataframe. It is similar to the CSV file format with a tab separator.