Requirement
In this, we are going to read and load TSV file data in Spark Dataframe. The TSV file contains data with a tab separator.
![]()
Sample Data
Let’s use the same sample data:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Solution
Read TSV in dataframe
We will load the TSV file in a Spark dataframe. Find the below snippet code for reference.
%scala
val tsvFilePath = "/FileStore/tables/emp_data1.tsv"
val tsvDf = spark.read.format("csv")
.option("header", "true")
.option("sep", "\t")
.load(tsvFilePath)
display(tsvDf)Here, we have used CSV as a format but changed the separator to tab (“\t”). By default, CSV file have comma(“,”) as a separator.
![]()
Save as Delta table
![]()
Validate data in Table
![]()
Wrapping Up
In this post, we have seen how to read TSV file data in a spark dataframe. It is similar to the CSV file format with a tab separator.