Create Delta table from TSV File in Databricks

Requirement

In this, we are going to read and load TSV file data in Spark Dataframe. The TSV file contains data with a tab separator.

Sample Data

Let’s use the same sample data:

empno

ename

designation

manager

hire_date

sal

deptno

location

9369

SMITH

CLERK

7902

12/17/1980

800

20

BANGALORE

9499

ALLEN

SALESMAN

7698

2/20/1981

1600

30

HYDERABAD

9521

WARD

SALESMAN

7698

2/22/1981

1250

30

PUNE

9566

TURNER

MANAGER

7839

4/2/1981

2975

20

MUMBAI

9654

MARTIN

SALESMAN

7698

9/28/1981

1250

30

CHENNAI

9369

SMITH

CLERK

7902

12/17/1980

800

20

KOLKATA

Solution

Read TSV in dataframe

We will load the TSV file in a Spark dataframe. Find the below snippet code for reference.

%scala
val tsvFilePath = "/FileStore/tables/emp_data1.tsv"

val tsvDf = spark.read.format("csv")
                      .option("header", "true")
                      .option("sep", "\t")
                      .load(tsvFilePath)

display(tsvDf)

Here, we have used CSV as a format but changed the separator to tab (“\t”). By default, CSV file have comma(“,”) as a separator. 

Save as Delta table

Validate data in Table

Wrapping Up

In this post, we have seen how to read TSV file data in a spark dataframe. It is similar to the CSV file format with a tab separator.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply