Create Delta table from TSV File in Databricks

Requirement

In this, we are going to read and load TSV file data in Spark Dataframe. The TSV file contains data with a tab separator.

Sample Data

Let’s use the same sample data:

empno	ename	designation	manager	hire_date	sal	deptno	location
9369	SMITH	CLERK	7902	12/17/1980	800	20	BANGALORE
9499	ALLEN	SALESMAN	7698	2/20/1981	1600	30	HYDERABAD
9521	WARD	SALESMAN	7698	2/22/1981	1250	30	PUNE
9566	TURNER	MANAGER	7839	4/2/1981	2975	20	MUMBAI
9654	MARTIN	SALESMAN	7698	9/28/1981	1250	30	CHENNAI
9369	SMITH	CLERK	7902	12/17/1980	800	20	KOLKATA

Solution

Read TSV in dataframe

We will load the TSV file in a Spark dataframe. Find the below snippet code for reference.

%scala
val tsvFilePath = "/FileStore/tables/emp_data1.tsv"

val tsvDf = spark.read.format("csv")
                      .option("header", "true")
                      .option("sep", "\t")
                      .load(tsvFilePath)

display(tsvDf)

Here, we have used CSV as a format but changed the separator to tab (“\t”). By default, CSV file have comma(“,”) as a separator.

Save as Delta table

Validate data in Table

Wrapping Up

In this post, we have seen how to read TSV file data in a spark dataframe. It is similar to the CSV file format with a tab separator.

Previous Post: Create Delta table from Excel File in Databricks

Next Post: Check If DataFrame is Empty in Spark

Leave a Reply Cancel reply

You must be logged in to post a comment.