Load TSV file in Spark

Requirement

Let’s say we have a data file with a TSV extension. It is the same as the CSV file.

What is the difference between CSV and TSV?

The difference is separating the data in the file The CSV file stores data separated by “,”, whereas TSV stores data separated by tab.

In this post, we will load the TSV file in Spark dataframe

Sample Data

Let’s take some dummy data for the exercise:

empno        ename        designation        manager        hire_date        sal        deptno        location

9369        SMITH        CLERK        7902        12/17/1980        800        20        “1A

BANGALORE”

9499        ALLEN        SALESMAN        7698        2/20/1981        1600        30        “2B

HYDERABAD”

9521        WARD        SALESMAN        7698        2/22/1981        1250        30        PUNE

9566        TURNER        MANAGER        7839        04/02/81        2975        20        MUMBAI

9654        MARTIN        SALESMAN        7698        9/28/1981        1250        30        CHENNAI

9369        SMITH        CLERK        7902        12/17/1980        800        20        “5E

KOLKATA”

Solution

We are having a file that contains the above data with Tab-separated in the TSV file. The file is available below the path.

Find below the code snippet used to load the TSV file in Spark Dataframe.

val df1 = spark.read.option("header","true")
                    .option("sep", "\t")
                    .option("multiLine", "true")
                    .option("quote","\"")
                    .option("escape","\"")
                    .option("ignoreTrailingWhiteSpace", true)
                    .csv("/Users/dipak_shaw/bdp/data/emp_data1.tsv")

Here, we have used some options like the header. sep, multiline, etc. We have already covered the details about this in the post.

Wrapping Up

If you observed, we are using CSV built-in function to read the data from the TSV file and load it into Dataframe. The changes are separated by the character which has been done using the option.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply