Create Parquet Table from CSV File in Databricks

Requirement

In the last post, we have imported the CSV file and created a table using the UI interface in Databricks. In this post, we are going to create a delta table from a CSV file using Spark in databricks.

Solution

Let’s use the same sample data:

empno

ename

designation

manager

hire_date

sal

deptno

location

9369

SMITH

CLERK

7902

12/17/1980

800

20

BANGALORE

9499

ALLEN

SALESMAN

7698

2/20/1981

1600

30

HYDERABAD

9521

WARD

SALESMAN

7698

2/22/1981

1250

30

PUNE

9566

TURNER

MANAGER

7839

4/2/1981

2975

20

MUMBAI

9654

MARTIN

SALESMAN

7698

9/28/1981

1250

30

CHENNAI

9369

SMITH

CLERK

7902

12/17/1980

800

20

KOLKATA

Code

Here, we will use the same template of code from Create table with notebook option.

%python
# File location and type
file_location = "/FileStore/tables/emp_data1-3.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)


# Create a view or table

temp_table_name = "emp_data13_csv"

df.createOrReplaceTempView(temp_table_name)


%sql

/* Query the created temp table in a SQL cell */

select * from `emp_data13_csv`


permanent_table_name = "emp_data13_csv"

# df.write.format("parquet").saveAsTable(permanent_table_name)

Here, I have just changed the first_row_is_header to true instead of the default value.

 Next, with the below code, you can create a temp table:

 # Create a view or table

temp_table_name = "emp_data13_csv"
df.createOrReplaceTempView(temp_table_name)

And last, you can create the actual table with the below command:

permanent_table_name = "testdb.emp_data13_csv"
df.write.format("parquet").saveAsTable(permanent_table_name)

Here, I have defined the table under a database testdb. It will create this table under testdb.

This will create a Parquet format table as mentioned in the format.

Wrapping Up

In this post, we have just used the available notebook to create the table using parquet format. Here, you can customize the code based on your requirement like table name, DB name, the filter of the data based on any logic, etc.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply