Requirement

In the last post, we have imported the CSV file and created a table using the UI interface in Databricks. In this post, we are going to create a delta table from a CSV file using Spark in databricks.

Solution

Let’s use the same sample data:

empno	ename	designation	manager	hire_date	sal	deptno	location
9369	SMITH	CLERK	7902	12/17/1980	800	20	BANGALORE
9499	ALLEN	SALESMAN	7698	2/20/1981	1600	30	HYDERABAD
9521	WARD	SALESMAN	7698	2/22/1981	1250	30	PUNE
9566	TURNER	MANAGER	7839	4/2/1981	2975	20	MUMBAI
9654	MARTIN	SALESMAN	7698	9/28/1981	1250	30	CHENNAI
9369	SMITH	CLERK	7902	12/17/1980	800	20	KOLKATA

Code

Here, we will use the same template of code from Create table with notebook option.

%python
# File location and type
file_location = "/FileStore/tables/emp_data1-3.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)


# Create a view or table

temp_table_name = "emp_data13_csv"

df.createOrReplaceTempView(temp_table_name)


%sql

/* Query the created temp table in a SQL cell */

select * from `emp_data13_csv`


permanent_table_name = "emp_data13_csv"

# df.write.format("parquet").saveAsTable(permanent_table_name)

Here, I have just changed the first_row_is_header to true instead of the default value.

Next, with the below code, you can create a temp table:

 # Create a view or table

temp_table_name = "emp_data13_csv"
df.createOrReplaceTempView(temp_table_name)

Create DELTA Table

And last, you can create the actual delta table with the below command:

 permanent_table_name = "testdb.emp_data13_csv"
df.write.format("delta").saveAsTable(permanent_table_name)

Here, I have defined the table under a database testdb. It will create this table under testdb.

This will create a DELTA format table as mentioned in the format.

Same Code in Scala

%scala
val file_location = "/FileStore/tables/emp_data1-3.csv"
val file_type = "csv"

val infer_schema = "false"
val first_row_is_header = "true"
val delimiter = ","

val df = spark.read.format(file_type)
  .option("inferSchema", infer_schema)
  .option("header", first_row_is_header)
  .option("sep", delimiter)
  .load(file_location)

display(df)

val permanent_table_name = "testdb.emp_data13_csv"
df.write.format("delta").saveAsTable(permanent_table_name)

Wrapping Up

In this post, we have just used the available notebook to create the table using delta format. Here, you can customize the code based on your requirement like table name, DB name, the filter of the data based on any logic, etc.

Create Delta Table from CSV File in Databricks

Requirement

Solution

Code

Same Code in Scala

Wrapping Up

Leave a Reply Cancel reply

Tags