Requirement
In the last post, we have imported the CSV file and created a table using the UI interface in Databricks. In this post, we are going to create a delta table from a CSV file using Spark in databricks.
Solution
Let’s use the same sample data:
empno | ename | designation | manager | hire_date | sal | deptno | location |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | BANGALORE |
9499 | ALLEN | SALESMAN | 7698 | 2/20/1981 | 1600 | 30 | HYDERABAD |
9521 | WARD | SALESMAN | 7698 | 2/22/1981 | 1250 | 30 | PUNE |
9566 | TURNER | MANAGER | 7839 | 4/2/1981 | 2975 | 20 | MUMBAI |
9654 | MARTIN | SALESMAN | 7698 | 9/28/1981 | 1250 | 30 | CHENNAI |
9369 | SMITH | CLERK | 7902 | 12/17/1980 | 800 | 20 | KOLKATA |
Code
Here, we will use the same template of code from Create table with notebook option.
%python # File location and type file_location = "/FileStore/tables/emp_data1-3.csv" file_type = "csv" # CSV options infer_schema = "false" first_row_is_header = "false" delimiter = "," # The applied options are for CSV files. For other file types, these will be ignored. df = spark.read.format(file_type) \ .option("inferSchema", infer_schema) \ .option("header", first_row_is_header) \ .option("sep", delimiter) \ .load(file_location) display(df) # Create a view or table temp_table_name = "emp_data13_csv" df.createOrReplaceTempView(temp_table_name) %sql /* Query the created temp table in a SQL cell */ select * from `emp_data13_csv` permanent_table_name = "emp_data13_csv" # df.write.format("parquet").saveAsTable(permanent_table_name)
Here, I have just changed the first_row_is_header to true instead of the default value.
Next, with the below code, you can create a temp table:
# Create a view or table temp_table_name = "emp_data13_csv" df.createOrReplaceTempView(temp_table_name)
And last, you can create the actual table with the below command:
permanent_table_name = "testdb.emp_data13_csv" df.write.format("parquet").saveAsTable(permanent_table_name)
Here, I have defined the table under a database testdb. It will create this table under testdb.
This will create a Parquet format table as mentioned in the format.
Wrapping Up
In this post, we have just used the available notebook to create the table using parquet format. Here, you can customize the code based on your requirement like table name, DB name, the filter of the data based on any logic, etc.