how to delete column in spark dataframe

Requirement:

You have sample dataframe and you want to delete some columns from it.

 

Solution:

Step 1: Sample Dataframe 

use below command:

 spark-shell

Note: I am using spark 2.3 version.

To Create a sample dataframe , Please refer Create-a-spark-dataframe-from-sample-data

After following above post ,you can see that students dataframe has been created. You can use this dataframe to perform operations.

Use below command to see the content of dataframe

 students.show()

Step 2: Deletion of columns

To delete some columns,refer below code.

in below code we have used drop function , which takes the name of columns which we want to delete.stu

var updated_df=students.drop("percentage","name")

 

 

 

 

 

Step 3 : Check Number of columns in new dataframe

You can check the columns using below command

updated_df.columns

 

 

 

Wrapping up:

Sometimes after joining and applying filter ,we might not need some columns in spark dataframe. So to minimise any memory issue or for saving processing time we must eliminate unwanted columns as early as possible.

Don’t forget to subscribe us.

Don’t miss the tutorial on Top Big data courses on Udemy you should Buy

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply