Change column type in Spark Dataframe

Requirement

In this post, we will see how to convert column type in spark dataframe. Let’s assume a scenario, we used to get a CSV file from the source. As it is a CSV file, we will see mostly the datatype of the field would be String. In this scenario, we may need to change the data type before processing the data.

Solution

Let’s create a data frame with some dummy data. Here, you can assume the data frame has created on the dummy data:

 val df = spark.createDataFrame(Seq(
("1100", "Person1", "Location1", null),
("1200", "Person2", "Location2", "Contact2"),
("1300", "Person3", "Location3", null),
("1400", "Person4", null, "Contact4"),
("1500", "Person5", "Location4", null)
)).toDF("id", "name", "location", "contact")

Here, if you see all the columns are having String data type. Let’s change the id column data type from String to Int.

 Change column type

 val df2 = df.withColumn("id", df("id").cast("int"))

Wrapping Up

In this post, we have learned to change the column of the dataframe in spark using the cast. We have performed to change the data type from String to Int. Similarly, you can explore other data types.

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply