Requirement
In this post, we will see how to convert column type in spark dataframe. Let’s assume a scenario, we used to get a CSV file from the source. As it is a CSV file, we will see mostly the datatype of the field would be String. In this scenario, we may need to change the data type before processing the data.
Solution
Let’s create a data frame with some dummy data. Here, you can assume the data frame has created on the dummy data:
val df = spark.createDataFrame(Seq( ("1100", "Person1", "Location1", null), ("1200", "Person2", "Location2", "Contact2"), ("1300", "Person3", "Location3", null), ("1400", "Person4", null, "Contact4"), ("1500", "Person5", "Location4", null) )).toDF("id", "name", "location", "contact")
Here, if you see all the columns are having String data type. Let’s change the id column data type from String to Int.
Change column type
val df2 = df.withColumn("id", df("id").cast("int"))
Wrapping Up
In this post, we have learned to change the column of the dataframe in spark using the cast. We have performed to change the data type from String to Int. Similarly, you can explore other data types.