Read data from Cosmos DB using Spark in Databricks

In this post, I will show you how to read data from Cosmos DB using Spark in Databricks. This is a common scenario for many data pipelines that need to process data from a NoSQL database like Cosmos DB.

Requirement

Let’s say you have a Cosmos DB account with a container that stores some JSON documents. You want to read these documents into a Spark dataframe in Databricks and perform some transformations and analysis on them. How can you do that?

Solution

There is a built-in connector available that you can install on your Databricks cluster. It allows you to connect to your Cosmos DB account using a connection string and query the data using Spark SQL.

 

 

To use the built-in connector, you need to create a configuration object with your connection string and other parameters, such as the database name, container name, preferred regions, etc. Then you can use the `spark.read.format("cosmos.oltp")` method to create a dataframe from your Cosmos DB container. For example:

# Import the library
import com.azure.cosmos.spark._

# Create a configuration object
config = {
"spark.cosmos.accountEndpoint" : "",
"spark.cosmos.accountKey" : "",
"spark.cosmos.database" : "",
"spark.cosmos.container" : ""
}

# Create a dataframe from Cosmos DB
df = spark.read.format("cosmos.oltp").options(**config).load()

# Display the dataframe
display(df)

The advantage of using the built-in connector is that it is easy to set up and use. You can query your data using Spark SQL and apply any transformations or actions you want. The disadvantage is that it may incur high latency and RU charges, as it needs to communicate with your Cosmos DB account over the network. It also does not support change feed or analytical store features.

Conclusion

In this post, I have shown you how to read data from Cosmos DB using Spark in Databricks using the built-in connector. I hope you found this post useful and learned something new today. Thanks for reading!

Sharing is caring!

Subscribe to our newsletter
Loading

Leave a Reply