What makes spark so powerful (Part 1)

Reading Time: 3 minutes

In this blog, we are going to see some core components of spark which make it so powerful and easy to use.

This series will have 3 different parts which will cover following topics:

What makes spark so powerful (Part 1) – Storage layer
- Birth of spark
- Storage
What makes spark so powerful (Part 2) – Resource management
- Cluster management
- Worker
What makes spark so powerful (Part 3) – Engine & Ecosystem and APIs
- Spark SQL
- MLlib
- Structure streaming

Birth of Spark

We all must have heard about the popularity of Spark, which is adopted by major players like Amazon, eBay, and Yahoo!.
But have you ever think? why we need a spark and why it actually initiated?

OK let me tell you!!!!!
Spark was initiated to address the potential issues in Hadoop Map Reduce framework. Although Hadoop Map Reduce was a groundbreaking framework to handle big data processing, in reality, it still had a lot of limitations in terms of speed. Spark was new and capable of doing in-memory computations, which made it almost 100 times faster than any other big data processing framework. Since then, there has been a continuous increase in the adoption of Spark across the globe for big data applications.

Apache Spark vs Hadoop Map Reduce

Apache Spark is really fast, nearly 100 times faster than Hadoop Map Reduce.
Suppose you want real-time decisions or business insights, then you should opt for Spark and its in-memory processing.
Spark has many inbuilt libraries, like for machine learning it has spark ml but Hadoop needs a third party to provide these functionalities.
As the spark is fast in speed, it can create all combinations faster.

Spark Architecture

There are five core components that make Spark so powerful and easy
to use. The core architecture of Spark consists of the following layers,

Storage
Resource Management
Engine
Ecosystem

In this part of the blog, we are only going to talk about the Storage layer, and in the next blogs, we will talk about every individual layer of spark.

Storage

Prior to utilizing Spark, data must be made available to process it. This data can be there in any sort of database. Spark offers various choices to utilize various types of data sources, in terms of processing data on a large scale. Spark allows you to utilize many State of the arts and traditional relational databases just as NoSQL, for example, Cassandra and MongoDB.

Additionally, It provides the ability to read from almost every popular file systems such as HDFS, Cassandra, Hive, HBase, SQL servers.

This is all from this blog. Hope you enjoyed the blog and it helped you!! Stay connected for more future blogs. Thank you!!

Stay Tunes, happy learning

Follow MachineX Intelligence for more:

Birth of Spark

Apache Spark vs Hadoop Map Reduce

Spark Architecture

Storage

Recommend

影响不大？奔驰汽车官宣涨价：不涨我也买不起呀

Top 27 User Onboarding Tools: Highly Recommended for Businesses

Amazon workers in Staten Island vote to unionize

The Secret to Pitching Your Business Plan in Just 10 Minutes

10 Amenities That Should Be at Every Electric Vehicle Charging Station

What's the Spread Operator Used For in JavaScript?

LoginRadius Launches M2M Authorization for Seamless Business Operations

Use PHP regex to replace content in a link

Untrusted - a user javascript adventure game

Spark Structured Streaming (Part 4) - Handling Late Data - Knoldus Blogs

About Joyk