Today's
internet age data nature, size and query response changes dynamically
due to huge user load resulting into data start growing unmanageably.
Wal-Mart: 1 million transactions per hour
Twitter: 250 million tweets a day
Facebook: 1 billion messages a day
RDMS
are not design to handle this volume of data since they are generally
design to scale on single server, so user need to buy a bigger machine
to meet this requirement and it will add huge input cost.
Now this data is accessed from mobile devices to desktop. (Data is distributed & urgent)
$300 billion potential annual value to US health care. (Data is valued)
To
meet the need of scalability, performance and consistency required in
this web era NoSQL databases are schema free, easy replication support,
sharding, instead of ACID** it follow BASE**, following CAP** theorem
etc.
** ACID is Atomicity, Consistency, Isolation, Durability. BASE is Basically Available, Soft State, and Eventually Consistent. CAP is Consistency , Availability , Partition tolerance
Schema free: There is no predefined schema for NoSQL databases.
Easy Replication Support:
NoSQL databases employ asynchronous replication, which allows write to
complete more quickly since they don’t depend on extra network traffic.
Horizontal Scaling: NoSQL databases split tables across different server, but with only instance of the same data.
Sharding: NoSQL can perform sharding. In which records (rows) are distributed across many database server.
BASE instead of ACID: NoSQL databases emphasis on performance and availability.
Basic Availability:
NoSQL use replication to reduce the likelihood of data unavailability
and use sharding, or partitioning the among many different storage
server, to make any remaining failure partial. The result is a system
that is always available, even if subsets of the data become unavailable
for short period of time.
Soft State: NoSQL systems allow data to be inconsistent and relegate designing around such inconsistencies to application developers.
Eventual Consistency:
Although applications must deal with instantaneous consistency, NoSQL
systems ensure that at some future point in time the data assumes a
consistent state. In NoSQL it guarantees consistency only at some
un-define future time.
Here
is one important theorem of distributed computer system CAP theorem or
Brewer’s theorem, which is followed by NoSQL databases.
Consistency: All database clients see the same data, even with concurrent updates.
Availability: All database clients are able to access some version of the data.
Partition Tolerance:
The database can be split over multiple servers & at any moment
data must be available even if any node(s) / communication link(s)
fails.
NoSQL
database come in many form to meet application requirement like
key-value store, column family store, document store and graph store are
the major forms.
Key-Values Store
main idea is the existence of a hash table where there is a unique key
and a pointer to a particular item of data. These mappings are usually
accompanied by cache mechanisms to maximize performance.
Column Family Store
was created to store and process very large amounts of data distributed
over many machines. There are still keys but they point to multiple
columns.
Document Store
model is basically versioned documents that are collections of other
key-value collections. The semi-structured documents are stored in
formats like XML, JSON (JavaScript Object Notation), YAML (Yet Another
Markup Language)& BSON(Binary JSON).
Graph Store
is built with nodes, relationships between notes and the properties of
nodes. Instead of tables of rows and columns and the rigid structure of
SQL, a flexible graph model is used which can scale across many
machines.