Notice

╠ This is my personal blog and my posts here have nothing to do with my employers or any other association I may have. It is my personal blog for my personal experience, ideas and notes. ╣

Monday, July 1, 2013

NoSQL Overview

Today's internet age data nature, size and query response changes dynamically due to huge user load resulting into data start growing unmanageably.
Wal-Mart: 1 million transactions per hour
Twitter: 250 million tweets a day
Facebook: 1 billion messages a day


RDMS are not design to handle this volume of data since they are generally design to scale on single server, so user need to buy a bigger machine to meet this requirement and it will add huge input cost.  


Now this data is accessed from mobile devices to desktop. (Data is distributed & urgent)
$300 billion potential annual value to US health care. (Data is valued)


To meet the need of  scalability, performance and consistency required in this web era NoSQL databases are schema free, easy replication support, sharding, instead of ACID**  it follow BASE**, following CAP** theorem etc.


** ACID is Atomicity, Consistency, Isolation, Durability. BASE is Basically Available, Soft State, and Eventually Consistent. CAP is Consistency , Availability , Partition tolerance


Schema free: There is no predefined schema for NoSQL databases.


Easy Replication Support: NoSQL databases employ asynchronous replication, which allows write to complete more quickly since they don’t depend on extra network traffic.


Horizontal Scaling: NoSQL databases split tables across different server, but with only instance of the same data.


Sharding: NoSQL can perform sharding. In which records (rows) are distributed across many database server.


BASE instead of ACID: NoSQL databases emphasis on performance and availability.
Basic Availability: NoSQL use replication to reduce the likelihood of data unavailability and use sharding, or partitioning the among many different storage server, to make any remaining failure partial. The result is a system that is always available, even if subsets of the data become unavailable for short period of time.
Soft State: NoSQL systems allow data to be inconsistent and relegate designing around such inconsistencies to application developers.
Eventual Consistency: Although applications must deal with instantaneous consistency, NoSQL systems ensure that at some future point in time the data assumes a consistent state. In NoSQL it guarantees consistency only at some un-define future time.


Here is one important theorem of distributed computer system CAP theorem or Brewer’s theorem, which is followed by NoSQL databases.
Consistency: All database clients see the same data, even with concurrent updates.
Availability: All database clients are able to access some version of the data.
Partition Tolerance: The database can be split over multiple servers & at any moment data must be available even if any node(s) / communication link(s) fails.


NoSQL database come in many form to meet application requirement like key-value store, column family store, document store and graph store are the major forms.


Key-Values Store main idea is the existence of a hash table where there is a unique key and a pointer to a particular item of data. These mappings are usually accompanied by cache mechanisms to maximize performance.


Column Family Store was created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns.


Document Store model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like XML, JSON (JavaScript Object Notation), YAML (Yet Another Markup Language)& BSON(Binary JSON).


Graph Store is built with nodes, relationships between notes and the properties of nodes. Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which can scale across many machines.