Big Data

Big Data

Currently you can save, organize and analyze large amounts of data easily. And, briefly, this is the concept of Big Data. You have a huge amount of data and can organize this data and analyze them in order to obtain information that matters in useful time.
The Big Data [Big Data in 2013] main features are volume (amount of data), velocity (speed of data in and out) and variety (range of data types and sources), called the 3V's of Big Date. The volume considers large amounts of data growing exponentially. The velocity with which working data must be reduced. Information must be obtained in useful time. The characteristic of the variety is related to data formats. The data can be in various formats, from binary files to XML structures, JSON,etc ...
But there are two important aspects implicit in the concept Big Data that can not be bleached, the veracity and the value. Veracity is an important aspect because no matter have volume, speed and variety if the data is not reliable. The value is the aspect that justifies all the work and investment in Big Data, the value of information obtained from large data storage.

Relational databases have little flexibility are therefore not suitable for working with Big Data. It is here you enter the NoSQL (NotOnlySQL) that can work with large and growing volume of information.
The NoSQL solutions can be:

  • Storage key / value that can be represented as a hashmap or an associative array.
  • Super storage columns where the information is stored in the form of columns without relationships.
  • document storage functions similarly to the key / value but the value is a format, for example, BSON or JSON.
  • Storage graphs where data are represented by nodes or arcs and are related.
  • oriented storage objects where the data is stored as objects.
The way information is accessed and manipulated varies from solution to solution. Some NoSQL databases are MongoDB, Cassandra, Redis, Hbase, BigTable and others.

However some relational databases already incorporate NoSQL, such as MySQL and PostgreSQL, thereby exploring the best of both worlds. With relational databases bearing semi-extruturados data allows the user to use NoSQL document storage and maintain the ACID (atomicity, consistency, isolation and durability) set of properties that ensure security for database transactions.
Two very important concepts and that has to be addressed when discussing this subject is the MapReduce. and sharding.
The mapreduce, basically, is a design pattern that maps the input data and produces a list of key / value.
The process of storing data records across multiple machines is called sharding. Makes data storage management maintaining a uniform storage in systems with several machines to store data. Sharding suport data growth and the demands of read and write operations.