Sunday, December 25, 2016 8:16:00 PM
- There is a large volume of structured and unstructured data sitting out there. How can companies analyze these large amounts of data to gain actionable insights for better strategic decision making? Thats what Big Data helps achieve.
- Terms commonly heard: Hadoop, Map Reduce, Cassandra, Spark, Kafka, Zookeeper, No SQL, Mongo DB, R, Matlab, data mining, high performance computing, analytics (descriptive, predictive, in-memory), grid computing etc.
- Matlab and R are popularly used for statistical programming while Hadoop is an implementation of Map Reduce and as expected it does a lot of computation in parallel.
- Spark is a new Apache technology that sifts through data faster than current or traditional methods while Cassandra, developed at Facebook is an open-source technology that helps store, process, and access mass amounts of data across less expensive, low-end servers
- Some major Big Data algorithms include: Classification, Clustering and Regression. In the early 2000’s Doug Laney coined this term "Big Data” as having three V’s: Volume + Velocity + Variety.
- Where does this large volume of data come from? Streaming, social media and publicly available data (data.gov, CIA World Factbook, European Union Open Data Portal etc.) are the main sources.