What is big data and My learning Path
As a developer at some point we all have this question. How big is production size. Most of the time RDBS enough for handing client's data but there are exceptional situations. When you get a client who handle very large amount of data and continuously growing exponentially with time. what are your suggestions for handling data. Before that there are lot of things to answer.
Experts say there are a limitations in RDBS. you cant use RDBS for every database needs. What would happen if we pass that limits. How we use those data for something useful. How facebook, youtube handle their data. i had all those questions. finally i understood i need to lean big data to answer my questions. Even we see huge amount of data we cant call it as Big Data.
There are 3 things you need to think about before considering those data as big data.
Volume - Data In Very large Size.
Variety. - Data Come in all type of format.
Velocity - Data is generating at very fast rate.
we call this 3V. these data can come in three categories. Structured Data, Unstructured Data, Semi-Structured Data.
When i start to lean big data i didnt have much idea about where to start. i went through articles, blog posts and i finally came up with this queue.
1. Programing language skills - When you handle big data basic job basic idea is get data from on e source and put in it somewhere else very effectively and efficiently. this involves applications and scripts. you should be good at aleat on programing language like java, python and a script language like Bash
2. Database - when you work with database you need to understand when and where to use sql and nosql. how it works in datawarehouses, how to model data. mainly you should proficient with SQL as a DML.
3. BI workflow - Business Intelligent Workflow. You should be need to understand this kind of workflows very well.
4. Big Data - First need to understand the value of 3V and familier with hadoop ecosystem(Hive, sqoop,pig,HBase) and spark.“Hadoop” also is often used interchangeably with “big data,”. Hadoop is a framework for working with big data. It is part of the big data ecosystem, which consists of much more than Hadoop itself.
These steps a not step by step approach to lean Big Data but i think the best way is start with first two steps and work on rest of the tasks as you want.
Comments
Post a Comment