středa 25. prosince 2013

What, where, why and how to study Data Science

I just jumped into the water. Just like that. I haven't so much thought about that. I knew some basics about Big Data, but about Data Science not. It changed when I passed through one commercial enterprise Big Data solution introduction. It also contains Data Science role description and its contribution to development process.

Next step was reading many web discussion about it. And then finally read really good article Fake Data Science from Vincent Granville published in February 2013. It formed my opinion and it turned my way little bit different way than I was at first time. Simply said from more technical point of view to more business point of view. And it make sense if you would start to look on Data Science from right angle. What is result or what result do you expect (what's are business needs) and what path do you need to use to accomplish it?


What?

So, what to study. It's only my opinion and we can discuss about it:

  • Data analysis, it doesn't matter what kind of data, you can start with your personal, small volume or large volume it doesn't matter, but do analysis, practice it and think about it. Why are you doing what you are doing and what is your target and what's are results. What's are expectations. Even the tool is irrelevant you can use any kind of at the beginning.
  • Business intelligence. Why business need this analysis, why do you need to analyse this kind of data and what are you looking for. Does definitions of business needs correlate to analyzed data? So, not just only existing metrics for specific businesses but also new ones, your own defined and make sense. With practice and study about business behind and existing KPI's for each one, you will be able to define your own measures with experience you get.
  • Data transformation, processing, cleansing, actually good knowledge about ETL or integration should be fine. Including large data processing aka Big Data. Also knowledge about statistics, data mining and machine learning  is suitable knowledge. Could be also architecture, storage of data via SQL and NoSQL databases and so on.
From more quick point of view, it is good to know (don't take it as complete, full list...):

  • Data analysis techniques
  • Business analysis (specific businesses like telecommunications, banking, insurance, retail, etc.).
  • Java or Python or any suitable programming language
  • NoSQL (even know basics about SQL) database
  • Statistics and tools for statistics
  • Data mining techniques
  • Machine learning
  • Predictive modelling
  • May be basics or better knowledge about Hadoop, MapReduce, Hive, Pig, HBase, etc.

Where?

You can start for free as I started. Good overview is written again by Vincent Granville in article Data Science programs and training currently available published in October 2013.

Universities

List of available programs is in article mentioned above. And there is also article about Harvard classes on data science from Vincent Granville published in December 2013. 

Online courses

I recommend to you go to mooc-list.com and find for your self any suitable courses not only from Coursera, but also from Open2Study or else.

For example interesting for me are:
I have chosen and bought few books from this list Data Science Kit, also there is many free on Amazon about Big Data basics. You can find also Machine learning, Statistics and another books. I'll follow up with articles contains reviews of book which I have studied from, so currently you can choose what you want and then may be following me in close future.

Why?

My motivation is clear. I want to professionally grow. I am really keen to study and learn something with purpose, useful and have some value for me.

How?

Here is my straightforward plan:
  • 1 book in 2013 (in progress, reading now Big Data Analytics).
  • 12 books in 2014 (Intro to Big Data, Data mining, Machine Learning, businesses and analysis).
  • 1 online course in 2013 (already done Open2Study Big Data for Better Performance).
  • 12 online courses in 2014 (Open2Study, Mongo, Coursera, CalTech, Harvard, etc.).
  • When book is theoretical, then course must be practical with hands on technology and vice versa.
  • I will write book review on this blog for each one which I have finished.
  • Accomplish online course with 80% score or better.
  • Hands on experience by my own way with blog post about it. I want to do Social Network Analysis, FX financial analysis, etc.
That's all. You can just watch and read about my progress. You can join and follow me or just read and discuss. Give me some advice and see if I take it or not :-). I'll try to publish at least 4 times per month every week if possible, but I do not promise!

Žádné komentáře:

Okomentovat