středa 26. listopadu 2014

Killing me softly

The title say everything. This killing combination come with project in Sweden together with previously started PhD study and my spare time study during evenings and sleepless nights, but lets take it from beginning.

At the beginning there was sustainable project with daily routine and spring. Part of the year with the most energy. So, I applied and got into PhD study. Euphoria was the mood which I had directly when I got letter with acceptation.

Then I started study and not realized how much things I need to go trough so I started with two subjects in one semester. So far so good. Lot of study materials, I can't say no. But, really interesting topics (I will blog about it sometimes later).

"Winter is coming."

- the motto of House Stark

And then I started to work on project in Sweden. And this started to be more interesting. I am not able to study on daily basis, so I am little bit behind schedule, but I am really tired from winter and dark (at 7 AM is not daylight yet and at 2 PM is no longer daylight).

So, even when I put into it big discipline and lot of effort I have more things to do. And I hope I will handle it at least partially or with some postponements. And will finalize first semester somehow reasonable.

sobota 4. října 2014

Introduction to fitness band experiment

This started as stupid idea. It was influenced by someone's project in Coursera course (Data Analysis and Statistical Inference), when I did volunteer evaluation. The other student analyzed his own data from Fitbit Flex . I almost forgot on it. Then later on in another Coursera course (Getting and Cleaning Data from Data Science Specialization) we did data processing in R and this data contained also information from accelerometer and some experiment with couple of human subjects (see Human Activity Recognition Using Smartphones Data Set).

Then continued when I was discussing my dissertation thesis topic with my supervisor and we came to the experiment with fitness bands and sentiment written into twitter.

pondělí 29. září 2014

Big Data and Data Science study - presentation

During our Friday 26th of September company meeting I talked about topic in title with presentation. I think this presentation could be useful not only for audience which was there but for anyone.

Here is description what is content of presentation about:
Big Data and Data Science study with subtitle "study materials and online courses" is little bit more over 40 slides presentation about 10 domains of Data Science covered by online free and paid MOOC courses, study materials and free books.

Based on my almost year of study, investigate and collect of materials, tutorials, courses, books, links, etc. I have prepared distillation of the best in this short presentation.

Of course list is not full, because there is always something new, undiscovered and better than before. But it contains the most important information for those who want to start or don't know where exactly follow up when they already begun.

Follow up to the Speaker Deck site and you can download presentation as PDF which is quite useful when you consider functional links over all presentation.


neděle 31. srpna 2014

Data Science and PhD study

When I was applying for the PhD study I was thinking about it as opportunity to have chance to use all techniques, programming languages, methods, processes and technologies in practical way and delivery something reasonable which supports my professional growth and development. And it could stands as proof that I am really keen to learn and improve in this field and I have passion to do Data Science.

I was thinking about it again when I have read this article The Modern Data Nerd Isn’t as Nerdy as You Think on Wired about Data Nerds who are currently in charge of Data Science departments or in similar role and apply their experiences and knowledge and usually they do not have PhD. More over they do not have master degree, just bachelor degree. Or in case they have any kind of degree it is not from field close to Data Science.

čtvrtek 7. srpna 2014

Interviews to Data Scientist role

After almost year of self study of Data Science courses via Coursera, DSE program and another MOOC pages I had chance to go to Data Scientist role interviews. So far two.

First one was a couple weeks ago I was in interview to the Researcher / Data Scientist position. Mainly about prediction models, math, statistics and data mining/machine learning. Mostly in R connected with Hadoop and some other mathematical tools. This was the first for such position.

Second one was a couple days ago and it was interview to the purely Data Scientist position. Including all data science work from many different sources with project driven or data driven approach, it depends. This was most recent experience and I hope not a last one.

pondělí 30. června 2014

My study list for Summer time 2014

I was thinking what to study during Summer time. I need to improve my insight into Big Data, especially Hadoop and its fundamentals like HDFS, HBase and MapReduce. At least from non-Java developer point of view. But I would like to start with Java development (or another programming language) perspective too, some intro would be good.

I have also in my study plans two courses about Machine Learning and Data Science on Coursera which I already subscribed to. When I am currently taking a break from courses in Data Science Specialization.

And last but not least I need to finish mini project in R for DSE 501 and follow up with next one DSE 502.

If anyone wants to join me, for Coursera courses for example. Sign in, links to the courses or another staff are below and let me know in comments. We can make a study group.

sobota 28. června 2014

Course: DSE 501 - Machine Learning with R

After I went successfully through DSE 400, which is in detail described here. I followed up with DSE 501 Machine Learning with R. I was really eager to learn Machine Learning since beginning of my Data Science study, so here was an opportunity and I took it.

First at all I need to say, I really enjoyed it, even though I haven't finished it yet. Last thing which is still on my table even though I am working on it really hard is final project, the 6th assignment. You are choosing it by your self and define proposal by your self, so it's up to you, how difficult it would be. But, I am going backwards. Let's start from beginning.

čtvrtek 12. června 2014

Reading: Free online resources June 2014

I need to say, my last obsession is NLP. After I came through basics of ML/DM techniques I challenge my self in NLP Kaggle.com contest about Sentiment analysis. So, if you are interested about free (as usual of course) basic sources  for this topic, be my guest:

Unfortunately I have found that 1st one is not completely available, just first 4 chapters, but as intro it is good enough.

pondělí 12. května 2014

Course: DSE 400 - Fast Track to Data Science

After I described the whole DSE Program it self here. I would like to continue with specific parts which I came through. Currently I study 2nd course DSE 501 - Machine Learning with R which I will describe in some next blog posts, but let's start first with initial course which was DSE 400 - Fast Track to Data Science.

As I wrote already in my previous blog post Data Science Enablement Program is really different than other known courses which you know from Coursera or other MOOC websites. Instead looking at video courses and doing quizzes and course project your need to be personally involved and discuss weekly topics and also do (at least) weekly assignments which corresponds to current week topic. Each week you got a lot of study materials which could help you to achieve assignment and get reasonable knowledge about the topic and also lot of following practice suggestions with reasonable questions or tasks which move you forward through the topic.

úterý 29. dubna 2014

Reading: Free online resources April 2014

So, more statistics and what next I found interesting (I really like D3!), here is the list:
Like I wrote in my previous blog post. My intention is to have more fun with Data Science then just being overloaded by homeworks, assignments and quizzes.

pondělí 28. dubna 2014

Too much, too little

Interesting, I was waiting when it happens. It happen, after 3 months of intensive study. Combination of daily routine work, care about family and study, brought me to the intensive and continuous tired mood. I just realized that study of the Data Science field finishes to be fun and starts to be obstacle.

I know what I am doing wrong and I need to get rid of it. First at all it is study of many different MOOC courses in parallel and taking new one into account which leads to two things. I am overloaded and I am not able to finalize anything in term and/or in quality. So I need to strictly reduce it in maximum two in parallel, sometimes only one, when it is too difficult (who should know, Data Analysis and Statistical Inference gave me a lesson). And starts to enjoy it again. And also sleep over night, at least sometimes ;-).

At this time I would like to finalize all which I have in the middle, but don't go for new ones till my queue is not empty or with just one running course. I will maybe keep one and in parallel will return to read some book.

Anyway, it doesn't mean, I am stop writing these notes to my blog or even close it, not at all. I just need to slow down and keep my track which I have planned. It's interesting how little is enough to realizes that it is too much :-).

neděle 30. března 2014

Gartner's Magic Quadrants as a source for what technologies to study

As a Data Scientists you need to learn a lot. Even if you know a lot you need to learn more. New things and don't stop learning, because you wanna be top. And people on top are at least know about trends if not following them and create them (that's the best choice :-)).

So, the idea is simple, follow up a Gartner's Magic Quadrants which shows all necessary players from specific field and look what is necessary to know, what should be fine and what is optional or even could be omitted. Also we should not forget to see who is novice player and who has been rejected.

Update 4/22/2014: Included pictures of all Gartner's Magic Quadrants mentioned in this post.

čtvrtek 20. března 2014

Reading: Free online resources March 2014

Last time I promised Hadoop books, but think is there (in the internet) are another books which make bigger buzz:

So, next time maybe little bit more statistics and will see what I will find out interesting...

úterý 18. března 2014

Course: Data Scientist Enablement Program (DSE)

Data Scientist Enablement Program (DSE) is something which I have found by accident. At the beginning of year I was looking for new courses or some complex study with topics about Data Science and Big Data and have been really keen to learn something new. Then I found form which gave me access to SONO to DSE.

This program and its all 4 modules (so far I have done 1st one, so I am just assuming based on the current experience) are not classic course with video tutorial, quizzes and labs with question form and score. It is almost like in school, you can participate or you can not, you can read and watch materials which you got, but nobody checks if you have done it or not. Only one thing you need to do, is submission of assignment. And even this action is not time limited, so there is no penalty for late submissions, only recommendation to do it ASAP.

Moreover, you will get certificate corresponding to your maturity level in this study and based on your composite score which is computed based on your social engagement, activities, projects, collaboration, etc in all 4 modules.

pondělí 24. února 2014

Reading: Free online resources February 2014

The new package of brain-food from February. Last time it was about intro to Data Science, Statistics, some ML and Analysis. This time it is again about Statistics, ML and Analysis and of course available for free and online:

Next time Hadoop and related staff, but not only by Hadoop is live a men ;-).

sobota 8. února 2014

Study: Apply to Berkeley School of Information MIDS Program

It was little bit easygoing decision. When I saw offer to apply to Berkeley School of Information Master of Information and Data Science (MIDS) Program I tried it. It is completely online study. And here is my experience with how it went. In front of this post I need to say I haven't finish it and I won't do it. The main reason is: for me it is impossible to finance it. Before you start, please have a look over financial part too. You can find tuition-fee described in detail in separate article. I haven't done that, I expect it is expensive, but not so much, of course it is Berkeley :-), so I have been surprised in the middle of application.

úterý 4. února 2014

Course: M101P - MongoDB for Developers

M101P: MongoDB for Developers in Python language course was really good experience with hands on practice and many programming staff around. I need to thanks to Andrew Erlichson for his really big effort which he includes into this course at MongoDB University. I've learned a lot and I started to like NoSQL databases at least this one.

sobota 1. února 2014

Reading: Free online resources January 2014

This is just a simple list of sources which I have found on different places and are available free and online:

I will post new list, when I have something new, if possible next month.

čtvrtek 23. ledna 2014

Course: Data Science Specialization on Coursera

Surprise came to my mailbox today morning. Coursera announced new type of study: Specialization.

Specializations are new multi-course programs. In case of Data Science Specialization you will take 9 separate courses with signature track for $49 and earn certificate. And complete track you can finish with capstone project which is another $49. So, if you wanna be certified for Specialization you need finish 9 certified courses and do final project. In total you will pay $490, but it is in 10 separated payments. Great thing is all of these courses are under patronage of Johns Hopkins University which I hope ensures quality. I need to say that based on my experience with Computing for Data Analysis course which is done by this university as well, it has really good quality!

pondělí 20. ledna 2014

Course: Computing for Data Analysis

When I signed to course Computing for Data Analysis on Coursera (since mid of 2nd week I am doing Signature track which is more challenging). I did not expect that it should be so time consuming and so tough. It is. But this strictness doesn't allow me to go just over the surface, I need to go deep down and really learn R language. Which I actually good. I am lazy from nature, but who isn't :-).

pondělí 13. ledna 2014

Looking for first hands on experience

Last week I was quite busy with MongoDB course. And even though I have plenty work with final exam which contains 10 separate works I am little bit fed up of databases at all. No matter if NoSQL or SQL which I work with really intensively in last week before Christmas.

I moved my sight to the programming language and was looking for some practical hands on experience. I have been lucky and found Data Mining: Discovering and Visualizing Patterns with Python, when I was looking for new articles and resources about data science. When I found this I was so excited! It's like kill many data science skills with one stone. I can play with Python instead of the database, work with new data and visualize it. Good, let's see how it works.