Data Science Padawan: Result of fitness band experiment

As I am progressing with my PhD study I haven't been able to write down any article because I have been busy. And not only PhD made me busy for whole time since January this year, I have to care about work and family.

Nevertheless I should point out results from Introduction fitness band experiment when I wrote about it one and half year ago and never continued.

What was the experiment about

To find relation between sentiment (represented by text recorded in twitter just for practical reasons) and human activity represented by footsteps. Practically about finding link between soft data - sentiment - and hard, measured data.

Reading data: Github implementations

Data and it's processing and analysis code I will publish another time, because I don't have them yet at GitHub.

Twitter data reading implementation

What I have is implementation or reading tweets from Twitter API through tweepy. Which is refactored original version. The reason of refactoring is that I work on second version of fitness band experiment.

Jawbone data reading implementation

What I also have is implementation of reading data from Jawbone API through many different libraries. It's small mess which I need to clean up later when I will use Jawbone again.

Story telling

Long story short

All the data has been extracted, processed and was defined hypothesis which wasn't rejected. Unfortunately, rejection of null hypothesis in favor of alternative was expected and it doesn't happened. That's result and that's the long story short. The whole article presented on HEALTHINF 2016 conference in February this year (2016) is possible to get here:

You can also follow up with the whole story in following chapter, if your are not interested in paper it self right now.

Complete story

During the time when I worked on the paper I did several crucial decisions how to follow up. But first things first. Main goal is to provide some evidence there is relation between good mood (positive) and move (doing steps) rather than bad mood (negative) represented by tweets (text). That's the base for later hypothesis definition and testing.

Searching for relation between sentiment (represented by tweets) and steps was possible with two sentiment evaluation/extraction options. Sentiment was evaluated manually by people (several colleges from university and friends). And I also want to apply supervised machine learning for automatic sentiment extraction.

To choose right machine learning method for sentiment extraction is important for further application when human evaluation is not possible due to big amount of data. To do so, I realized that I have small corpus to train model properly and later test it with significant amount of data.

So, I found out two external corpora (Movie Review Data and Annotated Twitter Sentiment Dataset) with already evaluated sentiment to train my sentiment extraction model with five different methods:

Decision Trees (DT)
Random Forest (RF)
Naive Bayes (NB)
Maximum Entropy (ME)
Support Vector Machines (SVM)

Standard metrics (precision, recall, accuracy, F-measure) were performed over those different machine learning methods and for two most successful (RF and SVM) were done cross tables to see the detail of each method sentiment extraction. Better, based on cross tables, from those two was RF.

Next few steps are going narrow down to the main goal. I defined null hypothesis: Movement doesn't effect mood and wanted to reject it in favor of alternative hypothesis: Movement results in positive mood.

After exploratory data analysis when I compared human evaluated sentiment and aggregated amount of steps to particular sentiment value in time and of course extracted sentiment (by machine learning method) with also aggregated amount of steps to specific sentiment value in time, it looks good with possible conclusion of reject null hypothesis.

It was false conclusion which was fully rejected by statistical inference t-test method which revealed that for such data as I collected during experiment the null hypothesis cannot be rejected.

The final conclusion discuss possible causes of not rejection null hypothesis like bias, size of the corpora and results of hypothesis testing itself. The final result is summarizing possible improvement in the way to use in next experiment heart rate instead of steps which is much more independent measure for such purposes.

And that's it. I spent circa half a year of work to write it down into paper and then it was presented in conference. Currently I am performing second experiment which I will write about in some next blog posts.

4 komentáře:

data scientist course11. dubna 2021 v 21:20
First You got a great blog .I will be interested in more similar topics. i see you got really very useful topics, i will be always checking your blog thanks.
data scientist training and placement in hyderabad
OdpovědětVymazat
Odpovědi
Ramesh Sampangi10. září 2021 v 5:44
Nice blog. Awesome content. Keep sharing more blogs again soon. Thank you.
Best Data Science Training in Hyderabad
OdpovědětVymazat
Odpovědi
360DigiTMG9. března 2022 v 9:00
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one.
data analytics course in hyderabad
OdpovědětVymazat
Odpovědi
360DigiTMG18. dubna 2022 v 3:00
You completed certain reliable points there. I did a search on the subject and found nearly all people will agree with your blog.
data science course in hyderabad
OdpovědětVymazat
Odpovědi

Okomentovat

středa 15. června 2016

Result of fitness band experiment