Data Science Padawan: 2016

pátek 14. října 2016

There are no shortcuts

During last 3 months I have realized it is really difficult to progress with something when you don't have enough knowledge and you are trying to go ahead without proper basics and good background, because you don't have time to focus on them, rather jump directly to wild water and swim. What a mistake.

I was dealing with many things simultaneously. I needed to start with second publication for my PhD study, but I worked on second experiment which needed to be included into this publication. And of course I studied two difficult and time consuming courses in the same time on Coursera and elsewhere. All of this together with big workload in work and push from my company to learn German. To much to process, to much work to do.

Problem is, nobody can control dreams you have.
There is a reason why Magical Realism has been born in Columbia.
It's a country wheres dreams and reality are conflated.
Where in the head, people fly high as Icarus.
But even Magical Realism has it's limits
and when you get to close to the sun...
your dreams may melt away.

- Narcos

I haven't fly so high as Icarus yet. I haven't fall down and let my dreams melt away yet. But I have been close. My PhD was running away from me. I haven't finish courses.

So, there are no shortcuts. Basic need was to organize and finish all of those tasks one by one with proper priority and timing. This helped me out of the bad mantra I am too busy for basics, but I cannot manage the complex things because of lack of basics.

And I returned to basics (learn particular technique or setup, install and explore new technology) at least for once a week instead of work on my PhD or watching lot a videos from MOOC courses I am focusing on small parts with hands on experience.

neděle 3. července 2016

Heart rate and sentiment experiment design

Photo GrejGuide.dk @ Flickr

After I finished first experiment and publish article in HEALTHINF 2016 conference this year in Rome I started thinking about next experiment design.

As I mentioned in previous paper improvements I tried to get a lesson from previous mistakes and improve a lot. First, steps are increasing during the day and thus they are not so much independent. Better would be to use heart rate because is totally independent. Second, I can improve my records about sentiment in timing and evaluation. And last but least important is sentiment extraction, instead of supervised learning used in previous work I would like to used unsupervised classification.

So, let's get to details.

Experiment Design

We are still looking for relation between soft data (sentiment) and hard data (measurand). In first experiment it was text recorded via twitter and footsteps. This time it's again text recorded via twitter, but instead footsteps it's heart rate which is more idenpendent.

What's are the main objectives:

One month experiment (30 days)
20 tweets per day (600 tweets minimum)
Continuous heart rate measurement (24/7)
Effective time for heart rate measurement between 7 and 23, i. e. 16 hours a day, rest used for wristband charging
No sleep activity monitoring
Steps monitoring? Perhaps.

Další informace »

středa 15. června 2016

Result of fitness band experiment

As I am progressing with my PhD study I haven't been able to write down any article because I have been busy. And not only PhD made me busy for whole time since January this year, I have to care about work and family.

Nevertheless I should point out results from Introduction fitness band experiment when I wrote about it one and half year ago and never continued.

What was the experiment about

To find relation between sentiment (represented by text recorded in twitter just for practical reasons) and human activity represented by footsteps. Practically about finding link between soft data - sentiment - and hard, measured data.

Reading data: Github implementations

Data and it's processing and analysis code I will publish another time, because I don't have them yet at GitHub.

Twitter data reading implementation

What I have is implementation or reading tweets from Twitter API through tweepy. Which is refactored original version. The reason of refactoring is that I work on second version of fitness band experiment.

Jawbone data reading implementation

What I also have is implementation of reading data from Jawbone API through many different libraries. It's small mess which I need to clean up later when I will use Jawbone again.

Story telling

Long story short

All the data has been extracted, processed and was defined hypothesis which wasn't rejected. Unfortunately, rejection of null hypothesis in favor of alternative was expected and it doesn't happened. That's result and that's the long story short. The whole article presented on HEALTHINF 2016 conference in February this year (2016) is possible to get here:

You can also follow up with the whole story in following chapter, if your are not interested in paper it self right now.

Další informace »

neděle 3. ledna 2016

Every Data Scientist must only pay taxes and die, rest is just optional...

Picture from cacm.acm.org

I am just wondering. There are many articles about what every Data Scientist MUST know and do to be real Data Scientist. There are many articles about MUST not do as Data Scientist. What real Data Scientist MUST read and so on and so on. And honestly I don't care. Why is that so?

Research

Let's take the last one must: "What Data Scientist must read" list. I just briefly took several results from Google, here is the list of 10 of them:

And what is this quick research good for? From this circa 80 books and list of several articles you get list of subjective chosen resources which lead practically to nowhere. Just several books repeating like famous Nate Silvers Signal and Noise and of course some R or other Cookbooks. So, what is conclusion?

Conclusion

This lists of books which someone else read leads me always to Vincent Granville's article Fake data science. And what you need to take from it? Pick any book you need for your field of expertise. And what should be your field of expertise? Choose some project, doesn't matter if your personal one, school or for instance from Kaggle.com. And follow up approaches which you need to for goal achievement, then pick book, course to support your path towards this goal. And by real work, life experience you will sooner or later become Data Scientist.