neděle 19. března 2017

Continuous and discrete time series visualized together

The second publication for Gigascience is nearly finished and the deadline for third publication is approaching really quickly so I needed to start with data analysis. Because I had just a few ideas about how to analyze experiment data I asked the colleague from University about advice and sent her data. It was a few months ago.

Recently I got back from her the ideas and notes and also the recommendation for the book about time series analysis. As usual, it's good to start with several visualizations and from simple things to complex. So, I want to share some notes about it.

Data

As the output from two parts of the experiment were recorded two pairs of datasets for:
  • Experiment #1 with Fitbit Charge HR
    • 1029 tweets with average of 20.56 tweets per day, 
    • 411 799 records of HR with frequency of 6 - 7 records per minute
  • Experiment #2 with Peak Basis
    • 1017 tweets with average of 20.32 tweets per day,
    • 69 909 records of HR with frequency of 1 record per minute
All the tweets are just a records of sentiment and they are evaluated by hashtags #p for positive and #n for negative sentiment by experiment participant. As a part of publication, the sentiment will be extracted via machine learning methods and compared to human evaluation.

Visualization

I took data from experiment #1 since there are more records of HR and did following data wrangling:

  • Tweets respectively evaluated sentiment needs to be extended to not represent just a point in time, but the whole window. For this, we need to define "breaking point" between two consecutive going sentiments. It's simply in the middle. This is visualized in the following figure as the gray line with the dots on it where positive sentiment = 150 of HR bpm and negative sentiment = 0 of HR bpm (scale adjustment was done, otherwise sentiment is represented by +1 and -1). Dot's here representing true records, the line is an extrapolation to the time window.
  • Heart rate is drawn with another gray line representing rapid changes over time. I have applied to it simple moving average (SMA) method to get a much smoother line for further analysis. And also cut this SMA line by extrapolated windows of sentiment where the red part represents negative sentiment and blue part positive sentiment.
Here is the result:
It looks really promising, but that's all so far. I have few ideas how to continue, but since this is a part of my third publication I will publish it first and then link the publication itself and also complete code on Github. Sharing this was just about the idea how to start with the visualization of continuous and discrete time series together.

Žádné komentáře:

Okomentovat