I need to check if the day of the week or month of the year has effect on stock returns. I decided to use unobserved components model with "rucm" package in R because it can extract seasonal characteristics from time series. In my case, I want to determine if there exists daily and monthly seasonality. My dataset is just a time series of daily stock returns:
structure(list(Date = structure(c(1388966400, 1389139200, 1389225600, 1389312000, 1389571200, 1389657600, 1389744000, 1389830400, 1389916800, 1390176000, 1390262400, 1390348800, 1390435200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), LogReturn = c(-0.009, 0.016, 0.021, 0.036, 0.049, 0.092, 0.023, -0.05, 0.044, -0.018, 0.001, -0.021, -0.022)), .Names = c("Date", "LogReturn"), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"))
The code I used:
install.packages(rucm) library(rucm) model1<-ucm(formula=LogReturn~0,data=data, level=TRUE,slope=FALSE,season=TRUE,season.length=30)
I put season.length =30 randomly. I thought that if I have daily data and seasonality is daily, season.length should be 1, but it doesnt accept 1.
And my output looks like this
Estimated variance: "Irregular_Variance" "Level_Variance" "Season_Variance"
As you can see I didnt get much information on how day of the week or month of the year affects stock returns. Could you please help me with this problem?
Update1. I added some features to my dataset. Now it shows what day of the week is on each date and added proxy variables for each day of the week.
structure(list(Date = structure(c(1388966400, 1389139200, 1389225600, 1389312000, 1389571200, 1389657600, 1389744000, 1389830400, 1389916800, 1390176000, 1390262400, 1390348800, 1390435200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), LogReturn = c(-0.009, 0.016, 0.021, 0.036, 0.049, 0.092, 0.023, -0.05, 0.044, -0.018, 0.001, -0.021, -0.022), Dayoftheweek = c("Monday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday"), proxymonday = c(1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0), proxytuesday = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0), proxywednesday = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0), proxythursday = c(0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), proxyfriday = c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0)), .Names = c("Date", "LogReturn", "Dayoftheweek", "proxymonday", "proxytuesday", "proxywednesday", "proxythursday", "proxyfriday"), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"))
1 Answers
Answers 1
I am not mentioning using RUCM because I am not familiar enough with the unobserved components model. But it seems the question you posed can be answered in another manner.
It seems to me there are two separate questions you are interested in. The first is the day of the week as the independent variable and the second is the month of the year as the independent component. Contingently, you also may want to look at the interaction of day of the week * month of the year. So let's break this up:
First let's look at the day of the week. You want to first find the mean dollars for each day. Your null hypothesis in this case would be that there are no day-to-day differences versus the alternative hypothesis which is at least one day will have a significant difference from the other days. To see this, we use anova
as a function of dayofweek
: (in this example, dat
is the name of your dataset)
datsum <- anova(lm(LogReturn~dayofweek))
This yields:
> datsum Analysis of Variance Table Response: LogReturn Df Sum Sq Mean Sq F value Pr(>F) dayofweek 4 0.0066421 0.0016605 1.266 0.3587 Residuals 8 0.0104932 0.0013116
Based on your data that you provided it seems there is no significant day effect. But like you said this is only a small portion of the data.
Now, you can do this based on month, although since months don't have a uniform number of days, you are going to have to adjust for it. But you can do the same thing again by anova(lm(LogReturn ~ Month, data = dat))
.
Now using ARIMA modeling, you can find seasonality. Here is a tutorial of time series.
Another option is to use double-seasonal time series.
Since you have not provided more data, it is hard for me to demonstrate it here. But both of the linked guides will help you structure your code and your data to perform seasonal analysis. I provided the anova
as a starting guide and a launch point for you thinking about this.
0 comments:
Post a Comment