Interpretation of LOESS regression confidence intervals

advertisements

I have a data frame that looks like this:

days <- c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7)
values <- c(3,4,5,6,7,8,1,4,5,9,9,10,3,4,7,7,7,10,4,6,6,8,9,9,1,1,2,3,3,6,2,2,3,5,6,6,6,7,7,8,9,9)

df <- data.frame(days,values)

In order to identify a signal in my data, I decided to apply a simple loess smoothing:

loess <- loess(df$values~df$days, span=1)
pred <- predict(loess, loess$x, se=TRUE)

plot(days,values)

lines(loess$x, loess$fitted, col="red", lwd=1.5)

lines(loess$x,pred$fit - qt(0.975,pred$df)*pred$se, lty="dashed", col="blue", lwd=1)
lines(loess$x,pred$fit + qt(0.975,pred$df)*pred$se, lty="dashed", col="blue", lwd=1)

The results, including confidence intervals, looks like this:

I see that, for instance, day 3 has higher values that day 5, but is this difference significant? Should I compare the confidence intervals of each day (which, in this situation, overlap), or should I base my interpretation on the loess curve vs CIs (e.g. at day 3 the curve is outside the confidence intervals of day 5)?

You have misunderstood the confidence intervals here. For regression problems of type:

observations = signal + noise

we often produce point-wise confidence intervals for the estimated signal. Such convention includes linear models, generalized linear models, nonparametric regression, etc.

Point-wise confidence interval is easy to compute, and helps you assess how variable an estimate is on every point, but you can not make cross comparison. In your case, that point-wise confidence band for each day have overlap does not imply that there is no statistical difference between them.

If you really want to test whether there is significant difference between observations from different days, for example, test whether day3 and day5 have significant difference, you should do t-test:

day3 <- df$values[df$days == 3]
day5 <- df$values[df$days == 5]
## test whether two group have the same mean, i.e., mean difference = 0
t.test(day3, day5, mu = 0)$p.value
## 0.0177

Now, you can see that day5 and day3 are statistically different at 0.05 significance level.

Generally, you should do pairwise t-test:

pairwise.t.test(df$values, df$days, p.adjust.method = "none", pool.sd = FALSE)

#    Pairwise comparisons using t tests with non-pooled SD 

# data:  df$values and df$days 

#   1       2       3       4       5       6
# 2 0.62614 -       -       -       -       -
# 3 0.52954 1.00000 -       -       -       -
# 4 0.20951 0.69979 0.62189 -       -       -
# 5 0.02519 0.05772 0.01775 0.00307 -       -
# 6 0.19799 0.19597 0.10104 0.02372 0.24736 -
# 7 0.04247 0.41763 0.27750 0.50416 0.00044 0.00355

# P value adjustment method: none

Note the use of pairwist.t.test here:

by setting pool.sd = FALSE, no common standard error is estimated for all groups;
by setting p.adjust.method = "none", raw p-value returned by t.test is not adjusted.

In this way, you can see that the p-value at (5,3) matches what we see in our previous t-test.

Your data have tied values, so that you are sort of lucky. For real time series, you have no replicate for each day; in that case, there is no pairwise t-test for you to do.

Interpretation of LOESS regression confidence intervals

Interpretation of LOESS regression confidence intervals

Recommend

Gwt celltable invokes sorting on a column

估值40亿美元！“男版Lululemon” Vuori进军七个国际市场！

韵达3月新增超1000家网点全国网点已达上万家

快手：开展“营销养号及相关收徒教学活动”治理行动

雄安新区5周年，基础电信企业力促“未来之城”拔节生长

EF框架基础 - 農碼一生

核弹来了！NVIDIA RTX 3090 Ti首发评测：比RX 6900 XT快16%

Iframe does not work in PHP?

关于图论算法 - 你的小垃圾

64亿美元！MSC就收购Bolloré Africa Logistics达成协议！

About Joyk