Wednesday, May 4, 2016

How to extrapolate beyond the x points passed to `ksmooth`?

Leave a Comment

I have a kernel function like so:

x <- 1:100 y <- rnorm(100, mean=(x/2000)^2) plot(x,y) kernel <- ksmooth(x,y, kernel="normal", bandwidth=10) print(kernel$y) 

If I try to predict at a point outside of the range of x values, it will give me NaN, because it is attempting to extrapolate beyond the data:

x <- 1:100 y <- rnorm(100, mean=(x/2000)^2) plot(x,y) kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, x.points=c(130)) print(kernel$y)  > print(kernel$y) [1] NA 

Even when I change range.x it doesn't budge:

x <- 1:100 y <- rnorm(100, mean=(x/2000)^2) plot(x,y) kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200) , x.points=c(130)) print(kernel$y)  > print(kernel$y) [1] NA 

How do I get the ksmooth function the extrapolate beyond the data? I know this is a bad idea in theory, but in practice this issue comes up all the time.

1 Answers

Answers 1

To answer your side question, looking at the code of ksmooth, range.x is only used when x.points is not provided so that explains why you do not see it used. Let's look at the code in ksmooth:

function (x, y, kernel = c("box", "normal"), bandwidth = 0.5,      range.x = range(x), n.points = max(100L, length(x)), x.points)  {     if (missing(y) || is.null(y))          stop("numeric y must be supplied.\nFor density estimation use density()")     kernel <- match.arg(kernel)     krn <- switch(kernel, box = 1L, normal = 2L)     x.points <- if (missing(x.points))          seq.int(range.x[1L], range.x[2L], length.out = n.points)     else {         n.points <- length(x.points)         sort(x.points)     }     ord <- order(x)     .Call(C_ksmooth, x[ord], y[ord], x.points, krn, bandwidth) } 

From this we see that we need to not provide x.points to make sure that range.x is used. If you run:

x <- 1:100 y <- rnorm(100, mean=(x/2000)^2) plot(x,y) kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200)) plot(kernel$x, kernel$y) 

Now you'll see that your kernel is evaluated beyond 100 (although not up to 200). Increasing the bandwidth parameter allows you to get even further away from 100.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment