9

ChatGPT4 writes Stan code so I don’t have to.

 1 year ago
source link: https://statmodeling.stat.columbia.edu/2023/04/18/chatgpt4-writes-stan-code-so-i-dont-have-to/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

ChatGPT4 writes Stan code so I don’t have to.

Several months ago I (Phil Price) wrote a Stan model to do some time series forecasting. It took me almost a full day to get it running and debugged. Today I decided to test ChatGPT4, by seeing if it could write a Stan model to give me the same functionality. I have been using ChatGPT to help me with coding, but in the past I’ve used it for little stuff, such as “I have a data frame with column names that all start with mm, and some of the names have ( and ) in them; write a function to remove all of that”. I have just started using it to write entire programs for me, and it looks like it is going to be enormously useful. Below, I present a verbatim transcript of the start of a ChatGPT4 session. At the end I got tired of cutting and pasting — and you would have gotten tired of reading it — but I tell you what happened. And at the very end, I’ll summarize my thoughts on the ChatGPT4 programming experience thus far.

Phil: I have monthly time series data. I want to fit a model to the data (typically from the past five to eight years) and use the model to forecast future values. Let’s call the data y[t]. Here are some things I want to include:
1. Seasonality. If the value in March is high in one year it is probably also high in the following year, and so on for other months. So I want ‘month effects’.
2. Autocorrelation in month effects. The month effect, for a given month of the year, is not iid random. If the March effect is high for several years and then drops a lot, it is more likely to stay low than to jump back up.
3. Local trend. If the value has increased for the past several months it will probably keep increasing, although it might not.
4. Soft constraints. The value of y[t] is very unlikely to fall below a value y_low, or to go above y_high. I don’t want to impose the constraints after the fact. Instead, I want to repel the forecasts from these limits. I want a fairly short-range repulsion, so the forecasts aren’t affected unless they get close to the limits.

Please write a Stan program to do this, and include comments that show where each of those four requirements is implemented.

ChatGPT4: Here’s a Stan program that incorporates seasonality, autocorrelation in month effects, local trend, and soft constraints:

data {
  int<lower=1> T;          // number of time steps
  int<lower=1> T_future;    // number of future time steps for forecasting
  int<lower=1> M;           // number of months in a year
  real y[T];                // observed time series data
  real y_low;               // lower soft constraint
  real y_high;              // upper soft constraint
}

parameters {
  real mu;                  // overall mean
  vector[M] month_effect;   // month effect for each month
  real<lower=0> sigma_month;// standard deviation of month effect changes
  real<lower=0> sigma_trend;// standard deviation of trend changes
  real<lower=0> sigma_y;    // observation noise standard deviation
  real phi;                 // autoregressive trend parameter
  real trend[T];            // local trend
}

transformed parameters {
  real y_hat[T];            // fitted values
  
  for (t in 1:T) {
    int month = (t - 1) % M + 1;
    y_hat[t] = mu + month_effect[month] + trend[t];
  }
}

model {
  // 1. Seasonality: Month effects
  month_effect ~ normal(0, 1);

  // 2. Autocorrelation in month effects
  for (m in 2:M) {
    month_effect[m] ~ normal(month_effect[m-1], sigma_month);
  }

  // 3. Local trend
  trend[1] ~ normal(0, sigma_trend);
  for (t in 2:T) {
    trend[t] ~ normal(phi * trend[t-1], sigma_trend);
  }

  // 4. Soft constraints
  for (t in 1:T) {
    target += normal_lpdf(y[t] | y_hat[t], sigma_y) +
              normal_lccdf(y_low | y_hat[t], sigma_y) +
              normal_lccdf(y_high | y_hat[t], sigma_y);
  }
}

generated quantities {
  real y_future[T_future];  // forecasts of future values
  
  for (t in 1:T_future) {
    int month = (T + t - 1) % M + 1;
    real trend_future = phi * trend[T];
    y_future[t] = normal_rng(mu + month_effect[month] + trend_future, sigma_y);
  }
}

Let me explain where each of the four requirements is implemented:

  1. Seasonality: The month_effect vector stores the effect for each month (lines 15 and 29). The fitted value y_hat[t] is the sum of the overall mean mu, the month effect for the corresponding month, and the local trend at time t (line 34).
  2. Autocorrelation in month effects: The autocorrelation in month effects is modeled by specifying that each month effect (except the first one) is drawn from a normal distribution with the mean equal to the previous month effect (line 39).
  3. Local trend: The local trend is modeled by the trend vector. The trend at time t is drawn from a normal distribution with the mean equal to the phi * trend[t-1] (line 43). The parameter phi controls the autoregressive trend.
  4. Soft constraints: Soft constraints are incorporated in the model block by modifying the log probability density function (line 48). The normal_l

Phil: Your answer got truncated. Please explain how the soft constraints work.

ChatGPT4: I apologize for the truncation. I will explain how the soft constraints work in the Stan program:

In the model block, soft constraints are incorporated by modifying the log probability density function (line 48). Instead of using the usual likelihood y[t] ~ normal(y_hat[t], sigma_y), we add two additional terms:

  • normal_lccdf(y_low | y_hat[t], sigma_y): This term calculates the log of the cumulative distribution function (CDF) of a normal distribution with mean y_hat[t] and standard deviation sigma_y at the lower constraint y_low. The CDF gives the probability of a value being less than or equal to y_low. Since we want to penalize the forecasts close to or below the lower limit, we use the log complementary CDF (lccdf), which gives the log probability of a value being greater than y_low. This term increases when y_hat[t] is closer to y_low, effectively repelling the forecasts from the lower limit.
  • normal_lccdf(y_high | y_hat[t], sigma_y): This term works similarly to the previous one but for the upper constraint y_high. It calculates the log probability of a value being greater than y_high. This term increases when y_hat[t] is closer to y_high, repelling the forecasts from the upper limit.

By adding these two terms to the likelihood, the model will adjust its parameter estimates to avoid forecasting values too close to the soft constraint limits. The strength of the repulsion depends on the standard deviation sigma_y; smaller values of sigma_y will result in stronger repulsion.

Phil: Will that model work for just about any input data, or should I standardize y[t]?

ChatGPT4: The model provided can work for a wide range of input data, but it is generally a good practice to standardize your input data, especially when using probabilistic models like those in Stan. Standardizing your data can lead to better convergence and more efficient sampling.

To standardize your input data y[t], you can calculate the mean and standard deviation of the input data, then subtract the mean and divide by the standard deviation for each data point. In the Stan program, you can add the following to the transformed data block:

transformed data {
  real y_mean = mean(y);
  real y_sd = sd(y);
  real y_standardized[T];

  for (t in 1:T) {
    y_standardized[t] = (y[t] - y_mean) / y_sd;
  }

  real y_low_standardized = (y_low - y_mean) / y_sd;
  real y_high_standardized = (y_high - y_mean) / y_sd;
}

Then, replace y with y_standardized and y_low and y_high with their standardized versions in the model block. After fitting the model, you can convert the forecasts back to the original scale in the generated quantities block:

generated quantities {
  real y_future[T_future];  // forecasts of future values

  for (t in 1:T_future) {
    int month = (T + t - 1) % M + 1;
    real trend_future = phi * trend[T];
    real y_future_standardized = normal_rng(mu + month_effect[month] + trend_future, sigma_y);
    y_future[t] = y_future_standardized * y_sd + y_mean;
  }
}

By standardizing your input data, you make the model more robust and ensure better convergence and efficiency during the sampling process.

Phil: I prefer to standardize my data outside the Stan model. Please provide Python code to standardize the data and to call the Stan model.

ChatGPT4: Certainly! Here is a Python script that standardizes your data outside the Stan model, calls the Stan model, and rescales the forecasts back to the original scale. This script uses the CmdStanPy library to interact with Stan, but you can use other libraries like PyStan if you prefer.

First, make sure you have the CmdStanPy package installed: pip install cmdstanpy

Now, let’s proceed with the Python script:

import numpy as np
import pandas as pd
from cmdstanpy import CmdStanModel

# Load your data (replace this with the appropriate code to load your time series)
y = np.array([...])  # Replace with your actual data

# Standardize the data
y_mean = np.mean(y)
y_sd = np.std(y)
y_standardized = (y - y_mean) / y_sd

# Set the other required data inputs
T = len(y)
T_future = 12  # Number of future time steps for forecasting
M = 12  # Number of months in a year
y_low = 10  # Lower soft constraint
y_high = 50  # Upper soft constraint

# Standardize y_low and y_high
y_low_standardized = (y_low - y_mean) / y_sd
y_high_standardized = (y_high - y_mean) / y_sd

# Set up data dictionary
data_dict = {
    'T': T,
    'T_future': T_future,
    'M': M,
    'y': y_standardized,
    'y_low': y_low_standardized,
    'y_high': y_high_standardized
}

# Compile the Stan model (use the provided Stan code in a file named "time_series_model.stan")
stan_model = CmdStanModel(stan_file="time_series_model.stan")

# Fit the Stan model
fit = stan_model.sample(data=data_dict)

# Extract the forecasts
y_future_standardized = fit.stan_variable("y_future")

# Rescale the forecasts back to the original scale
y_future = y_future_standardized * y_sd + y_mean

# Calculate the mean and percentiles of the forecasts
y_future_mean = np.mean(y_future, axis=0)
y_future_lower = np.percentile(y_future, 2.5, axis=0)
y_future_upper = np.percentile(y_future, 97.5, axis=0)

# Print the results
print("Mean forecasts:", y_future_mean)
print("Lower bound (2.5 percentile):", y_future_lower)
print("Upper bound (97.5 percentile):", y_future_upper)

Phil: [After running the model and generating forecasts] I thought the constraints would keep all or almost all of the forecasts to between y_low and y_high but that’s not the case at all, they go much higher and much lower. They go much lower than 0.5 times the minimum value in the data, and much higher than 2 times the max. Am I doing something wrong with the definition of y_low and y_high, or is there a mistake in the code?

ChatGPT4: I apologize for the confusion. The issue is due to the fact that y_low and y_high are defined based on the original data (y_kwh) but not adjusted to the standardized scale. When you use y_low and y_high in the Stan model, they should be defined relative to the standardized data.

To fix the issue, you need to standardize y_low and y_high by applying the same transformation as you did for y. [ChatGPT4 provided the code, not shown here]

Phil: That’s pretty much what I’m doing, but I still have the problem. Here’s the actual code I’m using: [code not shown, it’s the same as above except instead of y_low=10 and y_low=50 I set these to be 0.5 times the minimum value in the data, and 2 times the maximum.]

ChatGPT4: I apologize for the confusion earlier. I reviewed the Stan code again and realized that the soft constraints are not strong enough to keep the forecasts within the desired limits. The cauchy_lpdf in the Stan model does not repel the forecasts as strongly as you intended. To address this issue, we can use a different distribution for the soft constraints, such as a truncated normal distribution.

Please update the Stan code for the soft constraints as follows:

Replace the lines:

for (t in 1:(T + T_future))
    target += cauchy_lpdf(y_hat[t] | y_low, 0.01) + cauchy_lpdf(y_hat[t] | y_high, 0.01);

with:

real y_low_sd = 0.01;
  real y_high_sd = 0.01;
  for (t in 1:(T + T_future)) {
    target += normal_lcdf(y_low | y_hat[t], y_low_sd) + normal_lcdf(y_high | y_hat[t], y_high_sd);
  }

This change replaces the Cauchy distribution with a truncated normal distribution for the soft constraints. The normal_lcdf function is the cumulative distribution function (CDF) of the normal distribution, which can be used to create a stronger repulsion effect.

After making this change, the forecasts should be more constrained within the desired limits defined by y_low and y_high. Please update your Stan model with this new code and re-run the analysis. If you still experience issues, please let me know, and we can explore other ways to strengthen the soft constraints.

Phil: Unfortunately I am still getting an error: RuntimeError: Error during sampling: Exception: array[uni, ...] index: accessing element out of range. index 62 out of range; expecting index to be between 1 and 61 (in 'time_series_model.stan', line 48, column 4 to column 98)

ChatGPT4: I apologize for the oversight. The error occurs because the transformed parameter y_hat is defined only for the observed time series (of length T), but in the soft constraints loop, we are accessing elements up to T + T_future. To fix this issue, we should extend the y_hat array to include the future time steps as well. Here’s the corrected code:

transformed parameters {
  real y_hat[T + T_future];  // fitted values including future time steps
  
  for (t in 1:T) {
    int month = (t - 1) % M + 1;
    y_hat[t] = mu + month_effect[month] + trend[t];
  }
}

With this modification, y_hat is now an array with T + T_future elements, allowing the soft constraints loop to access future time steps without going out of range. Please update your Stan code with this change and re-run the analysis. If you still encounter any issues, please let me know, and we can explore further adjustments.

=====
At this point I will stop with the verbatim dialog. There were several problems with the code: loops that ran over the wrong range; variables calculated in the transformed_parameters block that should have been in the generated_quantities block; also an error I introduced while cutting and pasting code from chatGPT4; a rather fundamental problem using the truncated normal approach to the penalties (I ended up switching to a sort of exponential repulsion). It took about one additional hour to get the code to run and to confirm to my satisfaction that it’s doing what I want. All told, this took about 2.5 hours. And all I did was end up with a model very similar to one I wrote by hand several months ago. But that one took me something like six hours! So 2.5 hours is a huge speedup. And I am still learning how to get chatGPT to write code for me. As I get better I expect to get faster.

Working with ChatGPT4 for this task was like working with a good intern programmer who is rather forgetful and a bit scattered. At one point it gave me a line like target -= normal_lcdf(…). I said “target -= looks weird, is that really acceptable syntax?” and it said Yes! (It isn’t). Also, it acts as though it can’t hold more than one idea in its head at a time sometimes. Considering it actually holds _zero_ ideas in its ‘head’ at any time, it’s very impressive that it can do this at all. Still, it’s sometimes frustrating to seemingly mutually decide to modify a chunk of code in a specific way, and two minutes later it reverts to the old way. That is quite a common occurrence.

ChatGPT4 can be frustrating in its forgetfulness, its occasional syntax errors, etc. And on both this programming project and another one in which I’ve put ChatGPT4 to serious use rather than idle testing, I have had to be very attentive because it sometimes produces code that runs but does not do what I asked: the model is not the one I specified, or a pre-processing step is wrong, or whatever. I have not yet been able to simply tell it what I want and get the answer out of it directly.

I thought I had learned some lessons on my other project and that with this one here I would indeed be able to get good code right away. For instance, it seems to help if you say “please implement features X, Y, and Z, and identify where in the code these are implemented”, so I tried that above…and it still wasn’t enough.

But: This has already changed the way I code. I am not sure I am ever going to write code completely by hand again! And I expect these tools to get better fast.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK