2

Universal Prediction Algorithms

 1 year ago
source link: http://bactra.org/notebooks/universal-prediction.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Universal Prediction Algorithms

22 Jul 2022 10:04


Given: a single time series, perhaps a very long one, from a stochastic process which is basically unknown; perhaps merely that it is stationary and ergodic.

Desired: a forecast which will converge on the best possible forecast, as the series becomes longer and longer. Or: the best possible forecast from within a fixed class of forecasting algorithms.

A solution is called a universal prediction algorithm because it applied equally to all the processes within the class, and is not tailored to any one of them.

This has connections to information theory (via universal compression algorithms), to the problem of finding Markovian representations and inference for Markov models, and to many other topics. Presumably they could be used for bootstrapping time series.

See also: Ergodic Theory; Learning in Games; Learning Theory; Low-Regret Learning; Machine Learning, Statistical Inference and Induction; Sequential Decisions Under Uncertainty; Time series

  • Recommended:
  • Alekh Agarwal and John C. Duchi, "The Generalization Ability of Online Algorithms for Dependent Data", arxiv:1110.2529 [If you have a learning algorithm which achieves low regret against some class of alternatives, and the data source is suitably strongly-mixing, your algorithm will generalize almost as well as the best rival.]
  • Paul H. Algoet
  • Pierre Alquier and Olivier Wintenberger, "Model selection and randomization for weakly dependent time series forecasting", Bernoulli 18 (2012): 883--913, arxiv:0902.2924
  • Nicolo Cesa-Bianchi and Gabor Lugosi, Prediction, Learning, and Games [Mini-review]
  • Imre Csiszar and Zsolt Talata, "On Rate of Convergence of Statistical Estimation of Stationary Ergodic Processes", IEEE Transactions on Information Theory 56 (2010): 3637--3641
  • Shane Legg, "Is There an Elegant Universal Theory of Prediction?", cs.AI/0606070 [A nice set of diagonalization arguments against the hope of a universal prediction scheme which has the nice features of Solomonoff-style induction, but is actually computable.]
  • Donald Ornstein and Benjamin Weiss, "How Sampling Reveals a Process", Annals of Probability 18 (1990): 905--930 [Open access. A truly beautiful and inspiring paper. — The negative results here would seem to depend on their very strong notion of what it means to reconstruct a process. For instance, while in their sense it is not possible to always discriminate between two processes (unless they are Bernoulli), Ryabko and Ryabko (arxiv:0804.0510) give a consistent test to do just this for ergodic processes, not necessarily Bernoulli, by employing a weaker notion of inter-process distance.]
  • Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
  • Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari, "Online Learning: Random Averages, Combinatorial Parameters, and Learnability", arxiv:1006.1138
  • Vladimir Vovk, Alex Gammerman and Glenn Shafer, Algorithmic Learning in a Random World [Mini-review]
  • To read:
  • Gérard Biau, Kevin Bleakley, László Györfi, György Ottucsák, "Nonparametric sequential prediction of time series", arxiv:0801.0327
  • Pierre Gaillard, Paul Baudin , "A consistent deterministic regression tree for non-parametric prediction of time series", arxiv:1405.1533
  • L. Gyorfi, G. Morvai, S. Yakowitz, "Limits to consistent on-line forecasting for ergodic time series", IEEE Transactions on Information Theory 44 (1998): 886-892 = arxiv:0712.2430
  • Marcus Hutter
    • "Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences," cs.LG/0106036
    • "Convergence and Loss Bounds for Bayesian Sequence Prediction," cs.LG/0301014
    • "General Loss Bounds for Universal Sequence Prediction," cs.AI/0101019
    • "Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet", cs.LG/0311014
  • D. Jones, M. Kohler and H. Walk, "Weakly Universally Consistent Forecasting of Stationary and Ergodic Time Series", IEEE Transactions on Information Theory 58 (2012): 1191--1202
  • Yuri Kalnishkan, Vladimir Vovk and Michael V. Vyugin, "How many strings are easy to predict?", Information and Computation 201 (2005): 55--71 ["It is well known in the theory of Kolmogorov complexity that most strings cannot be compressed; more precisely, only exponentially few (O(2^n-m)) binary strings of length n can be compressed by m bits. This paper extends the 'incompressibility' property of Kolmogorov complexity to the 'unpredictability' property of predictive complexity. The 'unpredictability' property states that predictive complexity (defined as the loss suffered by a universal prediction algorithm working infinitely long) of most strings is close to a trivial upper bound (the loss suffered by a trivial minimax constant prediction strategy). We show that only exponentially few strings can be successfully predicted and find the base of the exponent."]
  • Gusztav Morvai, "Guessing the output of a stationary binary time series", arxiv:0710.3760
  • Gusztav Morvai, Sanjeev R. Kulkarni, Andrew B. Nobel, "Regression estimation from an individual stable sequence", Statistics 33 (1999): 99--118 = arxiv:0710.2496
  • Gusztáv Morvai and Benjamin Weiss
  • G. Morvai, S. Yakowitz, L. Gyorfi, "Nonparametric inference for ergodic, stationary time series", Annals of Statistics 24 (1996): 370--379, arxiv:0711.0367
  • Andrew B. Nobel, Gusztav Morvai, Sanjeev R. Kulkarni, "Density estimation from an individual numerical sequence", IEEE Transactions on Information Theory 44 (1998): 537--541, arxiv:0710.2500
  • Boris Ryabko, "Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes", Problems of Information Transmission 43 (2003): 367--379
  • Boris Ryabko and Jaakko Astola
    • "Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding", cs.IT/0504079
    • "Universal Codes as a Basis for Time Series Testing", cs.IT/0602084
    • "Universal Codes as a Basis for Nonparametric Testing of Serial Independence for Time Series", cs.IT/0506094
  • Daniil Ryabko
  • Daniil Ryabko and Marcus Hutter, "On Sequence Prediction for Arbitrary Measures", cs.LG/0606077
  • Alessio Sancetta, "Universality of Bayesian Predictions", Bayesian Analysis 7 (2012): 1--36
  • Nathan Srebro, Karthik Sridharan, Ambuj Tewari, "On the Universality of Online Mirror Descent", arxiv:1107.4080
  • Hayato Takahashi, "Computational limits to nonparametric estimation for ergodic processes", IEEE Transactions on Information Theory 57 (2011): 6995--6999, arxiv:1002.1559
  • T. Weissman, "How to Filter an `Individual Sequence with Feedback'", IEEE Transactions on Information Theory 54 (2008): 3831--3841
  • S. Yakowitz, L. Gyorfi, J. Kieffer, G. Morvai, "Strongly consistent nonparametric forecasting and regression for stationary ergodic sequences", J. Multivariate Analysis 71 (1999): 24--41 = arxiv:0712.2592
  • Jacob Ziv, "A Universal Prediction Lemma and Applications to Universal Data Compression and Prediction", IEEE Transactions on Information Theory 47 (2001): 1528--1532

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK