Machine learning models and the market for lemons

The market for lemons is an economic concept in which buyers of a good cannot distinguish between quality products and poor products (the lemons). This lack of knowledge makes it so that people selling lemons can always underbid people with higher quality products. In the long run, all quality vendors are driven out, and only cheap lemon sellers remain.

I believe people selling predictive models (or machine learning models, or forecasting products, or artificial intelligence, to round out all the SEO terms) are highly susceptible to this. This occurs in markets in which the predictive models cannot be easily evaluated.

What reminded me of this is I recently saw a vendor saying they have the “most accurate” population health predictive models. This is a patently absurd assertion (even if you hosted a kaggle style competition, it would only apply to that kaggle dataset, not a more general claim to that particular institutions population). But the majority of buyers (different healthcare systems), likely have no way to evaluate my companies claims vs this vendors.

ChatGPT is another recent example. Although it can generate on its face “quality” answers, don’t use it to diagnose your illnesses. ChatGPT is very impressive at generating grammatically correct responses, so to a layman may appear to be high quality, but really it is very superficial in most domains (no different than using google searches to do anything complicated, which can be useful sometimes but is very superficial).

So what is the solution? From a consumer perspective, here is my advice. You should ask the vendor to do a demonstration on your own data. So you ask the vendor, “here is my data for 2019, can you forecast the 2020 data?”, or something along those lines where you provide a training set and a test set. Then you have the vendor generate predictions for test set, and you do the evaluation yourself to see if there predictions are worth the cost of the product.

This is a situation in which academic peer review has some value as well (as well as data competitions). You can see that the method a particular group used was validated by its peers, but ultimately the local tests on your own data will be needed. Even if my recidivism model is accurate for Georgia, it won’t necessarily generalize to your state.

If you are in a situation in which you do not have data to validate the results in the end, you need to rely on outside experts and understanding the methodology used to generate the estimates. A good example of this is people selling aggregate crime data (that literally make numbers up). I have slated a blog post about that in the near future to go into more detail, but in short there is no legitimate seller of second hand crime data in the US currently.

If you are interested in building or evaluating predictive models, please get in touch with my consulting services. While I say that markets for lemons can drive prices down, I still see quite a few ridiculous SaaS prices, like $900k for a black box, unevaluated early intervention system for police.

At least so far many of these firms are using the Joel Spolsky 6 figure sales approach for crappy products. My consulting firm can easily beat a six digit price tag, so the lemons have not driven me out yet.

Machine learning models and the market for lemons

Machine learning models and the market for lemons

Recommend

PC shipments fell sharply last quarter, with Apple among hardest hit

7000mAh超大电池比肩充电宝！华为畅享60X官宣：实用体验才是王道

发表误导性声明毕马威高盛大摩被起诉

Linux Install Party 2023

B站up主集体停更，创作者运营应满足这三点

American IQ Scores Have Rapidly Dropped, Proving the 'Reverse Flynn Effect' - Sl...

Netflix is making a Stranger Things animated series

Beau Is Afraid is an exercise in laughing to keep from screaming

包圆了！苹果A17将是今年唯一3nm手机处理器：安卓4nm干瞪眼

Tesla’s newest Megapack-building Megafactory will be in Shanghai - The Verge

About Joyk