Coupon Collector's Problem

Aug 23, 2018 • Tao He| Probability Theory

The coupon collector’s problem describes the following question: there are n different types of coupons and the collector will receive a random coupon every time, what is the expected trials that the collector can collect all n types of coupons? We can conclude that the expected trails grows as O(nlog⁡n).

The Distribution of Trails for a new Coupon

Let T be the time to collect all n types of coupons, and ti be the time to collect a new coupon after i−1 coupons have been collected. We can see that the probability of ti satisfies geometric distribution with pi=n−(i−1)n, then we conclude

E(ti)=1pi=nn−(i−1)

We also obviously know that ti and tj are independent. Thus

E(T)=∑E(ti)=1p1+1p2+⋯+1pn=nn+nn−1+⋯+n1=n⋅(1n+nn−1+⋯+11)≃nlog⁡n+γn+12+O(1n)

The variance of the random variable T can be calculated as (since ti are independent)

Var(T)=∑Var(ti)=1−p1p12+1−p2p22+⋯+1−pnpn2<=n2n2+n2(n−1)2+⋯+n212<=n2⋅(112+122+⋯+1n2)=n2⋅π26

A classical puzzle that can be modeled as Coupon Collector’s problem is that, for a cubic dice, we are expected to get all six points after how many trails? The answer for the question should be

∑166i≃14.7≃15

Generalization

The Coupon Collector’s Problem has been generalized as the expected trails of collect m copy of each n coupons. When m=2, the problem is also called The Double Dixie Cup Problem1. The expectation of Tm satisfies

E(Tm)=nlog⁡n+(m−1)nlog⁡log⁡n+O(n),asn→∞

In the more general case, when pi is nonuniform, Philippe Flajolet gives2

E(T)=∫0∞(1−∏i=1n1−e−pi⋅t)dt

If the collector receives d coupons each run3, in the asymptotic case the m copy of of each n coupons will be collected expected within time

E(Tm)=nlog⁡n/d+(m−1)(n/d)log⁡log⁡n+O(n⋅m)

The Coupon Collector’s Problem can also be generalized as stochastic process4 5 6. The paper from MAT7 also gives exhaustive analysis of this problem. The expectation can also be expressed using the Stirling numbers8 9.

Coupon Collector's Problem

Coupon Collector's Problem

The Distribution of Trails for a new Coupon

Generalization

References

Recommend

Use vector in ThreadScope to Reduce the Memory Usage

GSoC 2019 Round 1st Evaluation

实时通信 | pusher 如何使用私有频道（四） - Tinywan

End-to-end Argument in System Design

[OtterCTF 2018] - ReCurse Writeup

[OtterCTF 2018] - Birdman’s Data Writeup

长三角首个国产技术算力中心在南京江北新区上线！

星际旅行时把人类送入冬眠仓可行吗

Exploiting the xmlrpc.php on all WordPress versions

Wordpress Sliced Invoices <= 3.8.2 Authenticated SQL Injection

About Joyk