蓄水池抽样
This problem can be solved easily. You don't have to use reservoir sampling at all. You just need to:
calculate the length of the list
randomly draw a number from
1 ~ length
(let's sayi
), then return theith
element, and it's done.
But if you want to know more about reservoir sampling, take a looking at the following part.
Problem Desciption
Choose
k
entries fromn
numbers. Make sure each number is selected with the probability ofk/n
This problem is the special case where
k=1
Basic idea
Choose
1, 2, 3, ..., k
first and put them into the reservoir.For the next
k+1
element, pick it with a probability ofk/(k+1)
, and randomly replace a number in the reservoir.Same as above, for the following
k+i
element, pick it with a probability ofk/(k+i)
, and randomly replace a number in the reservoir.Repeat until
k+i
reachesn
Proof
For
k+i
, the probability that it is selected and will replace a number in the reservoir isk/(k+i)
(As we defined above)For a number in the reservoir before (let's say
X
), the probability that it keeps staying in the reservoir isP(X was in the reservoir last time)
×P(X is not replaced by k+i)
=
P(X was in the reservoir last time)
× (1
-P(k+i is selected and replaces X)
)=
k/(k+i-1)
× (1
-k/(k+i)
×1/k
)=
k/(k+i)
When
k+i
reachesn
, the probability of each number staying in the reservoir isk/n
Examples
Example1: Choose 3
numbers from [111, 222, 333, 444]
. Each number should be selected with a probability of 3/4
Choose
[111, 222, 333]
as the initial reserviorThen choose
444
with a probability of3/4
For
111
, it stays with a probability ofP(444 is not selected)
+P(444 is selected but it replaces 222 or 333)
=
1/4
+3/4
*2/3
=
3/4
The same case with
222
and333
Now all the numbers have the probability of
3/4
to be picked
Example2: Choose 1
number from [111, 222, 333]
. Each number should be selected with a probability of 1/3
Choose
[111]
as the initial reserviorNow we iterate to
222
, choose it with the probility of1/2
if it is chosen (
1/2
probility), we replace111
with222
. In this case,222
is picked with the probility of1/2
if it is not chosen (another
1/2
probility),111
stays. In this case,111
stays with the probility of1/2
Combine the above 2 cases, both
111
and222
have the probility of1/2
Now we iterate to
333
, choose it with the probility of1/3
if it is chosen (
1/3
probility), we replace the result of last iteration to333
. In this case,333
is picked with the probility of1/3
if it is not chosen (
2/3
probility), the result of last iteration stays, and in last iteration, both111
and222
have the probility of1/2
. In this round, their probability become2/3 * 1/2 = 1/3
.Now all the nums
111, 222, 333
have the probility of1/3
to be picked
Sample Code
Related Problems
Last updated