Sunday, September 14, 2008

Its been a while, but I am still not bad in probability

Few days back, one of my friend who like to solve Data structure problem asked me an interesting probability problem. I thought, ah probability, I am no more good in it. Its around 7 yrs I dint even touch its basics. And I am not in profession to make money by betting in stock market based on the probability of any stock to rise. 

Today is Sunday. I went to park for a walk with school friends and had a fantastic breakfast. I was feeling fresh. I had nothing much to do. Suddenly the thought to try out the problem ran into my mind, and as I was free and in good mood so I thought to give it a try. 

Here Is the problem:

Assume your computer is reading characters one by one from a stream (you don't know the length of the stream before ending). Note that you have only one character of storage space (so you can't save the characters you've read to a something like a strong). When you've finished reading you should return a character out of the stream with equal probability.

Everybody has different approaches but I always go to the route by taking example.

Question ask to select a character from stream in such a way that probability of selecting any element in the stream is same. It is conditioned by the one character extra space that we can use. 

Lets consider that the stream has 5 elements:
a, b, c, d, e

As the stream has 5 elements then everyelemet should have 1/5 probability to get selected.

First solution:
Read a and keep it in buffer. Read next char(b) . Get a random number from random generator, which returns 0 or 1 with equal probability. we have random generators built in in all the languages. If zero (0) comes, don't swap b with a but if 1 comes, swap b with a. Do the same for rest of the characters C, D and E.

Lets see the probability:
  • A: it would be the final choice if random generator returns zero(0) for all other characters in the stream.  p(a) = 1(a) * 1/2(b) * 1/2(c) * 1/2 (d) * (1/2)e = 1/16 
  • B: B final selection depends only on character c, d, E. Not on A. B would be the final choice if random generator returns (1) for b and (0) for characters c, d, e.                                 p(b)= 1/2(b) * 1/2(c) * 1/2 (d) * (1/2)e = 1/16
  • p(c) = 1/2(c) * 1/2 (d) * (1/2)e = 1/8
  • p(d) = 1/2 (d) * (1/2)e = 1/4
  • p(e) = (1/2)e = 1/2
You see, as we go down in the stream, probability increases. And the reason is there are less characters to replace it. E has the highest probability as there is no more character to replace.

This is definitely not required. What we have noticed,
For a, there are 4 more character to replace it if it get selected as final choice.
For b, there are 3 more character to replace it if it get selected as final choice.
..
..
For e, there are no more character to replace it if it get selected as final choice.

Lets modify it. If we our self make selection of a character difficult or less as we go down the stream, we can get the equal probability.

Final solution:
As we go down the stream, decrease the probability of swaping a character.
We can use random function like random (1, n) where the character will be swapped only if it random(1, n) returns n. So its probability will be 1/n.

Now, the probability for each alphabet to get selected for swapping will be
p(swap A) = 1/1
p(swap B) = 1/2, p(do not swap B) = 1 - p(swap B) = 1/2
p(swap C) = 1/3,  p(do not swap C) = 1 - p(swap C) = 2/3
p(swap D) = 1/4,  p(do not swap D) = 1 - p(swap D) = 3/4
p(swap E) = 1/5,  p(do not swap E) = 1 - p(swap E) = 4/5

We are almost done. Now lets calculate the probability of each character to come as random number as final answer from stream.

A: = A is selected as swap. And no other character is selected for swapping.
     = p(swap A) * p(do not swap B) * p(do not swap C) * p(do not swap D) * p(do not swap E)
     = 1/1 * 1/2 * 2/3 * 3/4 * 4/5 = 1/5

B: = p(swap B) * p(do not swap C) * p(do not swap D) * p(do not swap E)
     = 1/2 * 2/3 * 3/4 * 4/5 = 1/5

C: = p(swap C) * p(do not swap D) * p(do not swap E)
     = 1/3 * 3/4 * 4/5 = 1/5

D: = p(swap D) * p(do not swap E)
     = 1/4 * 4/5 = 1/5

E: = p(swap E) 
     = 1/5.

All the characters has equal probablity of getting selected. 

2 comments:

Anonymous said...

very clear explanation!

Anonymous said...

good work..nicely explained :)