How much is one bit of information gain?

Johannes Buchner
12.06.2012
From Wikipedia on Bayesian experimental design, we take the definition of the utility function from Bayesian inference:
U( ξ )= p( D| ξ )U( D, ξ )dy
And using Shannon information as a measure of information gain:
U( D, ξ ) = log( p( θ |D, ξ ) )p( θ |D, ξ )d θ - log(p( θ ))p( θ )d θ = D KL ( Posterior | Prior )
In this interpretation, Bayesian inference can tell us about the information gain of the knowledge update (going from prior to posterior distribution), measured in bits (when using lo g 2 ) or nats ( ln ).

How much is one bit?

The prior (q) and posterior (p) are expressed as Gaussian functions of the parameter x. Their shape is a fixed σ or k σ .
p(x)= 1 k σ 2 π exp( - x 2 /2/ ( k σ ) 2 ) , q(x)= 1 σ 2 π exp( - x 2 /2/ σ 2 )
The KL-distance is D KL ( P | Q )= - p(x)ln p(x) q(x) [font rm [char d mathalpha]] x , and applied here the distance between prior and posterior is thus:
IG = - 1 k σ 2 π exp(- x 2 /2/( k σ ) 2 )( - x 2 /2/(k σ ) 2 -log(k)+ x 2 /2/ σ 2 ) x = ( k 2 -1 )/2-log(k)

1.1 Sigma versus bits

image: infogain.png
As you see in the above graphic, the more the Gaussian shrinks (uncertainty decreases), the larger the information gain becomes (in bits). This provides an intuitive view of nats/bits for physicists.
1 nat =1/ln(2) bits =1.44 bits .