The selection of a prior is, in my view, the weakest part of Bayesian inference, so we will sidestep the debate on the correct choice. Rather, let us view the situation as an opportunity, a license to explore the consequences of different priors on the ``true'' maps which emerge. This is easily done by simulation - take a plausible map, Fourier transform, sample with a function so that some information is now missing, and use your favourite prior and maximise ``entropy'' to get a candidate for the true map. It is this kind of study which was responsible for the great initial interest in MEM. Briefly, what MEM seemed to do in simple cases was to eliminate the sidelobes and even resolve pairs of peaks which overlapped in the true map, i.e it was sometimes ``better'' than the original! This last feature is called superresolution, and we will not discuss this in the same spirit of modesty that prompted us to use a CLEAN beam. Unlike CLEAN, MEM did not seem to have a serious problem with extended structure, unless it had a sharp edge (like the image of a planet). In this last case, it was found that MEM actually enhanced the ripples near the edge which were sitting at high brightness levels; though it controlled the ripples which were close to zero intensity. This is perhaps not surprising if one looks at the graph of the function . There is much more to be gained by removing ripples near than at higher values of , since the derivative of the function is higher near .
Fortunately, these empirical studies of the MEM can be backed up by an
analytical/graphical argument due to Ramesh Narayan, which is outlined
below. The full consequences of this viewpoint were developed in a review
article (Annual review of Astronomy and
Astrophysics 24 127 1986), so they will not be elaborated here, but the basic
reasoning is simple and short enough. Take the expression for the entropy,
and differentiate it with respect to the free parameters at our disposal,
namely the unmeasured visibilities, and set to zero for
maximisation. The derivative of the entropy taken with respect to a
visibility is denoted by . The understanding is that
have not been measured. The condition for a maximum is
More properties of the MEM solution are given in the references cited earlier. But one can immediately see that taking the exponential of a function with only a limited range of spatial frequencies (those present in the dirty beam) is going to generate all spatial frequencies, i.e., one is extrapolating and interpolating in the plane. It is also clear that the fitting is a nonlinear operation because of the exponential. Adding two data sets and obtaining the MEM solution will not give the same answer as finding the MEM solution for each separately and adding later! A little thought shows that this is equally true of CLEAN.
If one has a default image in the definition of the entropy function, then the same algebra shows that is the exponential of a band-limited function. This could be desirable. For example, while imaging a planet, if the sharp edge is put into , then the MEM does not have to do so much work in generating new spatial frequencies in the ratio . The spirit is similar to using a window to help CLEAN find sources in the right place.