The curly brackets around on the left side are meant to remind us that the entropy is a single number computed from the entire information about the brightness, i.e the whole set of pixel values. Physicists will note that this expression seems inspired by Boltzmann's formula for entropy in statistical mechanics, and communication engineers will see the influence of Shannon's concept of information. It was E.T. Jaynes writing in the Physical Review of 1957 who saw a vision of a unified scheme into which physics, communication theory, and statistical inference would fall (with the last being the most fundamental!). In any case, the term ``entropy'' for the logarithm of the prior distribution of pixel values has stuck. One can see that if the only data given was the total flux, then the entropy as defined above is a maximum when the flux is distributed uniformly over the pixels. This is for the same reason that the Boltzmann entropy is maximised when a gas fills a container uniformly. This is the basis for the oft-heard remark that MEM produces the flattest or most featureless map consistent with the data - a statement we will see requires some qualification. But if one does not want this feature, a modified entropy function which is the integral over the map of is defined. is called a ``default image''. One can now check that if only total flux is given the entropy is a maximum for .

The selection of a prior is, in my view, the weakest part of Bayesian inference, so we will sidestep the debate on the correct choice. Rather, let us view the situation as an opportunity, a license to explore the consequences of different priors on the ``true'' maps which emerge. This is easily done by simulation - take a plausible map, Fourier transform, sample with a function so that some information is now missing, and use your favourite prior and maximise ``entropy'' to get a candidate for the true map. It is this kind of study which was responsible for the great initial interest in MEM. Briefly, what MEM seemed to do in simple cases was to eliminate the sidelobes and even resolve pairs of peaks which overlapped in the true map, i.e it was sometimes ``better'' than the original! This last feature is called superresolution, and we will not discuss this in the same spirit of modesty that prompted us to use a CLEAN beam. Unlike CLEAN, MEM did not seem to have a serious problem with extended structure, unless it had a sharp edge (like the image of a planet). In this last case, it was found that MEM actually enhanced the ripples near the edge which were sitting at high brightness levels; though it controlled the ripples which were close to zero intensity. This is perhaps not surprising if one looks at the graph of the function . There is much more to be gained by removing ripples near than at higher values of , since the derivative of the function is higher near .

Fortunately, these empirical studies of the MEM can be backed up by an
analytical/graphical argument due to Ramesh Narayan, which is outlined
below. The full consequences of this viewpoint were developed in a review
article (Annual review of Astronomy and
Astrophysics 24 127 1986), so they will not be elaborated here, but the basic
reasoning is simple and short enough. Take the expression for the entropy,
and differentiate it with respect to the free parameters at our disposal,
namely the *un*measured visibilities, and set to zero for
maximisation. The derivative of the entropy taken with respect to a
visibility is denoted by . The understanding is that
have *not* been measured. The condition for a maximum is

This can be interpreted as follows. The

More properties of the MEM solution are given in the references cited earlier. But one can immediately see that taking the exponential of a function with only a limited range of spatial frequencies (those present in the dirty beam) is going to generate all spatial frequencies, i.e., one is extrapolating and interpolating in the plane. It is also clear that the fitting is a nonlinear operation because of the exponential. Adding two data sets and obtaining the MEM solution will not give the same answer as finding the MEM solution for each separately and adding later! A little thought shows that this is equally true of CLEAN.

If one has a default image in the definition of the entropy function, then the same algebra shows that is the exponential of a band-limited function. This could be desirable. For example, while imaging a planet, if the sharp edge is put into , then the MEM does not have to do so much work in generating new spatial frequencies in the ratio . The spirit is similar to using a window to help CLEAN find sources in the right place.