next up previous contents
Next: Noise and Residuals Up: Maximum Entropy Previous: Bayesian Statistical Inference   Contents

MEM Images

Descending now from the sublime to aperture synthesis, think of $A$ as the true map and $B$ as the dirty map, or equivalently its Fourier transform, the set of measured visiblilities. We usually want a single map, not a probability distribution of $A$. So we need the further step of maximising $P(A\vert B)$ with respect to $A$. All this is possible if $P(A)$ is available for a given true map $I(l,m)$. One choice, advocated by Gull and Daniell in 1978, was to take

\begin{displaymath}\log P(\{I(l,m)\}) \propto -\int \int I(l,m) \ln I(l,m)~ dl~ dm.\end{displaymath}

The curly brackets around $I$ on the left side are meant to remind us that the entropy is a single number computed from the entire information about the brightness, i.e the whole set of pixel values. Physicists will note that this expression seems inspired by Boltzmann's formula for entropy in statistical mechanics, and communication engineers will see the influence of Shannon's concept of information. It was E.T. Jaynes writing in the Physical Review of 1957 who saw a vision of a unified scheme into which physics, communication theory, and statistical inference would fall (with the last being the most fundamental!). In any case, the term ``entropy'' for the logarithm of the prior distribution of pixel values has stuck. One can see that if the only data given was the total flux, then the entropy as defined above is a maximum when the flux is distributed uniformly over the pixels. This is for the same reason that the Boltzmann entropy is maximised when a gas fills a container uniformly. This is the basis for the oft-heard remark that MEM produces the flattest or most featureless map consistent with the data - a statement we will see requires some qualification. But if one does not want this feature, a modified entropy function which is the integral over the map of $-I\ln (I/I^d)$ is defined. $I^d(l,m)$ is called a ``default image''. One can now check that if only total flux is given the entropy is a maximum for $I\propto I^d$.

The selection of a prior is, in my view, the weakest part of Bayesian inference, so we will sidestep the debate on the correct choice. Rather, let us view the situation as an opportunity, a license to explore the consequences of different priors on the ``true'' maps which emerge. This is easily done by simulation - take a plausible map, Fourier transform, sample with a function $W$ so that some information is now missing, and use your favourite prior and maximise ``entropy'' to get a candidate for the true map. It is this kind of study which was responsible for the great initial interest in MEM. Briefly, what MEM seemed to do in simple cases was to eliminate the sidelobes and even resolve pairs of peaks which overlapped in the true map, i.e it was sometimes ``better'' than the original! This last feature is called superresolution, and we will not discuss this in the same spirit of modesty that prompted us to use a CLEAN beam. Unlike CLEAN, MEM did not seem to have a serious problem with extended structure, unless it had a sharp edge (like the image of a planet). In this last case, it was found that MEM actually enhanced the ripples near the edge which were sitting at high brightness levels; though it controlled the ripples which were close to zero intensity. This is perhaps not surprising if one looks at the graph of the function $=I \ln I$. There is much more to be gained by removing ripples near $I=0$ than at higher values of $I$, since the derivative of the function is higher near $I=0$.

Fortunately, these empirical studies of the MEM can be backed up by an analytical/graphical argument due to Ramesh Narayan, which is outlined below. The full consequences of this viewpoint were developed in a review article (Annual review of Astronomy and Astrophysics 24 127 1986), so they will not be elaborated here, but the basic reasoning is simple and short enough. Take the expression for the entropy, and differentiate it with respect to the free parameters at our disposal, namely the unmeasured visibilities, and set to zero for maximisation. The derivative of the entropy taken with respect to a visibility $V(u',v')$ is denoted by $M(u',v')$. The understanding is that $u',v'$ have not been measured. The condition for a maximum is

M(u',v') =\int\int (-1-\ln ( I(l,m)) \exp(+2\pi i (lu'+mv')~dl~dm=0.\end{displaymath}

This can be interpreted as follows. The logarithm of the brightness is like a dirty map, i.e it has no power at unmeasured baselines, and hence has sidelobes etc. But the brightness $I$ itself is the exponential of this ``band limited function'' (i.e one with limited spatial frequency content). Note first of all that the positivity constraint is nicely implemented- exponentials are positive. Since the exponential varies rather slowly at small values of I, the ripples in the ``baseline'' region between the peaks are suppressed. Conversely, the peaks are sharpened by the steep rise of the exponential function at larger values of $I$. One could even take the extreme point of view that the MEM stands unmasked as a model fitting procedure with sufficient flexibility to handle the cases usually encountered. Högbom and Subrahmanya independently emphasised very early that the entropy is just a penalty function which encourages desirable behaviour and punishes bad features in the map (IAU Colloq. 49, 1978). Subrahmanya's early work on the deconvolution of lunar occultation records at Ooty (TIFR thesis, 1977) was indeed based on such penalties.

More properties of the MEM solution are given in the references cited earlier. But one can immediately see that taking the exponential of a function with only a limited range of spatial frequencies (those present in the dirty beam) is going to generate all spatial frequencies, i.e., one is extrapolating and interpolating in the $u-v$ plane. It is also clear that the fitting is a nonlinear operation because of the exponential. Adding two data sets and obtaining the MEM solution will not give the same answer as finding the MEM solution for each separately and adding later! A little thought shows that this is equally true of CLEAN.

If one has a default image $I^d$ in the definition of the entropy function, then the same algebra shows that $I/I^d$ is the exponential of a band-limited function. This could be desirable. For example, while imaging a planet, if the sharp edge is put into $I^d$, then the MEM does not have to do so much work in generating new spatial frequencies in the ratio $I/I^d$. The spirit is similar to using a window to help CLEAN find sources in the right place.

next up previous contents
Next: Noise and Residuals Up: Maximum Entropy Previous: Bayesian Statistical Inference   Contents