Optimal Image Quantization, Perception and the Median Cut Algorithm

We study the perceptual problem related to image quantization from an optimization point of view, using different metrics on the color space. A consequence of the results presented is that quantization using histogram equalization provides optimal perceptual results. This fact is well known and widely used but, to our knowledge, a proof has never appeared on the literature of image processing


INTRODUCTION
To represent an image by digital means it is necessary to substitute its continuous range of colors by a finite subset of them.This process is called quantization.If the function f : U → C represents an image with color range in C, the quantization process consists of the choice of a function q : C → Q = {q 1 , . . ., q n }, defined on the color space C and taking values in a finite subset Q ⊂ C. The function q is called a quantizer for f , while Q is the reproducing alphabet.The colors q 1 , . . ., q n are called quantization levels.The subset Q, with known size, can be given a priori or may be part of the quantization problem.
The image quantization problem appears very early and at different stages of image processing.It first appears at the acquisition level and it is a basic process at the coding level.Quantization always implies a perceptual loss of quality of the image, and most of the well known quantization techniques exploit some biological limitations of the human visual system, e.g., spatial acuity, nevertheless they hardly touch some psychophysics aspects of that (Lloyd 1957, Heckbert 1982, Linde et al. 1980).In this work, we use known psychophysics results on the human visual system (Fechner 1858, Stevens 1961, von Helmholtz 1891) to study quantization algorithms that are adapted to the way humans perceive colors.The resulting algorithms will depend on the choice of a distortion measure for the color space.Different choices will lead to different strategies for color quantization.We have designed distortion measures based on the so called psychophysics response function for the visual system and some of them were implemented for comparison.Also, as a consequence of the results presented, we derive the well known result that quantization by histogram equalization has, from the viewpoint of the information theory, optimal perceptual qualities for the resulting quantized image.

BACKGROUND
The problem of image quantization can be easily posed as optimization problem, once we define a metric on the color space and correlate the quantization error to the distribution of colors present in the image.We then look for a quantizer that minimizes the expect error introduced during the process.Before we proceed to the n-dimensional problem, we will present the scalar case, which was introduced by Lloyd (1957).

Optimal Scalar Quantization
When an image f : U → C is quantized an error is introduced between a color x ∈ C and its corresponding quantized value q(x).This punctual error can be measured by the choice of distortion measure in the color space C of the image, see subsection 2.3.Denote this measure by d, the expect value for d(x, q(x)) will provide an measure of the overall impact of the error introduced by the quantization process, where p(x) is the probability density function (pdf) for the colors in the image.The quantizer q is called optimal, when which enables an elegant solution for the case of grayscale images (Lloyd 1957): if x k and x k+1 are the extremes of the quantization cell, then one can obtain, by simple computing derivatives, that the quantization intervals boundaries are given by and the quantization levels are computed from the expression The solution presented above was simplified by two fairly strong hypothesis: the assumption that the colors present in the image have a probability distribution function, and the particular choice for the distortion measure d.
Nevertheless, both hypothesis are not common in practice.Most natural images present large areas with constant colors, which annihilates the hypothesis of the existence of a probability density function.Besides that, the choice for d had no direct relation with the ultimate end of the quantization process, that is, the image will be sought by a human being.Thus, general methods that allow for direct experimentation with humans and that could avoid the use of derivatives are desired.In the following, we present a general class of such methods as well a description of the quantization cells for color images.

Weber's Law
A fundamental aspect of the human visual system is that it does not perceive light intensity continually but in steps.If x k = x k+1 − x k is the lower quantity such that is possible to perceive difference between gray light intensities x k and x k+1 , the Weber's law (Weber 1834, Fechner 1858) states that In consequence, if we start with an intensity value x 0 , the k-th value will be 1+ε) , which shows that the human visual system uses a non-uniform quantization of the color space.
Intuitevely, the Weber's law states that the intensities values x k and x k+1 differs from each other by k units.Therefore, one can model the psychophysics response function by φ(x) = c log(x).The constant value c incorporates a possibly change in the basis of the logarithm.
The psychophysics response function can be derived from the axioms of information theory (Resnikoff 1987).In this context, the only possibilities for φ are: The model φ(x) = c log(x) was proposed by Fechner (1858), while the model φ(x) = cx r was proposed by Stevens (1961).

Distortion Measures
From a general viewpoint, we can use as a distortion measure on a color space C any function When, in addition, we have the measure d is called a metric.
Property 3) is handy since it permits to estimate the overall error in a multi step quantization process from the errors introduced at each step.
Another way to interpret the Weber's law is to think that the unity used to measure distances in the color space changes according with the color, this change being proportional to the inverse of the color intensity.The mathematical concept that abstracts this notion of adaptive change of unity for each point in a surface is the Riemannian metric.
In the context of Riemannian metrics, the Weber's law for color intensities suggests the metric α 2 ( dx x ) 2 for the one-dimensional color space.For the three-dimensional color space, this metric generalizes to which is known as metric of von Helmholtz (1891).In such a metric, the distance between the colors x = (x 1 , x 2 , x 3 ) and y = (y 1 , y 2 , y 3 ) is In consequence, if two color differ only by intensity, i.e., y = r x, one have which is the psychophysics response function proposed by Fechner.
Let's say that the psychophysics function is φ.We can easily construct distortion measures that take φ in account.One such choice is Observe that the particular choice φ = (φ 1 , φ 2 , φ 3 ), where φ i (x) = α i log(x i ) and r = 2 is equivalent to the choice of the von Helmholtz's metric for the three-dimensional color space.

Voronoi Diagram
Given a set of points {x 1 , . . ., x n } ⊂ C, the subsets form together a partition of C called Voronoi diagram, which has an ubiquitous presence in the computational geometry literature due to its good computational properties and many applications.In particular, the are many known efficient algorithms to compute the Voronoi diagram (de Figueiredo and Carvalho 1991, Fortune 1987).As we will see, it also plays a role in the quantization problem.

OPTIMAL QUANTIZATION
In the following we will consider the problem of optimal quantization for a measurable function , where µ 0 is the standard Lebesgue measure for U , see (Fernandez 1976) for details on measure theory.For simplicity, we will suppose µ 0 (U ) = 1, which means that µ will be a measure of probability for C.
The quantization problem, thus, consists of given a distortion measure d, find a quantizer q : C → Q = {q 1 , . . ., q n } that minimizes the expected error It should be remarked that the computation of the quantization levels q j , j = 1, . . ., n, is part of the problem.
For each quantization level q j we define the corresponding quantization cell C j by C j = q −1 (q j ), where q is the quantization function.In other words, C j is the set of all colors in C which are quantized to the level q j .
The problem of computing the quantization levels q j , j = 1, . . ., n and the corresponding quantization cells C j are closely related.If we have the quantization levels in Q, we can compute the quantization cells.In fact, let C i be the cells in the Voronoi diagram for Q, we have Therefore, the Voronoi diagram is the best partition for this choice of the quantization levels.Reciprocally, suppose that we have the quantization cells C j .Thus the best value for q j is certainly From the above properties of the quantization levels and quantization cells, one can design a descendent algorithm to compute the optimal quantizer.The basics for such an algorithm is presented below as Algorithm 1.We note that at each step of the algorithm, the value for E(d, q) decreases.It is possible to show that this algorithm belongs to a class of convergent algorithms, under certain conditions, see (Gray et al. 1979).

Algorithm 1
1: Start with an arbitrary reproducing alphabet Q = {q 1 , . . ., q n } 2: repeat 3: Compute the Voronoi diagram C 1 , . . ., C n for Q 4: Compute the reproducing alphabet Q = {q 1 , . . ., q n } according to equation 13 The followings propositions will be useful for the construction of Table I to be used for implementation purpose.
Proposition 1.Let q : C → Q = {q 1 , . . ., q n } be an optimal quantizer for an image f : U → C and B the common border of the quantization cells C i and C j .Then 1) B is halfway between q i and q j , that is, if x ∈ B, then d(x, q i ) = d(x, q j ).
2) In particular for scalar quantization, if d(x, y) = |φ(x) − φ(y)| r and φ is one-to-one, we have Proof. 1) Since C k are cells of the Voronoi diagram for Q, we have that x ∈ C k if and only if d(x, q k ) ≤ d(x, q s ) for any s = 1, . . ., n.Therefore, if x ∈ B, then d(x, q i ) ≤ d(x, q j ) and d(x, q j ) ≤ d(x, q i ), which shows that d(x, q i ) = d(x, q j ).
2) We may suppose, without loss of generality, that q i = q j , otherwise, we join the cells C i and C j in a single one.Thus, for d(x, y) = |φ(x) − φ(y)| r , it follows that but the second equation implies φ(q i ) = φ(q j ).From the injectivity of φ, we have q i = q j , which has been excluded.Proposition 2. Let q : C → {q 1 , . . ., q n } be an optimal quantizer for an image f : U → C. Denote the quantization cells by C k , then: Proof.To proof the proposition, just observe that (James 1981) mean = arg min  Recurrence formulas for scalar quantization.
For the particular case of scalar quantization, φ maps the median of x on the median of φ(x).Therefore, Proposition 2 presents a noticeable fact: If the distortion measure is given by d(x, y) = |φ(x) − φ(y)|, the quantization level is the median of the quantization interval and therefore does not depend on the function φ.
Table I presents the discrete formulas.They will be used for the implementation to be discussed in Section 4. In this table, z j denotes the color in the image range, p j the total number of the occurrences of the color z j in the cell C k and n k the number of the color occurrences in the cell C k .

Quantization and Information Gain
In this section, we will show that color quantization by histogram equalization provides optimal results in the sense that it maximizes the information retained by the quantization process.In the literature of computer graphics this quantization technique was introduced by P. Heckbert (1982) and is known as the median cut algorithm.Also, it is well known in the area of image processing that a simple histogram equalization improves the perceptual properties of an image.Although the good perceptual qualities of histogram equalization are well known and widely used, to our knowledge, a proof has never appeared on the literature.
Quantization by histogram equalization consists in choosing the quantization cells in such way that each of then contains the same number of colors.This is equivalent to a constant color histogram in the quantized image.For scalar quantization, this means that we should choose the borders of the quantization cells as x −∞ dµ is the accumulated probability distribution of the colors in the image.
Given a measure µ on C and two measurable subsets C 2 ⊂ C 1 of C, the information gain from C 2 with respect to C 1 is defined by (Resnikoff 1987) In particular, when µ is a probability measure, the information gain of a subset C 1 related to the set C is An. Acad.Bras.Cienc., (2001) 73 (3) From now on, we will be concerned only with probabilities measures.Let P = {C 1 , . . ., C n } be a partition of C by measurable subsets, that is, C = ∪ k C k and µ(C i ∩ C j ) = 0, if i = j .Denote by p k = µ(C k ), since µ(C) = 1, we have n k=1 p k = 1.Therefore p k is discrete probability and the expected gain of information or entropy related to this subdivision of C is defined by Intuitively, I is a measure of the information present in a given quantization of an image, with cells C k .Let's refine the partition P by introducing a new cell, that is, we split C k = C k1 ∪ C k2 with µ(C k1 ∩ C k2 ) = 0. Let's call this new partition P and write p k = p k1 + p k2 , where p ki = µ(C ki ).
We have from where,

I (P) ≤ I (P ).
The above equation corresponds to the intuitive notion that the perceptual quality of a quantized image improves if we use more quantization cells.On the other side, each term of the sum in I is a function of the type g(x) = −x log(x), x > 0 and therefore it satisfies lim x→0 g(x) = lim x→1 g(x) = 0.This in turn implies that cells with very high or very small probability will produce a negligible contribution for the information gain.The former possess little information, while the latter have small probability.Therefore, a natural question arises: what are the best partitions for C in order to maximize the information gain?Proposition 3. Let f : U → C be an image and P = {C 1 , . . ., C n } a partition of C. Then the information gain is maximum for p Proof.We want to maximize I = − p k log(p k ) subject to the restriction p k = 1.Therefore, we can apply the Lagrange method searching for the singular points of I .Let L be defined by Differentiating , we find Therefore p k = e λ−1 and using the restriction, we have p k = 1/n.Since I is a concave function, its restriction to the subspace {(p 1 , . . ., p n ) | p k = 1} is also concave, therefore the only critical point is actually the place where I reaches its global maximum.

IMPLEMENTATION
From the formulas in Table I, we see that the expressions for x k and q k are defined in a recursive way.Once we know the values for x 1 , . . ., x n , we can compute the values for q 1 , . . ., q n and vice-verse.This suggests Algorithm 2.

Algorithm 2
1: Start with any values for q 1 , . . ., q n 2: repeat 3: Compute x 0 , . . ., x n+1 according to Table I 4: Compute q 1 , . . ., q n according to Table I 5: until Stop criteria The stop criteria can be the number of iterations or one can halt the algorithm when E(d, q) stops decreasing, in this case some kind of threshold has to be used.The algorithm was implemented by Romildo Silva according to Table I and applied to the well known image of Lena.Originally with 256 colors, the image was quantized to 16, 8, 4 and 2 colors for comparison between different possibilities for d.The algorithm of the median cut was also implemented for comparison.The results are presented in Figures 1 to 5 at the end of the paper.

ACKNOWLEDGMENTS
This research has been developed in the VISGRAF laboratory at IMPA.The laboratory is sponsored by CNPq, FAPERJ, FINEP, and IBM Brasil.C. M. thanks the Institute for Signal Processing, University of Luebeck, where this work has been finished.