Some comments on integrals involving the Dirac delta function

There are two diﬀerent but equivalent ways of interpreting the integration of the Dirac delta function within the context of the theory of distributions. We discuss both of them and argue that the least known is the most suitable one for handling problems where integrals involving the Dirac delta function appear in physics and engineering. Examples are given to illustrate this point.


Introduction
In a recent contribution to RBEF, Akamu et al. [1] discussed some interesting issues concerning the Dirac delta function δ(x), in particular the value of The authors discussed two possible choices for I, namely I = 1 and I = 1/2, as well as some consequences of each choice. As we can conclude from that discussion, this is not a simple problem, but its discussion is very important since the Dirac delta function is a fundamental object in physics, engineering and mathematics. We would like to make a contribution to this discussion arguing that behind this problem there is the interpretation of equation (1). What does the integration symbol in equation (1) mean? What is its definition? Understanding the meaning of the quantity represented by the integration symbol in the equation (1) is a key point to give an answer about the numerical value of I. We believe that a clear understanding of the meaning of mathematical expressions is a necessary condition not only to work correctly with these expressions but also to be able to properly exploit its use in applications, which of course also involves its use in the teaching in areas where the Dirac delta function is a fundamental object. This is what we want to discuss in this work. We will show that there are two possible ways of interpreting equation (1) in the context of the Schwartz's theory of distributions. Both interpretations give the same result, but one is very similar to the usual definite integral of the ordinary calculus of real variable functions.
In order to present our interpretation of the problem we first need to recall some concepts about the theory * Correspondence email address: vaz@unicamp.br of distributions, which we will do in Section 2. There is no single theory proposed to describe the nature of the Dirac delta function, but the most important and popular one is the Schwartz's theory of distributions, and therefore the focus of our discussion will be on that theory. In Section 3 we will look at a possible interpretation of equation (1) using Schwartz's definition of the Dirac delta function, and analyse its consequences. In Section 4 we will discuss the concept of the value of a distribution at a point. This is the key concept for the introduction of the definite integral of a distribution in Section 5. We will interpret equation (1) and other variations of it as a definite integral and see how the problem in equation (1) can be handled.

Some facts about distributions
We will assume that the reader has some familiarity with the Dirac delta function and we will consider the problem only in one dimension so to avoid overwhelming the notation. In Dirac's words ( [2], p. 58) "δ(x) is not a function according to the usual mathematical definition of a function, which requires a function to have a definite value for each point in its domain, but is something more general, which we may call an 'improper function' ". Schwartz showed that the Dirac delta function could be interpreted as an object called distribution, which is justified since δ(x) can be interpreted as a density of a point. We note in passing that although the denomination Dirac distribution is more appropriate than Dirac delta function, we will keep the later for historical reasons. In order to define distributions, Schwartz started defining a set of functions called test functions with two properties, that is, (i) they are infinitely differentiable functions, and (ii) they have compact support (that is, they are identically null for |x| > R for some finite R) along the real line. The set of test functions over the real line is denoted by D(R).
When we are interested in a open subset U ⊂ R, we denote the space of test functions by D(U ).
A distribution is defined as a continuous linear functional on the space of test functions. Recall that a functional acting on a set associates a number to each member of this set. Thus, if φ ∈ D(R) is a test function and f is a distribution, then the action of f on φ, denoted by (f, φ), is a number (which we assume to be a real number). The space of distributions is denoted by D (R). The Dirac delta function is the distribution δ such that that is, its action on a test function φ gives its value at x = 0. Among the properties of the space of distributions, one should be highlighted: given a sequence of distributions {f n }, if for every φ ∈ D(R) the numerical sequence (f n , φ) converges for n → ∞ to (f, φ), then f is a distribution. This is called the completeness theorem, and we say that f is the weak limit of the sequence {f n }. A proof of this theorem can be found in [3].
This definition of the Dirac delta function, while mathematically sound, is far from intuitive. After Schwartz's approach, Mikusiński [4] presented an alternative formulation to the theory of distributions that turned out to be slightly more intuitive. Given a so called delta sequence [5] {δ n (x)}, we can construct a numerical sequence (δ n , φ) as The quantity on the right hand side (RHS) is an ordinary integral. Although this integral is written as an improper integral, it is in fact an integral along a finite interval because φ(x) has compact support, so its existence represents no issues. A delta sequence is such that Thus, because of the completeness theorem, we can write This expression usually appears as In spite of the integration symbol the expression on the LHS is not an integral. It is just a notation for (δ, φ). It can be thought as the limit of a sequence of integrals as those on the RHS. Using this notation, we can write This is the approach in many textbooks such as [6][7][8].
One important characteristic of a distribution is that they have derivatives of all orders. Considering, for example, the first-order derivative δ (x), it is not difficult to see that where the LHS is understood as Another important concept for us is the one of the support of a distribution. Let U be an open subset of R and consider the set of test functions D(U ). A test function φ 0 ∈ D(U ) can be extended to a test function φ ∈ D(R) by defining Given a distribution f ∈ D (R), we can define its restriction to the space of distributions D (U ), denoted by f 0 , as We The support of the Dirac delta function centred at x 0 = 0 clearly is

What is
Let us first consider the expression Can we look to equation (5) as a particular case of equation (3) by taking φ(x) = 1? At first glance, it appears that the answer must be no. In fact, the function 1 is not a test function (as it has no compact support) and therefore one might say that it makes no sense to write equation (3) with 1 in place of φ(x). However, as discussed in the end of the previous section, the Dirac delta function has a single point support. Thus, we just need to have a test function φ(x) such that φ(x) = 1 for x ∈ (a, b) with a < 0 < b for equation (3) to makes sense. There are many examples of such test functions (see, for example, [3]), and so we conclude that equation (5) holds from the very definition of the Dirac delta function. Now we turn our attention to equation (1). The interpretation that seems natural to us from the perspective of Schwartz's distribution theory is to look to equation (1) in the same way that we interpreted equation (5). So in this case, like in equation (2), the expression in equation (1) should be understood as Let us consider, for example, the well-known delta sequence [6][7][8] We can evaluate the integral on the RHS of equation (6) by writing where is an arbitrary number such that > 0. The and then where we recall that ∞ 0 e −y 2 dy = √ π/2. Thus this result suggests that we have I = 1/2 in equation (1). But this is not all! When working with the Dirac delta function, we surely want to give a numerical value to equation (1), but we also want to work with the delta function doing other operations like, for example, change variables in its argument and calculate its derivatives. In fact, recalling the applications of the Dirac delta function in electromagnetism, while δ(x) is used to model a charged point particle, its derivative δ (x) is used to model a point dipole, its second derivative δ (x) to model a point quadrupole, and so on [9]. Thus we also need to define δ (x), δ (x), etc. Since δ(x) is the weak limit of the delta sequence {δ n (x)}, its derivative δ (x) should be the weak limit of the delta sequence {δ n (x)} like in equation (4), that is, where δ n (x) denotes the ordinary derivative of δ n (x). Integration by parts gives where we used the fact that as a test function there exists R > 0 such that φ(x) = 0 for x > R. Using equation (7), it follows that where we used equation (10) Thus we see that in order to the limit on the RHS of equation (11) to converge, the test function φ(x) must be identically null at x = 0 to avoid the divergence of the term φ(0) lim n→∞ n √ π in equation (12). Using φ(0) = 0 in equation (10) gives It is important to remark that the problem is not related to our choice of delta sequence in equation (7). Our choice was motivated not only because that is one of the main examples of delta sequence, but also to facilitate the analysis of the problem (from equation (8) to equation (10)). Thus, besides being infinitely differentiable, the test functions φ(x) must be such that φ(x) = 0 for x > R and φ(0) = 0, that is, test functions must have compact support [ , R] with > 0. It is appropriate therefore to rewrite the lower limit of integration in equation (13) as 0 + to recall the limit → 0 + . Since equation (13) holds for every φ ∈ D(0, ∞), we conclude that If we interpret the quantity on the RHS of equation (1) as the quantity on the LHS of equation (14), then we obtain that I = 0. In fact, if we look at the role of the test functions, we should not be surprised by the above result. The role of test functions within the Schwartz's theory of distributions is to provide the technical conditions necessary to build a calculus with the distributions, and these conditions are to be infinitely differentiable functions with compact support on the set U where the distributions are defined. In the above case, we are considering the set U as the positive real line, and so test functions must be identically null outside intervals of the form [a, b] with a > 0 and finite b. Therefore, given φ ∈ D(0, ∞) and f ∈ D (0, ∞), we will have (f, φ) = 0 if supp f = {0}, as is the case for the Dirac delta function.
We can think of a way around this problem by considering the test functions in the interval (− , ∞) with > 0. In this setting, instead of equation (6), we can write Using the delta sequence in equation (7), then instead of equation (8) and instead of equation (9), Thus in the limit n → ∞ we have where we used ∞ −∞ e −y 2 dy = √ π. A slightly more compact notation for the above expression is In relation to δ (x), since φ(− ) = 0 an analogous calculation as the one following equation (11) gives So, in conclusion, since there are test functions in the interval (− , ∞) such that φ(x) = 1 for |x| < < , we can write If we interpret the quantity on the RHS of equation (1) as the quantity on the LHS of equation (15), then we obtain that I = 1.
In relation to the above interpretation for ∞ 0 δ(x) dx, it is worthy to recall a similar situation in ordinary differential and integral calculus courses. In a basic course on differential equations, when studying the theory of the Laplace transforms, the student learns that [10] This is a well established result. Indeed, a result other than 1 would give a wrong solution for an initial value problem. As discussed, for example, in [11,12], the Laplace transform in this case has to be interpreted as Then, for s = 0, we obtain equation (15).
So have we solved the problem concerning equation (1)? With the above interpretation, we can say yes. But what if we change the question to whether we are satisfied with the above interpretation? Or else, is there any other possible interpretation for equation (1)? As we will see, yes, there is another interpretation. We will discuss this below.

The value of a distribution at a point
The value of a distribution acting on a test function does not depend on a particular point but on an interval U , which can be the whole real line. In spite of this, Łojasiewicz [13] showed that it is possible to define a quantity that can be identified as the value of a distribution at a point.
Let us recall that, among the operations we can do with a distribution, the change of variable by an affine transformation is well-defined and a standard result in the theory of distributions. Given a distribution f (x), we define the distribution f (ax + b) as [14] ( Łojasiewicz [13] defined the value of a distribution f at a point x 0 as follows: if there exists a constant c such that for every φ ∈ D(R), then c is the value of the distribution f at the point x 0 and write f (x 0 ) = c. We say that f has a jump behaviour [15] if with c − = c + for every φ ∈ D(R). Note that if c − = c + then we recover the definition of Łojasiewicz with c = c − = c + . If we use the limit → 0 − on the LHS of equation (17), then on the RHS we have to use (c − H(x) + c + H(−x), φ). We usually write c ± = f (x 0 ±).

Let us see some examples.
Example 1: the Dirac delta function. We have Recall that there exists R > 0 such that φ(x) = 0 for |x| > R. Thus, if x 0 = 0, then we have for all < |x 0 |/R that φ (−x 0 / ) = 0, and then On the other hand, if x 0 = 0, then we have Thus, according to Łojasiewicz definition in equation (16), the Dirac delta function has a value of 0 at x 0 = 0, but for x 0 = 0 the Łojasiewicz point value does not exist.
Example 2: the Heaviside distribution. Let us consider the Heaviside step function, or better speaking, the Heaviside distribution, given by Then we have In order to proceed we need to distinguish the cases → 0 + and → 0 − . The Heaviside step function Firstly, let us assume that > 0. Then we have Since both limits → 0 + and → 0 − give the same result, we can write and then the point value of the Heaviside distribution at x 0 > 0 is 1 and for x 0 < 0 is 0. This result can be generalized as where f (x) is a smooth function. We leave the proof to the reader.
The case x 0 = 0 deserves special attention. Note that we have and then we obtain Thus from equation (17) we see that H(0+) = 1 and H(0−) = 0. Therefore there is no value of the Heaviside distribution at x 0 = 0. In fact, we recall that the value of the Heaviside step function H(x) at 0 is completely irrelevant for the sake of defining the Heaviside distribution H. Therefore, since there is no single possible value for the Heaviside step function at 0, it should come as no surprise that the process of assigning a value to the Heaviside distribution at x 0 = 0 does not provide a definite value.

Example 3: regular distributions.
For a distribution f associated with an integrable function f (x), we have Let us suppose that f (x) is differentiable. Using Taylor theorem with remainder, we can write where ξ is a number such that ξ ∈ [x 0 , x 0 + x]. Since ξ depends on x, we will write f (ξ) as g (x), that is Then is not differentiable at a point x 0 but the one-sided limits f (x 0 −) and f (x 0 +) exist, then we can make use of equation (17). Let us write Note that on the integral along the negative real axis the argument of the function f in the integrand takes values on the interval (−∞, x 0 ), and on the integral along the positive real axis the argument of the function f on the integrand takes values on the interval (x 0 , ∞). Along the intervals (−∞, x 0 ) and (x 0 , ∞) the function f (x) is by assumption differentiable, so we can use the Taylor expansion as we did above. Proceeding in an analogous way, we have and therefore , and consequently we obtain f (x 0 ) as the value of the distribution f at x 0 . To conclude this section, we would like to comment that there is an alternative and very interesting formulation for the problem of the value of a distribution at a point due to Ferreira. We will not discuss this approach but we recommend the interested reader to reference [16].

The integral of a distribution
Suppose a distribution f has a primitive F , that is, a distribution F such that F = f . If we can define a value of a distribution at a point, then we can define an operation analogous to the definite integral of an ordinary function. Given a distribution f and a primitive if the distribution F has a value F (a) and a value F (b) at points a and b, respectively. The use of a different font to denote the integral of f is obviously intentional as it is a different concept from the one used so far, like in equation (2). Moreover, the use of the integration symbol without reference to the integration variable is quite appropriate in the present case, although this is sometimes also used in the ordinary differential and integral calculus, as for example in [17].
In [13] Łojasiewicz proved that if a distribution has a value at a point a, then its primitive also has a value at that point. So, if f is the distribution associated with an integrable function f (x), and as in Example 3 in the previous section the distribution f has a value f (x 0 ) at a point x 0 , then the primitive F of f will also have a value at point x 0 , which is F (x 0 ), and the integral in equation (19) gives the same numerical result as the integral of the integrable function f (x) in the ordinary calculus.
Let us see some examples of equation (19). where we used the values of the Heaviside distribution as in equation (18). Taking the appropriate limits, we have and Note that, although the quantities in equation (5) and in equation (20) have the same numerical value, they have a different interpretation. In fact, we don't need to worry about interpreting the integral of the delta function for different intervals even taking test functions defined along the whole real line. We need the test functions defined along the entire line to assign a value to the distribution at a point, but after that we only need to worry about the value of the distribution at the given points to calculate the integral of the distribution. In relation to the integral along the positive real line, we have As we have seen in the previous section, the value of the Heaviside distribution at x 0 = 0 is not defined. However, if one accepts to use its average value at x 0 = 0, that is, In our opinion, we see no justification for this choice based on what we have discussed so far, and consequently no alternative other than understand that integral as not defined in the context of the Schwartz's theory of distributions. However, we do not rule out the possibility that such a justification might exist. For example, recalling the Fourier theorem [18], we know that the inverse Fourier transform returns the average value of the original function at a discontinuity point, so from this point of view is natural to use H(0) = 1/2. On the other hand, in Mikusiński's sequential approach to distributions [19,20], where the Dirac delta function is defined as an equivalence class of delta sequences, and not as the weak limit of a delta sequence as in Schwartz's theory, it is possible to attribute a value to the Heaviside distribution at x 0 = 0, which is 1/2 as one might expected. However, this is a different theory, and each analysis have to be done within the scope of each theory. A discussion of Mikusiński theory is completely outside the objectives of this work, and we suggest the interested reader to see [19,20].
Example 2: the filtering property of the Dirac delta function. Let us see how equation (3) fits in the definition of integral in equation (19). Given Recalling that a < 0 and b > 0, we Again we do not need to worry with a and b as long as a < 0 and b > 0, and so

Example 3: integral involving the derivative of the Dirac delta function. Given
We have seen that the value of δ at x 0 = 0 is 0, and then, using equation (23), as expected.

Conclusions
What is the integral of the Dirac delta function in the interval between points a and b? As we saw in this work, there are two ways to interpret this: one as b a δ(x) dx, and the other as b a δ(x) dx.
In Section 3 we discussed the subtleties involved in equation (24). This equation is not like a simple integral of a function. It is the foundation of all Schwartz's theory of distributions. In other words, equation (24) must be seen as a particular case of (δ, φ) = b a δ(x)φ(x) dx.
All the mathematical operations that we do with distributions, we do using the definition of this operation in test functions. So there is a lot more to the equation (24) than the numerical value of the integral of the Dirac delta function in the interval between a and b.
On the hand, in Section 5 we discussed the concept of integral in equation (25) based on the concept of the value of a distribution at a point, discussed in Section 4. With these concepts, the integral in equation (25) is very similar to the ordinary integral of a real variable function. Note that we need the test functions to determine the value of the distributions at the points of interest, but not to define the integral itself, unlike in equation (24). Therefore, the concept of integral in equation (25) appears to us much simpler and much closer to what we already know from ordinary calculus.
Based on what we have discussed in this paper, we advocate the use of the definition of the integral in equation (25) as much simpler and intuitive than equation (24). Although both definitions give the same numerical result, the interpretations are different and in the case of equation (25) it reminds the usual one in ordinary calculus. Using equation (25) we obtained equation (21), and the value for I in equation (1) depends only on how we understand the lower limit of integration. If the lower limit of integration is the limit to 0 from the left then we have I = 1 because H(0−) = 0. On the other hand, if the lower limit of integration is the limit to 0 from the right then we have I = 0 because H(0+) = 0. Since both sided limits are different, it is no surprise that the Łojasiewicz definition of the value of H at x = 0 does not exist. However, if there is a way to attribute a value to the Heaviside distribution at x = 0 (like using the Fourier transform, for example), then we would obtain equation (22), which is a result we do not see how to obtain using the definition in equation (24).