|
1.INTRODUCTIONIn recent years, the advent of new sensors, measurement technologies, and social network infrastructure has provided huge opportunities to visualize complicated interconnected network structures, record data of interest at various locations in such networks, analyze such data, and make inferences and diagnostics. We can easily observe such network- based problems in truly diverse fields: biology and medicine (e.g., voltages at dendrites connected to neurons, blood flow rates in a network of blood vessels);computer science (e.g., Internet traffic, email correspondences among user accounts);electrical engineering (e.g., sensor networks);hydrology and geology (e.g., river flow measurements in a ramified river network);and civil engineering (e.g., traffic flow on a road network), to name just a few. Consequently, there is an explosion of interest and demand to analyze data sampled on such irregular grids, graphs, and networks, as evidenced by many recent special issues of journals.1–4 What about mathematical and computational tools for analyzing such datasets? Traditional harmonic analysis tools such as Fourier and wavelet transforms have been the ‘crown jewels’ for analyzing regularly-sampled data. They have found widespread use in a variety of applications, some of the most common being data compression, image analysis, and statistical signal processing. However, these conventional harmonic analysis tools originally developed for functions on simple Euclidean domains (e.g., a rectangle) or signals sampled on regular lattices cannot directly handle datasets recorded on general graphs and networks. Hence, the community of applied and computational harmonic analysts has recognized the importance of transferring these tools to the graph setting, resulting in many efforts to extend classical wavelets to the ever-expanding realm of data on graphs.5–14 A fundamental difficulty in extending wavelets to the graph setting is that we lack a true notion of frequency. Indeed, much of classical signal processing relies on our ability to view a signal through two complementary lenses: time and frequency. Without a notion of frequency, we cannot apply the Littlewood-Paley theory (i.e., the dyadic partitioning of the frequency domain), which is the theoretical foundation of classical wavelets. Therefore, a common strategy has been to develop wavelet- like transforms rather than trying to directly transfer classical wavelets to the graph setting. We will review some of these transforms in Section 2.2. The objective of this article is twofold. First, we review some basic concepts for understanding data analysis on graphs and the challenges that lie therein. Second, we introduce our own contributions to harmonic analysis on graphs: generalizations of overcomplete, multiscale dictionaries from classical signal processing, including wavelet packets and local trigonometric transforms. We conclude with some applications. 2.NOTATION AND REVIEWSIn this section, we will first establish our notation and define some important quantities which will be used throughout this article. Next, we will review some of the recent wavelet-like transforms proposed by other groups, and discuss the fundamental problem of the lack of a true frequency domain in the graph setting. Finally, we will briefly review classical dictionaries of multiscale bases and the corresponding concept of a “best basis.” 2.1Laplacians on graphsLet us fix our notation and basic concepts in graph theory first. We mostly follow the commonly used notation; see e.g., Ref. 15 for undirected graphs. Let G = (V, E) be a graph, where V = V (G) = {v1, v2,… vn} is the vertex set of G and E = E(G) = {e1, e2,…, em} is its edge set, where ek connects two vertices vi, vj for some 1 ⩽ i, j ⩽ n. We only deal with finite n and m in this article. For simplicity, we often write i instead of vi.A subgraph H of a graph G is a graph such that V (H) ⊆ V (G) and E (H) ⊆ E (G). An edge connecting a vertex i ∈ V and itself is called a loop. If there exists more than one edge connecting some i, j ∈ V, then they are called multiple edges. A graph having loops or multiple edges is called a multiple graph (or multigraph); a graph with neither of these is called a simple graph. An edge e ∈ E connecting two distinct vertices i, j ∈ V may or may not have a direction. If e is directed from i to j, then we write e = (i, j), and i, j are called the tail and the head of e, respectively. A directed edge is also referred to as an arc. If e is an undirected edge between i and j, then we write e = ij, and i, j are called the endpoints of e. We also say an edge e = i j is incident with i and j, and e joins i and j. If e = i j, then i, j are said to be adjacent and we write i ~ j. If e = (i, j), then i is adjacent to j, but not vice versa, and we write i ↝ j. If each edge in E is directed, G is called a directed graph or digraph. An undirected graph is a graph where none of its edges has a direction. A mixed graph is a graph with some directed and some undirected edges. If each edge e ∈ E has a weight (normally nonnegative), written as we, then G is called a weighted graph. G is said to be unweighted if we = 1 for each e ∈ E. For some problems edge weights are given while for the other problems one needs to define edge weights carefully. A path from i to j in a graph G is a subgraph of G formed by taking a sequence of distinct vertices, starting with i and ending with j, and a minimal set of edges such that consecutive vertices are adjacent. A path starting from i that returns to i (but is not a loop) is called a cycle. For any two vertices in V, if there is a path connecting them, then such a graph is said to be connected. In the case of a directed graph, it is said to be strongly connected. A tree is a connected graph without cycles, and is often denoted by T instead of G. For a tree T, we have | E(T) | = | V(T) | −1, where |·| denotes the cardinality of a set. We say the length (or cost)ℓ(P) of a path P is the sum of its corresponding edge weights, i.e., ℓ(P) := Σe∈E(P) we. Let Pij be the set of all possible paths from i to j in G. The graph distance from i to j is defined by . Clearly, for an undirected graph, we always have d (i, j) = d(j, i), but that is not the case for a directed graph in general. We say two graphs are isomorphic if there is a one-to-one correspondence between the vertex sets such that if two vertices are joined by an edge in one graph, the corresponding vertices are also joined by an edge in the other graph. Let us now define a few important matrices associated with a given graph G. The (weighted) adjacency matrix of G, denoted by W (G) = (wij) ∈ ℝn×n, is a matrix consisting of the edge weights in G. Note that wi,j = 0 if ij ∉ E (G) (undirected) or (i, j) ∉ E(G) (directed). It is clear that W(G) is symmetric if G is undirected. The degree of vertex i is defined as , i.e., the ith row sum of W(G). If G is directed, then the above definition of the degree is also called the outdegree of vertex i, denoted by dout(i). In that case, one can also define the indegree of vertex i by , i.e., the ith column sum of W(G). Occasionally, we need the unweighted versions of the adjacency matrix and degrees of a weighted graph. These versions are often called the combinatorial adjacency matrix and combinatorial (out)degrees, denoted by A(G) and , respectively. In other words, Aij (G) = 1 if Wij (G) > 0, otherwise Aij (G) = 0, and is the ith row sum of A(G), which is the number of incident edges with i. The combinatorial indegree of i is the ith column sum of A(G). Hence, the earlier definition of d(i) = di is also referred to as a weighted degree. The degree matrix D (G) of G is the diagonal matrix diag(d1,…, dn). Sometimes, each vertex is associated with spatial coordinates in ℝp. For example, if we want to analyze a network of sensors and build a graph whose vertices represent the sensors under consideration, then these vertices have spatial coordinates in ℝ2 or ℝ2 indicating their current locations. In that case, we write xi ∈ ℝp for the location of vertex i. Let f = (f (1),…, f (n))T ∈ ℝn be a data vector, where f (i) e R is the value measured at the vertex i of the graph. Let us also define 1n := (1,…, 1)T ∈ ℝn for future use. Similarly, for a subset of the vertices X ⊆ V (G), we define 1x to be the vector which is one at all positions corresponding to vertices in X and zero elsewhere. In order to analyze a graph, we often utilize the eigenvalues and eigenvectors of its (unnormalized) Laplacian matrix, defined as Two other versions of the Laplacian matrix which are often used in literature include the random-walk normalized Laplacian and the symmetrically normalized Laplacian: Which version should be used depends on the application. For example, von Luxburg16 recommends the use of Lrw for graph partitioning. We let λ0 ≤ λ1 ≤ … ≤ λn−1 be the sorted eigenvalues of L(G), and we denote by ϕ0, ϕ1,…, ϕn−1 their corresponding eigenvectors. Let mG (λ) denote the multiplicity of λ as an eigenvalue of L(G), and let mG (I) be the number of eigenvalues of L(G), multiplicities included, that belong to I ⊂ ℓ, an interval of the real line. The corresponding quantities for Lrw(G) are denoted by the superscript rw, i.e., , and , and similarly for Lsym. It is easy to show that and . (See Ref. 16 for additional details of the relationship between these three matrices and their spectral properties.) The Laplacian eigenvalues are useful because they reflect various intrinsic topological and geometric features of the graph (e.g., connectivity, the maximum distance over all pairs of vertices, mean distance, and the Cheeger constant,17 etc.). Some simple yet important facts about the Laplacian eigenvalues are the following: for each of these three Laplacian matrices, all the eigenvalues are nonnegative, the smallest eigenvalue λ0 = 0, and mG (0) ≥ 1 indicates the number of connected components of G. For L and Lrw, the zeroth eigenvector is constant: and . Moreover, the smallest positive eigenvalue of the unnormalized Laplacian L(G) is called the algebraic connectivity of G and serves as a quantitative measure of graph connectivity.18–20 For each of the three Laplacian matrices, the eigenvector corresponding to the smallest nonzero eigenvalue is referred to as the Fiedler vector, and it plays a key role in spectral clustering and spectral graph partitioning.16,21 (See Sections 3 and 4 for our use of the Fiedler vectors of graphs.) The full set of Laplacian eigenvectors provides us with additional information for graph partitioning via the celebrated Courant nodal domain theorem.22,23 Moreover, it also provides an orthonormal basis which can be used for representation, approximation, and analysis of data on G. The simple path Pn consisting of n vertices, shown in Figure 1, provides an important insight for the development of our new multiscale transforms. The graph Laplacian of such a path graph can be easily obtained and is instructive: As pointed out in Ref. 24, the eigenvectors of this matrix are nothing but the DCT Type II basis vectors, which are used in the JPEG image compression standard;see e.g., Ref. 25 for more about the DCT. In fact, we have where ϕk = (ϕk(1),…, ϕk(n))T is the eigenvector corresponding to λk, and ak;n is a normalization constant that ensures ‖ ϕk ‖2 = 1 for k = 0,1,…, n −1. From (4), it is clear that these eigenvectors are simple global oscillations and do not reveal wavelet-like localizations. In addition, it is straightforward to show that the eigenvectors of Lsym (Pn) are the DCT Type I basis vectors. Also, for the unweighted cycle Cn consisting of n vertices, it is easy to notice that the eigenvectors of L(Cn) are exactly the n-point DFT basis vectors. Finally, for a graph which is the Cartesian product of two or more graphs, the Laplacian eigenvectors are the tensor product of the eigenvectors of those underlying graphs. 2.2.Multiscale wavelet-like basis functions on graphsFor both the development and theory of conventional wavelets on regular lattices and rectangular domains in ℝp, the Fourier series and transforms on those domains have played a significant role;see e.g., Ref. 26 (Chap. 2) and the references therein. Hence, when attempting to develop wavelet theory for graphs and networks, some researchers have used Laplacian eigenvalues and eigenfunctions in place of the frequencies and complex exponentials, respectively, used in classical Fourier theory. While tempting to do so, there are at least two fundamental problems in viewing the Laplacian eigenfunctions as the equivalent of the Fourier basis functions for building wavelet-like multiscale basis functions. First, it is difficult to know the essential support of the Laplacian eigenvectors a priori, which strongly depends on the structure of the graph: sometimes they are completely global, like those of Pn, whereas in other cases they may be quite localized, e.g., on dendritic trees of neurons.24,27 Hence, it is worth controlling the support of the eigenvectors explicitly. (In fact, this observation has led us to our HGLET construction using recursive graph partitioning, as discussed in Section 3.) The second problem of viewing the Laplacian eigenfunctions as the equivalent of the Fourier basis functions is the intricate relationship between the frequencies and the Laplacian eigenvalues. For very simple 1-D graphs such as Pn and Cn, the eigenvectors are the Fourier basis vectors and the eigenvalues are a nondecreasing function of their corresponding frequencies, as shown in (4) in the case of Pn. Consequently on Pn as well as Cn, we can develop the classical wavelets using the Littlewood-Paley theory [26, Chap. 2] by appropriately partitioning the eigenvalue axis into blocks and combining the corresponding eigenfunctions. However, as soon as the domain becomes even slightly more complicated, the situation completely changes: we cannot view the eigenvalues as a simple monotonic function of frequency anymore. For example, consider a long but thin strip in ℝ2, and suppose that the domain is discretized as Pm × Pn (m > n). In this case, using (4), its Laplacian eigenpairs are: where k = 0,…, mn − 1; kx = 0,…, m − 1;ky = 0,…, n − 1;x = 1,…, m; and y = 1,…, n. As always, let {λk} be ordered in the nondecreasing manner. In this case, the smallest eigenvalue is clearly λ0 = λ(0,0) = 0, and the corresponding eigenvector is constant. The second smallest eigenvalue λ1 is λ(1,0) = 4sin2(π/2m), since π/2m < π/2n, and its eigenvector has one oscillation in the x-direction. But, how about λ2? Even for such a simple situation there are several possibilities for λ2, depending on m and n. If m > 2n, then λ2 = λ(2,0) < λ(0>1). On the other hand, if n < m < 2n, then λ2 = λ(0>1) < λ(2,0). More generally, if Kn < m < (K + 1)n for some K ∈ ℕ, then λk = λ(k,0) = 4sin2(kπ/2m) for k = 0,…,K. Yet we have λK+1 = λ(0>1) = 4sin2(π/2n) and λK+2 = λ(K +1,0) = 4sin2((K + 1)π/2m). As one can see from this, the mapping between k and (kx, ky) is quite nontrivial. Notice that ϕ(k,0) has k oscillations in the x-direction whereas ϕ(0>1) has only one oscillation in the y-direction. In other words, all of a sudden the eigenvalue of a completely different type of oscillation sneaks into the eigenvalue sequence. Hence, on a general domain or a general graph, by simply looking at the Laplacian eigenvalue sequence {λk}k=0,1,…, it is almost impossible to organize the eigenpairs into physically meaningful dyadic blocks and apply the Littlewood-Paley approach unless the underlying domain is of very simple nature, e.g., Pn or Cn. For complicated domains, the notion of “frequency” is not well-defined anymore, and thus wavelet construction methods which rely on the Littlewood-Paley theory, such as the spectral graph wavelet transform (SGWT),11 may lead to unexpected problems on general graphs. We also note that the notion of “translations” or “shifts” on general graphs is not well-defined either despite some effort to generalize it.28 How about wavelets that rely less on the Fourier/Littlewood-Paley theory? A number of efforts have been made to build Haar-like wavelets on graphs and networks.7,8,10,29,30 Sharon and Shkolnisky use a subset of Laplacian eigenvectors of subgraphs of a given graph to construct construct a multiresolution analysis and consequently multiwavelet bases, which can include the Haar wavelet basis.14 Coifman et al. take a different approach, using the diffusion/random walk on a graph to build diffusion wavelets5 and diffusion wavelet packets.6 Jansen et al. developed a wavelet-like transform for signals on graphs by generalizing the classical lifting scheme.9 Rustamov recently constructed two different wavelet-like transforms on graphs. The first one12 is based upon the average-interpolation wavelets and also uses a top-down partitioning. The second one13 used the lifting scheme whose update and prediction operators are adaptively learned from a given set of signals so that the resulting wavelet coefficients of a signal belonging to the same signal class become sparse. As we will describe in the next sections, we have developed the so-called Hierarchical Graph Laplacian Eigen Transforms (HGLET) and Generalized Haar-Walsh Transform (GHWT) for undirected graphs. We can say these are complementary to the previously proposed methods listed above. For example, our algorithms utilize a top-down recursive graph partitioning to construct the wavelet-like basis functions, while the others5–8 utilize the bottom-up approach. We do not use the lifting scheme9,13 nor the average-interpolation wavelet scheme.12 Perhaps, our HGLET and GHWT are most closely related to the construction by Szlam et al.,30 which uses the top-down partitioning to build the Haar-like basis using the average and difference operations on subgraphs, and the local cosine dictionary using the graph/manifold version of the folding and unfolding operators initially proposed by Coifman and Meyer for functions on the real line (or on the regular 1D lattice);31 see also Ref’s. 26 (Chap. 6), 32 (Chap. 8), 33 (Chap. 4). Unfortunately, based on our considerable experience with the local cosine dictionary on regular lattices,34 we can predict that such generalized local cosines may not work well in practice. In fact, the direct use of such folding/unfolding operations in the graph setting may be unnecessary or even harmful for many applications. As we will show later in Section 4, our Generalized Haar-Walsh Transforms go much further than the scope of Ref. 30. 2.3.Basis dictionaries and best bases for regularly-sampled signalsIn this subsection, we will briefly review the so-called basis dictionaries for signals sampled on regular lattices and the concept of best basis selection from such dictionaries. For the details, we refer to Ref’s. 33,35’39 as well as Ref’s. 32 (Chap. 8) and 26 (Chap. 6, 7). In this article, we associate with a basis dictionary a binary tree whose nodes are subspaces of with different localization characteristics in the original domain and in the “transformed” domain, as shown in Figure 2. Examples of such dictionaries previously developed for signals sampled on regular lattices include: the wavelet packet transform dictionary (WPT), block discrete cosine transform dictionary (BDCT), and local trigonometric/Fourier transform dictionary (LTT). It costs O(n[log n]p) to generate a dictionary for a signal of length n, i.e., to compute the inner products between such a signal and each element in the dictionary, where p = 1 for WPT and p = 2 for BDCT/LTT. We would like to point out that for practical implementation of these transform algorithms, the signal length n must be a dyadic number, i.e., for some n0 ∈ ℕ. On the other hand, our generalizations of these dictionaries to the graph setting, which will be described in Sections 3 and 4 in detail, do not have any such restriction on n; i.e., an arbitrary positive integer will be allowed for n. Now, each dictionary contains up to n(1 + log2 n) basis vectors, but it also contains more than 2n/2 possible orthonormal bases from which to choose. Hence, a big question is how to select the best possible basis for a given task at hand (e.g., compression, denoising, classification, regression, etc.). To answer this question, Coifman and Wicker- hauser proposed the best basis algorithm for the purpose of signal compression and denoising.35 Later, Saito, Coif- man, and their groups generalized the original best basis algorithm for classification and regression problems.36–39 Let us define a cost functional ℒ such that (i) the lower the cost, the better and (ii) the cost functional is additive: . As seen in Fig. 2, each non-leaf node has two children nodes, and . The best basis algorithm essentially boils down to the following question: If the left-hand side is smaller or equal to the right-hand side, then we keep the parent node and discard these two children nodes. Otherwise, we replace the parent node by the union of its two children nodes. The actual algorithm starts from the bottom level of the binary tree until it reaches to the top level j = 0. The end result, i.e., the union of the retained nodes/subspaces, is called the best basis w.r.t. the cost functional ℒ. For various examples and success stories, we refer to the above references. In this article, we will generalize such basis dictionaries and the best basis selection algorithm for graphs and networks, as will be discussed in Sections 3 and 4. 3.HIERARCHICAL GRAPH LAPLACIAN EIGEN TRANSFORMSIn this section, we describe the Hierarchical Graph Laplacian Eigen Transform (HGLET), which we introduced in Ref. 40. This transform recursively partitions the graph into a binary tree using Fiedler vectors of subgraphs, and then computes the graph Laplacian eigenvectors of each subgraph. Similar to the wavelet packet and local trigonometric dictionaries for regularly-sampled signals, this dictionary of bases on the graph allows one to select an orthonormal basis that is most suitable for one’s task at hand using a best basis type algorithm. See Ref. 40 for the full details. Below, for simplicity, we describe our algorithm using the unnormalized graph Laplacian matrix of an input graph as well as those of its subgraphs. However, one can use Lrw or Lsym of L instead of L, as will be discussed after we present the algorithm. Given a connected graph G (which may be weighted or unweighted), the first step of the HGLET algorithm computes the complete set of eigenvectors of L(G): ϕ0, ϕ1,…, ϕn−1 with corresponding eigenvalues 0 = λ0 < λ1 ≤ … ≤ λn−1. Then we partition the graph into two disjoint subgraphs according to the signs of the entries in the Fiedler vector, ϕ1. Partitioning the graph in this manner is supported by the theory discussed in Ref. 21. We repeat this process recursively on each such subgraph. In order to state this process more precisely, let us introduce notation which is more appropriate for such multiscale transforms. Instead of the notation used to denote coefficient subspaces for classical signals in the previous section, we will use the notation for subgraphs in our graph setting. Let , andlet , be the sorted eigenpairs of , with j denoting the level (or depth) of the partition, k denoting the region (i.e., subgraph) number on level j, and l indexing the eigenvectors for region k on level j. Let Kj be the number of regions at level j. Hence, , and and are those two disjoint subgraphs of G obtained by the above partitioning process; note that but . The whole process cannowbe summarized as follows: Algorithm 1 (HGLET) Step 0: Set and initialize K0 = 1 and K1 = 0;set j = 0 and k = 0. Step 1: Construct the Laplacian matrix . Step 2: Compute its eigenvectors, . Step 3: If , then partition by the sign of the Fiedler vector into and set , and , and Kj+1 = Kj+1 + 2 ; else set , and Kj+1 = Kj+1 + 1. Step 4: If k + 1 < Kj, then set k = k + 1 and go back to Step 1;else go to Step 5. Step 5: If for k = 0,…, Kj+1 − 1, then finish;else set j = j + 1, k = 0, Kj+1 = 0, and go back to Step 1. Several remarks on this algorithm are in order.
Figure 3 shows some HGLET eigenvectors computed on the Minnesota road network. 4.GENERALIZED HAAR-WALSH TRANSFORMWe now introduce our Generalized Haar-Walsh Transform. As with the HGLET, the two main steps in this transform are 1) recursively partitioning the graph and 2) generating a full orthonormal basis on each level of the graph partitioning. Our notation for the basis vectors is , where j ∈ [0, jmax] denotes the level, k ∈ [0, Kj) denotes the index of the region on level j, and l ∈ [0,2jmax− j) denotes the tag of the basis vector. A basis vector’s tag is an integer which, when expressed in binary, specifies the sequence of averaging and differencing operations that were used to generate it. Within a given region k on level j, the tags are never duplicated, and thus they serve as unique identifiers for the basis vectors within the region. We refer to basis vectors with tag l = 0 as scaling vectors, those with tag l = 1 as Haar-like vectors, and those with tag l ≥ 2 as Walsh-like vectors. The GHWT algorithm can be summarized as follows: Algorithm 2 (GHWT) Step 0: Generate a full recursive partitioning of the graph, as described in Section 3. This yields a set of regions , with 0 ≤ j ≤ jmax and 0 ≤ k ≤ Kj. Step 1: Generate an orthonormal basis on level jmax. Since each region contains a single vertex, we simply have . Step 2: For j = jmax,…, 1, use the orthonor al basis on level j to generate an orthonor al basis on level j − 1 as follows. For k = 0,…,Kj−1 − 1: Step 2a: Compute the scaling vector on as Step 2b: If , then compute the Haar-like vector on as Step 2c: If , then compute the Walsh-like vectors on . For The result is an overcomplete dictionary of orthonormal bases, each of which we view as an orthogonal matrix in ℝn×n. The overall cost of generating these bases is O(n2), which is due to forming a dense n × n matrix via simple arithmetic operations. If we simply wish to expand a signal on the graph (i.e., f E Rn) in terms of these bases, the cost is O(nlog n). This is done by expanding the signal in terms of the basis on the level jmax (which merely amounts to reordering the signal), and then performing the operations in (6)–(10) on the coefficients rather than the basis vectors. Fig. 4 shows some GHWT basis vectors on the MN road network. At this point we make several observations about the GHWT. First, note that the GHWT basis vectors on each level are localized;i.e., their support does not extend beyond the region to which they correspond. This is due to the way in which the basis vectors are defined on the finest level (where each region contains a single vertex) in Step 1, and to how the basis vectors on regions containing multiple vertices are formed by taking linear combinations of basis vectors on their two subregions in Step 2. Therefore, the bases corresponding to subgraphs and , will be disjoint if . Furthermore, if region is divided into subregions and , then the basis vectors corresponding to will span the same space as the union of those corresponding to and . By making use of these properties, we can select an orthonormal basis containing basis vectors corresponding to multiple levels of the graph partitioning. To demonstrate these points, we use the simple example of P6; that is, the unweighted path graph of length six. We group the basis vectors by region and arrange them as in Fig. 5a. This illustrates both the manner in which the graph is recursively partitioned and how the span of each block of basis vectors includes the span of all blocks of basis vectors directly beneath it. We refer to this ordering of basis vectors as the coarse-to-fine dictionary. In addition to grouping basis vectors by their corresponding region, we can also group them by their tag, l, and we call this the fine-to-coarse dictionary. From Step 2, note that the basis vectors on level j with tag l are used to generate the basis vectors on level j − 1 with tags 2l and 2l + 1. Therefore, the vectors span the same space as the vectors . Exploiting this relationship affords us more options for selecting a basis. Again, we use P6 as an example. Reversing the order of the levels in Fig. 5a and grouping the basis vectors by tag, we obtain Fig. 5b. As with the coarse-to-fine dictionary, the span of each block of basis vectors in the fine-to-coarse dictionary includes the span of all blocks of basis vectors directly beneath it. However, notice that the structure of the groups/blocks differs between the coarse-to-fine and fine-to-coarse dictionaries. Not only do the tags allow us to regroup the basis vectors, thereby providing more choosable bases, they also impart upon the basis vectors an approximate notion of frequency. From (6), we see that the scaling vectors (l = 0) are constant on their support. From (7), we see that the Haar-like vectors (l = 1) assume exactly two distinct values on their support. And from (8)–(10), we see that the tags of the Walsh-like vectors (l ≥ 2) specify the sequence of average and difference operations by which they were generated. Generally speaking, larger l values indicate more oscillation, with exceptions occurring when imbalances in the partitioning necessitate the use of (8), as opposed to (9) and (10). 5.BEST BASIS ALGORITHMOur multiscale basis dictionaries, i.e., the HGLET and the two arrangements of the GHWT, allow us to generalize the best basis algorithm of Coifman-Wickerhauser35 to the graph setting in a straightforward manner. We first specify a cost functional ℒ. For example, if our objective is efficient approximation of the signal, we would choose a cost functional which favors sparse representations (e.g., ‖ · ‖1). For the HGLET and the coarse-to-fine GHWT, we initialize the best basis as the union of the bases on the bottom level of the dictionary (j = jmax) and we proceed upwards one level at a time, using ℒ to compare the cost of a block of basis vectors to the cost of its descendant basis vectors in the current best basis and updating the best basis when necessary. Upon completion, this search yields the coarse-to-fine best basis. For the fine-to-coarse GHWT dictionary, we start at the bottom level (j = 0) and proceed upwards until we have obtained the fine-to-coarse best basis. In the case of the HGLET, each of the three graph Laplacians (L, Lrw, and Lsym) leads to a different best basis. In fact, the three HGLET dictionaries and the GHWT coarse-to-fine dictionary all conform to the same hierarchical structure. We can take advantage of this by choosing different transforms on the various regions. The result of searching among these four dictionaries is a “hybrid” best basis which is, by construction, orthonormal. 6.SIMULTANEOUS SEGMENTATION AND DENOISING OF CLASSICAL 1-D SIGNALSIn Ref. 44 we presented our preliminary denoising results using the HGLET and GHWT in conjunction with the best basis algorithm. In this article, we elect to showcase our recent results in which we use an iterative procedure involving the HGLET and the best basis algorithm to simultaneously segment and denoise classical 1-dimensional signals, which we treat as signals on unweighted path graphs. This experiment is in the same spirit as Ref. 45, but in place of the polyharmonic local sine transform we use our HGLET. Doing so affords us more flexibility in our segmentation, as we no longer have to work within a dyadic constraint on the segment lengths. The objective here is three-fold: 1) to divide the signal into segments of similar characteristics; 2) to reduce the noise in the signal; and 3) to achieve better approximation and compression of the underlying signal. As our cost functional for the best basis algorithm, we use the minimum description length (MDL) criterion, which chooses between two or more models for input data by considering their costs in terms of bits.46,47 Specifically, for each model we compute the number of bits needed to encode the model parameters and quantized expansion coefficients of the input data that are necessary to fit the model to the data. In accordance with the MDL, we then select the model that best captures the nature of the underlying signal in the input data using the fewest bits. Thus, the MDL provides an intelligible, objective, parameter-free means of choosing between competing models. Furthermore, the resulting quantized coefficients can be used to denoise the signal since the MDL automatically selects the precision and threshold that best capture the noise-free portion of the signal. The first step in our algorithm is to recursively partition the path graph. The ordering of the vertices is preserved, and so each subgraph is associated with an integer interval , where n is the length of the signal. Next, we analyze the graph using the HGLET with the eigenvectors of each of the three Laplacians, which are the variants of the DCTs as we explained earlier. From the resulting three dictionaries, we select a hybrid best basis using the best basis algorithm equipped with the MDL criterion as its cost functional. In our experiments, the model parameters that the MDL determines are: (i) the segmentation configuration of the signal (i.e., the set of disjoint intervals such that , which we quantize via the levels list description method33) and (ii) a flag to specify the HGLET variation used for each segment. Thus, by using the MDL cost functional to perform the best basis search, we are searching for the segmentation whose structure is choosable from the current partitioning tree that allows us to most efficiently represent the signal. The MDL-guided best basis search yields two outputs: a segmentation of the signal and the corresponding set of quantized expansion coefficients. Utilizing the segmentation obtained by the best basis algorithm, we modify the edge weights of the graph. The purpose of doing so is to encourage edges between regions of similar characteristics to be preserved in the next iteration and to encourage those edges between regions of different characteristics to be cut. We have tried several different means of doing so, and the method we have found most effective in our experiments is as follows. If two adjacent regions and are represented by the same HGLET variation (e.g., Lrw), then we double the weight of edge ei i+1 between them and halve the weights of edges ei−1 i i and ei+1 i+2. Whereas we began with an unweighted path graph, we now have a weighted one. We then iterate this process. We generate a new recursive partitioning of the signal, which will differ from the previous recursive partitioning due to the modified edge weights. We analyze the signal again using the three HGLET variations, although we treat the graph as being unweighted. This is because the purpose of modifying the edge weights is to influence the partitioning while preserving the relationship between the HGLET on a path graph and the block DCTs. As the recursive partitioning of the signal is different, the expansion coefficients will be different as well. We then find a new best basis and corresponding segmentation, and we modify the edge weights as before. We repeat this process until it converges to a particular basis, which gives us both a segmentation of the signal and a set of quantized coefficients. Empirically, we have observed that convergence occurs between 6 and 15 iterations. We denoise the signal by reconstructing with these quantized and thresholded HGLET coefficients. Fig. 6 shows some results obtained via this procedure. 7.DISCUSSIONIn this article, we have discussed applied and computational harmonic analysis on graphs and networks, highlighting key difficulties in extending classical wavelets to the graph setting. We review two transforms that we have proposed: the HGLET and the GHWT, which are generalizations of the hierarchical block DCTs and the Haar-Walsh wavelet packets that yield overcomplete, multiscale dictionaries comprised of orthonormal bases. For constructing such basis dictionaries, we have used “sharp” (i.e., mutually exclusive) recursive graph partitionings generated via Fiedler vectors. It is important to note, however, that one can use other graph partitioning techniques, e.g., the one based on the diffuse interface model.41 Moreover, it may be necessary to consider graph partitioning with overlaps, i.e., smoother cutoff functions, if one wants to construct smoother wavelet packets and local cosine dictionaries on graphs;see, e.g., Ref. 30 for their attempt on this issue. Due to the page limitation, we could not discuss more applications of our tools. Currently, we have been conducting numerical experiments of various applications including denoising and matrix data analysis. We hope to report these at a later date. In order to deepen our understanding of efficient approximation and denoising of data on graphs, a more theoretical consideration, which was done for wavelet bases for functions defined on the rectangular domains,48,49 is indispensable. However, it will be quite a challenge to define appropriate “smoothness” classes of functions defined on graphs. We can see some attempts toward this in Ref’s. 10,14. Of course, if one deals with the so-called quantum (or metric) graphs, certain function classes, such as the Sobolev spaces, have been defined,50 and hence we can expect closer interactions with that field. Finally, we want to remark on directed graphs, which have become more and more important in many applications. If G is a directed graph, both W(G) and L(G) are asymmetric. Hence, their eigenpairs are complex-valued in general. Consequently, their interpretation and analysis become more difficult than the undirected case. In order to resolve this problem, many different graph Laplacians for directed graphs have been proposed.51–53 Unfortunately, most of them strongly depend on the specific properties of the directed graph under consideration (e.g., strong connectivity);hence a universal definition of digraph Laplacian has yet to be found or defined. Instead of searching for digraph Laplacians, we believe that it will be quite fruitful to consider the distance matrix of a given directed graph, and the associated integral operator defined on it. Our viewpoint is the following: on a directed graph, the connectivity between any two vertices is not a local concept; rather it is a global concept, i.e., the existence of paths connecting these two vertices, and in particular the length of the shortest such path, is the most important information on a directed graph. The latter author previously proposed a method of computing Laplacian eigenfunctions on a domain of general shape in Rp,54 and we believe similar operators should be developed for directed graphs. ACKNOWLEDGMENTSThis research was partially supported by ONR grant N00014-12-1-0177 and NSF grant DMS-1418779, and was conducted with Government support under contract FA9550-11-C-0028 and awarded by the Department of Defense, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. REFERENCES
“Special Issue: Statistics on Networks,”
J. Comput. Graph. Statist., 21
(4),
(2012). Google Scholar
Special Issue: Graph-Based Representations in Pattern Recognition, Pattern Recognition Lett., 33
(15), 2012). Google Scholar
Special Section: Adaptation and Learning over Complex Networks, IEEE Signal Processing Magazine, 20
(3), 2013). Google Scholar
Special Issue: Graphs and Networks, Lincoln Laboratory Journal, 20
(1), 2013). Google Scholar
Coifman, R. R. and Maggioni, M.,
“Diffusion wavelets,”
Appl. Comput. Harm. Anal., 21
(1), 53
–94
(2006). https://doi.org/10.1016/j.acha.2006.04.004 Google Scholar
Bremer, J. C., Coifman, R. R., Maggioni, M., and Szlam, A.,
“Diffusion wavelet packets,”
Appl. Comput. Harm. Anal., 21
(1), 95
–112
(2006). https://doi.org/10.1016/j.acha.2006.04.005 Google Scholar
Murtagh, F.,
“The Haar wavelet transform of a dendrogram,”
J. Classification, 24
(1), 3
–32
(2007). https://doi.org/10.1007/s00357-007-0007-9 Google Scholar
Lee, A., Nadler, B., and Wasserman, L.,
“Treelets—an adaptive multi-scale basis for sparse unordered data,”
Ann. Appl. Stat., 2 435
–471
(2008). https://doi.org/10.1214/07-AOAS137 Google Scholar
Jansen, M., Nason, G. P., and Silverman, B. W.,
“Multiscale methods for data on graphs and irregular multidimensional situations,”
J. R. Stat. Soc. Ser. B, Stat. Methodol., 71
(1), 97
–125
(2008). https://doi.org/10.1111/rssb.2008.71.issue-1 Google Scholar
Gavish, M., Nadler, B., and Coifman, R. R.,
“Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning,”
in Proc. 27th Intern. Conf Machine Learning,
367
–374
(2010). Google Scholar
Hammond, D. K., Vandergheynst, P., and Gribonval, R.,
“Wavelets on graphs via spectral graph theory,”
Appl. Comput. Harm. Anal., 30
(2), 129
–150
(2011). https://doi.org/10.1016/j.acha.2010.04.005 Google Scholar
Rustamov, R. M., Average interpolating wavelets on point clouds and graphsarXiv:1110.2227v1 [math.FA],
(2011). Google Scholar
Rustamov, R. M. and Guibas, L.,
“Wavelets on graphs via deep learning,”
Advances in Neural Information Processing Systems, 26 998
–1006 Curran Associates, Inc., Red Hook, NY
(2013). Google Scholar
Sharon, N. and Shkolnisky, Y.,
“A class of Laplacian multiwavelets bases for high-dimensional data,”
Appl. Comput. Harm. Anal., 38
(3), 420
–451
(2015). https://doi.org/10.1016/j.acha.2014.07.002 Google Scholar
Godsil, C. and Royle, G., Algebraic Graph TheoryGraduate Texts in Mathematics, 207 Springer, New York
(2001). https://doi.org/10.1007/978-1-4613-0163-9 Google Scholar
von Luxburg, U.,
“A tutorial on spectral clustering,”
Stat. Comput., 17
(4), 395
–416
(2007). https://doi.org/10.1007/s11222-007-9033-z Google Scholar
Chung, F. R. K.,
“Spectral Graph Theory,”
in CBMS Regional Conference Series in Mathematics,
(1997). Google Scholar
Fiedler, M.,
“Algebraic connectivity of graphs,”
Czechoslovak Math. J., 23 298
–305
(1973). Google Scholar
de Abreu, N. M. M.,
“Old and new results on algebraic connectivity of graphs,”
Linear Algebra Appl., 423
(1), 53
–73
(2007). https://doi.org/10.1016/j.laa.2006.08.017 Google Scholar
Merris, R.,
“Laplacian matrices of graphs: A survey,”
Linear Algebra Appl., 197/198 143
–176
(1994). https://doi.org/10.1016/0024-3795(94)90486-3 Google Scholar
Fiedler, M.,
“A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory,”
Czechoslovak Math. J., 25 619
–633
(1975). Google Scholar
Davies, E. B., Gladwell, G. M. L., Leydold, J., and Stadler, P. F.,
“Discrete nodal domain theorems,”
Linear Algebra Appl., 336
(1–3), 51
–60
(2001). Google Scholar
Bıyıkoğlu, T., Leydold, J., and Stadler, P. F., Laplacian Eigenvectors of GraphsLecture Notes in Mathematics, 1915 Springer-Verlag, New York
(2007). https://doi.org/10.1007/978-3-540-73510-6 Google Scholar
Nakatsukasa, Y., Saito, N., and Woei, E.,
“Mysteries around the graph Laplacian eigenvalue 4,”
Linear Algebra Appl., 438
(8), 3231
–3246
(2013). https://doi.org/10.1016/j.laa.2012.12.012 Google Scholar
Strang, G.,
“The discrete cosine transform,”
SIAM Review, 41
(1), 135
–147
(1999). https://doi.org/10.1137/S0036144598336745 Google Scholar
Jaffard, S., Meyer, Y., and Ryan, R. D., Wavelets: Tools for Science & Technology, SIAM, Philadelphia, PA
(2001). https://doi.org/10.1137/1.9780898718119 Google Scholar
Saito, N. and Woei, E., , “Analysis of neuronal dendrite patterns using eigenvalues of graph Laplacians,” JSIAM Letters 1, 13–16 (2009). Invited paper. Google Scholar Shuman, D. I., Ricaud, B., and Vandergheynst, P.,
“Vertex-frequency analysis on graphs,”
Appl. Comput. Harm. Anal.,
(2015). Google Scholar
Coifman, R. R. and Gavish, M.,
“Harmonic analysis of digital data bases,”
Wavelets and Multiscale Analysis: Theory and ApplicationsApplied and Numerical Harmonic Analysis, 161
–197 Birkhäuser, Boston, MA
(2011). https://doi.org/10.1007/978-0-8176-8095-4 Google Scholar
Szlam, A. D., Maggioni, M., Coifman, R. R., and Bremer, Jr., J. C.,
“Diffusion-driven multiscale analysis on manifolds and graphs: top-down and bottom-up constructions,”
in Wavelets XI, Proc. SPIE 5914,
(2005). https://doi.org/10.1117/12.616931 Google Scholar
Coifman, R. R. and Meyer, Y.,
“Remarques sur l’analyse de Fourier à fenêtre,”
Comptes Rendus Acad. Sci. Paris, Série I, 312 259
–261
(1991). Google Scholar
Mallat, S., A Wavelet Tour of Signal Processing, thirdAcademic Press, Burlington, MA
(2009). Google Scholar
Wickerhauser, M. V, Adapted Wavelet Analysis from Theory to Software, A K Peters, Ltd., Wellesley, MA
(1994). Google Scholar
Saito, N. and Remy, J.-F.,
“The polyharmonic local sine transform: A new tool for local image analysis and synthesis without edge effect,”
Appl. Comput. Harm. Anal., 20
(1), 41
–73
(2006). https://doi.org/10.1016/j.acha.2005.01.005 Google Scholar
Coifman, R. R. and Wickerhauser, M. V.,
“Entropy-based algorithms for best basis selection,”
IEEE Trans. Inform. Theory, 38
(2), 713
–719
(1992). https://doi.org/10.1109/18.119732 Google Scholar
Saito, N.,
“Local feature extraction and its applications using a library of bases,”
Topics in Analysis and Its Applications: Selected Theses, 269
–451 World Scientific Pub. Co., Singapore
(2000). https://doi.org/10.1142/4240 Google Scholar
Saito, N. and Coifman, R. R.,
“Local discriminant bases and their applications,”
J. Math. Imaging Vis., 5
(4), 337
–358
(1995). https://doi.org/10.1007/BF01250288 Google Scholar
Saito, N., Coifman, R. R., Geshwind, F. B., and Warner, F.,
“Discriminant feature extraction using empirical probability density estimation and a local basis library,”
Pattern Recognition, 35
(12), 2841
–2852
(2002). https://doi.org/10.1016/S0031-3203(02)00019-5 Google Scholar
Marchand, B. and Saito, N.,
“Earth Mover’s Distance based local discriminant basis,”
Multiscale Signal Analysis and ModelingLecture Notes in Electrical Engineering, 275
–294 Springer, New York
(2013). https://doi.org/10.1007/978-1-4614-4145-8 Google Scholar
Irion, J. and Saito, N.,
“Hierarchical graph Laplacian eigen transforms,”
JSIAM Letters, 6 21
–24
(2014). https://doi.org/10.14495/jsiaml.6.21 Google Scholar
Bertozzi, A. L. and Flenner, A.,
“Diffuse interface models on graphs for classification of high dimensional data,”
Multiscale Model. Simul., 10
(3), 1090
–1118
(2012). https://doi.org/10.1137/11083109X Google Scholar
Shi, J. and Malik, J.,
“Normalized cuts and image segmentation,”
IEEE Trans. Pattern Anal. Machine Intell., 22
(8), 888
–905
(2000). https://doi.org/10.1109/34.868688 Google Scholar
Hagen, L. and Kahng, A. B.,
“New spectral methods for ratio cut partitioning and clustering,”
IEEE Trans. Comput.-Aided Des., 11
(9), 1074
–1085
(1992). https://doi.org/10.1109/43.159993 Google Scholar
Irion, J. and Saito, N.,
in Proc. 2014 IEEE Workshop on Statistical Signal Processing,
472
–475
(2014). Google Scholar
Saito, N. and Woei, E. R.,
in Proc. 13th IEEE Workshop on Statistical Signal Processing,
315
–320
(2005). Google Scholar
Rissanen, J., Stochastic Complexity in Statistical Inquiry, World Scientific, Singapore
(1989). Google Scholar
Grünwald, P. D., The Minimum Description Length Principle, The MIT Press, Cambridge, MA
(2007). Google Scholar
DeVore, R. A.,
“Nonlinear approximation,”
Acta Numerica, 7 51
–150
(1998). https://doi.org/10.1017/S0962492900002816 Google Scholar
Donoho, D. L.,
“Unconditional bases are optimal bases for data compression and statistical estimation,”
Appl. Comput. Harm. Anal., 1 100
–115
(1993). https://doi.org/10.1006/acha.1993.1008 Google Scholar
Berkolaiko, G. and Kuchment, P., Introduction to Quantum GraphsMathematical Surverys and Monographs, 186 AMS, Providence, RI
(2013). Google Scholar
Chung, F.,
“Laplacians and the Cheeger inequality for directed graphs,”
Ann. Comb., 9
(1), 1
–19
(2005). https://doi.org/10.1007/s00026-005-0237-z Google Scholar
Bauer, F.,
“Normalized graph Laplacians for directed graphs,”
Linear Algebra Appl., 436
(11), 4193
–4222
(2012). https://doi.org/10.1016/j.laa.2012.01.020 Google Scholar
Bapat, R. B., Kalita, D., and Pati, S.,
“On weighted directed graphs,”
Linear Algebra Appl., 436
(1), 99
–111
(2012). https://doi.org/10.1016/j.laa.2011.06.035 Google Scholar
Saito, N.,
“Data analysis and representation on a general domain via eigenfunctions of Laplacian,”
Appl. Comput. Harm. Anal., 25 68
–97
(2008). https://doi.org/10.1016/j.acha.2007.09.005 Google Scholar
|