Javascript required
Skip to content Skip to sidebar Skip to footer

Neural Network Learn Sparse Solution to Underdetermined System

Sparse approximation (also known as sparse representation) theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signal processing, machine learning, medical imaging, and more.

Sparse decomposition [edit]

Noiseless observations [edit]

Consider a linear system of equations x = D α {\displaystyle x=D\alpha } , where D {\displaystyle D} is an underdetermined m × p {\displaystyle m\times p} matrix ( m < p ) {\displaystyle (m<p)} and x R m , α R p {\displaystyle x\in \mathbb {R} ^{m},\alpha \in \mathbb {R} ^{p}} . The matrix D {\displaystyle D} (typically assumed to be full-rank) is referred to as the dictionary, and x {\displaystyle x} is a signal of interest. The core sparse representation problem is defined as the quest for the sparsest possible representation α {\displaystyle \alpha } satisfying x = D α {\displaystyle x=D\alpha } . Due to the underdetermined nature of D {\displaystyle D} , this linear system admits in general infinitely many possible solutions, and among these we seek the one with the fewest non-zeros. Put formally, we solve

min α R p α 0  subject to x = D α , {\displaystyle \min _{\alpha \in \mathbb {R} ^{p}}\|\alpha \|_{0}{\text{ subject to }}x=D\alpha ,}

where α 0 = # { i : α i 0 , i = 1 , , p } {\displaystyle \|\alpha \|_{0}=\#\{i:\alpha _{i}\neq 0,\,i=1,\ldots ,p\}} is the 0 {\displaystyle \ell _{0}} pseudo-norm, which counts the number of non-zero components of α {\displaystyle \alpha } . This problem is known to be NP-hard with a reduction to NP-complete subset selection problems in combinatorial optimization.

Sparsity of α {\displaystyle \alpha } implies that only a few ( k m < p {\displaystyle k\ll m<p} ) components in it are non-zero. The underlying motivation for such a sparse decomposition is the desire to provide the simplest possible explanation of x {\displaystyle x} as a linear combination of as few as possible columns from D {\displaystyle D} , also referred to as atoms. As such, the signal x {\displaystyle x} can be viewed as a molecule composed of a few fundamental elements taken from D {\displaystyle D} .

While the above posed problem is indeed NP-Hard, its solution can often be found using approximation algorithms. One such option is a convex relaxation of the problem, obtained by using the 1 {\displaystyle \ell _{1}} -norm instead of 0 {\displaystyle \ell _{0}} , where α 1 {\displaystyle \|\alpha \|_{1}} simply sums the absolute values of the entries in α {\displaystyle \alpha } . This is known as the basis pursuit (BP) algorithm, which can be handled using any linear programming solver. An alternative approximation method is a greedy technique, such as the matching pursuit (MP), which finds the location of the non-zeros one at a time.

Surprisingly, under mild conditions on D {\displaystyle D} (using the spark (mathematics), the mutual coherence or the restricted isometry property) and the level of sparsity in the solution, k {\displaystyle k} , the sparse representation problem can be shown to have a unique solution, and BP and MP are guaranteed to find it perfectly.[1] [2] [3]

Noisy observations [edit]

Often the observed signal x {\displaystyle x} is noisy. By relaxing the equality constraint and imposing an 2 {\displaystyle \ell _{2}} -norm on the data-fitting term, the sparse decomposition problem becomes

min α R p α 0  subject to x D α 2 2 ϵ 2 , {\displaystyle \min _{\alpha \in \mathbb {R} ^{p}}\|\alpha \|_{0}{\text{ subject to }}\|x-D\alpha \|_{2}^{2}\leq \epsilon ^{2},}

or put in a Lagrangian form,

min α R p λ α 0 + 1 2 x D α 2 2 , {\displaystyle \min _{\alpha \in \mathbb {R} ^{p}}\lambda \|\alpha \|_{0}+{\frac {1}{2}}\|x-D\alpha \|_{2}^{2},}

where λ {\displaystyle \lambda } is replacing the ϵ {\displaystyle \epsilon } .

Just as in the noiseless case, these two problems are NP-Hard in general, but can be approximated using pursuit algorithms. More specifically, changing the 0 {\displaystyle \ell _{0}} to an 1 {\displaystyle \ell _{1}} -norm, we obtain

min α R p λ α 1 + 1 2 x D α 2 2 , {\displaystyle \min _{\alpha \in \mathbb {R} ^{p}}\lambda \|\alpha \|_{1}+{\frac {1}{2}}\|x-D\alpha \|_{2}^{2},}

which is known as the basis pursuit denoising. Similarly, matching pursuit can be used for approximating the solution of the above problems, finding the locations of the non-zeros one at a time until the error threshold is met. Here as well, theoretical guarantees suggest that BP and MP lead to nearly optimal solutions depending on the properties of D {\displaystyle D} and the cardinality of the solution k {\displaystyle k} . [4] [5] [6] Another interesting theoretical result refers to the case in which D {\displaystyle D} is a unitary matrix. Under this assumption, the problems posed above (with either 0 {\displaystyle \ell _{0}} or 1 {\displaystyle \ell _{1}} ) admit closed-form solutions in the form of non-linear shrinkage.[4]

Variations [edit]

There are several variations to the basic sparse approximation problem.

Structured sparsity: In the original version of the problem, any of the atoms in the dictionary can be picked. In the structured (block) sparsity model, instead of picking atoms individually, groups of them are to be picked. These groups can be overlapping and of varying size. The objective is to represent x {\displaystyle x} such that it is sparse while forcing this block-structure.[7]

Collaborative (joint) sparse coding: The original version of the problem is defined for a single signal x {\displaystyle x} . In the collaborative (joint) sparse coding model, a set of signals is available, each believed to emerge from (nearly) the same set of atoms from D {\displaystyle D} . In this case, the pursuit task aims to recover a set of sparse representations that best describe the data while forcing them to share the same (or close-by) support.[8]

Other structures: More broadly, the sparse approximation problem can be cast while forcing a specific desired structure on the pattern of non-zero locations in α {\displaystyle \alpha } . Two cases of interest that have been extensively studied are tree-based structure, and more generally, a Boltzmann distributed support.[9]

Algorithms [edit]

As already mentioned above, there are various approximation (also referred to as pursuit) algorithms that have been developed for addressing the sparse representation problem:

min α R p α 0  subject to x D α 2 2 ϵ 2 . {\displaystyle \min _{\alpha \in \mathbb {R} ^{p}}\|\alpha \|_{0}{\text{ subject to }}\|x-D\alpha \|_{2}^{2}\leq \epsilon ^{2}.}

We mention below a few of these main methods.

  • There are several other methods for solving sparse decomposition problems: homotopy method, coordinate descent, iterative hard-thresholding, first order proximal methods, which are related to the above-mentioned iterative soft-shrinkage algorithms, and Dantzig selector.

Applications [edit]

Sparse approximation ideas and algorithms have been extensively used in signal processing, image processing, machine learning, medical imaging, array processing, data mining, and more. In most of these applications, the unknown signal of interest is modeled as a sparse combination of a few atoms from a given dictionary, and this is used as the regularization of the problem. These problems are typically accompanied by a dictionary learning mechanism that aims to fit D {\displaystyle D} to best match the model to the given data. The use of sparsity-inspired models has led to state-of-the-art results in a wide set of applications.[12] [13] [14] Recent work suggests that there is a tight connection between sparse representation modeling and deep-learning.[15]

See also [edit]

  • Compressed sensing
  • Sparse dictionary learning
  • K-SVD
  • Lasso (statistics)
  • Regularization (mathematics) and inverse problems

References [edit]

  1. ^ Donoho, D.L. and Elad, M. (2003). "Optimally sparse representation in general (nonorthogonal) dictionaries via L1 minimization" (PDF). Proceedings of the National Academy of Sciences. 100 (5): 2197–2202. Bibcode:2003PNAS..100.2197D. doi:10.1073/pnas.0437847100. PMC153464. PMID 16576749. CS1 maint: multiple names: authors list (link)
  2. ^ Tropp, J.A. (2004). "Greed is good: Algorithmic results for sparse approximation" (PDF). IEEE Transactions on Information Theory. 50 (10): 2231–2242. CiteSeerX10.1.1.321.1443. doi:10.1109/TIT.2004.834793. S2CID 675692.
  3. ^ Donoho, D.L. (2006). "For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution" (PDF). Communications on Pure and Applied Mathematics. 56 (6): 797–829. doi:10.1002/cpa.20132.
  4. ^ a b Elad, M. (2010). Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer. CiteSeerX10.1.1.331.8963. doi:10.1007/978-1-4419-7011-4. ISBN978-1441970107.
  5. ^ Donoho, D.L., Elad, M. and Templyakov, V. (2006). "Stable recovery of sparse overcomplete representations in the presence of noise" (PDF). IEEE Transactions on Information Theory. 52 (1): 6–18. CiteSeerX10.1.1.125.5610. doi:10.1109/TIT.2005.860430. S2CID 14813938. CS1 maint: multiple names: authors list (link)
  6. ^ Tropp, J.A. (2006). "Just relax: Convex programming methods for identifying sparse signals in noise" (PDF). IEEE Transactions on Information Theory. 52 (3): 1030–1051. CiteSeerX10.1.1.184.2957. doi:10.1109/TIT.2005.864420. S2CID 6496872.
  7. ^ Eldar, Y.C, Kuppinger, P. and Bolcskei, H. (2009). "Block-sparse signals: Uncertainty relations and efficient recovery". IEEE Transactions on Signal Processing. 58 (6): 3042–3054. arXiv:0906.3173. Bibcode:2010ITSP...58.3042E. doi:10.1109/TSP.2010.2044837. S2CID 335122. CS1 maint: multiple names: authors list (link)
  8. ^ Tropp, J.A., Gilbert, A.C. and Strauss, M.J. (2006). "Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit". Signal Processing. 86 (3): 572–588. doi:10.1016/j.sigpro.2005.05.030. CS1 maint: multiple names: authors list (link)
  9. ^ Peleg, T. Eldar, Y.C. and Elad, M. (2012). "Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery". IEEE Transactions on Signal Processing. 60 (5): 2286–2303. arXiv:1010.5734. Bibcode:2012ITSP...60.2286P. doi:10.1109/TSP.2012.2188520. S2CID 3179803. CS1 maint: multiple names: authors list (link)
  10. ^ Needell, D. and Tropp, J.A. (2009). "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples". Applied and Computational Harmonic Analysis. 26 (3): 301–321. arXiv:0803.2392. doi:10.1016/j.acha.2008.07.002. CS1 maint: multiple names: authors list (link)
  11. ^ Zibulevsky, M. and Elad, M. (2010). "L1-L2 optimization in signal and image processing" (PDF). IEEE Signal Processing Magazine. 27 (3): 76–88. Bibcode:2010ISPM...27...76Z. doi:10.1109/MSP.2010.936023. S2CID 2783691. CS1 maint: multiple names: authors list (link)
  12. ^ Baraniuk, R.G. Candes, E. Elad, M. and Ma, Y. (2010). "Applications of sparse representation and compressive sensing". Proceedings of the IEEE. 98 (6): 906–909. doi:10.1109/JPROC.2010.2047424. CS1 maint: multiple names: authors list (link)
  13. ^ Elad, M. Figueiredo, M.A.T., and Ma, Y. (2010). "On the role of sparse and redundant representations in image processing" (PDF). Proceedings of the IEEE. 98 (6): 972–982. CiteSeerX10.1.1.160.465. doi:10.1109/JPROC.2009.2037655. S2CID 10992685. Archived from the original (PDF) on 2018-01-17. CS1 maint: multiple names: authors list (link)
  14. ^ Plumbley, M.D. Blumensath, T. Daudet, L. Gribonval, R. and Davies, M.E. (2010). "Sparse representations in audio and music: From coding to source separation". Proceedings of the IEEE. 98 (6): 995–1005. CiteSeerX10.1.1.160.1607. doi:10.1109/JPROC.2009.2030345. S2CID 4461063. CS1 maint: multiple names: authors list (link)
  15. ^ Papyan, V. Romano, Y. and Elad, M. (2017). "Convolutional Neural Networks Analyzed via Convolutional Sparse Coding" (PDF). Journal of Machine Learning Research. 18 (83): 1–52. arXiv:1607.08194. Bibcode:2016arXiv160708194P. CS1 maint: multiple names: authors list (link)

Neural Network Learn Sparse Solution to Underdetermined System

Source: https://en.wikipedia.org/wiki/Sparse_approximation