Abstract: This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results.
Keywords: tensors
Abstract: Tensors (also known as mutidimensional arrays or N-way arrays) are used in a variety of applications ranging from chemometrics to psychometrics. We describe four MATLAB classes for tensor manipulations that can be used for fast algorithm prototyping. The tensor class extends the functionality of MATLAB's multidimensional arrays by supporting additional operations such as tensor multiplication. The tensor_as_matrix class supports the matricization of a tensor, i.e., the conversion of a tensor to a matrix (and vice versa), a commonly used operation in many algorithms. Two additional classes represent tensors stored in decomposed formats: cp_tensor and tucker_tensor. We describe all of these classes and then demonstrate their use by showing how to implement several tensor algorithms that have appeared in the literature.
Abstract: A special, restricted variant of T2 - the weighted Euclidian distance model, called INDSCAL - is presented within the context of multidimensional scaling. The method to estimate the parameters proceeds via the minimization of a loss function. It is illustrated with the Miller & Nicely data and Wish' data on the perception of nations.
Keywords: tensors, CANDECOMP, PARAFAC
Generally cited as one of two primary references for PARAFAC (the other is Harshman, 1970)
Keywords: tensors
Cited for HO-SVD.
Keywords: tensors
Abstract: Several recently proposed algorithms for fitting the PARAFAC model are investigated and compared to more established alternatives. Alternating least squares (ALS), direct trilinear decomposition (DTLD), alternating trilinear decomposition (ATLD), self-weighted alternating trilinear decomposition (SWATLD), pseudo alternating least squares (PALS), alternating coupled vectors resolution (ACOVER), alternating slice-wise diagonalization (ASD) and alternating coupled matrices resolution (ACOMAR) are compared on both simulated and real data. For the recent algorithms, only unconstrained three-way models can be fitted. In contrast, for example, ALS allows modeling of higher-order data, as well as incorporating constraints on the parameters and handling of missing data. Nevertheless, for three-way data, the newer algorithms are interesting alternatives. It is found that the ALS estimated models are generally of a better quality than any of the alternatives even when overfactoring the model, but it is also found that ALS is significantly slower. Based on the results (in particular the poor performance of DTLD), it is advised that (a slightly modified) ASD may be a good alternative to ALS when a faster algorithm is desired.
Keywords: Trilinear; Overfactoring; Algorithm comparison; Speed; tensors
Abstract: Simple structure and other common principles of factor rotation do not in general provide strong grounds for attributing explanatory significance to the factors which they select. In contrast, it is shown that an extension of Cattell's principle of rotation to Proportional Profiles (PP) offers a basis for determining explanatory factors for three-way or higher order multi-mode data. Conceptual models are developed for two basic patterns of multi-mode data variation, system- and object-variation, and PP analysis is found to apply in the system-variation case. Although PP was originally formulated as a principle of rotation to be used with classic two-way factor analysis, it is shown to embody a latent three-way factor model, which is here made explicit and generalized from two to N "parallel occasions". As originally formulated, PP rotation was restricted to orthogonal factors. The generalized PP model is demonstrated to give unique "correct" solutions with oblique, non-simple structure, and even non-linear factor structures. A series of tests, conducted with synthetic data of known factor composition, demonstrate the capabilities of linear and non-linear versions of the mode, provide data on the minimal necessary conditions of uniqueness, and reveal the properties of the analysis procedures when these minimal conditions are not fulfilled. In addition, a mathematical proof is presented for the uniqueness of the solution given certain conditions on the data. Three-mode PP factor analysis is applied to a three-way set of real data consisting of the fundamental and first three formant frequencies of ll persons saying 8 vowels. A unique solution is extracted, consisting of three factors which are highly meaningful and consistent with prior knowledge and theory concerning vowel quality. The relationships between the three-mode PP model and Tucker's multi-modal model, McDonald's non-linear model and Carroll and Chang's multi-dimensional scaling model are explored.
Keywords: tensors PARAFAC
Generally cited as one of two primary references for PARAFAC (the other is Carroll and Chang, 1970).
Keywords: tensors
An excellent paper on notation for mulilinear algebra.
Abstract: As the size of the web increases, it becomes more and more important to analyze link structure while also considering context. Multilinear algebra provides a novel tool for incorporating anchor text and other information into the authority computation used by link analysis methods such as HITS. Our recently proposed TOPHITS method uses a higher-order analogue of the matrix singular value decomposition called the PARAFAC model to analyze a three-way representation of web data. We compute hubs and authorities together with the terms that are used in the anchor text of the links between them. Adding a third dimension to the data greatly extends the applicability of HITS because the TOPHITS analysis can be performed in advance and offline. Like HITS, the TOPHITS model reveals latent groupings of pages, but TOPHITS also includes latent term information. In this paper, we describe a faster mathematical algorithm for computing the TOPHITS model on sparse data, and Web data is used to compare HITS and TOPHITS. We also discuss how the TOPHITS model can be used in queries, such as computing context-sensitive authorities and hubs. We describe different query response methodologies and present experimental results.
Keywords: PARAFAC, multilinear algebra, link analysis, higher-order SVD
Application of PARAFAC to web data.
Abstract: Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular non-negative matrix that captures the hyperlink structure of the web graph. We propose and test a new methodology that uses multilinear algebra to elicit more information from a higher-order representation of the hyperlink graph. We start by labeling the edges in our graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, three-way tensor. The first two dimensions of the tensor represent the web pages while the third dimension adds the anchor text. We then use the rank-1 factors of a multilinear PARAFAC tensor decomposition, which are akin to singular vectors of theSVD, to automatically identify topics in the collection along with the associated authoritative web pages.
Also available at http://csmr.ca.sandia.gov/~tgkolda/pubs/index.html#ICDM05.
Abstract: We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The first operator, which we call the Tucker operator, is shorthand for performing an n-mode matrix multiplication for every mode of a given tensor and can be employed to consisely express the Tucker decomposition. The second operator, which we call the Kruskal operator, is shorthand for the sum of the outer-products of the columns of N matrices and allows a divorce from a matricized representation and a very consise expression of the PARAFAC decomposition. We explore the properties of the Tucker and Kruskal operators independently of the related decompositions. Additionally, we provide a review of the matrix and tensor operations that are frequently used in the context of tensor decompositions.
Keywords: PARAFAC, CANDECOMP, Tucker, HOSVD, multilinear algebra, higher-order factor analysis, Khatri-Rao product
Background on notation and terminolgy for multilinear algebra.
Keywords: tensors, three-mode principal component analysis
Keywords: PARAFAC
Available at http://publish.uwo.ca/~harshman/jbkrank.pdf.
Abstract: We demonstrate how non-negative matrix factorization (NMF) can be used to decompose the inter trial phase coherence (ITPC) of multi-channel EEG to yield a unique decomposition of time-frequency signatures present in various degrees in the recording channels. The NMF optimization is easily generalized to a parallel factor (PARAFAC) model to form a non-negative multi-way factorization (NMWF). While the NMF can examine subject specific activities the NMWF can effectively extract the most similar activities across subjects and or conditions. The methods are tested on a proprioceptive stimulus consisting of a weight change in a handheld load. While somatosensory gamma oscillations have previously only been evoked by electrical stimuli we hypothesized that a natural proprioceptive stimulus also would be able to evoke gamma oscillations. ITPC maxima were determined by visual inspection and these results were compared to the NMF and NMWF decompositions. Agreement between the results of the visual pattern inspection and the mathematical decompositions was satisfactory showing two significant coherent activities; the predicted 40Hz activity 60 ms after stimulus onset in the frontal-parietal region contralateral to stimulus side and additionally an unexpected 20Hz activity slightly lateralized in the frontal central region. Consequently, also proprioceptive stimuli are able to elicit evoked gamma activity.
Keywords: nonnegative tensor factorization
Application of non-negative tensor factorization.
Abstract: A time-efficient algorithm PMF3 is presented for solving the three-way PARAFAC (CANDECOMP) factor analytic model. In contrast to the usual alternating least squares, the PMF3 algorithm computes changes to all three modes simultaneously. This typically leads to convergence in 40–100 iteration steps. The equations of the weighted multilinear least squares fit are given. The optional non-negativity is achieved by imposing a logarithmic penalty function. The algorithm contains a possibility for dynamical reweighting of the data during the iteration, allowing a robust analysis of outlier-containing data. The problems typical of PARAFAC models are discussed (but not solved): multiple local solutions, degenerate solutions, non-identifiable solutions. The question of how to verify the solution is discussed at length. The program PMF3 is available for 486-Pentium based PC computers.
Keywords: Positive matrix factorization; Multilinear models; PARAFAC; CANDECOMP; Multiple solutions; Degeneracy
One of the first non-negative matrix/tensor factorization papers.
Abstract: This report is a masters thesis written at the Department of Mathematics, Linköping University. Two different classification algorithms for handwritten digit recognition have been thoroughly analyzed. The first algorithm uses Higher Order Singular Value Decomposition (HOSVD) of the training digits. The second algorithm relies on a specific distance measure, which is invariant to different transformations, called Tangent Distance (TD). This algorithm was modified in the implementation part by the use of numerical derivatives and an approximation of the blurring operator. Two more classification algorithms were constructed by combining the first two algorithms. All constructed algorithms have been tested with good performance for some of them. The best results were achieved by the Tangent Distance classifier with an error rate of 3%. Finally the results of a few other classifiers are presented and compared with the test results obtained in this report.
Keywords: tensors
Keywords: tensors
Abstract: As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search engine accumulates a large volume of clickthrough data, which records who submits queries and which pages he/she clicks on. The clickthrough data is highly sparse and contains different types of objects (user, query and Web page), and the relationships among these objects are also very complicated. By performing analysis on these data, we attempt to discover Web users' interests and the patterns that users locate information. In this paper, a novel approach CubeSVD is proposed to improve Web search. The clickthrough data is represented by a 3-order tensor, on which we perform 3-mode analysis using the higher-order singular value decomposition technique to automatically capture the latent factors that govern the relations among these multi-type objects: users, queries and Web pages. A tensor reconstructed based on the CubeSVD analysis reflects both the observed interactions among these objects and the implicit associations among them. Therefore, Web search activities can be carried out based on CubeSVD analysis. Experimental evaluations using a real-world data set collected from an MSN search engine show that CubeSVD achieves encouraging search results in comparison with some standard methods.
Keywords: tensors
Abstract: A multitude of algorithms have been developed to fit a trilinear PARAFAC model to a three-way array. Limits and advantages of some of the available methods (i.e. GRAM-DTLD, PARAFAC-ALS, ASD, SWATLD, PMF3 and dGN) are compared. The algorithms are explained in general terms together with two approaches to accelerate them: line search and compression. In order to compare the different methods, 720 sets of artificial data were generated with varying level and type of noise, collinearity of the factors and rank. Two PARAFAC models were fitted on each data set: the first having the correct number of factors F and the second with F+1 components (the objective being to assess the sensitivity of the different approaches to the over-factoring problem, i.e. when the number of extracted components exceeds the rank of the array). The algorithms have also been tested on two real data sets of fluorescence measurements, again by extracting both the right and an exceeding number of factors. The evaluations are based on: number of iterations necessary to reach convergence, time consumption, quality of the solution and amount of resources required for the calculations (primarily memory).
Keywords: PARAFAC; ALS; SWATLD; PMF3; Levenberg–Marquadt,tensors
Keywords: tensors Tucker
Generally cited as the primary reference for the Tucker decomposition.
Keywords: tensors Tucker
http://mrl.nyu.edu/~dt/papers/eccv02/eccv02.pdf
Abstract: Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing ensembles of images resulting from the interaction of any number of underlying factors. We present a dimensionality reduction algorithm that enables subspace analysis within the multilinear framework. This N-mode orthogonal iteration algorithm is based on a tensor decomposition known as the N-mode SVD, the natural extension to tensors of the conventional matrix singular value decomposition (SVD). We demonstrate the power of multilinear subspace analysis in the context of facial image ensembles, where the relevant factors include different faces, expressions, viewpoints, and illuminations. In prior work we showed that our multilinear representation, called TensorFaces, yields superior facial recognition rates relative to standard, linear (PCA/eigenfaces) approaches. We demonstrate factor-specific dimensionality reduction of facial image ensembles. For example, we can suppress illumination effects (shadows, highlights) while preserving detailed facial features, yielding a low perceptual error.
Abstract: Independent Components Analysis (ICA) maximizes the statistical independence of the representational components of a training image ensemble, but it cannot distinguish between the different factors, or modes, inherent to image formation, including scene structure, illumination, and imaging. We introduce a nonlinear, multifactor model that generalizes ICA. Our Multilinear ICA (MICA) model of image ensembles learns the statistically independent components of multiple factors. Whereas ICA employs linear (matrix) algebra, MICA exploits multilinear (tensor) algebra. We furthermore introduce a multilinear projection algorithm which projects an unlabeled test image into the N constituent mode spaces to simultaneously infer its mode labels. In the context of facial image ensembles, where the mode labels are person, viewpoint, illumination, expression, etc., we demonstrate that the statistical regularities learned by MICA capture information that, in conjunction with our multilinear projection algorithm, improves automatic face recognition.
Available at http://mrl.nyu.edu/~alex/mica/mica05.pdf.
Keywords: tensors
http://vision.ai.uiuc.edu/~wanghc/research/iccv03_fed.html
 
[an error occurred while processing this directive]