Tensors in Data Analysis

Block lecture, 5 ECTS credits, winter semester 2017--18

Organization

Lecturer

Dr. Pauli Miettinen

Time & Location

  • The lectures will take place on week 41 (9–13 October 2017) at 10:15–11:45 and 12:30–14:00 in room 016 of building E1.3 (N.B. room has changed)
  • Tutorial sessions will take place Tuesday–Friday on week 41 (10–13 October 2017) at 14:30–15:45 in room 023 of builiding E1.4 (MPI-INF) (N.B. time has changed)
  • Final exam will be on Friday, 20 October (tentative). Time and place to be announced.

Registration

  • Presense on the first lecture is mandatory
  • No pre-registration other than exam registration in HISPOS

News

  • Results for the exam are in HISPOS
  • Re-exam will take place on 10 November. You need to contact the lecturer by email before Friday, 27 October, if you want to reserve a time for the oral exam. You must also register in the HISPOS before Friday, 3 November.

Problem sheets

  1. Tensor manipulations [problem sheet | sample solutions]
  2. CP decomposition and tensor rank [problem sheet | sample solutions]
  3. Variations and applications of the CP decomposition [problem sheet | sample solutions]
  4. Tucker decompositions [problem sheet | sample solutions]

Exam Information

The results for the final exam are in HISPOS. 

The re-exam will be organized on Friday, 10 November. You must contact the lecturer by email before Friday, 27 October, in order to book a time for the oral exam, and register in HISPOS by Friday, 3 November.

Course Contents

Tensors are multidimensional extensions of matrices. In the past decade, there has been a significant increase in the interest of using tensors in data analysis, where they can be used to store, for example, multi-relational data (subject-predicate-object triples, user-movie-tag triples, etc.), high spectral data (X-Y-spectrum images), or spatio-temporal data (X-Y-time data). Various tensor factorization methods are developed and proposed for analysing such data sets and for finding the latent structure from them.

This course will cover the use of tensor factorizations in data analysis. We will cover a number of different factorizations, their applications, their strengths and weaknesses, and algorithms for finding them. In addition, we will also cover other important topics related to tensors in data analysis, such as how to select which factorization to use and how to interpret their results.

This is a block course, and all the lectures will happen within one week (two times two 1:30 h lectures per day) before the teaching starts. In addition there will be four tutorial sessions after the lectures.

The tentative list of contents for the course is:

  • Tensor algebra and tensor operations
  • The CANDECOMP/PARAFAC (CP) decomposition and tensor rank
  • Variants and applications of and algorithms for the CP decomposition
  • The Tucker decompositions, their algorithms and applications
  • The tensor train decomposition
  • Choosing the factorization and rank

Course Format

This is a block lecture, that is, it takes place on a single week. There are 2 x 1:30 h of lectures every day, and 1 x 1:30 h of tutorial sessions from Tuesday to Friday. For the tutorials, written homework assignments are handed out every day except Friday, and they are due the next day's tutorial session. Students should expect to spend at least eight hours per day for the course during the week. 

At the begin of the tutorial sessions, students have to mark which problem solutions they are willing to present. To be egligible to sit in the final exam, the students must mark at least half of the problems. Consequently, there is a mandatory presence in at least two tutorial sessions (assuming the studen marks all questions from the corresponding problem sheets). There are no bonus points for solving more problems, though trying to solve every problem in every problem sheet is strongly encouraged.

The lectures are "chalk-talks", that is, the material is presented primarily on the blackboard. Handwritten lecture notes will be made available after every lecture in the course home page. Presence in the first lecture is mandatory, and volutary for the rest (though highly recommended). 

Prerequisites

The students are expected to have a good knowledge of linear algebra and matrix analysis. Basic knowledge of matrix factorization methods such as SVD, PCA, and NMF is expected. Taking the course Data Mining and Matrices is not mandatory, but knowing its contents is recommended.
 

Learning Objectives

The course aims at teaching the theory behind the tensor decompositions, as well as the practical side of which decomposition to use when and how to interpret their results. After the course, the students should know the most common tensor decompositions in data analysis and be able to use them in their own work. The students should also be able to understand new decompositions, and how they relate to the ones they already know. The students should be able to understand the basic algorithmic ideas used in computing the decompositions, and to read and implement basic tensor decomposition algorithms. The students should be able to choose the correct decomposition to the given data analysis task, to be able to interpret the results, and to know the strengths and the weaknesses of the most common decompositions.

Reading Material

The following material provides background information on the topic. It is not mandatory for the course, but it is definitely helpful. 

  1. Skillicorn, D., 2007. Understanding Complex Datasets: Data Mining with Matrix Decompositions, Chapter 9. Boca Raton: Chapman & Hall/CRC.
  2. Kolda, T.G. & Bader, B.W., 2009. Tensor decompositions and applications. SIAM Review, 51(3), pp.455–500. [PDF]
  3. Cichocki, A. et al., 2009. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, Chapters 1.4, 1.5, and 7. Chichester: John Wiley & Sons.