# Advanced Data Analysis with Matrices and Tensors

## Block seminar, 7 ECTS credits, winter semester 2015–16

## Basic information

- Type: Block seminar
- Lecturer:Dr. Pauli Miettinen
- Credits: 7 ECTS credits
- Registration: The registration opens on Wednesday, 7 October, at 9 am CET. Information on how to register is posted in this page when the registration starts.
- Seminar days: The seminar will take place on 18–19 January 2016

## Registration

To register to the seminar, send email to the lecturer containing at least three (3) papers you would like to present, in the order of preference. The email should also contain student's full name and matriculation number. The places to the seminar are handed based on the availability of the listed papers in first-come-first-serve basis, but in case of conflicts, a priority is given to the students with stronger background information (e.g. to those who have taken the Data Mining and Matrices course). All students who have been accepted to the course must attend the kick-off meeting or their spot will be given away.

## Content

Graphs, binary relations, document corpora, shopping data, movie ratings, IP traffic data, and many other data types can be – and often are – represented using matrices or tensors. They allow the data analyst to find structures and regularities from the data using matrix and tensor factorization methods, such as the well-known SVD, NMF, or ICA factorizations. The recent popularity of data analysis using methods based on linear algebra has increased the need for other kinds of factorizations; the increased popularity of tensor factorizations being perhaps the most obvious example.

In this seminar we cover recent work on data analysis using matrix and tensor factorizations. For matrices, we see methods that improve the known approaches, or use them for novel applications; for tensors, we see new types of decompositions and their applications.

## Format and Prerequisites

This is a block seminar. The seminar will take place over two full days in Mid-January 2016, with mandatory participation in both days. In addition, there is a mandatory kick-off meeting at the begin of the semester and a recommended primer lecture on tensor factorizations. To pass the seminar, the participants must give a presentation and hand in a short report on their topic. The repots and preliminary versions of the slides have to be handed in to the lecturer during the semester (exact dates TBA), and the reports will be distributed to the other attendants. The grading is based on the reports, the presentation, the student's knowledge of the subject (as evidenced in the discussion after the presentation), and his/her activity in the discussions.

The students taking this seminar are expected to know linear algebra, popular matrix factorization methods, and their applications to data analysis. Successfully passing the Data Mining and Matrices course is recommended, but not mandatory, prerequisite.

## Schedule

Date | Time | Topic | Location |
---|---|---|---|

21 October | 12:15–14:00 | Kick-off meeting | Room 024, building E1.4 (MPI-INF) |

28 October | 12:15–14:00 | Info lecture | Room 024, building E1.4 (MPI-INF) |

7 December | 16:00 CET | Written report first draft DL | |

14 December | 16:00 CET | Slides first draft DL | |

10 January | 23:59 CET | Written report hand-in DL | |

18–19 January | 9:50–15:00 | Seminar days | Room 630, building E1.5 (MPI-SWS) |

## Papers

Below is the list of the papers. Papers 1–6 cover matrix factorizations and papers 7–12 cover tensor factorizations.

When registering, the students must list at least three papers (in the order of preference; more information is published when the registration opens). If you are interested in taking the seminar, it is recommended that you study the papers beforehand.

All papers have been selected and the seminar is fully booked!

- Erdős, D., Gemulla, R. & Terzi, E., 2014. Reconstructing Graphs from Neighborhood Data.
*ACM Transactions on Knowledge Discovery from Data*, 8(4), pp.23–22. [PDF] - Farahat, A.K. et al., 2014. Greedy column subset selection for large-scale data sets.
*Knowledge and Information Systems*, 45(1), pp.1–34. [PDF] - Liberty, E., 2013. Simple and deterministic matrix sketching. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–588. [PDF]
- Miettinen, P., 2015. Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks. In 2015 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 36–52. [PDF]
- Teflioudi, C., Gemulla, R. & Mykytiuk, O., 2015. LEMP: Fast Retrieval of Large Entries in a Matrix Product. In 2015 ACM SIGMOD/PODS Conference, pp. 107–122. [PDF]
- Zhang, Z. et al., 2014. Preference preserving hashing for efficient recommendation. In 37th International ACM SIGIR conference on Research & development in information retrieval, pp. 183–192. [PDF]
- Chi, E.C. & Kolda, T.G., 2012. On Tensors, Sparsity, and Nonnegative Factorizations.
*SIAM Journal on Matrix Analysis and Applications*, 33(4), pp.1272–1299. [PDF] - Comarela, G. & Crovella, M., 2014. Identifying and Analyzing High Impact Routing Events with PathMiner. In The 2014 Internet Measurement Conference, pp. 421–434. [PDF]
- Erdős, D. & Miettinen, P., 2013. Walk’n’Merge: A Scalable Algorithm for Boolean Tensor Factorization. In 13th IEEE International Conference on Data Mining, pp. 1037–1042. [PDF]
- Hu, C. et al., 2015. Scalable Bayesian Non-negative Tensor Factorization for Massive Count Data. In 2015 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 53–70. [PDF]
- Khan, S.A. & Kaski, S., 2014. Bayesian Multi-view Tensor Factorization. In 2014 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 656–671. [PDF]
- Nickel, M., Tresp, V. & Kriegel, H.-P., 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In 28th International Conference on Machine Learning, pp. 809–816. [PDF]