Recognizing Human Activity

Ulf Blanke & Bernt Schiele

Recognizing Human Activity

Recognizing human activity from wearable sensors

Sensing and understanding the context of the user plays an essential role in human-computer-interaction. It may enable natural communication, for instance, with robots that understand the user’s goals and offer support at the right time. Our work focuses on a particular type of context, namely human activity recognition.

While impressive progress has been made in recognizing short and atomic activities (such as shaking hands and walking), research on more complex human activity lasting for minutes or hours (such as a morning routine or an assembly task) is far less mature. We thus focus on several important aspects of human activity recognition related to complex and long-term activities.

Collecting human activity data

Wearable sensors attached to the body have great potential for sensing what the user is doing, at any given time or place, from a first-person perspective. Given the advances in micro-technology, inexpensive sensors are already today becoming widely available in watches, cell phones, and even clothing. Motion data can then be collected and analyzed for activity understanding, using machine learning techniques.

Figure 1: Relevant parts for composite activity

Identifying and combining relevant activity events

It is not usually the sequence of atomic activities that is interesting, but rather the higher level goal at which these activities are directed. There are several ways to infer the higher level goal from observing atomic activities. Since composite activities can contain large amounts of unrelated activity, using the complete observation can be suboptimal and therefore confuse the recognition. For many composite activities, it is sufficient to spot only a few underlying activity events to allow their recognition [Figure 1] . For example having lunch can be characterized by walking at a certain time of day, without even observing the actual eating activity. In a discriminative analysis, we observe that a surprisingly small fraction of relevant parts can be sufficient to recognize the higher level composite activity and allow efficient recognition algorithms.

Figure 2: Composite activity for construction

Hierarchical model for composite activities

Preserving the structure of a hierarchical activity offers several benefits. Considering a construction manual for a mirror in figure 2, one of several tasks is to fix the frame to the panel. This seemingly simple task consists of various steps, and it becomes obvious that composite activities add significant variation. Composite activities can be interrupted, the duration can vary strongly across different users, or the underlying activities can happen in different order. Using the same algorithms recognizing atomic activities can be suboptimal, as these require prohibitive amounts of training data. Therefore, we propose a hierarchical model that observes relevant activity events and combines them to recognize composite activities, similar to the way in which letters create words. Experiments show indeed superior performance compared to the standard approaches usually used for activity recognition.

Transferring knowledge

Parts that are similar in different composite activities can be shared, much like vocabulary. Instead of re-learning composite activities from scratch, transferring shared parts reduces the training effort for new composite activities significantly.

Ulf Blanke

DEPT. 2 Computer Vision and Multimodal Computing
Phone +49 681 9325-2000
Email blanke@mpi-inf.mpg.de

Bernt Schiele

DEPT. 2 Computer Vision and Multimodal Computing
Phone +49 681 9325-2000
Email shiele@mpi-inf.mpg.de