Sequence-Aware Recommender Systems

MASSIMO QUADRANA, ContentWise

PAOLO CREMONESI, Politecnico di Milano

DIETMAR JANNACH, AAU Klagenfurt

Recommender systems are one of the most successful applications of data mining and machine learning

technology in practice. Academic research in the eld is historically often based on the matrix completion

problem formulation, where for each user-item-pair only one interaction (e.g., a rating) is considered. In many

application domains, however, multiple user-item interactions of dierent types can be recorded over time.

And, a number of recent works have shown that this information can be used to build richer individual user

models and to discover additional behavioral patterns that can be leveraged in the recommendation process.

In this work we review existing works that consider information from such sequentially-ordered user-

item interaction logs in the recommendation process. Based on this review, we propose a categorization

of the corresponding recommendation tasks and goals, summarize existing algorithmic solutions, discuss

methodological approaches when benchmarking what we call sequence-aware recommender systems, and

outline open challenges in the area.

CCS Concepts:

• Information systems → Recommender systems

; Collaborative ltering;

• Computing

methodologies → Learning from implicit feedback;

Additional Key Words and Phrases: sequence, session, trend, algorithms, dataset, evaluation, recommendation

ACM Reference format:

Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems.

ACM Comput. Sur v. 1, 1, Article 1 (February 2018), 35 pages.

https://doi.org/0000001.0000001

1 INTRODUCTION

Recommender Systems (RS) are software applications that support users in nding items of interest

within larger collections of objects, often in a personalized way. Today, such systems are used

in a variety of application domains, including for example e-commerce or media streaming, and

receiving automated recommendations of dierent forms has become a part of our daily online

user experience. Internally, such systems analyze the past behavior of individual users or of a

user community as a whole to detect patterns in the data. On typical online sites, various types of

relevant actions of a user can be recorded, e.g., that a user views an item or makes a purchase, and

several of the actions of a single user may relate to the same item. These recorded actions and the

Author’s addresses: Massimo Quadrana, ContentWise, Milan, Italy ([email protected]). Paolo

Cremonesi, Politecnico di Milano, Dipartimento di Elettronica, Informatica e Bioingegneria (DEIB), Milan, Italy

([email protected]). Dietmar Jannach, Institute of Applied Informatics, AAU Klagenfurt, Austria (diet-

mar[email protected]). This work was mainly carried out when Massimo Quadrana was a Ph.D. student at Politecnico di

Milano.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee

provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the

full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.

Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires

prior specic permission and/or a fee. Request permissions from [email protected].

0360-0300/2018/2-ART1 $15.00

https://doi.org/0000001.0000001

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:2 M. adrana et al.

detected patterns are then used to compute recommendations that match the preference proles of

individual users.

In academic environments, the predominant problem abstraction is that of matrix completion

where we are given a user-item rating matrix and the goal is to predict the missing values. This

abstraction is generally well-suited to train machine learning models that aim to capture longer-term

user preference proles. The corresponding algorithms however typically implement no specic

means to take the users’ short-term behavior or intents into account in their recommendations;

nor are they designed to use the rich information that is contained in the sequentially-ordered user

interaction logs that are often available in practical applications.

In practice, however, there are many application scenarios where considering short-term user

interests and longer-term sequential patterns can be central to the success of a recommender.

A typical example problem setting is that of session-based recommendation [

], where no

longer-term user histories are available. Instead, we have to adapt the recommendations according

to the assumed short-term interests of an anonymous user. The goal in such scenarios usually is to

recommend objects that match a given sequence of user actions.

Typical algorithmic approaches in that context learn to predict the best next item from sequential

user interaction logs. Considering such sequences is however not only relevant for the short-term

adaptation of the recommendations. The sequential logs can also be used to derive longer-term

behavior patterns, e.g., to detect interest drifts of individual users over time [

], to identify short-

term popularity trends in the community that can be exploited by recommendation algorithms

[

], or to reason about the best point in time to remind users of certain items they have seen

or purchased before [

]. Finally, there are application domains where the recommendation of one

item (e.g., an accessory) only makes sense after some other object was purchased. Such weak or

strict ordering constraints might correspondingly be learned from the data and considered by a

sequence-aware recommender.

Overall, sequence-aware recommendation scenarios are highly relevant in practice and a number

of relevant works were proposed in the recent past. Research in the eld is however comparably

scattered and no common understanding of the dierent facets of the problem exists. In this

survey work, we therefore (i) categorize the various scenarios of sequence-aware recommendations

approaches in the academic literature, (ii) we review the various algorithmic approaches that were

proposed to extract and leverage patterns from interaction logs, and (iii) we nally discuss specic

issues when benchmarking dierent recommendation methods. One of the major goals of the

review in that context is to lay the path for more standardized and better reproducible research

works in the eld.

The paper is organized as follows. In Section 2 and Section 3 we characterize and categorize

dierent types of sequence-aware recommendation problems that can be found in the literature.

Section 4 reviews existing algorithms and Section 5 discusses methodological questions regarding

their evaluation and comparison. Section 6 nally gives a brief outlook on future research directions.

2 CHARACTERIZING SEQUENCE-AWARE RECOMMENDER SYSTEMS

Sequence-aware recommendation problems are dierent from the traditional matrix-completion

setup in a number of ways. Figure 1 gives a high-level overview of the problem, its inputs, outputs,

and specic computational tasks. In general, the ordering of the objects can be relevant both with

respect to the inputs and to the outputs. We will discuss these aspects in more detail next.

2.1 Inputs, Outputs, Computational Tasks, and Abstract Problem Characterization

Inputs. The main input to sequence-aware recommendation problems is an ordered and often

timestamped list of past user actions. Users can be already known by the system or anonymous

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:3

Fig. 1. High-level Overview of Sequence-Aware Recommendation Problems.

ones. Each action can be associated with one of the recommendable items. Finally, each action

can be of one of several pre-dened types and each action, user, and item may have a number of

additional attributes. Overall, the inputs can be considered as a sort of enriched clickstream data.

In the traditional matrix completion setup, all ratings are attached to one of the known users

and items. We do not require this to be the case for sequence-aware recommenders. Anonymous

user actions are not uncommon, e.g., in the e-commerce domain, where users are often not logged

in. Nonetheless, relevant information can be extracted from past anonymous sessions. We also do

not require that each action is related to an item, since, for example, relevant information can be

extracted from the users’ search or navigation behavior as well [

]. Finally, in most application

scenarios, each action will have an assigned action type (e.g., item-view, item-purchase, add-to-cart,

etc.). And, depending on the domain, additional information might be available that describes

further details of an action (e.g., whether an item was discounted when the action took place), the

users (e.g., demographics), or the items (e.g., metadata features).

Generally, such forms of input data are available in many practical applications, e.g., in the form

of application or web server logs. Usually, we do however not assume to have larger quantities of

explicit ratings available in sequence-aware recommender systems.

Outputs. The output of a sequence-aware recommender are ordered lists of items as will be

described more formally below. In this general form, the outputs are similar to those of a tradi-

tional “item-ranking” recommendation setup. However, in some sequence-aware recommendation

scenarios, the ordering of the objects in the recommendation list can be relevant as well. Instead of

considering the list of recommendations as a set of alternatives for the user, there are scenarios

where the user should consider all recommendations and do this in the provided order. Typical

examples include the recommendation of a sequence of tracks in music recommendation or the

recommendation of a series of learning courses. We will describe such application scenarios in

more detail later in the paper.

Computational Tasks. Dierent computational tasks of sequence-aware recommenders can be

identied in the literature. Most commonly, a task that is not present in traditional matrix completion

setups is the identication of sequence-related patterns in the recorded user actions. These can be

sequential patterns, where the order of the actions is relevant, or they can be co-occurrence patterns,

where it is only important that two actions happened together, e.g., within the same session. In

some cases, also distance patterns can be relevant, e.g., when the problem is to compute a good point

in time to remind the user of something through a recommendation. Note that the corresponding

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:4 M. adrana et al.

patterns do not have to be made explicit, as often done, e.g., in sequential pattern mining [

], but

can be implicitly encoded in complex machine learning models as well.

Besides the identication of such patterns that are subsequently used in the recommendation

task, another computational task of a sequence-aware recommender can be to reason about order

constraints. Such constraints can be either prescribed and given for an application domain as strict

constraints (e.g., in terms of a given curriculum for the learning course recommendation problem),

given as heuristics (e.g., in terms of track transition rules for next-track music recommendation),

or be implicitly derived from the given input data as a sort of weak constraints.

Finally, the patterns (or more generally, the learned models) that were identied in the data and

the constraints have to be related with the point in time for which the recommendation is sought

for. In a session-based recommender, one might consider the last few user actions and then look

for past sessions that were similar to the current one. On the other hand, when a recommender

is used as a reminder for the repeated purchase of consumables, the distance (in time) to the last

purchase action of the user to the present time might be relevant.

Abstract Characterization. Adopting the formalisms of [

], we can describe the problem at a

more formal, abstract level as follows. Let

be a set of users and

a set of recommendable items.

In contrast to matrix-completion problems, we are not interested in predicting a utility value for

each

i ∈ I

and for each

c ∈ C

, but in computing an ordered list of objects

of length

for each

user, where each element of

l ∈ L

corresponds to an element of

i ∈ I

. Technically, each sequence

is an element of the set of all permutations up to length

of the powerset of

, i.e.,

L ∈ S

(P (I ))

We denote this latter set of possible lists as L

∗

Let

be a function that returns a utility score of a given sequence

for a user

, i.e.,

C ×L

∗

→ R

The sequence-aware recommendation problem then consists of determining the sequence

′

∈ L

∗

that maximizes the score for the user, i.e.,

∀c ∈ C, l

′

= arg max

l ∈L

∗

u (c, l ) (1)

The main problem in recommender systems is to learn or extrapolate the utility function

from

some given data. In the matrix completion problem, which underlies the work in [

], the input is a

sparse matrix of user-item ratings. In sequence-aware recommender systems, we in contrast assume

that the underlying data is a dataset

consisting of sequence

of user actions where each user

action

A ∈ D

has a number of attributes. A sequence dataset

can be considered as an enriched

log of actions of a user community, where the attributes of each action

includes some sort of

user ID

and additional optional attributes like the action type (e.g., an item view or click event) or

a timestamp.

Overall, our function

is not limited to characterizing utility scores for individual items, but

for entire ordered lists of items. This makes it possible to consider additional aspects of utility in

sequence-aware recommendation problems, including the diversity of the set as a whole, the quality

of the ordering itself, e.g., in terms of transitions between objects, or the degree of fulllment of

weak or strict order constraints in

. How these quality factors are considered within existing

algorithms will be discussed later in this work in Section 4.

Generally, the design of the utility function

depends on the specic type of value the recom-

mender system should provide to the user, or its purpose in the sense of [

]. In the literature on

recommender systems, researchers often do not explicitly discuss the underlying purpose of the

system, which could be information ltering but also discovery support. Instead, they focus on

optimizing an abstract computational task like predicting a hidden rating. The situation in the

A sequence, as usual, is considered as an ordered set of objects.

In practice, the user ID can either refer to a known user or it is created from a cookie in an ongoing user session.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:5

context of sequence-aware recommenders is often similar with the dierence that the computa-

tional task is mainly to predict the hidden elements of a session given the session beginning. While

the performance of an algorithm that predicts the next hidden user action can be assessed with

standard measures from information retrieval (such as precision and recall), in other cases specic

measures (e.g., diversity metrics) are required in the evaluation process.

2.2 Relation to Other Areas

Implicit-Feedback Recommender Systems. Our characterization of the sequence-aware recommen-

dation problem mainly targets scenarios in which we observe the individual and collective behavior

of a user community over time instead of asking for explicit item ratings. A number of research

works exist that focus on implicit user feedback like purchase events. The problem formulation is

however often based again on matrix completion, where multiple interactions of one user with an

item are not taken into account. Explicit item ratings, on the other hand, can also be taken into

account in a sequence-aware recommender as one of several types of user actions. One potential

problem in that context however is that the point in time when users provide a rating can be quite

dierent from the point in time when they consumed or purchased an item (e.g., when registering

for a movie recommendation service, users initially rate a bunch of movies they have watched in

the past). The sequence and timestamp of the ratings might therefore mislead a sequence-aware

recommender.

Context-Aware and Time-Aware Re commender Systems. In some of the application scenarios

discussed in the next sections, sequence-aware recommender systems represent a special form

of context-aware recommender systems. In session-based recommendation, the users’ short-term

intents, which can be estimated from their very last actions, can represent an important piece of

context information to be taken into account when recommending [56].

Time-aware recommender systems (TARS) usually consider time information that is associated

with past user actions to adapt the recommendations accordingly, see [

] for an overview. TARS

share a number of commonalities with sequence-aware recommenders, e.g., in terms of how we

can compare dierent approaches in oine settings. The focus of sequence-aware recommenders

is however often less on the exact point of time of the past user interactions, but on the sequential

order of the events. Furthermore, a number of proposals on time-aware recommenders mainly rely

on the matrix completion problem setting when modeling temporal dynamics [68].

Research in Other Related Fields. Some aspects of sequence-aware recommender systems were

nally explored also in neighboring elds. Examples are the problem of query suggestion in the eld

of information retrieval or the problem of interest drift in the more general eld of user modeling.

In this paper, we concentrate on works where the recommendation problem itself is the main focus,

in contrast to works that, e.g., aim to develop methods to capture changes in the user preferences

over time. When searching for papers to consider in our survey, we therefore used a corresponding

search string and selection strategy when we queried a digital library, as will be described in more

detail in Section 3.5.

3 A CATEGORIZATION OF SEQUENCE-AWARE RECOMMENDATION TASKS

We identied four main goals in the academic literature that can be achieved with the help of

sequence-aware recommender systems in dierent application scenarios:

(1) Context Adaptation

(2) Trend Detection

(3) Repeated Recommendation

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:6 M. adrana et al.

(4) Consideration of Order Constraints and Sequential Patterns

We will discuss these four categories in more detail in the next sections and will then also look

at typical application domains for sequence-aware recommenders. Note that all types of problem

settings discussed next are based on the same formal problem characterization described in Equation

1, but require specic algorithmic approaches that use the sequence information in the input datasets

(see Section 4). The problems are also not mutually exclusive and multiple aspects (e.g., trends and

repetitions) can be considered in parallel, as was done, for example, in [

] for the e-commerce

domain.

3.1 Context Adaptation

In many domains, the relevance of a recommendable item not only depends on the users’ general

preferences, but also on their current situation and their short-term intents and interests. Context-

aware recommenders take such additional types of information into account. Typical contextual

factors in the literature include the user’s geographical position, the current weather, or the time of

the day [

]. Context factors like these are examples of what is called the representational context

[28], which is dened by a predened set of “observable” context variables.

Contextual factors like the user’s current shopping intent in an e-commerce setting or their

current mood are however not directly observable. These types of information, which represent

what is called the user’s interactional context, therefore have to be derived from the users most

recent actions and eventually on behavioral patterns of the user and the community as a whole

[

]. Considering interactional context factors is particularly important for systems where

there are many new or anonymous users. Since no historical data is available about their past

preferences, it is important to make full use of interactional context information, as representational

context information can only help to partition anonymous users into coarse-grained categories,

without any real personalization [

]. Overall, understanding the users’ situation and goals and

making context-adapted recommendations from past interaction data represents a main goal of

sequence-aware recommender systems.

Categorization based on importance of long- and short-term interactions. Depending on the avail-

ability of historical data for individual users and the importance of focusing on the most recent

interactions, we can dierentiate between the following types of context-adaptation situations.

•

Last-

interactions base d recommendation: In these scenarios, only the last

user actions

are considered. A typical problem setting is that of predicting the next location (or check-in)

in a location-aware recommender system [

]. The reason to limit oneself to the last

actions could be that not many past interactions of that type exist or that the other previous

actions of the same type (e.g., check-in events) are not predictive for the next action.

•

Session-based recommendation: In this problem scenario only the last sequence of actions

of a user is known and this sequence of actions is limited to a session, i.e., a limited period

of time when the user interacted with the site. Typical application examples include news

recommendation [

], e-commerce, video and classied advertisement recommendation [

•

Session-aware recommendation: Finally, there are situations in which we have knowledge

both about the users’ actions in the last session and about their past behavior. This type

of problem setting occurs if we have returning customers that can be identied. In this

situation, a sequence-aware recommender system can be based on a combination of long-

term and short-term interest models, e.g., in e-commerce settings or for app recommendation

[9, 40, 56, 91].

Note that our problem denition in Equation 1 covers all three scenarios, i.e., the output is a ranked

list of items. In the case of a session-aware adaptation problem, the underlying sequence dataset

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:7

is however usually split into two components, where one that contains the older interactions is

used for building a long-term model, and the other is used to consider short-term user intents. How

the dierent models are learned or combined then depends on the specic algorithmic approach

that is used to maximize the utility function

, which might for example return higher scores when

recommendations are a mix of familiar and novel items for the user.

On utility functions and recommendation purposes. Most of the papers on context-adaptation

problems in the literature do not make explicit statements about the characteristics of the utility

function in the application scenarios they considered. As mentioned above, they in most cases

implicitly dene the goal through the evaluation procedure and aim to predict hidden elements of

a given user session. Thereby, they implicitly assume that this next action is in some sense the best

recommendation for the given purpose.

The task of recommenders in context-adaptation scenarios can most often be characterized as

“nd matching items” for a given session beginning, without any further explicit specication

of what represents a good recommendation. In some works – and in practical environments – a

number of more specic purposes can be identied, see also [

]. The task of a recommender can

be, for example, to create a list of alternatives for the currently inspected items (similar items).

In other applications, in contrast, the task can be to determine complements, e.g., accessories to

a main shopping item in e-commerce. In yet other application domains the recommendations

should represent suitable or logical continuations of either the current session (e.g., next-track music

recommendations) or the user’s longer term behavior (e.g., next-basket recommendations). Finally,

we can dierentiate if the user is assumed to pick one of the recommendations (e.g., one alternative

in e-commerce scenarios), or consider all of them together (e.g., playlist recommendation for audio

and video streaming). This latter scenario was recently addressed in [

101

] for the news domain. In

their work, the authors model the user’s expected utility of an item during the course of session

and try to diversify the recommended content within a session accordingly.

3.2 Trend Detection

The detection of trends in a given sequence dataset is another potential, but less explored, goal

that can be accomplished by sequence-aware recommenders. We can distinguish between the

following types of information that can be extracted from sequential log information to be used in

the recommendation process.

•

Community trends. Considering the popularity of items within a user community can be im-

portant for successful recommendations in practice, e.g., in streaming media recommendation

[

]. Since the popularity of items can change over time in dierent domains, sequence-aware

recommenders can aim to detect and utilize popularity patterns in the interaction logs to

improve the recommendations. Such trends can be long-term (e.g., things becoming outdated

or out-of-fashion over time), seasonal, or reect short-term and one-time popularity peaks.

In the fashion domain, for example, considering the community trends of the last few days

can represent a successful strategy when selecting items for recommendation [54].

•

Individual trends. Changes in the interest in certain items can also happen at an individual

level. These interest changes can be caused when there is a “natural” interest drift, e.g., when

users grow up, or when their preferences change over time, e.g., due to the inuence of other

people, due to exceptional events, or when they discover something new. In the news domain,

for example, individual interests change over time and are inuenced by global and local

news trends [

]. Another example problem is the task of modeling the dynamics of the

musical taste of users [85].

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:8 M. adrana et al.

3.3 Repeated Recommendation

In some application domains, recommending items that the user already knows or has purchased

in the past can be meaningful. Such scenarios are not considered at all in the traditional matrix

completion setup. We can identify the following categories of repeated recommendation scenarios.

•

Identifying repeated user behavior patterns. Past interaction logs can be used by sequence-

aware recommenders to identify patterns of repeated user behavior. A typical application

example could be the repeated purchase of consumables, like printer ink. Such patterns can

be both mined from the behavior of individual users, as in [

109

122

123

], or the community

as a whole. Repeated user actions are particularly relevant for app recommendation problems.

In this context, patterns of repeated user behavior can be used to provide shortcuts to

applications that are frequently launched in a certain sequence by the user. An example is to

suggest to launch the “e-mail” or “calls” app after opening the “contacts” app. The general

goal here is to enhance the user experience with the device [9, 78, 87].

•

Repeated recommendations as reminders. In a dierent scenario, repeated recommendations

can help to remind users of things they found interesting in the past. Depending on the

domain, these reminders could relate to objects that the user has potentially forgotten (e.g., an

artist that she or he liked in the past), or to objects that the user has recently interacted with

[

]. The latter scenario is particularly relevant in e-commerce, and the recommendation of

recently-inspected items is common on platforms like Amazon.com.

Note that from the viewpoint of our problem characterization in Section 2, the two scenarios

are identical. In the rst case, however, there is often an underlying “logical” reason why an item

should be recommended again, which is not the case for scenarios of the latter class, i.e., the

recommendation of assumedly “forgotten” or recently relevant items.

In both mentioned scenarios, besides the selection of items to repeatedly recommend, a sequence-

aware recommender has to reason about the timing of the recommendations. In the reminding

scenario in e-commerce, the time frame to remind users of previously seen items might be narrow

and objects may become obsolete soon, e.g., if they were not purchased after a few view events.

Nonetheless, always reminding users of items they have inspected in the last session might be

inappropriate if the user’s current shopping intent does not match that of the previous session. In

the context of the recommendation of consumables, items can be repeatedly recommended after

longer periods of time, e.g., weeks or even months. Proper timing can however still be important

and such types of repeated recommendations share similar problems as proactive recommenders

[26], e.g., that their recommendations might interrupt the user at the wrong time.

3.4 Consideration of Order Constraints and Observed Sequential Paerns

In Section 3.1 on short-term context adaptation, we have discussed dierent tasks of sequence-

aware recommender systems in which the order of objects – either in the logs, in the current

session, or in the recommendations – can play a role. Considering such orderings can be central to

sequence-aware recommendation tasks also outside the context of the user’s last session, which is

why we discuss this aspect in some more depth in this section.

Specically, there are two types of information about sequentiality one can additionally consider

when determining a suitable order of the recommendations.

(a)

First, there can be “external” domain knowledge in the form of strict or weak ordering con-

straints that prescribes the ordering. In the domain of recommending a sequence of learning

courses, for example, there might be strict requirements regarding the order of dierent courses

that have to be considered by the recommender, e.g., when one cannot attend one course before

another one was completed [

113

]. In the domain of movie recommendation, in contrast,

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:9

it might be reasonable to recommend a sequel to a movie only after a user has watched the

preceding episode. Such a constraint is however not necessarily strict.

(b)

Second, one can try to identify such sequential consumption patterns from the user behavior,

and, e.g., automatically infer that users who watched a certain movie later on watched its

sequel. From a technical perspective, sequential pattern mining techniques have, for example,

been applied in dierent application domains of recommenders, e.g., for predicting the next

navigation actions of users on websites [

] or the nd next tracks to play in music

recommendation problems [15].

When considering these factors that can inuence the ordering, a number of application-specic

variations might have to be considered, among them the following.

•

Importance of the order of the recommendations. Depending on the specics of the applica-

tion scenario and the goals of the recommendation service, the order of the recommended

items can be relevant or not, even in the same domain. In the context of next-track music

recommendation, one can try to determine a playlist continuation where all elements are

generally a good t for the current listening session [

]. In contrast, one could however also

try to make sure that also the transitions between the tracks are smooth or that the resulting

playlist has certain characteristics, e.g., a continuous increase of the tempo [80].

•

Importance of the exact order of the past events. Similarly, the exact order of the past events

may or may not be relevant for the recommendation task. Again in the domain of the music

recommendation, one can try to look for sequential patterns in past sessions as in [

] or

simply consider track co-occurrence patterns, as done in the neighborhood-based approach

in [

]. In cases where the order is relevant, one might additionally consider the age of the

individual events in the log and reduce the importance of older events.

•

Existence of implicit order constraints. When recommending complements (e.g., accessories for

a given item), there might be implicit constraints of what can be reasonably recommended and

these constraints can depend on the specic domain or product category. One can recommend

a memory card as an accessory to a camera, but not the other way round. For other categories,

however, recommendations in both directions can be plausible.

In general, besides the identication of sequential patterns from past data to select and rank items,

an additional task for the recommender is to ensure that such strict and weak order constraints are

respected in the resulting lists.

3.5 Categorization of Existing Works

In this section we will classify existing research works according to our categorization scheme

in order to obtain a better understanding of which aspects of sequence-aware recommenders are

comparably well explored and which areas need further investigation.

We selected the papers to be considered in this categorization as follows. We issued a query to

the ACM Digital Library using a set of relevant search terms

, sorted the rst search 1,000 results

by relevance and manually inspected the abstracts of the rst 100 papers. If a paper made explicit

use of sequences of user actions it was considered relevant and its referenced papers were scanned

as further potential candidates to consider. Table 1 shows a summary of the papers that were nally

considered relevant for this work.

In the table, the rst column shows the context-adaptation type of each approach according to our

categorization. The second column indicates which form of ordering constraints were considered;

the third column mentions the main application domain addressed in the papers.

The exact query was “sequence sequential trend next basket bundles repeat session order + recommend”

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:10 M. adrana et al.

Table 1. Categorization of works regarding Context-Adaptation Type, Order Constraints and Domain.

Paper Context Ord. Domain

Baeza-Yates et al. [9] SA I APP

Chen et al. [19] LI WI MUS

Chen et al. [20] LI WI MUS

Cheng et al. [21] LI I POI

Chou et al. [22] LI I MUS

Feng et al. [30] LI I POI

Garcin et al. [32] SB I NEWS

Grbovic et al. [37] LI I EC

Greenstein-Messica et al. [38] SB I EC

Hariri et al. [40] SB WI MUS

He et al. [42] SB I QRY

He and McAuley [43] LI I EC,POI

He et al. [41] LI I POI

Hidasi et al. [46] SB I EC,VID

Hidasi et al. [45] SB I EC

Hosseinzadeh Aghdam et al. [47] LI I MUS

Hsueh et al. [48] LI I EC

Jannach et al. [56] SA I EC

Jannach et al. [57] SA WI MUS

Jannach and Ludewig [54] SA — EC

Lerche et al. [70] SA I EC

Letham et al. [71] LI I EC

Lian et al. [72] LI I POI

Lim et al. [73] LI S POI

Liu et al. [76] LI SI POI

Liu et al. [77] LI I POI

Lu et al. [78] SB I APP

Maillet et al. [80] — W MUS

McFee and Lanckriet [81] LI WI MUS

Mobasher et al. [83] SB I WWW

Moling et al. [84] SB — MUS

Moore et al. [85] LI WI MUS

Nakagawa and Mobasher [86] SB I WWW

Natarajan et al. [87] SA I MUS,APP

Paper Context Ord. Domain

Parameswaran et al. [88] — S CS

Pauws et al. [89] — S MUS

Quadrana et al. [91] SA I EC,VID

Reddy et al. [92] — W CS

Rendle et al. [94] LI I EC

Rudin et al. [95] LI I EC

Shani et al. [97] LI I EC,WWW

Soh et al. [98] SB I OTH

Song et al. [99] SB I EC,VID

Song et al. [100] SB I NEWS

Sordoni et al. [102] SB I QRY

Tagami et al. [103] LI I ADS

Tavakol and Brefeld [104] SB I EC

Trevisiol et al. [105] SB I WWW,NEWS

Turrin et al. [106] SB I MUS

Twardowski [107] SB I EC

Vasile et al. [108] SB WI MUS

Wang and Zhang [109] LI I EC

Wang et al. [110] LI I EC

Wu et al. [111] SB WI MUS

Xiang et al. [112] SA I OTH

Xu et al. [113] — S CS

Yan et al. [114] SB I QRY

Yap et al. [115] LI I WWW

Yu and Riedl [117] LI S OTH

Yu et al. [116] LI I EC

Zang et al. [118] LI I EC

Zhang et al. [121] LI I ADS

Zhang et al. [120] LI I POI

Zhao et al. [122] LI I EC

Zhao et al. [123] LI I EC

Zheleva et al. [124] SA — MUS

Zhou et al. [125] LI I WWW

Zimdars et al. [126] LI I WWW

Context SA: Session-aware, SB: Session-based, LI: Last-N interactions; Order Constraints W: Weak, I: Inferred, S: Strong; Application

domain APP: App recommendation, MUS: Music domain, QRY: Query recommendation, EC: E-Commerce domain, CS: Learning courses

recommendations, AD: Advertisements, WWW: Web navigation recommendation, OTH: Others

3.5.1 Categorization based on Context-Adaptation Type. We dierentiate between session-aware,

session-based, and last-interaction adaptation approaches. Figure 2 shows the number of research

works that fall into the dierent categories.

Last-N

interaction(s)

Session-based Session-aware No context adapt

Fig. 2. Distribution of types of context adaptation in the analyzed research works.

We can see that a large fraction of the papers focuses only on the very last visited object when

determining the next object to be recommended. A slightly smaller number of papers consider

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:11

session-based recommendation scenarios. However, we can observe increased interest in such

session-based recommendation problems in the recent years, which is fueled both by the industry

and by the emergence of new deep learning based sequence-learning techniques, as we will discuss in

Section 4. In fact, more than half of the papers on session-based and session-aware recommendation

were published starting from 2014. Finally, even fewer papers also take the long-term preferences

of the user into account. While short-term user intents might in many cases be more relevant than

long-term preferences, some works – like the ones by [

] – show that considering long-term

preferences can be important to achieve more accurate recommendations.

The limited number of works on session-aware systems might partially be due to the lack of

publicly available benchmark datasets. Overall, however, the results shown in Figure 2 indicate

more research is required to better understand the interplay between long-term and short-term

preferences in sequence-aware recommenders.

3.5.2 Order Constraints. As the column labelled “Ord.” in Table 1 shows, most existing works in

sequence-aware recommendation rely on implicitly derived (and correspondingly weak) ordering

constraints and only a few works address scenarios where explicit and strong order constraints

have to be considered. This can again be caused by the focus of the research community on

certain application domains like media (movie) recommendation [

], where ordering constraints

exist – consider movie sequels or follow-up news stories – but are often not considered in the

corresponding algorithmic approaches.

3.5.3 Categorization based on Domain. Figure 3 nally shows in which application domain

sequence-aware recommenders were investigated. In the following, we give examples of typical

research works in each domain.

E-commerce Music POI WWW Apps Others

Fig. 3. Distribution of research works per application domain.

E-commerce is the most often investigated domain, which is not surprising as session-based

recommendation and session-aware recommendation, as dened above, are common tasks in

e-commerce scenarios and investigated, e.g., in [

] or [

]. Other problem settings in the e-

commerce domain include next-basket recommendation [

110

116

] and the problem of reminding

users, e.g., in [122, 123] and [70].

Music recommendation is another “natural” application domain of sequence-aware recommenders.

The consumption of music is often session-based and the listener’s interest can change strongly

from one session to another. The user experience can furthermore be inuenced by the order in

which the tracks are played, i.e., weak ordering constraints can exist between the tracks. Such

constraints can either be explicitly given by the user [

] or can be inferred from listening logs of

For example, a major e-commerce platform provided a new dataset in the context of the 2015 ACM RecSys Challenge about

session-based recommendation.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:12 M. adrana et al.

a user community as done, e.g., in [

] and [

]. Constraints can be inferred on the basis of the

transitions between pairs of songs [

], or based on metadata and other attributes of tracks that

are usually “played together” [

108

]. This knowledge can then be leveraged by sequence-aware

recommenders to generate playlists or plausible continuations of listening sessions, see, e.g., [

]

or [106].

POI recommendations make predictions on the user’s next location or make recommendations for

the next place to visit. Moving between places is a sequential process, where the user’s movements

are usually limited by distance, time, or budget constraints. Sequence-aware recommenders have

been applied in this context in dierent ways to predict the next user location, e.g., based on the

user’s current location [

]. Considering several past user locations has shown to be helpful,

for example, for travel planning [

], or when predicting which place the user will most probably

visit at a specic time [77].

Web navigation prediction is an early application area of sequence-aware recommenders [

125

]. Web browsing is usually a sequential process and next-page visit predictions can help to

recommend users interesting links that t their current browsing session or to pre-load web pages.

App recommendation or app usage prediction is a more recent application eld of sequence-aware

recommenders, where considering the user’s current context is crucial. A typical example of a

research work is described in [

], where repeated usage patterns are mined from activity logs to

pre-fetch applications or to make contextual suggestions on which app to use.

Others. Finally, there are a number of other application domains of sequence-aware recommenders

which have however only be explored by a few research works. Examples include advertisement

recommendation [

121

], news recommendation [

], job posting recommendation [

], the recom-

mendation of videos in streaming platforms [

], query recommendation [

102

], or the

recommendation of next activities to add during workow modeling [59].

3.5.4 Specific Tasks of Sequence-aware Recommendation. Table 2 lists works that considered the

problems of repeated recommendations and trend detection.

Table 2. Categorization of works regarding Repeated Recommendation and Trend Detection.

Repeated Recommendation

• Find items for repeated recommendation [54, 70]

• Find timing for repeated recommendation [76, 77, 109, 122, 123]

Trend detection

• Detect individual trends [85, 109]

• Detect community trend [100, 109]

• Detect seasonal trends [109]

Repeated recommendations, while useful in practical applications [

], were rarely the focus

of researchers in the context of sequence-aware recommenders. Similarly, the detection of trends

has also not been investigated to a large extent, even though a recent work suggest that taking

in particular short-term trends can be important in the e-commerce domain [

]. A few works

consider community trends and some investigate seasonal aspects. Interestingly, only two works

analyzed here considered individual consumer trends and potentially outdated user tastes [

109

Finally, there are only few works that explicitly mention that the goal is to recommend similar

items [

] or complements [

104

]. A large number of papers in the literature focus explicitly

on next-item recommendation [

102

121

125

] and list continuation

[

117

]. The problem of next-basket recommendation in e-commerce was also the focus of

only a few works [72, 94, 110, 116].

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:13

Table 3. Taxonomy of algorithms for sequence-aware recommender systems.

Class Sub-class Family Examples of research works

Sequence Learning Frequent Pattern

Mining

Frequent itemsets [48, 56, 83, 86, 106, 118]

Frequent sequences [78, 83, 86, 95, 115, 125]

Sequence modeling Markov Models [32, 42, 47, 81]

Reinforcement

Learning

[84, 97, 104]

Recurrent Neural

Networks

[29, 45, 46, 77, 98, 100, 102, 107, 116, 121]

Distributed item

representations

Latent Markov

embeddings

[19, 20, 30, 111]

Distributional

embeddings

[9, 27, 37, 92, 103, 108, 124]

Supervised models w/

sliding window

[9, 110, 126]

Matrix factorization [107, 117, 122, 123]

Hybrid methods Factorized Markov

Chains

[21, 41, 43, 72, 94]

LDA/Clustering w/

sequence learning

[40, 87, 99]

Others Graph-based [77, 105, 112, 120]

Discrete optimization [57, 73, 89, 113]

4 ALGORITHMS FOR SEQUENCE-AWARE RECOMMENDER SYSTEMS

We can identify three main classes of algorithms in the literature that are used for the extraction

of patterns from the sequential log of user actions: sequence learning, sequence-aware matrix-

factorization and hybrids. Each of these classes can be further decomposed into subcategories,

leading to the taxonomy of algorithms in Table 3. A few comparably uncommon technical approaches

are listed under the category “Others”.

The majority of the reviewed works rely on some form of sequence learning methods. Such

approaches are a natural choice for the given problem and frequent pattern mining techniques,

for example, have been applied for a long time, e.g., for the prediction of user navigation patterns

on websites [

]. Other approaches, in particular the ones based on recurrent neural networks

and distributed item representations, were only recently successfully transferred from domains

like Natural Language Processing to sequential recommendation problems [

]. We discuss the

dierent approaches in more detail in the next sections.

4.1 Sequence Learning

Sequence learning methods are useful in application domains where the data to be analyzed has

an inherent sequential nature, like in Natural Language Processing, time-series prediction, DNA

modeling, and, as the focus of this work, sequence-aware recommendation.

4.1.1 Frequent Paern Mining (FPM).

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:14 M. adrana et al.

Methods. Frequent Pattern Mining (FPM) techniques were originally developed to discover user

consumption patterns within large transaction databases. Early Association Rule Mining approaches

[

] focused on identifying items that frequently co-occur in the same transaction, regardless of the

order of their appearance. Later on, Sequential Pattern Mining [

] techniques were developed that

considered item co-occurrences only as a pattern when the items appeared in the same order. An

extreme case nally are Contiguous Sequential Patterns, which require that the co-occurring items

are adjacent in the sequence of actions within a transaction.

Technically, in all approaches the patterns are mined in an oine process and usually translated

into a set of association rules or another compact form of knowledge representation, see, e.g.,

[

115

]. Usually, the resulting rules have values attached (e.g., condence and support) that express

their strength. In the prediction phase, we are given a partial transaction (e.g., an item that a user has

recently bought) for which we seek additional items. These items are then determined by scanning

the database for matching rules and by applying them on our partial transaction. In recommendation

scenarios, the most simple approach is to determine only pairwise item co-occurrence frequencies

in order to implement buying suggestions of the form “Customers who bought ... also bought”.

Application examples. In one of the earlier works on sequence-aware recommenders [

researchers compared Association Rules (AR), Sequential Patterns (SP) and Contiguous Sequential

Patterns (CSP) for a web usage mining scenario, where the problem is to predict a user’s next navi-

gation action for page prefetching or for context-adaptation in session-based scenarios. Technically,

they used a xed-sized sliding window of the current user session in the prediction phase. Given

the last

user actions in this window they look up and apply rules of size

N +

1 and rank the

recommendations (i.e., the elements of the rule consequents) based on the condence value of the

ring rules.

The obtained results showed that the less constrained patterns (AR and SP) lead to

better recommendations, whereas the usage of CSPs was more helpful for the page prefetching task.

Other kinds of sequential patterns, such as closed patterns and negative patterns, were explored

too. See, for example, the works of Zang et al.

[118]

and Hsueh et al.

[48]

for context-adaptation

based solely on the last few user actions.

When using frequent pattern approaches, personalization is obtained by matching the activity of

the user with the pre-extracted patterns. In a more recent work, Yap et al.

[115]

propose to further

personalize the method and to weight the patterns according to their estimated relevance for the

individual user. They designed three schemes to determine personalized pattern relevance scores

and compared it to a popularity-based method. Their empirical evaluation indicates that applying

personalized rule scoring schemes yields more accurate personalized next-item recommendations

for the target users.

As an example of a comparably recent application domain, Lu et al.

[78]

present the MASP (Mobile

Application Sequential Patterns) mining method for the problem of providing context-adaptation to

smartphones by predicting the user’s next used app. In their approach, transactions are composed

of sequences of app usages which are annotated with the location of the user at each time. The

MASP-mine algorithm takes into account both the user movements and app launches to discover

the sequential patterns. At prediction time, the single pattern with the maximal support value that

matches the recent user movements and activity is used to predict the app that is launched next.

Discussion. Frequent pattern mining techniques are well-explored and also easy to implement

and interpret. The main drawbacks include the limited scalability of some of the approaches and,

probably more importantly, the problem of nding suitable threshold values for the oine mining

task. Usually, the main parameter is the minimum support value. If it is set too low, too many (often

A technically similar approach was later on proposed by Zhou et al. [125].

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:15

noisy) patterns are identied; if it is set too high, only rules for the most frequently occurring items

will be found. To deal with the problem, one can start to search for the highest-quality rules in

a database that was created with a xed minimum-support threshold, and iteratively relax the

quality constraints until a matching rule is found. Alternatively, dierent algorithm variants were

proposed that use multiple minimum support values or “adjusted” condence scores [95].

Another common challenge when using frequent pattern mining techniques is to decide between

the dierent variants (AR, SP, CSP). Determining sequential patterns (SP, CSP) is often not only

computationally more expensive given the more strict type of rules to be mined

, but it can also

easily lead to much smaller rule bases. This in turn can result in situations at prediction time where

no matching rule is found. Which method works best can furthermore depend on the application

domain. The ordering of the events can be very important, e.g., in the context of query or app

recommendation [

125

]. In other problem settings, e.g., web page recommendation or next-track

music recommendation, considering the item orderings might result in too small rule bases or

have only small positive eects [

] that probably do not outweigh the additional computational

complexity. Finally, in some domains, simple co-occurrence patterns were despite their simplicity

employed with good success, e.g., in e-commerce and music recommendation [52, 56, 106].

4.1.2 Sequence modeling.

Methods. The input to sequence-aware recommenders, i.e., the ordered and often time-stamped

log of past user actions, can be considered as a time series with discrete observations. Therefore, in

many cases existing and sometimes complex time series prediction methods could in principle be

applied. In RS applications, however, the time-stamps are often merely used to sort the actions

which allows us to apply “simpler” sequence models, that do not necessarily consider the complex

underlying temporal dynamics of the observed sequences of actions.

In general, sequence modeling techniques aim to learn models from past observations to predict

future ones, which are in our case user actions. Sequence modeling methods for sequence-aware

recommendation mainly belong to three categories: Markov Models, Reinforcement Learning and

Recurrent Neural Networks (see Table 3).

•

Markov Models consider sequential data as a stochastic process over discrete random variables

(or states). The Markov property limits the dependencies of the process to a nite history.

For example, in rst-order Markov Chains (MCs) the transition probability of every state

depends only on the previous state. Higher-order MCs use longer temporal dependencies to

model more complex relationships between the states

. In sequence-aware recommender

systems, the Markov property translates into assuming that the next user actions depend

only on a limited number of the most recent preceding actions.

•

Reinforcement Learning (RL) techniques learn by interacting with the environment, and are

sequential in nature. In a recommendation scenario, the interaction consists of a recommen-

dation of an item to the user (the action) for which the system then receives a feedback (the

reward)

. For instance, in the music domain, the system recommends a song and monitors

if the user listens to or skips the recommended song. In this example, we assign a positive

reward if the user listens to the song and zero otherwise. The problem is typically formulated

as a Markov Decision Process (MDP) and the goal of the system is to maximize the cumulative

reward computed over a number of interactions.

See [5, 6] for a detailed analysis on the computational complexity of FPM methods.

The work of Du et al. [29] represents an exception in that respect.

See [31] for details on Markov Models for pattern recognition.

See [65] for a comprehensive review on Reinforcement Learning.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:16 M. adrana et al.

•

Recurrent Neural Networks (RNNs) are distributed real-valued hidden state models with non-

linear dynamics

. At each time step, the hidden state of the RNN is computed from the

current input in the sequence and the hidden state from the last step. The hidden state is

then used to predict the probability of the next items in the sequence. The recurrent feedback

mechanism memorizes the inuence of each past data sample in the hidden state of the RNN,

hence overcoming the fundamental limitation of MCs. RNNs are therefore well suited for

modeling the complex dynamics in user action sequences. Variants of RNNs such as LSTM

[

] and GRU [

], by means of their sophisticated hidden dynamics, can model much longer

and complex temporal dependencies than other approaches like Hidden Markov Models [

Application examples (Markov Models). Markov Chains in most cases cannot be naively applied

to sequence-aware recommendation since data sparsity quickly leads to poor estimates of the

transition matrices. Shani et al.

[97]

therefore enhance their MC-based approach with several

heuristics – namely skipping, clustering and nite mixture modeling – to mitigate the impacts of

data sparsity. In their application, the input data consists of purchase records of an online book

store and their experiments show the superiority of sequential models over non-sequential ones in

predicting what books to recommend next given the sequence of the last few user interactions. In a

dierent application domain, McFee and Lanckriet

[81]

use Markov Chain mixtures for the problem

of music playlist generation, a typical list continuation problem, where a mixture is a weighted

ensemble of several MCs (uniform, weighted and k-nearest neighbors) whose weights are learned

via maximum-likelihood optimization on a training dataset of playlists.

Another challenge when applying MCs usually lies in the choice of the order of the model.

In the context of a session-based next-query recommendation problem, He et al.

[42]

therefore

use a mixture of Variable-order Markov Models (VMMs, sometimes called context trees

), which

use a context-dependent order to capture both large and small Markov dependencies. A dierent

approach is adopted by Garcin et al.

[32]

in the context of a news recommendation application.

They assign one predictor (expert) to each context (node) in the context-tree, where each node is

associated with a dierent order of the Markov model. Each predictor is then trained to predict the

article to suggest next given the last sequence of user actions. As the sequence of actions of the

user grows, deeper nodes in the tree become active and contribute to the nal recommendations.

Hidden-state models, in particular Hidden Markov Models (HMM), address some of the lim-

itations of MC models and are, for example, applied in [

] for the contextual next-track music

recommendation problem. In HMMs, each hidden (or latent) state is a discrete variable associated

with a probability distribution over the observed variables. In conformity with the Markov property,

every hidden state is conditionally dependent only on the previous one [

]. There already exist a

few applications of HMMs to time-aware collaborative ltering [

119

]. In [

], the hidden states

is used to model the (unobserved) context of the user. Discrete-valued hidden states are however

limited in terms of the contextual information they can store, which to some extent limits their

applicability for sequence-aware recommendation.

Application examples (Reinforcement Learning). Reinforcement learning based on MDPs are used

in [

] and [

] for sequence-aware recommendation in on-line shops. These approaches make it

possible to tailor recommendations not only to the recent user activity but also to the expected

reward (income) for the shop. Since the state space of MDPs can quickly become unmanageable in

realistic scenarios, Tavakol and Brefeld

[104]

factorize the space over a set of mutually independent

item attributes. In their application case of a clothing marketplace, dress characteristics such as

See [74] for an overview of Recurrent Neural Networks.

See [11] for a comprehensive analysis of algorithms for learning VMMs.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:17

category, color, or price can be considered; the sequential relationships between attributes are then

independently modeled by an MDP to predict the characteristics of the products the user will likely

search or buy next.

Application examples (RNNs). Zhang et al.

[121]

use RNNs for click prediction for online adver-

tisements. At each time step, they train the RNN to predict the next click of the users given their

last click and the previous state of the network using a classication loss measure (cross-entropy).

For the domains of e-commerce and media recommendation, Hidasi et al.

[45]

explore the

use of Gated Recurrent Units (GRUs) [

] – a variant of RNNs – for modeling user activity in a

session-based scenario. Technically, as in [

121

], the model is trained to predict the next item in a

sequence given the current one, but dierent loss functions were used, which at the end led to

better performance. Later on, Hidasi et al.

[46]

included item features into the sequence model. The

proposed parallel-RNNs model is able to capture the interdependencies between item identiers and

their features, leading to a further improved recommendation accuracy. Quadrana et al.

[91]

employ

hierarchical RNNs for modeling user preferences across sessions. The rst level of the hierarchy,

which represents the state of the user across sessions, is used to initialize and propagate information

to the second level of the hierarchy, which is used to generate recommendations within a session.

Transferring the information from prior sessions ultimately led to better recommendations. Yu et al.

[116]

nally employ RNNs for next-basket recommendation in e-commerce. In their case, the RNN

is fed with real-valued representations of the baskets as an input – i.e., the sets of items the user

has interacted with at each step of the sequence – and trained to rank the items in the next basket.

A number of further technical variants of RNNs and problem encodings were recently proposed

for dierent application domains like e-commerce, online ads, and query or POI recommendation.

Twardowski

[107]

, for example, addresses the problem of providing personalized recommendations

to anonymous users by using RNNs to generate a compact representation of the user’s interactional

context. Such representation is later combined with item representations to genrete contextual

recommendations. Liu et al.

[77]

aim at detecting interest trends of individual users over time in

an RNN-based POI recommender system. Both Sordoni et al.

[102]

and Soh et al.

[98]

address the

context adaptation problem. Sordoni et al.

[102]

propose a generative model for session-based query

recommendation based on Hierarchical Recurrent Encoder-Decoder neural networks. The higher

level of the hiercarchy encodes the latent intent of the user in the current search session, while the

lower level generates next-query suggestions tailored to this intent. Soh et al.

[98]

employ RNNs to

represent sequences of user interactions to personalize user interfaces.

Song et al.

[100]

propose a context adaptation approach to build a session-aware recommender

that can handle long-term (e.g. seasonal) and short-term changes in user preferences in the news

domain. In their work, the authors propose the Temporal Deep Semantic Structured Model to

combine user and item features with user temporal features into a joint model. Static features are

modeled by several feed-forward neural networks, whereas the temporal features are modeled by a

set of RNNs.

Du et al.

[29]

address the list continuation problem in a wide range of application domains,

from taxi drop-o to nancial transactions. Given data with timestamps for the actions, their

model combines RNNs with Marked Temporal Point Processes, a mathematical tool that models

the distance between subsequent events as a random process. The idea of modeling the distance

between subsequent events as a random process is used also by Wang and Zhang

[109]

, who

employ a hierarchical Bayesian framework based on hazard models to represent the probability of

purchasing a product at a given time, while modeling time-dependent patterns between follow-up

purchases at the same time. Both models allow to compute the utility of recommendations as a

function of the sequence of past user actions and of the recommendation timing.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:18 M. adrana et al.

Discussion. Sequence modeling approaches have been successfully applied for a variety of

sequence-aware recommendation problems. Our review however shows that Markov Models

in most cases cannot be directly applied in a naive manner for sequence-aware recommendation

problems, due to problems of data sparsity and computational complexity, which is why researchers

often rely on specic model variants or embed heuristics into the learning process. Still, it is not

always fully clear if such model variants scale to real-world problems.

Deep learning based techniques, on the other hand, have been increasingly explored in the past

few years. Among the factors that contributed to the recent revival of neural networks are the

availability of large data sets for training and the increased computational capacities of modern

computer hardware.

The computational demands of deep learning approaches can however still represent a barrier to

their practical use, e.g., because of the need for testing and optimizing dierent (hyper-)parameter

congurations. Despite these limitations, researchers should not stop exploring more advanced

algorithmic approaches in the future, given the sometimes low quality impression of today’s

recommender systems in practice [

] and the fact that some of today’s rst deep learning based

approaches are sometimes not yet much better than computationally more simple approaches

[

]. More research in this area is also particular important as naive methods can have certain

limitations like a bias to recommend mostly popular items, which are not captured by today’s

problem formulations and common performance measures like precision and recall.

4.1.3 Distributed Item Representations.

Methods. Distributed item representations are dense, lower-dimensional representations of the

items. The representations are derived from sequences of events and preserve the sequential

relationships between the items. Similar to latent factor models, every item is associated with a

real-valued embedding vector, which represents its projection into a lower-dimensional space in

which certain item transition properties are preserved. For example, some representations are based

on the co-occurrence of items in similar contexts [

108

]. Others translate pairwise transition

probabilities into distances in a Euclidean space [

]. In sequence-aware recommendation

problems, we can recommend the next item(s) given the user’s most recent actions by traversing

the embedding space in a stochastic fashion [

] or by searching the nearest neighbors to the last

item(s) that were explored by the user [37].

Application examples. Distributed Item Representations have been explored, for example, in the

domain of playlist generation. Zheleva et al.

[124]

build a session-level long-term Latent Dirichlet

Allocation (LDA) model based on the community of users and merge it with a short-term LDA

model based on the current user session. Chen et al.

[19]

propose the Latent Markov Embedding

(LME) approach, a regularized maximum-likelihood embedding of Markov Chains in the Euclidean

space. In their method, every item is projected into a space in a way that the distance between any

pair of items in this space is proportional to their transition probability in a rst-order Markov

Chain. The learning procedure directly exploits the weak ordering between tracks in playlists. The

resulting vector spaces can then be traversed stochastically to generate new sequences (in their

case, playlists) or continuations of existing ones. Later on, Chen et al.

[20]

extend LME by clustering

items and by adding cluster-level embeddings to account for locality in item transitions. Wu et al.

[111]

, on the other hand, propose a Personalized LME version where both items and users are

projected into a Euclidean space in a way that the strength of their relationship is reected by the

projection. This allows them to incorporate long-term user preferences into the model and to use

them for recommendation. More recently, [

] employ a similar approach for the recommendation

of learning courses. Students, lessons and assessments are embedded in a common latent skill space,

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:19

in which the evolution of the student knowledge is formalized by a Hidden Markov Model. The

goal of Feng et al.

[30]

is personalized POI recommendation based on the last-

points of interest

visited by the user. In their approach, they use a pairwise ranking function similar to BPR [

] to

condition the transition probabilities on the personalized ranking preferences of the users.

A dierent technical approach – in their case to deliver personalized product ads – was chosen

by Grbovic et al.

[37]

. Their work is based on leveraging the so-called “distributional hypothesis”,

which in the domain of linguistics states that semantically equivalent words occur frequently in the

same contexts

, for sequence-aware recommendation. The proposed Prod2Vec recommendation

method learns distributed item representations from sequences of emails containing purchase

receipts. Similar to the LME method, every item is projected into a lower-dimensional space.

Specically, Prod2Vec uses the skip-gram model, which projects items that tend to have similar

neighboring items (i.e., items that tend to be “surrounded” by the same set of items) close to each

other. Given the sequence of the last few observations for a user, the most similar items in the

lower-dimensional space represent the recommendation candidates. Dierent enhancements of the

recommendation scheme of Prod2Vec were proposed, including the usage of time decay factors for

older email receipts or the consideration of directed order dependencies between the interactions

[

]. Furthermore, Vasile et al.

[108]

propose Meta-Prod2Vec, an enhanced version of the skip-gram

model that conditions the item embeddings also on their metadata. Greenstein-Messica et al.

[38]

employ Word2Vec and Glove to create item embeddings based on click sequences and item metadata,

and use these embeddings to enhance session-based recommendation with RNNs.

A technically dierent approach to use distributed item representations to learn a model from

session-based user data is proposed by Tagami et al.

[103]

. The authors use the Paragraph Vector

(PV-DV) model [

], which learns an additional user vector along with the item representations. In

this approach, the user vectors computed from one day of interaction data are used as features in a

multi-label classication problem to predict the set of ads the user will click on the next day.

Discussion. Distributed representations were successfully applied in particular in the domain of

Natural Language Processing [

] and, in the context of recommender systems, as an alternative

method for representing textual data in content-based approaches [

]. While the applications

discussed in this section indicate that using such representations can be helpful for sequence-

aware recommendation scenarios as well, some embedding approaches can be computationally

demanding and sometimes require extensive parameter tuning to yield good results. Similar to

sequence modeling methods, approaches based on distributed representations require a substantial

amount of training data to be eective.

While embedding methods exploit the sequential relations between items to learn the item

representations, these methods do not make direct use of the sequence of recent user interactions to

generate the recommendations. They instead rely on approximate methods like time-decay nearest

neighbors [

] or on additional supervised learning layers [

103

] to predict the next interactions

of the user. This in turn can lead to limited eectiveness in domains with strict ordering constraints

for which sequence modeling methods can be preferable. Using an indirect approach can also

further increase the computational costs of these methods.

For some works, the signicance and practicability of some of the proposals in real-world

environments is not always fully clear. The playlist generation method of [

], for example, is not

only computationally very intensive, it was also evaluated with a specic measure (the average log

likelihood), which does not truly inform us about the quality of the resulting playlists [15].

The distributional hypothesis forms the basis of the recent Word2Vec [

] and GloVe [

] approaches for Natural Language

Processing.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:20 M. adrana et al.

…

Feature

extraction

Multi-class/

multi-label

learning method

Sliding window

…

Target

Fig. 4. Sequence-aware recommendation as supervised learning with sliding windows.

4.1.4 Supervised Learning with Sliding Windows.

Methods. Sliding window models convert the next-in-sequence prediction problem into a tradi-

tional supervised learning problem that can be solved with any classier such as decision trees,

feed-forward neural networks and learning-to-rank methods. The general idea of the approach,

which resembles autoregressive models, is as follows. A sliding window of size

is moved over

each sequence (see Figure 4). At each step, all items within the window are used to derive the

feature values of the supervised learning problem and the identier of the immediately next item is

used as target variable. As a result, the sequence prediction problem is turned into a multi-class

classication problem, or into a multi-label classication in case multiple target items are allowed.

Application examples. In an early approach of that type, Zimdars et al.

[126]

frame sequential

click prediction in a web usage mining scenario as a binary prediction problem. In their approach,

clickstream data is rst expanded by dening a set of “accumulator” variables (lagged and cache

variables) to represent the contextual and historical activity of the user. Then, a page-level proba-

bilistic decision-tree model is trained and at recommendation time the pages are ranked according

to the predicted probability of being the next page.

More recently, Baeza-Yates et al.

[9]

use contextual features that were extracted from the last

12 hours of the app usage log to predict which (mobile) app will be used next in a session-aware

scenario. Their contextual features include some basic features, such as the geolocation and phone

usage characteristics, as well as session features, which are basically the Word2Vec representations

[

] of the actions of the user in the sliding window. Finally, they use a parallelized version of the

Tree Augmented Naive Bayes algorithm for the classication (prediction) of the next used app.

In the e-commerce domain, Wang et al.

[110]

use Feed-forward Neural Networks for next-basket

recommendation. For the predictions, only the items in the previous basket were used to rank the

items for the next one (i.e., the window size is 1). The rst layer of the neural network encodes the

previous basket as xed-size real-valued feature vector that is computed by taking the element-wise

maximum (max-pooling) or average (mean-pooling) of the embeddings of the items of the basket.

The second and nal layer of the network learns to rank the items in the next basket by taking as

input the concatenation of the user’s current and previous basket’s embeddings.

Discussion. The main advantage of this class of approaches is that it is general, conceptually

simple, and that a variety of existing classication methods and libraries can in many cases be applied

o-the-shelf. There are, however, also a number of shortcomings. First, as in many classication

problems, the eectiveness of the approach depends on the quality of the feature engineering

phase. Finding the best features can be challenging and often the features are domain-dependent

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:21

and cannot be re-used across domains. Second, the obtained results can be highly sensitive on the

choice of the window size. Finally, setting up a multi-class (or multi-label) classication problem

can be computationally expensive when the set of items grows. Furthermore, the quality of the

learned model can be strongly aected by unbalanced distributions of the target variable, which is

a common situation in real-life recommendation scenarios.

4.2 Sequence-aware Matrix Factorization

Method. Sequence-aware recommendation is dierent in various ways from the traditional matrix-

completion problem formulation. However, there are a few cases where sequence information,

which is usually derived from timestamps, is also considered within algorithms that are designed

for matrix completion.

Application examples. Zhao et al.

[122

123]

focus on an e-commerce recommendation scenario,

where the problem is to ll the missing cells of a given user-item purchase matrix. Specically,

they study the problem of nding the timing for repeated recommendations. Their proposed method

factorizes the matrix using a weighted loss function that maximizes the expected utility for the

user. The novel aspect of their approach is that the utility is determined based on the observed time

intervals between purchases of each pair of items. The resulting model can predict the personalized

relevance of an item depending on the time at which the recommendations are generated and

thereby takes the available sequence information into account.

Yu and Riedl

[117]

analyze the problem of generating interactive personalized story continuations.

Stories are represented in a prex graph, in which each node represents a prex of a story, i.e., a

possible sequence of point plots. In their problem encoding, they are given a “prex-rating matrix”

to be completed, where the items are replaced by possible story prexes and users provide ratings

only to some of the prexes. Matrix factorization is used to predict the missing entries. The highest

scoring full story that descends from the current “story-so-far” of the user is used to suggest the next

plot point. The rating of the user is collected on the next plot point, and the process (factorization,

recommendation, rating) continues iteratively until the story reaches an end.

Finally, Twardowski

[107]

not only propose an RNN-based method as discussed above but

also a matrix factorization approach to session-based recommendation for the next-in-sequence

recommendation problem in e-commerce. In this approach, session events such as click, add-to-

cart, etc., as well as the items are encoded by separate latent feature vectors. Sessions are then

represented by the time-decayed sum of event vectors associated with the actions performed by the

user so far in the current session. The prediction of the next items is then based on the factorization

of the observed session-item tuples, which can be obtained with standard ranking loss functions.

Discussion. The advantage of the discussed approaches is that standard matrix factorization

algorithms from the literature can often be applied. A main challenge however lies in the denition

of a suitable and computationally feasible encoding of the given application problem. Considering,

for example, the proposal in [

117

], one not only has to collect user feedback at each step of the

sequence, but the given encoding can easily lead to a huge matrix due to the combinatorial explosion

of the possible sequences. It also requires continuous updates of the factorization model as users

have to provide new ratings for every suggested next step in the sequence.

The approaches discussed in this section can be seen as special cases of matrix factorization

algorithms for time-aware recommendation (e.g. [

]). These algorithms mostly focus on tracking

changes of user behavior over large time spans, usually through time-evolving user and item latent

In this section we focus on sequence-aware and time-interval-based algorithms and do not discuss general time-aware

collaborative ltering techniques as described, e.g., in [17] and [68].

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:22 M. adrana et al.

factors. Such methods, as discussed in more detail on [

], can however often not be updated

in real-time. In some application domains they should therefore be extended or combined with

methods that support the short-term adaptation according to the user’s current preferences [56].

4.3 Hybrid methods

Methods. Hybrid models often combine the exibility of sequence learning methods with the

robustness to data sparsity of factorization-based matrix-completion techniques. Furthermore, such

forms of hybridization enable sequence learning methods to use the power of modern collaborative

learning-to-rank models such as BPR [

], which usually cannot be easily embedded in standard

sequence learning approaches.

Application examples. Among the rst proposals for such a hybrid technique is the Factorized

Personalized Markov Chain (FPMC) method of Rendle et al.

[94]

. The method combines matrix

factorization with Markov Chains for the problem of next-item recommendation given the last-

interactions of the user, e.g., in e-commerce settings. In the case of rst-order MCs, user interactions

can be represented as a three-dimensional (user, current item, next item ) tensor. Each entry in the

tensor corresponds to an observed transition between two items performed by a specic user.

The proposed method then uses pairwise factorization to predict the unobserved entries in the

sparse tensor, i.e., to predict personalized transitions between pairs of items. Overall, FPMC can be

seen as a rst-order Markov Chain whose transition matrix is jointly factorized with a standard

2-dimensional user-item matrix factorization approach. This joint factorization at the end makes it

possible to infer the unobserved transitions in the Markov Chain from the transition pairs of other

users. At recommendation time, the items are ranked according to their likelihood to be the next

item given the last item the user has interacted with.

In addition to e-commerce settings, the FPMC method has been successfully applied to other

problem settings, e.g., for POI recommendation or check-in prediction in location-based social

networks ([

]). He and McAuley

[43]

present a variation of the FPMC method where the matrix

factorization technique is replaced by a factored Item Similarity Model (FISM) [

]. The resulting

method, named Factorized Sequential Prediction with Item Similarity Model (Fossil), models users

as a combination of the factors of the items they have interacted with, which in turn allows Fossil to

provide sequential recommendations also for cold-start users – the FPMC method is limited in that

respect since it relies on user-item matrix factorization – as long as the item representations can be

estimated accurately. The method was tested in dierent application domains such as clothes, toy,

or electronic devices recommendation.

A number of other hybrid approaches were proposed in the literature. In the context of playlist

recommendation, Hariri et al.

[40]

use Latent Dirichlet Allocation (LDA) [

] to extract latent

topics from playlists, where playlists are taken as documents and tracks as words for the LDA

step. A sequential pattern mining technique is then applied to nd patterns of such latent topics

in the playlists. When generating continuations from existing sessions or playlists, the frequent

patterns are mapped to the topics extracted from the current listening session and used to lter the

recommendations that are generated by a classical nearest-neighbor-based recommender.

Song et al.

[99]

propose the States Transition Pair-Wise Ranking Model, which combines LDA

with rst-order MCs to simultaneously model the user’s long and short-term favorites. The user’s

long-term favorites, e.g., in the movie domain, are determined by the topics generated in the LDA

step with a user-specic prior. The short term favorites are captured by an MC transition matrix.

The combined model is then basically a HMM whose latent states are controlled by the personalized

The works presented in [45] and [46] represent exceptions that combine RNNs with BPR’s ranking loss criterion.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:23

LDA generative model, which can be trained via Markov Chain Monte Carlo (MCMC) Bayesian

inference and then be used to predict the next-item given the last few interactions of the user.

Finally, in the context of next-app recommendation, Natarajan et al.

[87]

propose a method based

on behavioral clustering to introduce personalization into session-aware models. In the behavioral

clustering step, users with similar sequential behavior are grouped by applying k-means clustering

on the per-user transition matrices of a rst-order MC. A personalized PageRank algorithm is then

used to build transition models at the cluster level. At recommendation time, the user is mapped

to its corresponding cluster and the cluster-level transition model is used to provide sequential

recommendations.

Discussion. Hybrid approaches are often used to build recommendation systems due to their

capability of overcoming the shortcomings of individual methods, e.g., in terms of limited content

discovery support or in the context of user- or item cold start situations [

]. The typical challenges

when designing hybrid recommender systems include the problem of nding the best way of

combining the predictions of the dierent recommendation channels and of determining importance

weights for each individual method in the nal prediction.

Some of the discussed approaches circumvent these issues by combining dierent competing

methods in a unied model, like the combinations of Markov Chains with matrix factorization [

]

or with LDA [

], in a way that the nal recommendations are computed by a single algorithm.

Other approaches rely on a cascade of algorithms [

], where a sequential model is used to lter

the predictions generated by a sequence-agnostic one, or a meta-level approach [

], where a rst

sequence-aware model is used as basis for deriving another one.

4.4 Other methods

Only a handful of the methods proposed in the literature do not t into the above categories, and

most of them are either graph-based or based on discrete optimization techniques.

Graph-based methods. Xiang et al.

[112]

focus on context adaption to detect short-term user

intents and interests. The paper describes a graph-based approach which is evaluated based on

implicit feedback datasets in the area of social bookmarking. In their approach, long-term and

short-term user preferences are fused into a two-sided bipartite graph, the Session-based Temporal

Graph. One side of the graph connects users with the items they have interacted with in the past.

The other side of the graph connects session identiers with the items the user has interacted

with in the current session. The edges are weighted to balance the inuence of long-term and

short-term preferences. The relationships between the items are propagated through the graph via

Injected Preference Fusion, and at recommendation time the graph is traversed via a random walk

to generate session-aware recommendations.

Liu et al.

[77]

aim to detect interest trends of individual users over time for POI recommendation.

Their method integrates static user interests and evolving sequential preferences based on temporal

interval assessments. Similar to [

122

], the time intervals between POI visits are assessed from

user check-in sequences in a POI-to-POI transition matrix. Then, a bi-weighted low-rank graph is

constructed to learn the individual user’s behavioral preferences by identifying a set of common

graph bases. As a result, the static interests and evolving sequential preferences of the user are

learned simultaneously with the graph. As in [

122

], recommendations are generated by ranking

the POIs by their expected relevance for a given time period.

In the same application domain, Zhang et al.

[120]

try to detect if there are order constraints when

visiting POIs. Their method, named LORE, mines sequential patterns from location sequences and

represents the patterns as a location-location transition graph. The probability of a user visiting

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:24 M. adrana et al.

a new location is modeled through an additive

-th order Markov Chain, in order to incorporate

sequential dependencies between locations.

Finally, Trevisiol et al.

[105]

apply contextual adaptation to address the new user problem in news

recommendation. In their approach, the users’ browsing behavior is represented by two graphs: the

BrowseGraph, which collects the behavior of all user sessions, and the ReferrerGraph, a subgraph

of the BrowseGraph which is induced by browsing sessions of users coming from the same referrer

domain

. When the user enters the news portal from the referrer domain, the recommendation

candidates are chosen among the neighbors of the article in the Referrer Graph, using the Browse

Graph as fallback. The article’s neighbors are ranked according to various strategies, such as by

random, by popularity, by content similarity and by edge weight.

Discrete optimization methods. Discrete optimization is often employed for sequence-aware

recommendation problems when weak or strict ordering constraints between items exist. Typical

application examples include travel planning, learning course sequence generation, and playlist

generation.

Both Jannach et al.

[57]

and Pauws et al.

[89]

address the list continuation problem to generate

playlists in the music domain. Jannach et al.

[57]

propose a two-stage method to determine suitable

continuations to a music listening session. In a rst step, the recommendable tracks are scored

based on a variety of features (including track co-occurrences and musical characteristics). Then,

the rst elements of the resulting list are re-ranked in a greedy approach in order to optimize the

coherence of the track continuations with the recently listened tracks. Pauws et al.

[89]

include

explicit order constraints, either global to all users or local to a single user, in their model. The

authors rst formulate the problem as an integer linear programming (ILP) problem, which is

NP-hard in its solution, and then implement a local search heuristic to make the solution scalable.

Similar to [

], Xu et al.

[113]

address the list continuation problem with explicit order constraints,

but in a dierent application domain: personalized course sequence recommendation. The specic

goal is to nd a sequence of courses that (a) will minimize the time-to-graduation of the students

and maximizes their GPA, (b) at the same time matches the interests of the user and (c) respects

the sequential constraints between the courses (called the prerequisite graph). Similar to [

they propose a two-stage algorithm. First, candidate sequences with short time-to-graduation are

extracted from the prerequisite graph using a Forward-Search Backward-Induction algorithm. Then,

an online regret minimization algorithm based on multi-armed bandits is used to select course

sequences that are expected to maximize the students’ GPAs.

Generally, most discrete optimization methods for sequence-aware recommendation do not rely

on exact or exhaustive search. Instead, due to the computational complexity of the underlying

problems, they usually resort to heuristic search or greedy optimization techniques, see also [63].

4.5 Summary and Pros and Cons of Selected Approaches

The main ideas of the most important families of algorithms along with a selection of their most

typical advantages and disadvantages are summarized in Table 4. Note that the entries in the table

mainly serve as a rough orientation and that specic pros and cons for individual algorithms within

each family can exist.

Examples of referrer domains considered in the paper are Facebook, Twitter and Reddit.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:25

Table 4. Summary of the Main ideas, Pros and Cons of the algorithms for sequence-aware recommendation.

Algorithm Main idea Pros Cons

FPM

Discover patterns in user action se-

quences

• Easy implementation

• Explainable results

• Complex conguration

• Suers from data sparsity

• Limited scalability

Compute transition probabilities

over xed-length sequences

• Explainable results • Fixed transition order

• Suers from data sparsity

• Limited scalability

VMM

Compute transition probabilities

over variable-length sequences

• Variable transition orders

• Explainable results

• Suers from data sparsity

HMM Model the causal factors in user se-

quences as transitions between dis-

crete hidden states

• Learns from variable-length inputs

• Robust to data sparsity

• Limited explainability

•

Huge number of discrete parame-

ters

Directly maximize the customer

and seller reward over time

•

Dynamically adapt recommenda-

tions to future (unknown) rewards

• Under active research

•

MDP-based approaches have same

issues as MCs

• Limited explainability

RNN Model the causal factors in user se-

quences with non-linear transitions

between continuous hidden states

• Learns from variable-length inputs

• Learns long-term dependencies

• Robust to data sparsity

• Compact hidden states

• Under active research

• Complex conguration

• Limited explainability

•

Benets not fully clear in some do-

mains

EMB

Embed items into latent spaces

that preserves sequential transition

properties

• Robust to data sparsity

• Visually interpetable embeddings

• Under active research

•

Need auxiliary methods to make

recommendations

• Limited explainability

Use supervised learning over fea-

tures extracted from xed-size slid-

ing windows over sequences

• Easy implementation

•

Use o-the-shelf supervised algo-

rithms

•

Explainability depends on the cho-

sen supervised method

• Feature engineering

Dene new inputs and loss func-

tions for MF to handle sequences

• Extensive literature available

• Robust to data sparsity

• Non-trivial input and loss design

• Concerns regarding scalability

Algorithm:

FPM: Frequent Pattern Mining, MC: Markov Chains, VMM: Variable-order Markov Models, HMM: Hidden

Markov Models, RL: Reinforcement Learning, RNN: Recurrent Neural Networks, EMB: Distributed Item Representations,

SL: Supervised Learning w/ Sliding Windows, MF: Matrix Factorization

5 EVALUATION OF SEQUENCE-AWARE RECOMMENDER SYSTEMS

5.1 Common Evaluation Approaches for Recommender Systems

In academic environments, the evaluation of recommender systems is dominated by “oine”

(simulation-based) experiments on historical rating or implicit feedback datasets. The common

oine evaluation methodology when using the matrix completion abstraction is to split the given

preference data into training and test splits, use the training data to learn a model and predict the

held-out preferences based on this model. The quality of the outputs of an algorithm can then be

assessed with the help of measures such as RMSE, or Precision and Recall. In addition one can

analyze a number of other quality factors of the recommendations, e.g., in terms of the diversity of

the list, the general popularity of the recommended items, or the algorithm’s catalog coverage.

User (laboratory) studies are an alternative to oine experiments. Such studies are often used to

assess the potential impact of recommendations on the behavioral intentions of users, an aspect

which cannot be determined based on simulation studies. Finally, eld studies (e.g., in the form of

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:26 M. adrana et al.

A/B tests) are used to analyze the eects of recommenders on their users in real-world environments.

This latter form of evaluation is however comparably rare in academic environments.

5.2 Oline Evaluation of Sequence-aware Recommenders

While in principle all three mentioned evaluation forms (oine studies, user studies, eld tests)

can be applied for sequence-aware recommenders, slightly dierent evaluation methodologies are

used in oine studies. Generally, several aspects of the evaluation protocols used for sequence-

aware recommenders are closely related to those that are used for Time-Aware Recommender

Systems (TARS), for which Campos et al.

[17]

provide a formalization and a detailed analysis. In

our subsequent analysis we will therefore take this framework as a reference.

5.2.1 Evaluation Methodologies. Campos et al.

[17]

discuss evaluation methodologies for TARS

along three dimensions: data partitioning, denition of the target items, and cross-validation.

Dataset partitioning. In standard matrix completion setups, the partitioning of the ratings into

training and test sets is often done by sampling ratings randomly. Common variants are to either

take samples from the entire dataset or sample ratings per user. For TARS, one can additionally

consider a rating order criterion and, e.g., select the most recent ratings (of the individual user or

the community) for the test set. For sequence-aware recommenders, the rating order criterion, i.e.,

in our case the sequence of actions, is already given by denition.

Two forms of splitting the data into training and test set are possible: event-level and session-level

splitting. When splitting the data at the event-level we can assign all events before a certain point in

time to the training set and the remaining events to the test set, as done in [

]. With this procedure,

an event is therefore assigned to one of the sets independent of the user or the session it might

belong to. Similarly, one can apply a time-based splitting criterion at the session-level, i.e., we do not

split up sessions but consider the timestamp of the rst event of a session to decide if the session

goes in the training or test set [45, 106].

Both event-level and session-level partitioning can be applied to either the whole set of users

(community-level partitioning) or to a subset of the users (user-level partitioning). In the latter case

one can select a number of training users and put all their data into the training set. Interactions

of the remaining test users are further split – either at event or session-level – into a user prole

and test data [

]. The user prole includes the less recent interactions and is used as input to the

recommender; the test data contains the most recent interactions to be predicted. Finally, mixed

approaches are in principle possible as well. One can, for example, select a number of users as test

users and put all data into the training set except for the most recent sessions of the test users [

Besides the question of the splitting criterion, dierent options exist of how to apply size conditions

[

]. Placing 80% into the training set and 20% in the test set is a common approach. In sequence-

aware recommenders, additional options exist. One can use time-based splits and place events (or

sessions) prior to a give time into the training set and use the rest for testing [

]. Or, we can use

xed-size splits and place the last

events of each sequence into the test set and the remaining into

the training set [21].

Our survey shows that no common standard exists in the community regarding data partitioning

procedures. In the context of session-based or session-aware recommendation problems, it is

however advisable to use a session-level partitioning procedure to avoid that individual user sessions

are split up. This is in particular necessary to mimic the behavior of session-based and session-aware

recommender systems that are batch-trained on instances of past sessions for eciency reasons. In

the specic case of session-aware recommendation problems, community-level partitioning has to

be applied to ensure that a recommender can also learn longer-term models for individual users.

In the case of session-based recommender systems, user-level partitioning has to be preferred to

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:27

ensure that a recommender is able to provide recommendations to new users (i.e., users with no

records of past interactions with the system).

Also with respect to the size conditions, no community standards exist so far. In some works,

for example in [

], all sessions except those of the last day are placed into the training set. While

the maximum amount of data is provided for learning in such a setting, only one single training-

test split can be used. The size of the training data can also be chosen based on domain-specic

requirements. In domains such as news recommendation or e-commerce, focusing on the most

recent interactions in some cases is sucient or even advisable, e.g., because news can quickly

become outdated or because e-commerce shoppers might concentrate on trending or recently

added items [

]. In general, when using time-based splitting, the evaluation should be performed

with dierent time splits, which allows us to evaluate the learning rate of the algorithm (i.e., the

minimum amount of training data that provide stable recommendations).

Denition of the target items. In traditional evaluation setups, we aim to predict ratings for a set

of target items or search for an optimal ranking of these items. In sequence-aware recommenders,

we are usually interested to predict future actions of a user (or, generally, events), where actions as

described above usually have an associated type and item. In addition, the input to the evaluation

step is not limited to a user or user-item pair for which a ranked list or rating prediction is sought

for, but the input can also include a sequence of events, e.g., an entire session from the test set in

session-based or session-aware recommendation scenarios. If such a sequence of events is given,

the general idea in most evaluation approaches is to hide a subset or all of the events of the given

sequence and predict the user’s (next) actions. Dierent variations of the evaluation scheme can be

found in the literature.

•

Sequence-agnostic prediction. In this case, the order of the hidden actions is not relevant and a

recommendation is considered successful if it correctly predicts one or more of the hidden

actions [122, 123].

•

Given-N next-item prediction. In this protocol variant, the rst

elements of the sequence

are “revealed” to the algorithm and the task of the recommender is to predict the immediate

next action. Jannach et al.

[56]

propose to use one specic value for

when evaluating. In

the domain of next-track music recommendation, often only the last element of a sequence

is hidden (i.e.,

N =

1) [

]. In [

], an approach is taken where

is incrementally

increased.

•

Given-N next-item prediction with look-ahead. This is a combination of the above protocols

and used in [

]: the set of revealed events is continuously increased. The order of the

hidden actions (which form the look-ahead set) is not relevant.

•

Given-N next-item prediction with look-back. In addition to the rst

items of the current

sequence, the idea of this protocol is to also reveal a set of actions that happened before the

current sequence [56, 70, 91].

Besides the dierent ways of evaluating the predictions for a given sequence or user session,

one can limit the evaluation to actions of certain types, e.g., to purchase actions, as done in [

]

and in the context of the 2015 ACM RecSys challenge

The choice of the best evaluation protocol depends both on the specic research question and on

the application domain. When evaluating session-based recommender systems, given-N next-item

prediction (with or without look-ahead) usually is preferred. In case the goal is to assess how quickly

a system is able to adapt its recommendations to the user’s assumed short-term intents, one might

use a given-N protocol where

is incrementally increased. Hiding only the last element of a

http://2015.recsyschallenge.com/

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:28 M. adrana et al.

sequence can be done in dierent application scenarios to assess the recommendation performance

when more information is available (“warm start”). Finally, when the goal is to evaluate the value

of combining short-term and long-term user preferences or the eectiveness of dierent reminding

strategies, the given-N next-item prediction with look-back protocol can represent an appropriate

choice.

Cross validation. Using a randomized cross-validation procedure is not possible in all application

areas of sequence-aware recommenders. In the case of next-track music recommendations, where

the goal is to predict the next track within a given playlist [

], the given sets of playlists can be

randomly split into training and test sets as long as the creation time of the playlists is not relevant.

In many other cases, however, sequence or time dependencies exist, which make it impossible to

randomly assign events to the training and test splits. Some works like [

] therefore only use one

single training-test split, which can however lead to biased results. Therefore, alternative validation

approaches like repeated random subsampling procedures should be applied. One can, for example,

split the data into several, potentially overlapping time slices of about the same length (“sliding

window”) and use the events at the end of these time slices as test sets [

]. Alternatively, in

case of large datasets, one can in addition repeatedly sample a subset of the users and consider

their last n actions in the test set.

5.2.2 Evaluation Metrics. For many application scenarios of sequence-aware recommenders,

standard classication and ranking metrics can be used for performance evaluation. The set of

metrics used in the reviewed papers include Precision, Recall, Mean Average Rank (MAR), Mean

Average Precision (MAP), Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain

(NDCG) and the F1 metric. Typically, most of the ranking metric are highly correlated when

evaluated with a xed-size split and their choice does not largely aect the outcomes [24, 44].

For some application domains, such as context adaptation, we can use standard classication

metrics (e.g., precision and recall) since the recommendation problem is a classical one and sequence-

based techniques are used to generate better recommendations. However, when the application

domain requires recommendations to fulll an explicit or implicit order constraint, ranking metrics

such as MAP have to be preferred over classication metrics. Error metrics were considered only in

[

117

] and, in general, do not t sequence-aware recommendation problems well, where in almost

all situations it is important to consider entire lists of recommended items.

In some application domains ranking metrics alone cannot fully inform us about the quality

of the recommendations. In playlist recommendation, for example, we may want to assess the

capability of the recommender in generating good quality transitions between subsequent songs.

Given a current track, there may be many other tracks that are almost equally likely good to be

played next, and ranking metrics to not take that aspect into account. This can in turn lead to the

eect that more popular tracks are favored by an algorithm [

]. Therefore, alternative metrics

derived from natural language processing, like the Average Log-Likelihood, can be adopted instead

[

]. However, several concerns on the actual quality of the recommendations and on the

interpretability of the results with these metrics have been raised [15].

Generally, when the goal of a recommender is not limited to predicting the next-best action but

a sequence or set of events that are related to each other, alternative “multi-metric” evaluation

approaches are required that can take multiple quality factors into account in parallel. They can

consider, as mentioned, the order of the recommendations or the internal coherence or diversity

of the recommended list as a whole. In the music domain, one might also be interested that the

set of next tracks is coherent not only in itself but with the last played tracks. In many cases, the

dierent quality factors can lead to trade-o situations like “accuracy vs. diversity”, which have to

be balanced by a recommendation algorithm [63].

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:29

Table 5. Publicly available datasets for sequence-aware recommendation.

Short name Domain Users Items Events Sessions Reference

Amazon EC 20M 6M 143M N/A [43]

Epinions EC 40k 110k 181k N/A [43]

RecSys Chall. 2015 EC N/A 38k 34M 9.5M [45]

Ta-feng EC 32k 24k 829k N/A [110, 116]

TMall EC 1k 10k 5k N/A [110, 116]

AVITO EC N/A 4k 767k 32k [107]

Retailrocket EC 1.4M 235k 2.7M N/A [2]

Microsoft WWW 27k/13k 8k 13k/55k 32k/5k [125, 126]

MSNBC WWW 1.3M/87k 1k 476k/180k N/A [115, 126]

Delicious WWW 8.8k 3.3k 60k 45k [112]

CiteULike WWW 53k 1.8k 2.1M 40k [112]

Outbrain WWW 700M 560

∗

2B N/A [1]

AOL Query 650k N/A 17M 2.5M [102]

Adressa News 15k 923 2.7M N/A [39]

Foursquare 1 POI 31k N/A N/A N/A [41]

Foursquare 2 POI 225k N/A 22.5M N/A [21, 43]

Gowalla 1 POI 54k 367k 4M N/A [21, 43]

Gowalla 2 POI 196k N/A 6.4M N/A [76, 77]

NYC Taxi Dataset POI N/A 299 N/A 670k [29]

Global Terrorism DB POI N/A N/A N/A N/A [76]

Art Of the Mix Music N/A 218k N/A 29k [40, 57, 81]

30Music Music 40k 5.6M 31M 2.7M [106, 108]

EC: E-commerce, WWW: Web browsing; POI: points of interest;

∗

number of sites.

Dierently from conventional recommender systems, many sequence-aware application domains

recommend also items that the user already knows or has purchased in the past (see Section 3.3). In

these application scenarios, the evaluation protocol has to be adapted in order to consider already

known items when computing the metrics.

Finally, sequence learning methods are often computationally more complex than traditional

matrix-completion algorithms. Therefore, the evaluation of sequence-aware recommender systems

should also discuss the time and space complexity of the methods and report running times for

typical training data sets so that the scalability can be assessed.

Overall, a general open issue in the eld is the lack of “standard” metrics to assess quality criteria

when recommendation lists as a whole should be evaluated, e.g., with respect to their diversity

or coherence. Usually, a selection of meta-data features is used to measure, for example, intra-list

diversity or the smoothness of the transitions between the objects in the list. However, to which

extent such measures reect the users’ quality perceptions often remains largely open.

5.2.3 Data Sets. In recent years, more and more datasets to benchmark dierent sequence-

aware algorithms in a reproducible way have become available to researchers. Table 5 shows the

characteristics of a number of public datasets that were used in existing works.

Some researchers rely on traditional rating datasets with time stamps to evaluate sequence-

aware algorithms. In reality, however, the point in time when users rate an item might be quite

dierent from the one when the user experienced the item (e.g., watched a movie). The design of

corresponding algorithms and the interpretation of the results must therefore be done with care.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:30 M. adrana et al.

5.3 On User Studies and Field Tests

User studies, which for example aim to evaluate the subjective quality perception of users in

sequence-aware recommendation scenarios are comparably rare. In a recent study in the context of

next-track music recommendation [

], researchers compared the quality perception of dierent

algorithms. They designed an online experiment where the main task for the 277 participants

was to pick one of the recommendations that were provided by four dierent algorithms as the

best continuation of a given playlist.

One of the main results of the study was that considering

musical features not only helps to increase the homogeneity of the recommendations in terms

of computational (oine) measures, but also to a better quality perception by users. At the same

time, focusing on generally popular tracks, which are in many cases already known to the users,

showed to be a comparably “safe” strategy, at least in the short term. While the study provided

evidence that in this particular domain, the chosen oine measures can be suitable proxies for

certain aspects, user studies also have their limitations, e.g., in terms of the often articial setup.

The number of reports on eld tests for sequence-aware recommendation scenarios is limited as

well. The few papers that exist include [

], [

] and [

]. In [

], the authors for example

report the results of an A/B test, where one strategy was to simply remind users of previously

seen items in a session-aware recommendation scenario. In that experiment it turned out that

reminding users of recently seen items is not only leading to comparably high accuracy in oine

experiments, but also to a certain business value in the real world. Dierently from this work, the

studies of Garcin et al.

[33]

in the news domain revealed that the results of an oine experiment

were not indicative for the performance of algorithms when deployed in a real system. Specically,

a session-based algorithm showed to be much more eective than a popularity-based strategy

that worked best in the oine experiments. Overall, nding good proxy measures to predict the

success of a recommender based on oine simulations is similarly challenging for sequence-aware

approaches as for other types of recommendation problems.

6 SUMMARY AND FUTURE DIRECTIONS

Sequence-aware recommendation – in particular in the forms of session-based and session-aware

recommendation – is a highly relevant problem in practice. Researchers have developed a variety

of algorithmic proposals over the past fteen years, and with this survey, our goal is to categorize

these approaches and to review the research practice in the eld. Throughout the paper, we have

identied a number of open research questions, among them the following.

Intent detection. Context-adaptation is one major goal in session-aware recommendation, and

the most crucial task here is to estimate the users’ context and short-intents. However, these

intents cannot always be reliably estimated from the rst few interactions of a session. For example

in the music streaming domain, is the user in the mood to discover something new or rather

interested in consuming known things [

]? In e-commerce, is the user currently browsing the

catalog to understand the range of options, or is she or he interested in re-inspecting a short list

of candidates to purchase? To address these questions, future works could for example consider

existing works in the eld of query intent understanding from the information retrieval eld, using,

e.g., external knowledge [

]. In addition, research could build on existing theories from other

elds like Marketing, where models like AIDA (Attention, Interest, Desire and Action) are used

to describe the dierent phases of a consumer, from becoming aware of an oering to the actual

purchase. As discussed in [

101

], there are also domains where the user’s interest can change even

within a session, for example, when a user has read a number of news stories of a certain category

and then becomes satiated with the topic. Finally, in the context of intent detection, very little

A similar study in the music domain was presented in [10].

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:31

research exists on how to give users the opportunity to correct the system’s assumption in case

they are wrong. On Amazon.com, some basic mechanisms for user control exist, but it is unclear if

they are broadly used by consumers [60].

Combining short-term and longer-term proles. To be able to assess the consumers’ state in the

decision making process, information about their behavior during recent sessions and probably also

their longer-term behavior has to be taken into account. The most recent works in the eld focus on

the session-based recommendation problem, where this past information is not available. However,

there are many domains where longer-term user proles exist and more research is required to

better leverage this information. Some works [

] show that while the short-term intentions should

be predominant in the selection of the recommendations, considering longer-term behavioral

patterns and preferences of the individual user (e.g., toward certain brands in an e-commerce

scenario) can be important. Existing works like [

] use a simple static weighting of long-term and

short-term models or rely on on re-ranking the results of the long-term models based on assumed

short-term intents [

]. Better integrated models as used, e.g., in [

] are therefore still needed.

Leveraging additional data and general trends. Generally, in the context of user proling, re-

searchers often rely on one or a few types of specic user interactions like item view events or

check-ins at certain locations. In real-world applications, usually much richer types of informa-

tion are available. Not only are there multiple additional actions related to certain items (e.g.,

add-to-wishlist, add-to-cart in the e-commerce domain), there are also other relevant user actions

like search or category navigation which are not considered to a large extent in today’s research.

Also the information about how the users entered the site (e.g., through a search engine) can be

a valuable piece of information to assess the user intent from the rst few interactions. Going

beyond behavioral patterns of the individual, more research is also required in terms of detecting

behavioral trends and interest shifts in the entire user community. In dierent application domains

where item recency plays a role, including news, music and e-commerce, being able to detect and

leverage short-term trends in a community is highly important. For the e-commerce domain, works

like [

] or [

] show that focusing on items that were trending on the platform on the last few

days can be crucial to further improve the recommendation quality.

Toward standardized and more comprehensive evaluations. Finally, our review shows that today

a variety of protocol variants and computational metrics are used to compare sequence-aware

algorithms, making it dicult to assess how the eld progresses. Going beyond the subtle details of

the evaluation protocols (e.g., of how to specically determine the hold-out set), today’s research in

sequence-aware recommenders is still largely focused on accuracy measures. While this is a known

issue also for the matrix-completion formulation of the recommendation problem, in sequence-

aware recommendation, and in particular in session-aware recommendation, the usefulness of

certain recommendations can largely depend on the contextual situation of the users, e.g., if they

are exploring the item space, inspecting a smaller choice set, looking for complements to a recently

inspected items etc. These aspects call for more purpose-oriented evaluation approaches [

which take the users’ specic contextual situation and goals into account.

REFERENCES

[1] https://www.kaggle.com/c/outbrain-click-prediction/data, last accessed January 2018.

[2] https://www.kaggle.com/retailrocket/ecommerce-dataset, last accessed January 2018.

[3]

G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art

and possible extensions. IEEE Trans. on Knowl. and Data Eng., 17(6):734–749, 2005.

[4]

G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In Recommender Systems Handbook, pages

217–253. Springer, 2011.

[5]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB ’94, pages 487–499,

1994.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:32 M. adrana et al.

[6] R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE ’95, pages 3–14, 1995.

[7]

A. Almahairi, K. Kastner, K. Cho, and A. Courville. Learning distributed representations from reviews for collaborative

ltering. In RecSys ’15, pages 147–154, 2015.

[8]

R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT

’04, pages 588–596, 2004.

[9]

R. Baeza-Yates, D. Jiang, F. Silvestri, and B. Harrison. Predicting the next app that you are going to use. In WSDM ’15,

pages 285–294, 2015.

[10]

L. Barrington, R. Oda, and G. R. G. Lanckriet. Smarter than Genius? Human Evaluation of Music Recommender

Systems. In ISMIR ’09, pages 357–362, 2009.

[11]

R. Begleiter, R. El-Yaniv, and G. Yona. On prediction using variable order markov models. JAIR, 22(1):385–421, 2004.

[12] D. Billsus, M. J. Pazzani, and J. Chen. A learning agent for wireless news access. In IUI ’00, pages 33–36, 2000.

[13] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In JMLR ’03, volume 3, pages 993–1022, 2003.

[14]

G. Bonnin and D. Jannach. Evaluating the quality of generated playlists based on hand-crafted samples. In ISMIR ’13,

pages 263–268, 2013.

[15]

G. Bonnin and D. Jannach. Automated generation of music playlists: Survey and experiments. ACM CSUR, 47(2):

26:1–26:35, 2014.

[16]

R. Burke. Hybrid web recommender systems. In P. Brusilovsky, A. Kobsa, and W. Nejdl, editors, The Adaptive Web,

chapter Hybrid Web Recommender Systems, pages 377–408. Springer, 2007.

[17]

P. G. Campos, F. Díez, and I. Cantador. Time-aware recommender systems: A comprehensive survey and analysis of

existing evaluation protocols. User Model. User-Adapt. Interact., 24(1-2):67–119, 2014.

[18]

H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through

and session data. In KDD ’08, pages 875–883, 2008.

[19]

S. Chen, J. L. Moore, D. Turnbull, and T. Joachims. Playlist prediction via metric embedding. In KDD ’12, pages

714–722, 2012.

[20] S. Chen, J. Xu, and T. Joachims. Multi-space probabilistic sequence modeling. In KDD ’13, pages 865–873, 2013.

[21]

C. Cheng, H. Yang, M. R. Lyu, and I. King. Where you like to go next: Successive point-of-interest recommendation.

In IJCAI ’13, pages 2605–2611, 2013.

[22]

S.-Y. Chou, Y.-H. Yang, J.-S. R. Jang, and Y.-C. Lin. Addressing cold start for next-song recommendation. In RecSys ’16,

pages 115–118, 2016.

[23]

J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence

modeling. CoRR, abs/1412.3555, 2014.

[24]

P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In

RecSys ’10, pages 39–46, 2010.

[25]

P. Cremonesi, F. Garzotto, and R. Turrin. User-centric vs. system-centric evaluation of recommender systems. In IFIP

Conference on Human-Computer Interaction, pages 334–351. Springer, 2013.

[26]

N. Dali Betzalel, B. Shapira, and L. Rokach. "Please, Not Now!": A Model for Timing Recommendations. In RecSys ’15,

pages 297–300, 2015.

[27]

N. Djuric, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hidden conditional random elds with deep user

embeddings for ad targeting. In ICDM ’14, pages 779–784, 2014.

[28] P. Dourish. What we talk about when we talk about context. Personal Ubiquitous Comput., 8(1):19–30, 2004.

[29]

N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song. Recurrent marked temporal point processes:

Embedding event history to vector. In KDD ’16, pages 1555–1564, 2016.

[30]

S. Feng, X. Li, Y. Zeng, G. Cong, Y. M. Chee, and Q. Yuan. Personalized ranking metric embedding for next new POI

recommendation. In IJCAI ’15, pages 2069–2075, 2015.

[31] G. A. Fink. Markov models for pattern recognition: from theory to applications. Springer, 2014.

[32]

F. Garcin, C. Dimitrakakis, and B. Faltings. Personalized news recommendation with context trees. In RecSys ’13,

pages 105–112, 2013.

[33]

F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. Oine and online evaluation of news

recommender systems at swissinfo.ch. In RecSys ’14, pages 169–176, 2014.

[34]

Z. Ghahramani. An introduction to hidden markov models and bayesian networks. In Hidden Markov models, chapter

An Introduction to Hidden Markov Models and Bayesian Networks, pages 9–42. World Scientic Publishing, 2002.

[35]

C. A. Gomez-Uribe and N. Hunt. The Netix recommender system: Algorithms, business value, and innovation. ACM

TMIS, 6(4):13:1–13:19, 2015.

[36] A. Graves. Generating sequences with recurrent neural networks. CoRR, 1308.0850, 2013.

[37]

M. Grbovic, V. Radosavljevic, N. Djuric, N. Bhamidipati, J. Savla, V. Bhagwan, and D. Sharp. E-commerce in your

inbox: Product recommendations at scale. In KDD ’15, pages 1809–1818, 2015.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:33

[38]

A. Greenstein-Messica, L. Rokach, and M. Friedman. Session-based recommendations using item embedding. In Proc.

IUI ’17, pages 629–633, 2017.

[39]

J. A. Gulla, L. Zhang, P. Liu, O. Özgöbek, and X. Su. The adressa dataset for news recommendation. In Proceedings of

the International Conference on Web Intelligence, pages 1042–1048, 2017.

[40]

N. Hariri, B. Mobasher, and R. Burke. Context-aware music recommendation based on latent topic sequential patterns.

In RecSys ’12, pages 131–131, 2012.

[41]

J. He, X. Li, L. Liao, D. Song, and W. Cheung. Inferring a personalized next point-of-interest recommendation model

with latent behavior patterns. In AAAI ’16, 2016.

[42]

Q. He, D. Jiang, Z. Liao, S. C. H. Hoi, K. Chang, E.-P. Lim, and H. Li. Web query recommendation via sequential query

prediction. In ICDE ’09, pages 1443–1454, 2009.

[43]

R. He and J. McAuley. Fusing similarity models with markov chains for sparse sequential recommendation. CoRR,

1609.09152, 2016.

[44]

J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative ltering recommender systems.

ACM Trans. Inf. Syst., 22(1):5–53, 2004.

[45]

B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk. Session-based recommendations with recurrent neural networks.

In ICLR ’16, 2016.

[46]

B. Hidasi, M. Quadrana, A. Karatzoglou, and D. Tikk. Parallel recurrent neural network architectures for feature-rich

session-based recommendations. In RecSys ’16, 2016.

[47]

M. Hosseinzadeh Aghdam, N. Hariri, B. Mobasher, and R. Burke. Adapting recommendations to contextual changes

using hierarchical hidden markov models. In RecSys ’15, pages 241–244, 2015.

[48]

S.-C. Hsueh, M.-Y. Lin, and C.-L. Chen. Mining negative sequential patterns for e-commerce recommendations. In

APSCC ’08, pages 1213–1218, 2008.

[49]

J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen. Understanding user’s query intent with wikipedia. In WWW ’09,

pages 471–480, 2009.

[50] D. Jannach and G. Adomavicius. Recommendations with a purpose. In RecSys ’16, pages 7–10, 2016.

[51]

D. Jannach and M. Ludewig. Determining Characteristics of Successful Recommendations from Log Data – A Case

Study. In SAC ’17, 2017.

[52]

D. Jannach and M. Ludewig. When recurrent neural networks meet the neighborhood for session-based recommen-

dation. In RecSys ’17, pages 306–310, 2017.

[53] D. Jannach and M. Ludewig. Investigating personalized search in e-commerce. In FLAIRS ’17, 2017.

[54]

D. Jannach and M. Ludewig. Determining characteristics of successful recommendations from log data âĂŞ- a case

study. In SAC ’17, 2017.

[55]

D. Jannach, M. Zanker, M. Ge, and M. Gröning. Recommender Systems in Computer Science and Information

Systems–A Landscape of Research. In EC-Web ’12, pages 76–87, 2012.

[56]

D. Jannach, L. Lerche, and M. Jugovac. Adaptation and evaluation of recommendations for short-term shopping goals.

In RecSys ’15, pages 211–218, 2015.

[57]

D. Jannach, L. Lerche, and I. Kamehkhosh. Beyond "hitting the hits": Generating coherent music playlist continuations

with the right tracks. In RecSys ’15, pages 187–194, 2015.

[58]

D. Jannach, L. Lerche, I. Kamehkhosh, and M. Jugovac. What recommenders recommend: an analysis of recommenda-

tion biases and possible countermeasures. User Modeling and User-Adapted Interaction, 25(5):427–491, 2015.

[59]

D. Jannach, M. Jugovac, and L. Lerche. Supporting the design of machine learning workows with a recommendation

system. ACM TiiS, 6(1), 2016.

[60]

D. Jannach, S. Naveed, and M. Jugovac. User control in recommender systems: Overview and interaction challenges.

In EC-Web 2016, Porto, Portugal, 2016.

[61]

D. Jannach, P. Resnick, A. Tuzhilin, and M. Zanker. Recommender systems - beyond matrix completion. Communica-

tions of the ACM, 59(11):94–102, 2016.

[62]

D. Jannach, M. Ludewig, and L. Lerche. Session-based item recommendation in e-commerce: On short-term intents,

reminders, trends, and discounts. UMUAI, 27(3–5):351–392, 2017.

[63]

M. Jugovac, D. Jannach, and L. Lerche. Ecient optimization of multiple recommendation quality factors according

to individual user tendencies. Expert Systems With Applications, 81:321–331, 2017.

[64]

S. Kabbur, X. Ning, and G. Karypis. FISM: Factored item similarity models for top-n recommender systems. In KDD

’13, pages 659–667, 2013.

[65]

L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. J. Artif. Intell. Res, 4:237–285, 1996.

[66] I. Kamehkhosh and D. Jannach. User perception of next-track music recommendations. In UMAP ’17, 2017.

[67]

K. Kapoor, V. Kumar, L. Terveen, J. A. Konstan, and P. Schrater. "I Like to Explore Sometimes": Adapting to Dynamic

User Novelty Preferences. In RecSys ’15, pages 19–26, 2015.

[68] Y. Koren. Collaborative ltering with temporal dynamics. Commun. ACM, 53(4):89–97, 2010.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

1:34 M. adrana et al.

[69]

Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML ’14, pages 1188–1196, 2014.

[70]

L. Lerche, D. Jannach, and M. Ludewig. On the value of reminders within e-commerce recommendations. In UMAP

’16, pages 27–25, 2016.

[71] B. Letham, C. Rudin, and D. Madigan. Sequential event prediction. Mach. Learn., 93(2-3):357–380, 2013.

[72]

D. Lian, V. W. Zheng, and X. Xie. Collaborative ltering meets next check-in location prediction. In WWW ’13, pages

231–232, 2013.

[73]

K. H. Lim, J. Chan, C. Leckie, and S. Karunasekera. Personalized tour recommendation based on user interests and

points of interest visit durations. In IJCAI ’15, pages 1778–1784, 2015.

[74]

Z. C. Lipton, J. Berkowitz, and C. Elkan. A critical review of recurrent neural networks for sequence learning. CoRR,

1506.00019, 2015.

[75]

J. Liu, P. Dolan, and E. R. Pedersen. Personalized news recommendation based on click behavior. In IUI ’10, pages

31–40, 2010.

[76]

Q. Liu, S. Wu, L. Wang, and T. Tan. Predicting the next location: A recurrent model with spatial and temporal contexts.

In AAAI ’16, 2016.

[77]

Y. Liu, C. Liu, B. Liu, M. Qu, and H. Xiong. Unied point-of-interest recommendation with temporal interval

assessment. In KDD ’16, pages 1015–1024, 2016.

[78]

E. H.-C. Lu, Y.-W. Lin, and J.-B. Ciou. Mining mobile application sequential patterns for usage prediction. In GrC ’14,

pages 185–190, 2014.

[79]

N. R. Mabroukeh and C. I. Ezeife. A taxonomy of sequential pattern mining algorithms. ACM CSUR, 43(1):3:1–3:41,

2010.

[80]

F. Maillet, D. Eck, G. Desjardins, and P. Lamere. Steerable playlist generation by learning song similarity from radio

station playlists. In ISMIR ’09, 2009.

[81] B. McFee and G. Lanckriet. The natural language of playlists. In ISMIR ’11, pages 537–541, 2011.

[82]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and

their compositionality. In NIPS ’13, pages 3111–3119, 2013.

[83]

B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Using sequential and non-sequential patterns in predictive web usage

mining tasks. In ICDM ’02, pages 669–672, 2002.

[84]

O. Moling, L. Baltrunas, and F. Ricci. Optimal radio channel recommendations with explicit and implicit feedback. In

RecSys ’12, pages 75–82, 2012.

[85]

J. Moore, S. Chen, D. Turnbull, and T. Joachims. Taste over time: the temporal dynamics of user preferences. In ISMIR

’13, 2013.

[86]

M. Nakagawa and B. Mobasher. Impact of site characteristics on recommendation models based on association rules

and sequential patterns. In IJCAI ’03, 2003.

[87]

N. Natarajan, D. Shin, and I. S. Dhillon. Which app will you use next?: Collaborative ltering with interactional

context. In RecSys ’13, pages 201–208, 2013.

[88]

A. Parameswaran, P. Venetis, and H. Garcia-Molina. Recommendation systems with complex constraints: A course

recommendation perspective. ACM Trans. Inf. Syst., 29(4):20:1–20:33, 2011.

[89]

S. Pauws, W. Verhaegh, and M. Vossen. Fast generation of optimal music playlists using local search. In ISMIR ’06,

2006.

[90]

J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In EMNLP ’14, pages

1532–1543, 2014.

[91]

M. Quadrana, A. Karatzoglou, B. Hidasi, and P. Cremonesi. Personalizing session-based recommendations with

hierarchical recurrent neural networks. In RecSys ’17, 2017.

[92]

S. Reddy, I. Labutov, and T. Joachims. Learning student and content embeddings for personalized lesson sequence

recommendation. In ACM Learning @ Scale ’16, pages 93–96, 2016.

[93]

S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: Bayesian Personalized Ranking from Implicit

Feedback. In UAI ’09, pages 452–461, 2009.

[94]

S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Factorizing personalized markov chains for next-basket recom-

mendation. In WWW ’10, pages 811–820, 2010.

[95]

C. Rudin, B. Letham, A. Salleb-Aouissi, E. Kogan, and D. Madigan. Sequential event prediction with association rules.

In COLT ’11, pages 615–634, 2011.

[96]

N. Sahoo, P. V. Singh, and T. Mukhopadhyay. A hidden markov model for collaborative ltering. MIS Q., pages

1329–1356, 2012.

[97]

G. Shani, D. Heckerman, and R. I. Brafman. An MDP-based recommender system. J. Mach. Learn. Res., 6:1265–1295,

2005.

[98]

H. Soh, S. Sanner, M. White, and G. Jamieson. Deep sequential recommendation for personalized adaptive user

interfaces. In IUI ’17, pages 589–593, 2017.

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.

Sequence-Aware Recommender Systems 1:35

[99]

Q. Song, J. Cheng, T. Yuan, and H. Lu. Personalized recommendation meets your next favorite. In CIKM ’15, pages

1775–1778, 2015.

[100]

Y. Song, A. M. Elkahky, and X. He. Multi-rate deep learning for temporal recommendation. In SIGIR ’16, pages

909–912, 2016.

[101]

Y. Song, N. Sahoo, and E. Ofek. When and how to diversify – a multi-category utility model of consumer response to

content recommendations. Available at SSRN: https://ssrn.com/abstract=2880779, 2017.

[102]

A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Simonsen, and J.-Y. Nie. A hierarchical recurrent encoder-decoder

for generative context-aware query suggestion. In CIKM ’15, pages 553–562, 2015.

[103]

Y. Tagami, H. Kobayashi, S. Ono, and A. Tajima. Modeling user activities on the web using paragraph vector. In

WWW ’15, pages 125–126, 2015.

[104] M. Tavakol and U. Brefeld. Factored mdps for detecting topics of user sessions. In RecSys ’14, pages 33–40, 2014.

[105]

M. Trevisiol, L. M. Aiello, R. Schifanella, and A. Jaimes. Cold-start news recommendation with domain-dependent

browse graph. In RecSys ’14, pages 81–88, 2014.

[106]

R. Turrin, A. Condorelli, R. Pagano, M. Quadrana, and P. Cremonesi. Large scale music recommendation. In LSRS

2015, 2015.

[107]

B. Twardowski. Modelling contextual information in session-aware recommender systems with neural networks. In

RecSys ’16, pages 273–276, 2016.

[108]

F. Vasile, E. Smirnova, and A. Conneau. Meta-prod2vec: Product embeddings using side-information for recommen-

dation. In RecSys ’16, pages 225–232, 2016.

[109]

J. Wang and Y. Zhang. Opportunity model for e-commerce recommendation: Right product; right time. In SIGIR ’13,

pages 303–312, 2013.

[110]

P. Wang, J. Guo, Y. Lan, J. Xu, S. Wan, and X. Cheng. Learning hierarchical representation model for nextbasket

recommendation. In SIGIR ’15, pages 403–412, 2015.

[111]

X. Wu, Q. Liu, E. Chen, L. He, J. Lv, C. Cao, and G. Hu. Personalized next-song recommendation in online karaokes.

In RecSys ’13, pages 137–140, 2013.

[112]

L. Xiang, Q. Yuan, S. Zhao, L. Chen, X. Zhang, Q. Yang, and J. Sun. Temporal recommendation on graphs via long-

and short-term preference fusion. In KDD ’10, pages 723–732, 2010.

[113]

J. Xu, T. Xing, and M. van der Schaar. Personalized course sequence recommendations. IEEE Trans. Signal Process., 64

(20):5340–5352, 2016.

[114]

X. Yan, J. Guo, and X. Cheng. Context-aware query recommendation by learning high-order relation in query logs.

In CIKM ’11, pages 2073–2076, 2011.

[115]

G.-E. Yap, X.-L. Li, and P. S. Yu. Eective next-items recommendation via personalized sequential pattern mining. In

DASFAA ’12, pages 48–64, 2012.

[116]

F. Yu, Q. Liu, S. Wu, L. Wang, and T. Tan. A dynamic recurrent model for next basket recommendation. In SIGIR ’16,

pages 729–732, 2016.

[117]

H. Yu and M. O. Riedl. A sequential recommendation approach for interactive personalized story generation. In

AAMAS ’12, pages 71–78, 2012.

[118]

H. Zang, Y. Xu, and Y. Li. Non-redundant sequential association rule mining and application in recommender systems.

In WI-IAT ’10, pages 292–295, 2010.

[119]

H. Zhang, W. Ni, X. Li, and Y. Yang. Modeling the heterogeneous duration of user interest in time-dependent

recommendation: A hidden semi-markov approach. Trans Syst Man Cybern Syst., 2017.

[120]

J.-D. Zhang, C.-Y. Chow, and Y. Li. Lore: Exploiting sequential inuence for location recommendations. In SIGSPATIAL

’14, pages 103–112, 2014.

[121]

Y. Zhang, H. Dai, C. Xu, J. Feng, T. Wang, J. Bian, B. Wang, and T.-Y. Liu. Sequential click prediction for sponsored

search with recurrent neural networks. In AAAI ’14, pages 1369–1375, 2014.

[122]

G. Zhao, M. L. Lee, W. Hsu, and W. Chen. Increasing temporal diversity with purchase intervals. In SIGIR ’12, pages

165–174, 2012.

[123]

G. Zhao, M. L. Lee, and H. Wynne. Utilizing purchase intervals in latent clusters for product recommendation. In

SNAKDD ’14, pages 4:1–4:9, 2014.

[124]

E. Zheleva, J. Guiver, E. Mendes Rodrigues, and N. Milić-Frayling. Statistical models of music-listening sessions in

social media. In WWW ’10, pages 1019–1028, 2010.

[125]

B. Zhou, S. C. Hui, and K. Chang. An intelligent recommender system using sequential web access patterns. In CIS

’04, pages 393–398, 2004.

[126]

A. Zimdars, D. M. Chickering, and C. Meek. Using temporal data for making recommendations. In UAI ’01, pages

580–588, 2001.

Received July 2017; revised January 2018; accepted February 2018

ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: February 2018.