Description of My Research

My research mainly focuses on analyzing a large amount of information generated by users, so that we can deliver the most up-to-date, high-quality, relevant information to users. This is being pursued in three different directions: (1) experimentally collecting and analyzing real-world data to gain a better insight on what the real-world is like, (2) mining the collected data to discover interesting patterns behind it, so that the discovered patterns can be used for other applications, such as recommendation systems and search engines and (3) build new algorithms and mechanisms that can help systems to deliver most up-to-date and high-quality information to interested users.

Some examples of the first thread of work, experimental analysis of real-world data, are the following papers.

In these papers, my group analyzed Twitter mention and retweet patterns in order to understand how users use these different interaction mechanisms (WSDM paper), a number of interesting characteristics of the Wikipedia evolution (ICWSM paper), and how RSS news feeds are updated and how similar and/or different it is from the general Web (TODS paper).

Regarding pattern discovery, my group has been using a number of probabilistic models to analyze Web data to discover hidden patterns. The most recent examples are the following two papers, where we designed extensions to the well-known probabilistic topic model, LDA (Latent Direchlet Analysis), so that it can be effectively used for social network graph. In particular, these new models help classifying the users on social network by their topic interests significantly more accurately than traditional methods.

Finally, the following papers are some examples of our work on developing new algorithms and mechanisms that systems can use to deliver most relevant and high-quality information to users quickly.

For example, in the DKE paper, we propose a new hybrid-layer index structure (and associated algorithms) that can efficiently compute the top-K results based on arbitrary subsets of features indexed by any system. In the WWW paper, we develop new ranking mechanisms that can increase the diversity of topics in search results, so that users will have an easier time to explore relevant topics, while the quality of the search results is not compromised. The last two papers describe effective ways to automatically identify tags and keywords that can be used to assist user in their information seeking tasks, such as “query by example” and general keyword-based search.

The complete list of my publications and their PDF files are available from my publication page. You can get my papers from my DBLP page and Google Scholar page as well.