Description of My Research

My research mainly focuses on analyzing a large amount of information generated by users, so that we can deliver the most up-to-date, high-quality, relevant information to users. This is being pursued in three different directions: (1) experimentally collecting and analyzing real-world data to gain a better insight on what the real-world is like, (2) mining the collected data to discover interesting patterns behind it, so that the discovered patterns can be used for other applications, such as recommendation systems and search engines and (3) build new algorithms and mechanisms that can help systems to deliver most up-to-date and high-quality information to interested users.

Some examples of the first thread of work, experimental analysis of real-world data, are the following papers.

Michael J. Welch, Uri Schonfeld, Dan He, Junghoo Cho "Topical Semantics of Twitter Links" In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), February 2011.
Rodrigo B. Almeida, Barzan Mozafari, Junghoo Cho "On the Evolution of Wikipedia." In Proceedings of the International Conference on Weblogs and Social Media (ICWSM), March 2007.
Ka Cheung Sia, Junghoo Cho, Hyun-Kyu Cho "Efficient Monitoring Algorithm for Fast News Alerts." IEEE Transactions on Knowledge and Data Engineering, 19(7): July 2007.

In these papers, my group analyzed Twitter mention and retweet patterns in order to understand how users use these different interaction mechanisms (WSDM paper), a number of interesting characteristics of the Wikipedia evolution (ICWSM paper), and how RSS news feeds are updated and how similar and/or different it is from the general Web (TODS paper).

Regarding pattern discovery, my group has been using a number of probabilistic models to analyze Web data to discover hidden patterns. The most recent examples are the following two papers, where we designed extensions to the well-known probabilistic topic model, LDA (Latent Direchlet Analysis), so that it can be effectively used for social network graph. In particular, these new models help classifying the users on social network by their topic interests significantly more accurately than traditional methods.

Youngchul Cha, Bin Bi, Chu-Cheng Hsieh, Junghoo Cho "Incorporating Popularity in Topic Models for Social Network Analysis" In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), July 2013.
Youngchul Cha, Junghoo Cho "Social-network analysis using topic models" In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), August 2012.

Finally, the following papers are some examples of our work on developing new algorithms and mechanisms that systems can use to deliver most relevant and high-quality information to users quickly.

Jun-Seok Heo, Junghoo Cho, Kyu-Young Whang "Subspace top-k query processing using the hybrid-layer index with a tight bound." Data and Knowledge Engineering (DKE), 83: 1-19 (2013).
Michael Welch, Junghoo Cho, Christopher Olston "Search Result Diversity for Informational Queries" In Proceedings of the 20th International World Wide Web Conference (WWW), March 2011.
Chu-Cheng Hsieh, Junghoo Cho "Finding similar items by leveraging social tag clouds" In Proceedings of the ACM Symposium on Applied Computing (SAC), March 2012.
Michael J. Welch, Junghoo Cho, Walter Chang "Generating Advertising Keywords from Video Content" In Proceedings of the 19th International Conference on Information and Knowledge Management (CIKM), October 2010.

For example, in the DKE paper, we propose a new hybrid-layer index structure (and associated algorithms) that can efficiently compute the top-K results based on arbitrary subsets of features indexed by any system. In the WWW paper, we develop new ranking mechanisms that can increase the diversity of topics in search results, so that users will have an easier time to explore relevant topics, while the quality of the search results is not compromised. The last two papers describe effective ways to automatically identify tags and keywords that can be used to assist user in their information seeking tasks, such as “query by example” and general keyword-based search.

The complete list of my publications and their PDF files are available from my publication page. You can get my papers from my DBLP page and Google Scholar page as well.