WebArchive Project

The goal of this project is to track and store the history of the Web. We believe that the Web history data will be very useful for many disciplines:

In order to store the history of the Web, we need to address many interesting technical challenges, including the following: As a testbed for this project, we are currently storing the history of the blogs available on the Web.

Publications

  1. Ka Cheung Sia, Junghoo Cho "Efficient Monitoring Algorithm for Fast News Alert." Technical Report, UCLA Computer Science Department, June 2005.

  2. Alexandros Ntoulas, Petros Zerfos, Junghoo Cho "Downloading Textual Hidden Web Content by Keyword Queries. In Proceedings of the Joint Conference on Digital Libraries (JCDL), June 2005.

  3. Panagiotis G. Ipeirotis, Alexandros Ntoulas, Junghoo Cho, Luis Gravano "Modeling and Managing Content Changes in Text Databases." In Proceedings of the International Conference on Data Engineering (ICDE), March 2005.

  4. Alexandros Ntoulas, Junghoo Cho, Christopher Olston "What's New on the Web? The Evolution of the Web from a Search Engine Perspective." In Proceedings of the World-Wide Web Conference (WWW), May 2004.

  5. Junghoo Cho, Sourashis Roy "Impact of Web Search Engines on Page Popularity." In Proceedings of the World-Wide Web Conference (WWW), May 2004.

  6. Junghoo Cho, Alexandros Ntoulas "Effective Change Detection Using Sampling." In Proceedings of 28th International Conference on Very Large Databases (VLDB), September 2002.


Junghoo (John) Cho,  cho@cs.ucla.edu