CS246 Projects

As part of class assignments, students will have to complete a set of programming projects.


We will assume that students are sufficiently proficient already with UNIX and have enough programming experience to use (or be willing to learn) the Java programming language proficiently. We expect the resources and links on the project pages will provide you with enough information to get you started even in case you are not familiar with either one. Since the projects are very important part of the final grade and will take a signficant amount of time, we strongly discourage students from taking this class unless they feel comfortable with these requirements.

System Setup

To help students set up the uniform environment for the class project, we will be using VirtualBox to run the Linux operating system in a virtual machine. VirtualBox allows a single machine to share resources and run multiple operating systems simultaneously. You will need to download the following files

and follow our VirtualBox setup instruction to install VirtualBox on your own machine.

The provided virtual machine image is based on Ubuntu 16.04, Oracle Java JDK 8, Gradle 4.1, ElasticSearch 5.6.2, and Mallet 2.0.8. If you have access to an equivalent machine that has Java JDK, ElasticSearch, and Mallet installed, you may use it instead of the virtual machine image. However, please note that we cannot provide support for systems other than the virtual machine image, and that your project MUST be runnable on the provided virtual machine. We will be using the virtual machine image for grading purposes, and if your submission does not work within this setup, you may get zero points. We cannot make any exceptions to your project schedule for problems incurred by using your own computing facilities.


The programming project consists of four submissions. In the first two projects, students will learn how to use and customize ElasticSearch, a widely-popular open-source search engine to provide quick and effective search on a small Wikpedia dataset. In the third project, students will build a simple spell checker that detects any potential misspelling and suggests a correction. In the last project, students will use Mallet, an open-source text mining engine, to analyize a text corpus using LDA, a highly-effective unsupervised topic model.

Late Submissions

We strongly encourage students to submit their work on time, but we do take late submissions. If your submission is late by one day, you will get 20% penalty. If your submission is late by two days, you will get 50% penalty. We do not take any submission two days after the deadline.