The advent of powerful personal computers and the Internet resulted in exponential growth in digital information. Anyone can create rich contents using his or her own personal digital device and make it available online. Many high-quality data sources are also available online within a single click or touch.
The growth of digital information, however, has brought in tremendous challenges in managing, organizing and accessing such information. The inherent heterogeneity of the information as well as the wide variety of unstructured text, semi-structured, and structured data make it challenging to handle them. In this course, we will learn foundational technologies behind the management of digital information. In particular, we will primarily learn the theory and practice behind information retrieval (IR) system that allows intuitive “searches” on textual data.
The course consists of lectures, paper readings, software projects and exams. Other than taking the lectures, students will have to read research papers throughout the quarter and write short summaries for the papers. Students will also have to finish a series of software project in which they use open-source tools to construct a “search engine” on a reasonably large text dataset. The project will give students hands-on experience on using a highly popular IR system for real dataset.
There are three high-level goals behind this class:
There is no required textbook, but the following book can be a useful reference in case you need to learn more details on a certain topic.
The final grade will be assigned based on the following criteria:
The final grading will be done based on the curve. Roughly 30% students will get A, 40% B and the remaining 30% C or D.
At http://www.deanofstudents.ucla.edu/Academic-Integrity, the Office of the Dean of Students presents University policy on academic integrity, with special attention to cheating, plagiarism, and student discipline. The policy summaries don’t specifically address programming assignments in detail, so we state our policy here.
Each of you is expected to submit your own original work. On many occasions it may be useful and have an educational value to ask others (the instructor, the TA, or other students) for hints or help, or to talk generally about programming strategies. Such activity is both acceptable and encouraged, but you must indicate any assistance (human or otherwise) that you received. Any assistance received that is not given proper citation will be considered plagiarism. In addition, to avoid unintended sharing and copying of your work, publishing your work on a public repository, such as public github, is strictly prohibited.
So where do we draw the line? We’ll decide each case on its merits, but here are some categorizations:
In any event, you are responsible for coding, understanding, and being able to explain on your own or as a team all project work that you submit.
Be especially careful about giving a copy of your work to a friend who “just wants to look at it to get some ideas”. Frequently, that friend ends up panicking and simply copies your work, thus betraying you and putting you through the hassle of an academic discipline hearing.
You must abide by this policy in addition to the policies expressed in the UCLA Student Conduct Code. If a violation of the policies is suspected, in accordance with University procedures, we will have to submit the case to the Dean. A typical penalty for a first plagiarism offense is suspension for one or more quarters. A second offense usually results in dismissal from the University of California.
Please note that routine alternate exams will not be offered. The university strongly discourages students from enrolling in two classes given at the same time, and instructors are under no obligation to accommodate such students. If you provide an extraordinarily compelling case then an alternate exam may be given, but alternate exams are always oral exams given by the instructor privately in his office.
All students must join and utilize CS246 discussion group at Piazza by registering at https://piazza.com/ucla/winter2019/cs246. This online discussion group will be the primary channel for students to ask course and project related questions and for others, including the TA, to answer them. Note that some of your questions may have already been discussed and answered by others, so please search the board first before asking a question. When you join the discussion group, you may choose to receive email notifications for new messages or just to read them on the board. You are responsible for all your posts to Piazza. Thus, please do NOT post any content that might be considered as a violation of honor codes, such as your source code to the project. If you have any doubt or concern, please ASK the TA/lecturer before posting it.