CS246: Web Information Systems – Fall 2020
Time and Place
Lectures
- Hours: Monday and Wednesday, 10:00 AM - 11:50 AM
- Location: Zoom https://ucla.zoom.us/j/96835293526
Discussion Section
- Hours: Friday 10:00 AM - 11:50 AM
- Location: Zoom https://ucla.zoom.us/j/96835293526
Course Staff
Instructor
- Name: Junghoo “John” Cho
- Email: cho@cs.ucla.edu
- Office: Zoom link available on CCLE
- Office hour: Tuesday 2:30 PM - 3:30 PM
TA
- Name: Manoj Reddy
- Email: mdareddy@cs.ucla.edu
- Office: Zoom link available on CCLE
- Office hour: Thursday 10 AM - 12 PM PST
Course Description
The advent of powerful personal computers and the Internet resulted in exponential growth in digital information. Anyone can create rich contents using his or her own personal digital device and make it available online. Many high-quality data sources are also available online within a single click or touch.
The growth of digital information, however, has brought in tremendous challenges in managing, organizing and accessing such information. The inherent heterogeneity of the information as well as the wide variety of unstructured text, semi-structured, and structured data make it challenging to handle them. In this course, we will learn foundational technologies behind the management of digital information. In particular, we will primarily learn the theory and practice behind information retrieval (IR) system that allows intuitive “searches” on textual data.
Course Organization
The course consists of lectures, paper readings, software projects and exams. Other than taking the lectures, students will have to read research papers throughout the quarter and write short summaries for the papers. Students will also have to finish a series of software project in which they use open-source tools to construct a “search engine” on a reasonably large text dataset. The project will give students hands-on experience on using a highly popular IR system for real dataset.
Learning Outcomes
There are three high-level learning outcomes from this class:
- Learn the theory: By the end of the quarter, students will learn how IR systems (or web search engines) work.
- Learn to use tools: Theory without practice can be just an intellectual pastime. By getting their “hands dirty” with software projects, students will learn how to use existing software tools for building IR systems to implement what they learned in the class.
- Learn to read research papers: A big part of a software engineer or research scientist’s task is to read and understand research papers. Throughout the quarter, students will read a number of research papers on their own, and will hopefully get more “comfortable” reading them.
Course Logistics
Web site
http://oak.cs.ucla.edu/classes/cs246/
Textbook
There is no required textbook, but the following book can be a useful reference in case you need to learn more details on a certain topic.
- Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze
Assignments
- Paper summaries: students will have to read research papers throughout the quarter and write short summaries for the papers.
- Projects: Students will have to finish a series of software project in which they use open-source tools to construct a “search engine” on a reasonably large text dataset.
All homeworks and projects should be submitted via Gradescope
Exams
- Final: December 11, Friday, 4-6PM
- If you are in a different time zone, please submit your time zone info via https://forms.gle/LgeRTAwyhnqAwzNT8
Alternate Exams
Please note that routine alternate exams will not be offered. The university strongly discourages students from enrolling in two classes given at the same time, and instructors are under no obligation to accommodate such students. If you provide an extraordinarily compelling case then an alternate exam may be given, but alternate exams are always oral exams given by the instructor privately in his office.
Grading
The final grade will be assigned based on the following criteria:
- Paper summaries: 10%
- Project: 50%
- Final: 40%
The final grading will be done based on the curve. Roughly 30% students will get A, 40% B and the remaining 30% C or D.
Online discussion group
All students must join and utilize CS246 discussion group at Piazza by registering at https://piazza.com/ucla/fall2020/cs246. This online discussion group will be the primary channel for students to ask course and project related questions and for others, including the TA, to answer them. Note that some of your questions may have already been discussed and answered by others, so please search the board first before asking a question. When you join the discussion group, you may choose to receive email notifications for new messages or just to read them on the board. You are responsible for all your posts to Piazza. Thus, please do NOT post any content that might be considered as a violation of honor codes, such as your source code to the project. If you have any doubt or concern, please ASK the TA/lecturer before posting it.
Academic Integrity
At http://www.deanofstudents.ucla.edu/Academic-Integrity, the Office of the Dean of Students presents University policy on academic integrity, with special attention to cheating, plagiarism, and student discipline. The policy summaries don’t specifically address programming assignments in detail, so we state our policy here.
Each of you is expected to submit your own original work. On many occasions it may be useful and have an educational value to ask others (the instructor, the TA, or other students) for hints or help, or to talk generally about programming strategies. Such activity is both acceptable and encouraged, but you must indicate any assistance (human or otherwise) that you received. Any assistance received that is not given proper citation will be considered plagiarism. In addition, to avoid unintended sharing and copying of your work, publishing your work on a public repository, such as public github, is strictly prohibited.
So where do we draw the line? We’ll decide each case on its merits, but here are some categorizations:
Acceptable:
- Clarifying what an assignment is requiring
- Discussing algorithms for solving a problem, perhaps accompanied by pictures, without writing any code
- Helping someone find a minor problem with their code, provided that offering such assistance doesn’t require examining more than a few lines of code
- Using codes from the course text, from reference materials linked on the project page, or from the instructor or the TAs.
Unacceptable:
- Turning in any portion of someone’s work without crediting the author of that work, if they are not from the sources mentioned above.
- Using project solutions from earlier offerings of this or any other class
- Soliciting help from an online source where not all potential respondents are subject to the UCLA Student Conduct Code
- Receiving from another person (other than the instructor or a TA) a code fragment that solves any portion of a programming assignment
- Writing for or with another student (except your partner) a code fragment that solves any portion of a programming assignment
In any event, you are responsible for coding, understanding, and being able to explain on your own or as a team all project work that you submit.
Be especially careful about giving a copy of your work to a friend who “just wants to look at it to get some ideas”. Frequently, that friend ends up panicking and simply copies your work, thus betraying you and putting you through the hassle of an academic discipline hearing.
You must abide by this policy in addition to the policies expressed in the UCLA Student Conduct Code. If a violation of the policies is suspected, in accordance with University procedures, we will have to submit the case to the Dean. A typical penalty for a first plagiarism offense is suspension for one or more quarters. A second offense usually results in dismissal from the University of California.