6.819/6.869: Advances in Computer Vision

Fall 2016

[Home | Schedule | Course Materials | Final Project | Piazza | Stellar ]

Final Project

Final Project is an opportunity for you to apply what you have learned in class to a problem of your interest in computer vision. We strongly recommand a team of 2-4 people.

Proposal (only for Option 3)
Due: Thu Oct 27 . Upload to stellar.

Due: Sun Dec 12

The report should be 4 pages for 6.819, and 6 pages for 6.869 (the upper limit of 6 pages is strict!) in CVPR format. It should be structured like a research paper, with sections for Introduction, related work, the approach/algorithm, experimental results, conclusions and references. Project reports should be individually submitted and the contributions of each team member should be clearly described.

Regarding the reports:

You should describe and evaluate what you did in your project, which may not necessarily be what you hoped to do originally. A small result described and evaluated well will earn more credit than an ambitious result where no aspect was done well. Be accurate in describing the problem you tried to solve. Explain in detail your approach, and specify any simplifications or assumptions you have taken. Also demonstrate the limitations of your approach. When doesn’t it work? Why? What steps would you have taken have you continued working on it? Make sure to add references to all related work you reviewed or used.

You are allowed to submit any supplementary material that you think it important to evaluate your work, however we do not guarantee that we will review all of that material, and you should not assume that. The report should be self-contained.

Submission: submit your report to stellar as a pdf file named <YOUR_LAST_NAME>.pdf. Submit any supplementary material as a single zip file named <YOUR_LAST_NAME>.zip. Add a README file describing the supplemental content.

Option 1: Mini Places Challenge.


- Students have to sign up with a team name and the team members to receive a team code and instructions for submitting to the leaderboard. This can be done here.

- After sign-up, upload the prediction resut through here. Each team is allowed to upload a submission at most every 4 hours

- The leaderboard is here.

Download: Data (460MB), Development Kit, Pre-trained network, Presentation Template

The goal of this challenge is to identify the scene category depicted in a photograph. The data for this task comes from the Places2 dataset which contains 10+ million images belonging to 400+ unique scene categories. Specifically, the mini challenge data for 6.869 will be a subsample of the above data, consisting of 100,000 images for training, 10,000 images for validation and 10,000 images for testing coming from 100 scene categories. The images will be resized to 128*128 to make the data more manageable. Further, while the end goal is scene recognition, a subset of the data will contain object labels that might be helpful to build better models.

For each image, algorithms will produce a list of at most 5 scene categories in descending order of confidence. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple scene categories in an image given that many environments have multi-labels (e.g. a bar can also be a restaurant) and that humans often describe a place using different words (e.g. forest path, forest, woods). The exact details of the evaluation are available on the Places2 challenge website.

Students should improve the classification accuracy of their network models on the validation set of mini places challenge. The evaluation server would be available from Mon 11/21/2016, so that students could submit their prediction of the test set for final evaluation and ranking in the challenge leaderboard.

We encourage students to use Amazon's EC2 for computation if they do not have access to their own GPUs to train deep networks. Students can sign up to receive free $100 credit through the AWS Educate program. We encourage students to use g2.2xlarge instances running Ubuntu for maximal ease of installing. Note that $100 of Amazon credit allows you to run a g2.2xlarge GPU instance for approximately 6 days without interruption (you should keep it on only while using it). In a larger group, you will get more total available compute time.

Important: For the 6.869 challenge, the dataset is different from the Places2 Challenge dataset. You do not need to register for the Places2 Challenge or download that data. You can only use the data provided in this challenge to train your models. You cannot use models that have been trained using other datasets e.g., ImageNet or the full Places database.


Option 2: Choose from the suggested topics.

You could choose from the list of suggested topics (updating):

Option 3: Your own project.

You could select a topic in computer vision that interests you most and work on it as your course project. Potential projects could be based on applications and models: Please clearly specify and justify: (1)what will be the approach; (2)why is it interesting; (3) how will you evaluate success. You could take a look at the Resources (image datasets and papers) in the Course Materials for some inspiration. Before proceeding this option, please find teammates through Piazza then draft a summary of the project proposal together and send it to the instructors for plausibility analysis, then set up a meeting about the project detail.

Grading Policy.

Final project occupies 40% of the course grade. The following is the weight for three parts:


The rules for forming groups are:
  1. Groups cannot mix students taking 6.819 and 6.869. All the members of the group should belong to the same course.
  2. Groups must have 2, 3 or 4 members.
  3. We do not allow groups of just 1 person. Collaboration work is part of the training.
  4. Reports should be individually submitted and it should highlight the contributions of each team member on a section of the paper.