MIT CSAIL6.819/6.869: Advances in Computer Vision
Final Project is an opportunity for you to apply what you have learned in class to a problem of your interest in computer vision. We strongly recommand a team of 2-4 people (except for the survey, which should be individual-based). There are three project options you can pick from:
The report should be 4 - 6 pages (the upper limit of 6 pages is strict!) in CVPR format. It should be structured like a research paper, with sections for Introduction, related work, the approach/algorithm, experimental results, conclusions and references.
You should describe and evaluate what you did in your project, which may not necessarily be what you hoped to do originally. A small result described and evaluated well will earn more credit than an ambitious result where no aspect was done well. Be accurate in describing the problem you tried to solve. Explain in detail your approach, and specify any simplifications or assumptions you have taken. Also demonstrate the limitations of your approach. When doesn’t it work? Why? What steps would you have taken have you continued working on it? Make sure to add references to all related work you reviewed or used.
You are allowed to submit any supplementary material that you think it important to evaluate your work, however we do not guarantee that we will review all of that material, and you should not assume that. The report should be self-contained.
Submission: submit your report to stellar as a pdf file named <YOUR_LAST_NAME>.pdf. Submit any supplementary material as a single zip file named <YOUR_LAST_NAME>.zip. Add a README file describing the supplemental content.
- Students have to sign up with a team name and the team members to receive a team code and instructions for submitting to the leaderboard. This can be done here.
- After sign-up, upload the prediction resut through here. Each team is allowed to upload a submission at most every 4 hours
- The leaderboard is here.
The goal of this challenge is to identify the scene category depicted in a photograph. The data for this task comes from the Places2 dataset which contains 10+ million images belonging to 400+ unique scene categories. Specifically, the mini challenge data for 6.869 will be a subsample of the above data, consisting of 100,000 images for training, 10,000 images for validation and 10,000 images for testing coming from 100 scene categories. The images will be resized to 128*128 to make the data more manageable. Further, while the end goal is scene recognition, a subset of the data will contain object labels that might be helpful to build better models.
For each image, algorithms will produce a list of at most 5 scene categories in descending order of confidence. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple scene categories in an image given that many environments have multi-labels (e.g. a bar can also be a restaurant) and that humans often describe a place using different words (e.g. forest path, forest, woods). The exact details of the evaluation are available on the Places2 challenge website.
Students should improve the classification accuracy of their network models on the validation set of mini places challenge. The evaluation server would be available from Tue 11/24/2015, so that students could submit their prediction of the test set for final evaluation and ranking in the challenge leaderboard.
We encourage students to use Amazon's EC2 for computation if they do not have access to their own GPUs to train deep networks. Students can sign up to receive free $100 credit through the AWS Educate program. We encourage students to use g2.2xlarge instances running Ubuntu for maximal ease of installing and using popular deep learning packages such as MatConvNet and Caffe. Note that $100 of Amazon credit allows you to run a g2.2xlarge GPU instance for approximately 6 days without interruption (you should keep it on only while using it). In a larger group, you will get more total available compute time.
Important: For the 6.869 challenge, the dataset is different from the Places2 Challenge dataset. You do not need to register for the Places2 Challenge or download that data. You can only use the data provided in this challenge to train your models. You cannot use models that have been trained using other datasets e.g., ImageNet or the full Places database.
Download: Presentation TemplateYou could select a topic in computer vision that interests you most and work on it as your course project. Potential projects could be based on applications and models: