|
MIT CSAIL
6.819/6.869: Advances in Computer Vision |
|
Fall 2015 |
|
Final Project
Final Project is an opportunity for you to apply what you have learned in class to a problem of your interest in computer vision. We strongly recommand a team of 2-4 people (except for the survey, which should be individual-based). There are three project options you can pick from:
Report
Due: December 10, 2015
The report should be 4 - 6 pages (the upper limit of 6 pages is strict!) in CVPR format. It should be structured like a research paper, with sections for Introduction, related work, the approach/algorithm, experimental results, conclusions and references.
You should describe and evaluate what you did in your project, which may not necessarily be what you hoped to do originally. A small result described and evaluated well will earn more credit than an ambitious result where no aspect was done well. Be accurate in describing the problem you tried to solve. Explain in detail your approach, and specify any simplifications or assumptions you have taken. Also demonstrate the limitations of your approach. When doesn’t it work? Why? What steps would you have taken have you continued working on it? Make sure to add references to all related work you reviewed or used.
You are allowed to submit any supplementary material that you think it important to evaluate your work, however we do not guarantee that we will review all of that material, and you should not assume that. The report should be self-contained.
Submission: submit your report to stellar as a pdf file named <YOUR_LAST_NAME>.pdf. Submit any supplementary material as a single zip file named <YOUR_LAST_NAME>.zip. Add a README file describing the supplemental content.
Option 1: Mini Places Challenge.
Submission:
- Students have to sign up with a team name and the team members to receive a team code and instructions for submitting to the leaderboard. This can be done here.
- After sign-up, upload the prediction resut through here. Each team is allowed to upload a submission at most every 4 hours
- The leaderboard is here.
Download: Data (460MB), Development Kit, Pre-trained network, Presentation Template
CNN packages: MatConvNet (recommended), Caffe
The goal of this challenge is to identify the scene category depicted in a photograph. The data for this task comes from the Places2 dataset which contains 10+ million images belonging to 400+ unique scene categories. Specifically, the mini challenge data for 6.869 will be a subsample of the above data, consisting of 100,000 images for training, 10,000 images for validation and 10,000 images for testing coming from 100 scene categories. The images will be resized to 128*128 to make the data more manageable. Further, while the end goal is scene recognition, a subset of the data will contain object labels that might be helpful to build better models.
For each image, algorithms will produce a list of at most 5 scene categories in descending order of confidence. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple scene categories in an image given that many environments have multi-labels (e.g. a bar can also be a restaurant) and that humans often describe a place using different words (e.g. forest path, forest, woods). The exact details of the evaluation are available on the Places2 challenge website.
Students should improve the classification accuracy of their network models on the validation set of mini places challenge. The evaluation server would be available from Tue 11/24/2015, so that students could submit their prediction of the test set for final evaluation and ranking in the challenge leaderboard.
We encourage students to use Amazon's EC2 for computation if they do not have access to their own GPUs to train deep networks. Students can sign up to receive free $100 credit through the AWS Educate program. We encourage students to use g2.2xlarge instances running Ubuntu for maximal ease of installing and using popular deep learning packages such as MatConvNet and Caffe. Note that $100 of Amazon credit allows you to run a g2.2xlarge GPU instance for approximately 6 days without interruption (you should keep it on only while using it). In a larger group, you will get more total available compute time.
Important: For the 6.869 challenge, the dataset is different from the Places2 Challenge dataset. You do not need to register for the Places2 Challenge or download that data. You can only use the data provided in this challenge to train your models. You cannot use models that have been trained using other datasets e.g., ImageNet or the full Places database.
Suggestion:
- It is very helpful to go through the examples of training mnist, cifar, and imagenet (optionally) in MatConvNet.
- Build the whole training pipeline first then train the reference network to see if it matches the performance of the given pre-trained network (refNet1).
- Use data augmentation, deeper layers, or object annotation etc, to boost the classification accuracy.
If you have questions, please contact Aditya (khosla@csail.mit.edu) or Bolei (bolei@mit.edu).
Option 2: Your own project.
Download: Presentation Template
You could select a topic in computer vision that interests you most and work on it as your course project. Potential projects could be based on applications and models:
- Applications: You would apply the techniques of computer vision to some specific applications with your background and interest, such as some image processing mobile APP and video recognition software.
- Models: You would build up some new models, or improve previous models or methods, then evaluate the proposed models systematically on some standard image datasets to show the improvement.
You could take a look at the Resources (image datasets and papers) in the
Course Materials for some inspiration. Before proceeding this option, please find teammates through Piazza then draft
a summary of the project proposal together and send it to the instructors Aude and Yusuf for plausibility analysis, then set up a meeting about the project detail.
Option 3: Survey (available to 6.819 students only).
Select a topic (to be discussed with one of instructors), then select 10-12 papers and go through them. Write a 2500-words survey article. This option is individual-based. Contact Aude (oliva@mit.edu) about the instructions for writing a survey.
Grading Policy.
Final project occupies 40% of the course grade. The following is the weight for three parts:
- Project proposal (5%)
- Research component of final project (30%)
- Abstract (3%)
- Introduction (3%)
- Related work (3%)
- Approach (and technical correctness) (6%)
- Experimental results (and technical correctness) (6%)
- Conclusion (2%)
- References (1%)
- Overall clarity of the report (3%)
- Reproducibility: can the work be reproduced from the information given in the report? (3%)
- Final presentation (5%) for 6.869 students only
Project Proposal
Due: October 24, 2013
The proposals should be just a page, and should describe what you plan to do (and who with, if appropriate). In the proposal, persuade us that it will be feasible for you to do it: lay out the tasks, and give a timeline for when you'll do each task. You can work by yourself or in pairs. Projects by pairs should be correspondingly more substantial.
Regarding the project topics: It should be something you're excited about. Anything related to computer vision is fine. We can help you with topics if you want ideas. We want it to be something new that you do for this class, so you can't submit a paper you've done for your RA or a project from another class. But something topically related to your RA is fine, and if it becomes a paper you submit for publication, that's ideal, of course.
Presentation
Due: November 26, 2013
The project presentation should be clear, informative, and short. You should briefly describe the problem you have chosen, and present an overview of your approach and results. The time allotted to each presentation is 5 minutes. We’ll have to be strict with the timing to accommodate all the students, so make sure your presentation fit within that time.
Submission: We will use one computer for the presentations in order to avoid the cost of everyone setting up their laptops. You should upload your presentation to stellar by the due time above as a single ppt or pdf file named <YOUR_LAST_NAME>.ppt (or .pdf). If your presentation has additional files (e.g. videos), upload it as a single zip file with the same naming convention. Late submissions are not allowed, and no further editing will be possible after submission.
Report
Due: December 11, 2013
The report should be 5 - 8 pages (the upper limit of 8 pages is strict!) in CVPR format. It should be structured like a research paper, with sections for Introduction, related work, the approach/algorithm, experimental results, conclusions and references.
You should describe and evaluate what you did in your project, which may not necessarily be what you hoped to do originally. A small result described and evaluated well will earn more credit than an ambitious result where no aspect was done well. Be accurate in describing the problem you tried to solve. Explain in detail your approach, and specify any simplifications or assumptions you have taken. Also demonstrate the limitations of your approach. When doesn’t it work? Why? What steps would you have taken have you continued working on it? Make sure to add references to all related work you reviewed or used.
You are allowed to submit any supplementary material that you think it important to evaluate your work, however we do not guarantee that we will review all of that material, and you should not assume that. The report should be self-contained.
Submission: submit your report to stellar as a pdf file named <YOUR_LAST_NAME>.pdf. Submit any supplementary material as a single zip file named <YOUR_LAST_NAME>.zip. Add a README file describing the supplemental content. Late submissions are not allowed.