Hopefully Final Proposal
- Plan Project (total 30% of Plan):
- Obtain many (200+) samples of English text, mainly from Marlboro community members
- Questionnaire – see page 2.
- Put through Computer Aided Translation (CAT) program (Google Translate, Yahoo! Babel Fish, Bing Translator)
- Using languages for which I can find near-native speakers (French… (and maybe one more – Italian?))
- Compare CAT to human translation
- How does the CAT program do the translation? Introduce each and pros/cons.
- Statistical? – (eg. Google)
- Bayesian probabilities (Phrase-based translation)
- Rule-based? – (eg. Systran, Yahoo! Babelfish)
- Hybrid? – (eg. Yahoo! Babelfish, Bing Translator)
- Something else? Are there any of these? I think so.
- Linguistic perspective: Look at syntax, tree structures
- Discuss: What’s going on? Ambiguities, if applicable.
- Computer Science perspective: Bayesian probabilities
- Discuss: What’s going on?
- How are they being used?
- “Other Plan Components” (total 40% of Plan):
- Two papers:
- One on Phonology and its significance in language.
- One on Neural Networks, with toy code and other examples.
- Independent Portion (total 30% of Plan):
- In Bayesian probabilities, Markov models, and Artificial Intelligence (15%)
- In syntax and grammar. (15%)
Stuff to fill out on FPA
- Plan Summary:
"A study of Syntax and Bayesian probability with respect to Natural Language Processing and Machine Translation."
Jim's suggested language: "an investigation of natural language processing and machine translation" ... the rest is more specific than what's actually in (say) the project.