Ralf Brown
Principal Systems Scientist, Language Technologies Institute
- 5711 Gates & Hillman Centers
- 412-268-8298
Dr. Ralf Brown is a Principal Systems Scientist at Carnegie Mellon University's Language Technologies Institute (LTI). With a Ph.D. in Computer Science from Carnegie Mellon, Dr. Brown has been a pivotal member of the institute since 1993, contributing to groundbreaking advancements in machine translation, language identification, and computational linguistics. His distinguished career has been recognized with awards such as the IJCAI Distinguished Paper Award and Carnegie Mellon's Allen Newell Award for Research Excellence.
Dr. Brown's research spans the development of innovative tools and techniques in example-based machine translation, corpus linguistics, and low-resource language processing. He is the creator of the CMU-EBMT, an open-source example-based machine translation system, and the LTI LangID Corpus, a resource for training language identifiers in over 2,000 languages. His work combines theoretical insights with practical applications, advancing the frontiers of natural language processing in diverse and multilingual contexts.
A dedicated educator and mentor, Dr. Brown leads courses on algorithm design and computational linguistics, emphasizing hands-on learning and real-world problem-solving. He has repeatedly served as Chair of the PhD Admissions Committee. He also actively contributes to open-source projects, including the widely used darktable image editor. With decades of experience and a passion for linguistic innovation, Dr. Brown continues to inspire the next generation of researchers at LTI while driving progress in language technologies worldwide.
Research Areas
- Information Extraction
- Summarization and Question Answering
- Information Retrieval
- Text Mining and Analytics
- Machine Translation
- Natural Language Processing and Computational Linguistics
Research Statement
My research interests primarily revolve around multilingual processing, with sidelines in text categorization and information extraction:
Machine Translation
Until 2011, I worked primarily on example-based machine translation (EBMT), a data-driven translation approach that originated a few years before statistical machine translation, characterized by the use of individual examples from the training corpus during translation. I have also applied my EBMT system to cross-language information retrieval and speech-to-speech translation.
Digital Forensics
I have worked on reconstructing corrupted ZIP archives and on extracting text in arbitrary encodings from files and raw disk images. As part of this work, I developed language identification for more than 1,300 languages (since expanded to more than 2,000 languages), and am continuing to improve the accuracy with which languages can be identified.
Information Extraction
I have also worked on extracting actions and affected components from aircraft maintenance records.