Carnegie Mellon University

Graham Neubig

Graham Neubig

Associate Professor, Language Technologies Institute

  • 5409 —Gates & Hillman Centers

Research Area

Machine Learning, Machine Translation, Natural Language Processing and Computational Linguistics, Spoken Interfaces and Dialogue Processing

Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His research focuses on natural language processing and large language models, including both fundamental advances in model capabilities and applications to tasks such as software development. His final goal is that every person in the world should be able to communicate with each-other, and with computers in their own language. He also contributes to making NLP research more accessible through open publishing of research papers, advanced NLP course materials and video lectures, and open-source software, all of which are available on his web site.
  • Embodiment
  • Grounding
  • RoboNLP
  • Unsupervised Learning
  • Vision and Language

My research is concerned with language and its role in human communication. In particular, my long-term research goal is to break down barriers in human-human or human-machine communication through the development of natural language processing (NLP) technologies. This includes the development of technology for machine translation, which helps break down barriers in communication for people who speak different languages, and natural language understanding, which helps computers understand and respond to human language. Within this overall goal of breaking down barriers to human communication, I have focused on several aspects of language that both make it interesting as a scientific subject, and hold potential for the construction of practical systems.

Advances in core NLP technology: Human language is complex and nuanced, and handling this complexity in a computational way requires sophisticated algorithms or learning techniques, the development of which is far from a solved problem. The first major area of my work focuses on advances in core NLP technology, which improve the accuracy with which we can analyze, translate, generate, or reply to textual inputs. 

Models of language associated with other modalities: Human language does not exist in a vacuum, and is often associated with context from the world around it. The second thread of my research focuses on models of natural language associated with other modalities, tackling the problems, and exploiting the fascinating possibilities presented by having text associated with other types of information. Specifically, I have worked extensively with speech, and am also interested in associations between natural and formal languages such as source code.

Learning from naturally occurring data: Language data exists in abundance, particularly due to the advent of the internet, and it is possible to acquire extremely large amounts of this data for scientific or engineering purposes. A third thread of my research focuses on learning NLP models from naturally occurring data through the use of unsupervised, semi-supervised, or active learning, which allows us to exploit this data to improve models while reducing the necessity for costly manual annotation.