Much of my research has been focused on data-driven and linguistically-motivated models of structure in natural language.

Topics in my current and past work include: Syntactic parsing, analysis of child language, incremental processing, interfacing shallow and deep syntactic analysis, parser ensembles, discriminative structured models, parsing efficiency, and descriptive adequacy of syntactic formalisms. I have applied this research various areas including child language development, bioinformatics and virtual human dialogue systems.

Current projects at UC Davis

  • Neural models of sentence meaning and similarity for authoring of specialized content for domain-specific interactive language systems

    Funded by the Navy through a subcontract from Soartech

    We are using neural NLP approaches to leverage large generic language resources (e.g. Wikipedia) to help authors of interactive systems (e.g. spoken dialogue systems, chatbots) create content for specialized domains for which no or little data is available.

  • Data-driven cross-linguistic modeling of word ordering preferences

    We are investigating how various factors, such as dependecy length, lexical frequency and co-occurrence, and syntactic complexity affect preference between different grammatical alternatives that express the same meaning, and what the similarities and differences between these preferences in different languages tell us about language processing and syntax.

  • Empirical analysis of second language acquisition

    Funded by a UC Davis Faculty Research Grant

    In collaboration with the Department of Spanish and Portuguese, we are collecting a large corpus of essays authored by learners of Spanish as a second language at various levels, and analyzing various aspects of second language learning through large amounts of data collected in real-time.

Recent projects at USC

  • More expressive models of linguistic structure

    Funded by the U.S. Army. (2014 to 2015)

    We are investigating new parsing approaches suitable for joint representation of multiple levels of linguistic analysis, including efficient structured prediction for data-driven parsing involving arbitrary directed graphs as output.
  • Automated Analysis of Discourse Structure

    Funded by the U.S. Army. (2013 to 2014)

    We are developing new parsing and structured prediction approaches for analysis of coherence relations in text, in the style of Rhetorical Structure Theory, and investigating new evaluation frameworks for evaluation of discourse analysis tools. Michael Heilman (ETS) and I are nearing the release of a new (very fast) discourse parser with state-of-the-art performance.
  • Neurobiology of Narrative Framing

    Funded by DARPA (Narrative Networks program).
    In collaboration with USC colleagues in the Institute for Creative Technologies (Andrew Gordon and Morteza Dehghani), and in the Brain and Creativity Institute (Antonio Damasio, Hanna Damasio, Mary Helen Immordino-Yang and Jonas Kaplan).

    We are studying how people use narrative framing to describe events in their lives, and the psychological effects of different types of framing on readers. For more information, visit our project page.
  • Modeling Human Communication Dynamics

    Funded by NSF.
    In collaboration with Louis-Philippe Morency

    We are building computational models of face to face communication, taking into account the omnidirectional flow of information conveyed asynchronously through verbal and non-verbal channels. Under this project, we have started looking into multimodal sentiment analysis using YouTube videos.

Past Projects

  • Joint processing for speech recognition and natural language understanding in dialogue systems

    Funded by TATRC. In collaboration with David Traum's dialogue group, of which I am a member, and Shri Narayanan's Signal Analysis and Interpretation Laboratory

    We are looking at several ways to achieve better integration and synergy between speech recognition and language understanding, using datasets collected from user sessions with ICT's virtual human dialogue systems. This work involves syntactic and semantic analysis and language modeling.
  • Semi-supervised discriminative language modeling

    In collaboration with Brian Roark and the JHU CLSP 2011 summer workshop team

    During the Johns Hopkins summer workshop in 2011, we developed several approaches for hallucinating speech n-best lists, which are necessary for training discriminative language models. Building on the progress we have made during the workshop (described in our workshop report and three ICASSP 2012 papers), we are continuing to pursue training of (application specific) discriminative language models from arbritrary text sources.
  • Dynamic programming for linear-time parsing

    Funded by Google. In collaboration with Liang Huang.

    The most widely used traditional approaches for natural language parsing use dynamic programming algorithms that explore the exponentially many possible analyses of a sentence in polynomial time. More recent research on linear time parsing has produced faster, but less accurate alternatives. We have merged these two threads of parsing research, producing a linear time left-to-right data-driven parsing approach that searches an exponential space, achieving high accuracy very efficiently. We are exploring the use of this approach in dependency and constituent parsing and language modeling.

  • For other research and past projects (CHILDES syntactic analysis, HPSG parsing, biomedical text mining, etc.), see my publications.