Master 2 Internship in NLP, Nancy, 2024

Domain: The Semantics of Discourse Markers
Co-supervisors: Yannick Toussaint (Lab: LORIA; team: Orpailleur) & Mathieu Constant (Lab: ATILF)
Contact: Yannick.Toussaint@loria.fr and Mathieu.Constant@univ-lorraine.fr

Motivation and context
Discourse Markers [Sch87, DCP13] play an important role in communication in general (text, speech. . . ), on two dimensions. On the one hand, Discourse Markers (like therefore, for instance) are connectives which represent the semantic and pragmatic relation between discourse units, that are either clauses, sentences or groups thereof, within a hierarchical structure that represents the whole text. On the other hand, alongside connectives, there are many discourse particles, through which speakers express their own attitudes, like emotions, beliefs or, more generally, the way in which they perceive a situation. Particles like ah, tu parles (you bet), bel et bien (indeed) suggest various emotional and belief states. The combination of DM with prosodic properties, such as intonational contours [Pot05, BHMP15, DRPA+15], aids interpretation by conveying attitudes or discourse planning. Thus, DM constitute a central means of communicating discourse structure and speakers’ attitudes.
The internship is part of the CODIM ANR project (COmpositionality and DIscourse Markers, Agence Nationale de la Recherche) which involves three partners in Computational Linguistics. The intern student will interact with the other partners, and attend the meetings. The internship will take place at the LORIA for 5 months. It may lead to a 3-year PhD contract on the same topic.

Goals and Objectives
The internship aims at studying discourse markers for one of the following aspects:
1) Compositionality will be studied through two approaches. The first one is the statistical study of multiword association. Pattern Mining and Association Rules could provide good statistical measures to study the degree of freezing versus syntactic and semantic variations. The second approach consists in training deep neural networks to tag discourse markers. We will fine-tune, compare different Large Language Models (LLM) and study their abilities at identifying some kind of freezing using word embeddings. This leads to the next point.
2) Semantics: The real question concerns the semantics of the expression. The internship will first deal with written texts and we may study the semantics of the discourse markers studying and comparing their embeddings (is there a difference between d’autant que . . . and de plus . . . ), using probing tasks and even using prompt to query the language model.
3) Prosody: What about the meaning when introducing the prosody? The texts will be annotated with the prosody information. The thesis should study the difference that exists in the semantic representation of DMs with various intonations: e.g. d’autant que . . . ↗ and d’autant que . . . ↘.

The internship will require three main activities:
– Analysing Corpora and studying discourse;
– Proposing a way to model discourse markers;
– Implementing and testing the different aspects.

Some references about Discourse Markers in Linguistics
[BHMP15] C. Beyssade, B. Hemforth, J.-M. Marandin, and C. Portes. Prosodic realizations of information focus in French. In L. Frazier and E. Gibson, editors, Explicit and Implicit Prosody in Sentence Processing, pages 39–61. Springer, 2015.
[DCP13] Liesbeth Degand, Bert Cornillie, and Paola Pietrandrea, editors. Discourse Markers and Modal Particles: Categorization and description. John Benjamins, 2013.
[DRPA+15] E. Delais-Roussarie, B. Post, M. Avanzi, C. Buthke, A. Di Cristo, I. Feldhausen, S.-A. Jun, Ph. Martin, T. Meisenburg, A. Rialland, R. Sichel-Bazin, and H.-Y. Yoo. Intonational phonology of French: Developing a Tobi system for French. In S. Frota and P. Prieto, editors, Intonation in Romance, pages 63–100. Oxford UP, 2015.
[Pot05] C. Potts. The Logic of Conventional Implicatures. Oxford UP, 2005.
[Sch87] D. Schiffrin. Discourse Markers. Cambridge UP, 1987.