Keywords : corpus linguistics, phraseology, discourse markers, association measures, (non-)compositionality
Motivation

There is a rich literature related to complex or multiword expressions (see Bathia et al. 2023 for a recent overview) and phraseology (see Piirainen et al. 2020, Pastor & Mitkov 2022, Mel’čuk 2023 for different perspectives). Works in this domain aim in particular at identifying stable combinations and evaluating their possible degrees of semantic rigidity. Similar problems arise for co-occurring elements whose main function is discursive, in the sense of contributing to discourse organization or speaker’s manifestation in utterance, for instance ah+bon, non+mais+alors, donc+du coup, ah+ben+tu parles, mais+enfin (Waltereit 2007, Dostie 2013, Crible 2018, Crible & Degand 2019, Cuenca & Crible 2019, Haselow 2019, Dargnat 2022). Studying these co-occurrences raises a number of questions which, in spite of the different perspectives they require, all concern the status of complex discourse expressions. The present workshop will focus on the following topics.

1. Annotating the elements of cooccurrences

Some elements belong to several categories. For instance, bon can be an adjective, which includes cases of idioms like à bon escient, à bon droit, a noun and an adverb or a discourse marker. For lexically ambiguous discourse markers, probabilistic or LLM-based1 taggers give middling/poor and unstable results on POS-recognition tasks. One can use finite-state automata to detect the category but there are still problems, most notably with elements whose discourse role is detected via unbounded dependencies. So, après is a preposition in (a) and a concessive adverb in (b), although the first eight words of the two sentences are the same.

              (a) [[après]PREP [le train qui est arrivé en retard]NP]PP [il y en avait un autre]S
               [[after]PREP [the train which was late]NP]PP [there was another one]S

                ‘After the train which was late there was another one’
              (b) [après]ADV [le train qui est arrivé en retard]NP [c’est pas la faute du conducteur]S

                [[after]ADV [the train which was late]NP]PP [it’s not the driver’s fault]S

  ‘This said, for the train which was late, the driver is not responsible’

Moreover, in the absence of a preliminary clause segmentation, and given the poor performances of sentencizers, the sentence or clause initial position cannot be reliably identified in spoken corpora. Thus, it is necessary to combine computer-assisted extraction methods, manual annotation and phonetic/prosodic information, whenever relevant (shortening, pauses, contours, duration, etc.). Lexical disambiguation is crucial for the next stage.

2. Evaluating mutual attraction between elements

Association measures (Desagulier 2017, Brezina 2018) are often considered as the technique of choice in order to evaluate the tendency of elements to cluster. These measures are mainly sensitive to two dimensions: exclusivity and frequency, that is, the tendency to occur together rather than separately and the difference between the expected and observed frequencies of the combinations. Directionality must also be taken into account. It corresponds to the possibility for an element to predict the occurrence of another element on its right or left. For instance, is ah a better predictor of a rightward bon than of another rightward marker (and conversely)? It is necessary to compare the results of various measures and test their efficiency and stability on different types of corpus. It is also useful to compare these results to those of LLM for fill-mask tasks, where the goal is to propose a candidate to fill a blank (the mask) inside a given sequence of words. For example, is a model able to propose bon as a filler for an incomplete sentence like ah <mask> je ne savais pas and for other similar patterns?

3. Semantic contribution of elements in a combination

There are a priori two main questions.

Firstly, should we assume that the contribution of an element to the meaning of the combination in which it occurs is “additive”? In that case the different elements of the combination contribute separately to its meaning. It seems to be true for mais enfin, for instance. Contrariwise, must we, at least in some cases, consider that the combination has a specific meaning, either because it inherits only some of the features of its components or because it has a global, non-decomposable, meaning, which seems to apply to pairs like ah bon? Secondly, For markers in isolation, like ah, tu sais, du coup, tu plaisantes, etc., prosody often allows one to identify values such as surprise, irony, dissatisfaction, etc. With co-occurrences such as allez + bon, non + mais + oh, tiens + donc, are the observed prosodic contours the results of juxtaposing the contours of each constituents, or is some constituent contour dominant and extended to the whole co-occurrence? In that case, how could we describe the interaction, if any, between prosody and semantics? Does one of these two dimensions drive the combination?

4. Taking variation into account

Severable variables can influence the production of discourse marker co-occurrences: the discourse genre (e.g. natural conversation, topic-controlled exchange, conference, debate, school presentation by pupils, fiction texts, etc.), the utterance situation (in particular the hierarchical relations between speakers), the individual parameters (age, social status, sex, academic and professional profile etc.), the corpus elaboration period, etc. It is also important to study the short/long term evolution of the co-occurrences to pinpoint their structure and possible idiomatization. This is for instance relevant when studying the emergence of du coup as a consequence marker and its combination with donc and alors, or the evolution of the verb+ donc marker series (tiens donc, dis donc, va donc, coudonc in québécois French). Such phenomena are discussed under general cover terms like grammaticalization, pragmaticalization and lexicalization (see Dostie 2004, Waltereit 2007, Heine et al. 2021).


1  LLM = Large Language Model.

Submission are closed
Conference Languages: English and French

Workshop organising committee: Mathilde Dargnat (ATILF, Université de Lorraine), Agnès Tutin (LIDILEM, Université Grenoble-Alpes)

Some references:
Bhatia A., Evang K., Garcia M., Giouli V., Han L., Taslimipoor S. 2023. Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023). Association for Computational Linguistics, Dubrovnik, Croatia.
Brezina V. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge UP.
Crible L. 2018. Discourse Markers and (Dis)fluency Forms and functions across languages and registers. Amsterdam: John Benjamins.
Crible L. and Degand L. 2019. « Domains and Functions: A two-dimensional account of discourse markers ». Discours 24, 35 p. (en ligne)
Cuenca M.-J. and Crible L. 2019. « Co-occurrence of discourse markers in English: From juxtaposition to composition ». Journal of Pragmatics 140, 171-184.
Dargnat M. 2022. « Mais enfin: construction et association ». Langages 225, 49-63.
Desagulier G. 2017. Corpus Linguistics and Statistics with R. New York: Springer.
Dostie G. 2004. Pragmaticalisation et marqueurs discursifs. Analyse sémantique et traitement lexicographique. Liège: De Boeck/Duculot.
Dostie G. 2013. « Les associations de marqueurs discursifs. De la cooccurrence libre à la collocation ». Linguistik 62(5), 15-45. (en ligne)
Haselow A. 2019. « Discourse Marker Sequences: Insights into the Serial Order of Communicative Tasks in Real-Time Turn Production ». Journal of Pragmatics 146, 1-18.
Heine B., Kaltenböck G., Kuteva T. & Long H. 2021. The Rise of Discourse Markers. Oxford: Oxford UP.
Mel’čuk I. 2023. General Phraseology, Theory and Practice. Linguisticae Investigationes Supplementa 36, Amsterdam: John Benjamins.
Pastor G. C. and Mitkov R. (éds). 2022. Proceedings of the 4th International Conference on Computation and Corpus-Based Phraseology. Cham: Springer.
Piirainen E. , Filatkina N., Stumpf S. and Pfeiffer C (éds). 2020. Formulaic Language and New Data. Theoretical and Methodological Implications. Berlin: de Gruyter.
Waltereit R. 2007. «  A propos de la genèse diachronique des combinaisons de marqueurs. L’exemple de bon ben et enfin bref ». Langue française 154, 94-109.

Domain: The Semantics of Discourse Markers
Co-supervisors: Yannick Toussaint (Lab: LORIA; team: Orpailleur) & Mathieu Constant (Lab: ATILF)
Contact: Yannick.Toussaint@loria.fr and Mathieu.Constant@univ-lorraine.fr

Motivation and context
Discourse Markers [Sch87, DCP13] play an important role in communication in general (text, speech. . . ), on two dimensions. On the one hand, Discourse Markers (like therefore, for instance) are connectives which represent the semantic and pragmatic relation between discourse units, that are either clauses, sentences or groups thereof, within a hierarchical structure that represents the whole text. On the other hand, alongside connectives, there are many discourse particles, through which speakers express their own attitudes, like emotions, beliefs or, more generally, the way in which they perceive a situation. Particles like ah, tu parles (you bet), bel et bien (indeed) suggest various emotional and belief states. The combination of DM with prosodic properties, such as intonational contours [Pot05, BHMP15, DRPA+15], aids interpretation by conveying attitudes or discourse planning. Thus, DM constitute a central means of communicating discourse structure and speakers’ attitudes.
The internship is part of the CODIM ANR project (COmpositionality and DIscourse Markers, Agence Nationale de la Recherche) which involves three partners in Computational Linguistics. The intern student will interact with the other partners, and attend the meetings. The internship will take place at the LORIA for 5 months. It may lead to a 3-year PhD contract on the same topic.

Goals and Objectives
The internship aims at studying discourse markers for one of the following aspects:
1) Compositionality will be studied through two approaches. The first one is the statistical study of multiword association. Pattern Mining and Association Rules could provide good statistical measures to study the degree of freezing versus syntactic and semantic variations. The second approach consists in training deep neural networks to tag discourse markers. We will fine-tune, compare different Large Language Models (LLM) and study their abilities at identifying some kind of freezing using word embeddings. This leads to the next point.
2) Semantics: The real question concerns the semantics of the expression. The internship will first deal with written texts and we may study the semantics of the discourse markers studying and comparing their embeddings (is there a difference between d’autant que . . . and de plus . . . ), using probing tasks and even using prompt to query the language model.
3) Prosody: What about the meaning when introducing the prosody? The texts will be annotated with the prosody information. The thesis should study the difference that exists in the semantic representation of DMs with various intonations: e.g. d’autant que . . . ↗ and d’autant que . . . ↘.

The internship will require three main activities:
– Analysing Corpora and studying discourse;
– Proposing a way to model discourse markers;
– Implementing and testing the different aspects.

Some references about Discourse Markers in Linguistics
[BHMP15] C. Beyssade, B. Hemforth, J.-M. Marandin, and C. Portes. Prosodic realizations of information focus in French. In L. Frazier and E. Gibson, editors, Explicit and Implicit Prosody in Sentence Processing, pages 39–61. Springer, 2015.
[DCP13] Liesbeth Degand, Bert Cornillie, and Paola Pietrandrea, editors. Discourse Markers and Modal Particles: Categorization and description. John Benjamins, 2013.
[DRPA+15] E. Delais-Roussarie, B. Post, M. Avanzi, C. Buthke, A. Di Cristo, I. Feldhausen, S.-A. Jun, Ph. Martin, T. Meisenburg, A. Rialland, R. Sichel-Bazin, and H.-Y. Yoo. Intonational phonology of French: Developing a Tobi system for French. In S. Frota and P. Prieto, editors, Intonation in Romance, pages 63–100. Oxford UP, 2015.
[Pot05] C. Potts. The Logic of Conventional Implicatures. Oxford UP, 2005.
[Sch87] D. Schiffrin. Discourse Markers. Cambridge UP, 1987.


Discourse markers have been examined in various models of grammar, such as Functional Discourse Grammar or Construction Grammar, considering their status as markers in discourse. From a macro-syntactic point of view, they are seen as peripheral, independent and movable entities enabling a bracketing function. In interaction-based models of grammar, such as Interactional Linguistics, the focus of analysis has been on their status as markers on discourse. As such, these approaches examine interpersonal and interactional functions, for instance, turn-taking, positioning of interlocutors vis à vis each other, vis à vis their discursive contribution and their attitude towards their own contribution.

The goal of this conference is to bring together research from different theoretical frameworks in order to find possible bridging points between grammar-based approaches of “markers in discourse”, function-based approaches of “markers in /on discourse” and discourse-based studies. We welcome studies which address at least one of the following issues:

1) Revisiting “in discourse” on a wider scale:
Studies would go beyond single discourse markers and examine for instance
· Clusters of discourse markers in their linearity
· Patterned co-occurrences of discourse markers and their syntactic slots (or macro-positions)
· Multiword units as discourse markers
· Discourse markers across discourse units (prospective and retrospective pointing function)

2) Revisiting “in / on discourse” on a wider scale:
Studies would need to make a link between position and function of discourse markers in and on language, considering for instance
· Discourse markers, sequentiality and cohesive function
· Discourse markers and discourse coherence
· Stretch of discourse and span of meaning negotiation
· Discourse markers, subjectification and intersubjectification
· Discourse markers and dynamics of interpersonal attitude

3) Revisiting markers “on discourse” on a wider scale:
Studies would need to be interaction-based and tackle the distribution of discourse markers in and across discourse genres and/or the ways in which discourse markers intertwine with context
· Discourse markers, and linguistic accommodation
· Discourse markers, contextualisation cues and indexicality
· Discourse markers, linguistic variation and language varieties

Submission are closed
Conference Languages: English and French

Invited Speakers
Maj-Britt Mosegaard Hansen, Professor of Linguistics and Pragmatics, University of Manchester, Great Britain
Graham Ranger, Professor, Université d’Avignon et des Pays du Vaucluse, France

Dates and venue: Metz, Île du Saulcy, France ; Friday 21 June / Saturday 22 June 2024
Friday: building B, rooms B110 and B110 ; Saturday: building A, rooms A208 and A209

Organising committee: Florine Berthe (ALTER, Université de Pau et des Pays de l’Adour), Mathilde Dargnat (ATILF, Université de Lorraine), Anita Fetzer (University of Augsburg), Isabelle Gaudy-Campbell (IDEA, Université de Lorraine)

Scientific committee: Karin Aijmer, Florine Berthe, Bernard Combettes, Mathilde Dargnat, Stefan Diemer, Anita Fetzer, Isabelle Gaudy-Campbell, Günther Kaltenböck, Matthias Klumm, Laure Lansari, Ursula Lenker, Diana Lewis, Francois Nemo

Some references:
Aijmer, Karin. English Discourse Particles: Evidence from a Corpus. John Benjamins, 2002.
Auer, Peter and Aldo Di Luzi (eds.). The Contextualization of Language. John Benjamins, 1992.
Cuenca, Maria Josep and Ludivine Crible. “Co-occurrence of discourse markers in English: From juxtaposition to composition”. Journal of Pragmatics 140, 2019, p. 171-184.
Crible, Ludivine. Discourse Markers and (Dis)fluency. Forms and functions across languages and registers. John Benjamins, 2018.
Degand, Liesbeth et al. Discourse Markers and Modal Particles: Categorization and Description. John Benjamins, 2013.
Dostie, Gaétane. Pragmaticalisation et marqueurs discursifs. Analyse sémantique et traitement lexicographique. De Boeck/Duculot, 2004.
Fischer, Kerstin, (eds). Approaches to Discourse Particles. Brill, 2006.
Hansen, Maj-Britt M. Particles at the Semantics/Pragmatics Interface: Synchronic and Diachronic Issues, a Study with Special Reference to the French Phrasal Adverbs. Elesevier, 2008.
Heine Bernd et al. The Rise of Discourse Markers. Oxford University Press, 2021.
Jucker, Andreas H. and Yael Ziv. Discourse Markers: Descriptions and Theory. John Benjamins, 1998.
Kaltenböck, Gunther et al. Outside the Clause: Form and function of extra-clausal constituents. John Benjamins, 2016.
Lansari, Laure. A Contrastive View of Discourse Markers: Discourse Markers of Saying in English and French. Palgrave Macmillan, 2020.
Pichler, Heike. The Structure of Discourse-Pragmatic Variation. John Benjamins, 2013.
Schiffrin, Deborah. Discourse Markers. Cambridge University Press, 1987.

PhD in Linguistics / Doctorat en Sciences du langage : 2023-2026
French discourse markers and compositionality

Duration: 36 months
Beginning: Fall 2023 (ideally October 2023)
Place: ATILF, From syntax to discourse axis (Nancy) and LLF (Paris)
Salary (net): about 1750 euros per month
Co-advisors: Mathilde Dargnat, Université de Lorraine et ATILF-CNRS, http://mathilde.dargnat.free.fr
and Jonathan Ginzburg, Université Paris Cité et LLF-CNRS, http://www.llf.cnrs.fr/fr/Gens/Ginzburg

To apply:
The electronic submission should include:
1) A CV of at most 5 pages, including the M1-M2 courses and the L3 (= bachelor)-M1-M2 grades.
2) A motivation letter.
3) The Master thesis and/or the submitted/accepted publications, if any, which can help the committee to appreciate the candidate’s abilities.
4) The name of one or two referees.

To be sent to: mathilde.dargnat (at) univ-lorraine.fr and yonatan.ginzburg (at) u-paris.fr
Deadline: June 27 2023 / extended to July 7 2023
Online interviews (in French): from July 2023 first week

Requisites:
Master 2 in Linguistics, Cognitive Sciences or NLP.
Excellent command of French.
A good background in formal or/and computational semantics would be welcome.

CODIM project short description: https://www.codim-project.org/object-objectives/

Dissertation content description
Topic: French discourse markers and compositionality
The dissertation will focus on a set of DM combinations and discuss their semantic and pragmatic properties and how they are related to those of the combined DM. This includes:
1. When the DM in a markedly frequent combination are intuitively close, accounting for the fact that their combination is not felt as redundant (and, as a result, awkward). Examples: donc du coup, alors donc, mais pourtant, mais quand même, et alors, donc voilà, etc.
2. When the DM are intuitively different, are they just complementary, which suggests that they introduce unconnected discourse relations/speaker manifestations), or do we need another type of analysis (see point 3)?
3. Is the combination compositional? Addressing this question requires discussing and possibly elaborating on existing compositional techniques. For instance, the current formal approaches in semantics are functional in an elementary mathematical sense: functions apply to arguments (which can themselves be functions) to deliver ‘interpretations’. Can we reduce the observed combinations to this type of mechanism? Also, in cases where such a reduction is possible, to what extent can it predict or motivate the strength of association between the DM which cluster into the combination? Why is this particular association more frequent than others? Does it correspond, for example, to specific discourse moves which play a prominent role in interactions?
4. Are there cases of repulsion (DM which do not occur together)? How come?

Indicative References
Couper-Kuhlen, E. & Kortmann, B. (Eds.). 2000. Cause-Condition-Concession-Contrast. Cognitive and Discourse Perspectives. Berlin: De Gruyter Mouton.
Crible, L. & Degand, L. 2019. Domains and Functions: A Two-Dimensional Account of Discourse Markers. Discours 24 (online), 35 p.
Crible, L. & Degand, L. 2021. Co-occurrence and ordering of discourse markers in sequences: A multifactorial study in spoken French. Journal of Pragmatics 177, 18-28.
Dargnat, M. 2020. Subjectivité et projection : le cas des particules discursives. In Actes du 7e Congrès Mondial de Linguistique Française, Montpellier, SHS Web of Conference 778, IDP Sciences.
Dargnat, M. 2022. Mais enfin : construction et association. Langages 225, 49-63.
Degand, L., Cornillie, B. & P. Pietrandrea (Eds.). 2013. Discourse Markers and Modal Particles. Categorization and Description. Amsterdam: Benjamins.
Dostie, G. 2004. Pragmaticalisation et marqueurs discursifs. Analyse sémantique et traitement lexicographique. Liège : De Boeck/Duculot.
Ginzburg, J. 2012. The Interactive Stance. Meaning for Conversation. Oxford: Oxford University Press.
Tian, Y. & Ginzburg, J. 2016. No I am: What are you saying “no” to? In Proceedings of Sinn und Bedeutung 21, 1241-1252.
Haselow, A. 2019. Discourse marker sequences: Insights into the serial order of communicative tasks in real-time turn production. Journal of Pragmatics 146, 1-18.
Haselow, A. & S. Hancil (Eds.). 2021. Studies at the Grammar-Discourse Interface. Discourse markers and discourse-related grammatical phenomena. Amsterdam: Benjamins.
Heine, B., Kaltenböck, G., Kuteva, T. & H. Long (Eds.). 2021. The Rise of Discourse Markers. Cambridge: Cambridge University Press.
Mosegaard Hansen, M.-B. 1998. The Function of Discourse Particles. Amsterdam: Benjamins.
Waltereit, R. 2007. A propos de la genèse diachronique des combinaisons de marqueurs. L’exemple de bon ben et enfin bref. Langue française 154, 94-109.


Scroll to Top