The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

SemanticDistance Dialogues

Jamie Reilly, Hannah R. Mechtenberg, Emily B. Myers, Jonathan E. Peelle

2025-08-27

Dialogues

This could be a conversation transcript or any language sample where you care about talker/interlocutor information (e.g., computing semantic distance across turns in a conversation). Your dataframe should nominally contain a text column and a speaker/talker column.

sample dialogue transcript included in the package

knitr::kable(head(Dialogue_Typical, 6), format = "pipe")

text	speaker
Hi Peter. It’s nice to see you	Mary
Hi Mary. Hot out today	Peter
It sure is.	Mary
Did you read that book?	Peter
No I haven’t had time.	Mary

Step 1: Clean Dialogue Transcript (clean_dialogue)

Decide on your cleaning parameters (e.g., stopwords? lemmatization?). Specify these in the argument(s) to your function calls.

Arguments to clean_dialogue() are:
dat your raw dataframe with at least one column of text AND a talker column
wordcol column name (quoted) containing the text you want cleaned
who_talk column name (quoted) containing the talker ID (will convert to factor)
omit_stops omits stopwords, T/F default is TRUE
lemmatize transforms raw word to lemmatized form, T/F default is TRUE

Dialogue_Cleaned <- clean_dialogue(dat=Dialogue_Typical, wordcol="text", who_talking="speaker", omit_stops=TRUE, lemmatize=TRUE)
knitr::kable(head(Dialogue_Cleaned, 12), format = "pipe")

id_row_orig	text_initialsplit	speaker	word_clean	id_row_postsplit	turn_count
1	hi	Mary	NA	1	1
1	peter	Mary	peter	2	1
1	its	Mary	NA	3	1
1	its	Mary	NA	4	1
1	nice	Mary	nice	5	1
1	to	Mary	NA	6	1
1	see	Mary	see	7	1
1	you	Mary	NA	8	1
2	hi	Peter	NA	9	2
2	mary	Peter	mary	10	2
2	hot	Peter	hot	11	2
2	out	Peter	out	12	2

Step 2: Compute Semantic Distances

Dialogue Distance Turn-to-Turn (dist_dialogue)

Averages the semantic vectors for all content words in a turn then computes the cosine distance to the average of the semantic vectors of the content words in the subsequent turn. Note: this function only works on dialogue samples marked by a talker variable (e.g., conversation transcripts). It averages across the semantic vectors of all words within a turn and then computes cosine distance to all the words in the next turn. You just need to feed it a transcript formatted with clean_dialogue. ‘dist_dialogue’ will return a summary dataframe that distance values aggregated by talker and turn (id_turn). Arguments to dist_dialogue are:
dat = dataframe w/ a dialogue sample cleaned and prepped using ‘clean_dialogue’

DialogueDists <- dist_dialogue(dat=Dialogue_Cleaned, who_talking="speaker")
knitr::kable(head(DialogueDists, 12), format = "pipe", digits=2)

turn_count	speaker	n_words	glo_cosdist	sd15_cosdist
1	Mary	3	0.83	0.58
2	Peter	4	0.85	0.58
3	Mary	1	0.86	0.58
4	Peter	3	0.86	0.45
5	Mary	5	NA	NA

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.