---
title: "Psychological Text Analysis with `nalanda`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Psychological Text Analysis with nalanda}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Purpose

This vignette shows how to use `nalanda` for the kind of workflow described by
Rathje et al. (2024): apply a simple prompt to many short texts, ask for a
numeric response, and compare model outputs to human annotations.

The goal here is not to reproduce every benchmark in the paper. The goal is to
give a simple getting-started pattern you can adapt for:

1. categorical sentiment,
2. discrete emotions,
3. offensiveness,
4. Likert-style sentiment or emotion ratings, and
5. multilingual datasets with a `language` column.

# 1. Set package options

As in the other live `nalanda` workflows, it is easiest to set model routing
once at the top of your script.

```{r, eval = TRUE}
library(nalanda)

options(
  nalanda.integration = "gpt-5-mini",
  nalanda.base_url = "https://ai-gateway.apps.cloud.rt.nyu.edu/v1/"
)

# In some Portkey/gateway setups the route slug is not the provider name.
# Verify the route with ellmer::models_portkey() or use a fully-qualified
# model string such as "@gpt-5-mini/gpt-5-mini" if that is the route that works
# in your gateway.
```

```{r, eval = FALSE}
ellmer::models_portkey(
  base_url = "https://ai-gateway.apps.cloud.rt.nyu.edu/v1/"
)
```

# 2. Create a small text dataset

The paper works row-wise over tweets or headlines. `run_text_analysis()` uses
the same pattern: one row per text.

```{r}
texts <- tibble::tibble(
  id = 1:4,
  language = c("English", "English", "Hindi", "Simplified Chinese"),
  text = c(
    "I love this new community project.",
    "This policy announcement is fine, I guess.",
    "\u092f\u0939 \u0916\u092c\u0930 \u092c\u0939\u0941\u0924 \u0905\u091a\u094d\u091b\u0940 \u0939\u0948\u0964",
    "\u6211\u4e0d\u559c\u6b22\u4ed6\u4eec\u5904\u7406\u8fd9\u4e2a\u95ee\u9898\u7684\u65b9\u5f0f\u3002"
  ),
  human_sentiment = c(1, 2, 1, 3)
)

texts
```

Here the human labels follow the same coding style used in the paper:

1. `1 = positive`
2. `2 = neutral`
3. `3 = negative`

# 3. Build the prompt

The screenshot tutorial shows a very direct prompt. You can build the same kind
of prompt with `make_annotation_prompt()`.

```{r}
sentiment_prompt <- make_annotation_prompt(
  question = "Is the sentiment of this {language} text positive, neutral, or negative?",
  labels = c("positive", "neutral", "negative")
)

cat(sentiment_prompt)
```

This returns a prompt template, not a final prompt. The `{language}` and
`{text}` placeholders will be filled separately for each row.

# 4. Run the analysis

Now apply the prompt to every row with `run_text_analysis()`. The result schema
is defined with `ellmer` just like in the other `nalanda` workflows.

```{r, eval = FALSE}
res <- run_text_analysis(
  data = texts,
  id_col = "id",
  text_col = "text",
  prompt = sentiment_prompt,
  response_type = ellmer::type_object(
    gpt = ellmer::type_number()
  ),
  n_simulations = 1,
  temperature = 0,
  model = "gpt-5-mini"
)
```

The important differences from the older chapter-based functions are:

1. the input is a data frame, not book/chapter text,
2. each row is analyzed directly,
3. any column can be interpolated into the prompt with `{column_name}`, and
4. the output stays aligned to the original row metadata.

# 5. Inspect the output

Each row of the result corresponds to one text and one simulation run.

```{r, echo = FALSE}
example_res <- tibble::tibble(
  id = 1:4,
  language = c("English", "English", "Hindi", "Simplified Chinese"),
  sim = 1L,
  human_sentiment = c(1, 2, 1, 3),
  gpt = c(1, 2, 1, 3),
  text = c(
    "I love this new community project.",
    "This policy announcement is fine, I guess.",
    "\u092f\u0939 \u0916\u092c\u0930 \u092c\u0939\u0941\u0924 \u0905\u091a\u094d\u091b\u0940 \u0939\u0948\u0964",
    "\u6211\u4e0d\u559c\u6b22\u4ed6\u4eec\u5904\u7406\u8fd9\u4e2a\u95ee\u9898\u7684\u65b9\u5f0f\u3002"
  )
)

knitr::kable(example_res)
```

This is the same basic structure as the screenshot workflow, but the parsing is
already handled for you because the response is extracted as a structured
numeric field.

# 6. Evaluate GPT against human labels

Rathje et al. compare GPT output to human annotations with metrics such as
accuracy, macro F1, and Spearman correlations. `evaluate_text_analysis()`
provides a simple package-native version of that step.

```{r, eval = FALSE}
scores <- evaluate_text_analysis(
  res,
  truth_col = "human_sentiment",
  estimate_col = "gpt",
  by = "language",
  metric = c("accuracy", "macro_precision", "macro_recall", "macro_f1")
)

scores
```

```{r, echo = FALSE}
example_scores <- tibble::tibble(
  language = c("English", "Hindi", "Simplified Chinese"),
  n = c(2L, 1L, 1L),
  accuracy = c(1.00, 1.00, 1.00),
  macro_precision = c(1.00, 1.00, 1.00),
  macro_recall = c(1.00, 1.00, 1.00),
  macro_f1 = c(1.00, 1.00, 1.00)
)

knitr::kable(example_scores, digits = 2)
```

For Likert-style tasks, switch the metric set to something like:

```{r, eval = FALSE}
evaluate_text_analysis(
  res,
  truth_col = "human_rating",
  estimate_col = "gpt",
  metric = c("spearman", "weighted_kappa")
)
```

# 7. Likert-style sentiment or emotion

The paper also evaluates headline sentiment and emotions on 1 to 7 scales.
That prompt style is also supported.

```{r}
likert_prompt <- make_annotation_prompt(
  question = "How negative or positive is this headline on a 1 to 7 scale?",
  scale = c(1, 7),
  anchors = c("very negative", "very positive"),
  text_label = "Here is the headline:"
)

cat(likert_prompt)
```

The live call looks the same, except the response field now represents a Likert
rating instead of a class code.

```{r, eval = FALSE}
headline_res <- run_text_analysis(
  data = headlines,
  id_col = "headline_id",
  text_col = "headline",
  prompt = likert_prompt,
  response_type = ellmer::type_object(
    gpt = ellmer::type_number()
  ),
  temperature = 0,
  model = "gpt-5-mini"
)
```

# 8. Repeated runs for reliability

The paper also checks whether repeated runs produce similar outputs. To do that,
increase `n_simulations`.

```{r, eval = FALSE}
res_repeated <- run_text_analysis(
  data = texts,
  id_col = "id",
  text_col = "text",
  prompt = sentiment_prompt,
  response_type = ellmer::type_object(
    gpt = ellmer::type_number()
  ),
  n_simulations = 2,
  temperature = 0,
  model = "gpt-5-mini"
)
```

Then compare run 1 and run 2 with `evaluate_text_analysis()` after reshaping the
results into one column per run.

# 9. When to use this workflow

Use this vignette's workflow when:

1. your unit is a row of text, not a chapter,
2. you want direct zero-shot annotation with a simple prompt,
3. you need multilingual prompt interpolation from dataset columns, or
4. you want agreement metrics against human labels.

Use the chapter-oriented workflows when your unit is still a book chapter and
you care about pre/post changes across simulated identities.

# Reference

Rathje, S., Mirea, D. M., Sucholutsky, I., Marjieh, R., Robertson, C. E., &
Van Bavel, J. J. (2024). *GPT is an effective tool for multilingual
psychological text analysis*. *Proceedings of the National Academy of
Sciences, 121*(34), e2308950121.
<https://doi.org/10.1073/pnas.2308950121>