Teaching Evaluation Analysis with IPAG

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

IPAG Package

2026-01-05

Introduction

This vignette illustrates the use of the IPAG package for statistical analysis using the Beauty dataset. This dataset, from Hamermesh and Parker (2005), examines the relationship between university instructors’ physical attractiveness and their student evaluations.

The IPAG package provides simple and pedagogical tools for:

Computing confidence intervals for means and proportions
Comparing means between groups
Performing linear regressions with concise output
Working with odds ratios

All confidence intervals are computed at the 99% level by default, with the option to specify alternative confidence levels.

Loading the package and data

library(IPAG)
data(Beauty)

The Beauty dataset contains 463 observations corresponding to university courses. Here’s an overview of the main variables:

str(Beauty)
#> Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 463 obs. of  22 variables:
#>  $ n            : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ score        : num  4.7 4.1 3.9 4.8 4.6 4.3 2.8 4.1 3.4 4.5 ...
#>  $ rank         : chr  "tenure track" "tenure track" "tenure track" "tenure track" ...
#>  $ ethnicity    : chr  "minority" "minority" "minority" "minority" ...
#>  $ gender       : chr  "female" "female" "female" "female" ...
#>  $ language     : chr  "english" "english" "english" "english" ...
#>  $ age          : num  36 36 36 36 59 59 59 51 51 40 ...
#>  $ cls_perc_eval: num  55.8 68.8 60.8 62.6 85 ...
#>  $ cls_did_eval : num  24 86 76 77 17 35 39 55 111 40 ...
#>  $ cls_students : num  43 125 125 123 20 40 44 55 195 46 ...
#>  $ cls_level    : chr  "upper" "upper" "upper" "upper" ...
#>  $ cls_profs    : chr  "single" "single" "single" "single" ...
#>  $ cls_credits  : chr  "multi credit" "multi credit" "multi credit" "multi credit" ...
#>  $ bty_f1lower  : num  5 5 5 5 4 4 4 5 5 2 ...
#>  $ bty_f1upper  : num  7 7 7 7 4 4 4 2 2 5 ...
#>  $ bty_f2upper  : num  6 6 6 6 2 2 2 5 5 4 ...
#>  $ bty_m1lower  : num  2 2 2 2 2 2 2 2 2 3 ...
#>  $ bty_m1upper  : num  4 4 4 4 3 3 3 3 3 3 ...
#>  $ bty_m2upper  : num  6 6 6 6 3 3 3 3 3 2 ...
#>  $ bty_avg      : num  5 5 5 5 3 ...
#>  $ pic_outfit   : chr  "not formal" "not formal" "not formal" "not formal" ...
#>  $ pic_color    : chr  "color" "color" "color" "color" ...
#>  - attr(*, "spec")=List of 3
#>   ..$ cols   :List of 22
#>   .. ..$ n            : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ score        : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ rank         : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ ethnicity    : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ gender       : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ language     : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ age          : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ cls_perc_eval: list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ cls_did_eval : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ cls_students : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ cls_level    : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ cls_profs    : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ cls_credits  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ bty_f1lower  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ bty_f1upper  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ bty_f2upper  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ bty_m1lower  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ bty_m1upper  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ bty_m2upper  : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ bty_avg      : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
#>   .. ..$ pic_outfit   : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   .. ..$ pic_color    : list()
#>   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
#>   ..$ default: list()
#>   .. ..- attr(*, "class")= chr [1:2] "collector_guess" "collector"
#>   ..$ delim  : chr ","
#>   ..- attr(*, "class")= chr "col_spec"
#>  - attr(*, "problems")=<externalptr>

Key variables:

score: Average professor evaluation score (1 = very unsatisfactory, 5 = excellent)
bty_avg: Average beauty rating of the professor (scale from 1 to 10)
age: Age of the professor
gender: Gender of the professor (female/male)
rank: Academic rank (teaching/tenure track/tenured)

Descriptive Statistics

Average evaluation score

Let’s compute the 99% confidence interval for the average evaluation score:

mean_ci(Beauty$score)
#> 99% CI for mean: [4.1094, 4.2401]

On average, professors receive a score of approximately 4.17 out of 5. The 99% confidence interval indicates that we can be confident the population mean score falls within this range.

Average beauty rating

Similarly for beauty ratings:

mean_ci(Beauty$bty_avg)
#> 99% CI for mean: [4.2342, 4.6014]

The average beauty rating is approximately 4.42 out of 10.

Group Comparisons

Score difference by gender

Is there a significant difference in evaluations between male and female professors?

score_female <- Beauty$score[Beauty$gender == "female"]
score_male <- Beauty$score[Beauty$gender == "male"]

mean_diff_ci(score_male, score_female)
#> 99% CI for mean difference (x - y): [0.0084, 0.2747]

The confidence interval for the difference in means allows us to test whether male professors receive different scores than female professors. If the interval contains zero, the difference is not statistically significant at the chosen level.

Beauty rating by gender

Let’s also compare beauty ratings:

bty_female <- Beauty$bty_avg[Beauty$gender == "female"]
bty_male <- Beauty$bty_avg[Beauty$gender == "male"]

mean_diff_ci(bty_male, bty_female)
#> 99% CI for mean difference (x - y): [-0.7894, -0.0435]

Proportions and Categorical Comparisons

Proportion of highly-rated professors

Let’s create a binary variable to identify professors with a score above 4:

high_score <- sum(Beauty$score > 4)
total <- nrow(Beauty)

prop_ci(trials = total, successes = high_score)
#> 99% CI for true proportion: [0.5663, 0.6838]

Approximately 62.6% of courses receive a score above 4.

Contingency table: Beauty and evaluation

Let’s create categories to analyze the relationship between beauty and evaluation:

# Create categorical variables
Beauty$high_beauty <- Beauty$bty_avg > median(Beauty$bty_avg)
Beauty$high_eval <- Beauty$score > 4

# Contingency table
table_data <- table(Beauty$high_beauty, Beauty$high_eval)
print(table_data)
#>        
#>         FALSE TRUE
#>   FALSE   112  171
#>   TRUE     61  119

Let’s compute the odds ratio to measure the association:

# Extract table cells
a <- table_data[2, 2]  # High beauty AND high evaluation
b <- table_data[2, 1]  # High beauty AND low evaluation
c <- table_data[1, 2]  # Low beauty AND high evaluation
d <- table_data[1, 1]  # Low beauty AND low evaluation

oddsratio_ci(a = a, b = b, c = c, d = d)
#> 99% CI for odds ratio: [0.7535, 2.1824]

An odds ratio greater than 1 suggests that more attractive professors are more likely to receive good evaluations.

Regression Analysis

Simple regression: Beauty and evaluation

Let’s examine the linear relationship between beauty and evaluation score:

linear_regress(score ~ bty_avg, data = Beauty)
#> Adjusted R^2: 0.0329
#> Overall F-test p-value: 5.083e-05
#> 
#>     Variable Estimate                CI p_value Signif
#>  (Intercept)   3.8803 [3.6834 ; 4.0773]  0.0000    ***
#>      bty_avg   0.0666 [0.0245 ; 0.1088]  0.0001    ***

This simple regression shows the effect of beauty on evaluation score. The coefficient of bty_avg indicates how much the evaluation score increases on average for each additional point of beauty.

Multiple regression: Controlling for characteristics

Let’s add control variables for a more complete analysis:

linear_regress(score ~ bty_avg + age + gender, data = Beauty)
#> Adjusted R^2: 0.0622
#> Overall F-test p-value: 4.068e-07
#> 
#>     Variable Estimate                 CI p_value Signif
#>  (Intercept)   4.0551  [3.6222 ; 4.4879]  0.0000    ***
#>      bty_avg   0.0641  [0.0205 ; 0.1077]  0.0002    ***
#>          age  -0.0058 [-0.0128 ; 0.0012]  0.0338      *
#>   gendermale   0.2009  [0.0669 ; 0.3349]  0.0001    ***

This regression controls for the professor’s age and gender. The adjusted R² tells us what proportion of the variance in score is explained by the model.

Full model

Let’s include more explanatory variables:

linear_regress(score ~ bty_avg + age + gender + rank + cls_perc_eval + cls_students, 
               data = Beauty)
#> Adjusted R^2: 0.1016
#> Overall F-test p-value: 9.473e-10
#> 
#>          Variable Estimate                  CI p_value Signif
#>       (Intercept)   3.8552   [3.2635 ; 4.4468]  0.0000    ***
#>           bty_avg   0.0513   [0.0076 ; 0.0950]  0.0025     **
#>               age  -0.0076  [-0.0156 ; 0.0005]  0.0155      *
#>        gendermale   0.2108   [0.0768 ; 0.3448]  0.0001    ***
#>  ranktenure track  -0.2146 [-0.4201 ; -0.0091]  0.0072     **
#>       ranktenured  -0.1509  [-0.3121 ; 0.0104]  0.0159      *
#>     cls_perc_eval   0.0060   [0.0019 ; 0.0100]  0.0002    ***
#>      cls_students   0.0005  [-0.0004 ; 0.0014]  0.1831

This more complex model includes:

rank: The professor’s academic rank
cls_perc_eval: The percentage of students who participated in the evaluation
cls_students: The total number of students in the course

Subgroup Analyses

Effect of beauty by class level

Let’s compare the effect of beauty for lower vs upper level courses:

# Lower level courses
Beauty_lower <- Beauty[Beauty$cls_level == "lower", ]
linear_regress(score ~ bty_avg, data = Beauty_lower)
#> Adjusted R^2: 0.0462
#> Overall F-test p-value: 0.003961
#> 
#>     Variable Estimate                CI p_value Signif
#>  (Intercept)   3.8845 [3.5469 ; 4.2221]  0.0000    ***
#>      bty_avg   0.0788 [0.0085 ; 0.1490]  0.0040     **

# Upper level courses
Beauty_upper <- Beauty[Beauty$cls_level == "upper", ]
linear_regress(score ~ bty_avg, data = Beauty_upper)
#> Adjusted R^2: 0.0205
#> Overall F-test p-value: 0.006947
#> 
#>     Variable Estimate                CI p_value Signif
#>  (Intercept)   3.8974 [3.6520 ; 4.1427]  0.0000    ***
#>      bty_avg   0.0559 [0.0026 ; 0.1092]  0.0069     **

Differential effect by gender

Let’s analyze whether the effect of beauty differs between male and female professors:

# Male professors
Beauty_male <- Beauty[Beauty$gender == "male", ]
linear_regress(score ~ bty_avg, data = Beauty_male)
#> Adjusted R^2: 0.0933
#> Overall F-test p-value: 2.038e-07
#> 
#>     Variable Estimate                CI p_value Signif
#>  (Intercept)   3.7666 [3.5258 ; 4.0073]  0.0000    ***
#>      bty_avg   0.1103 [0.0566 ; 0.1639]  0.0000    ***

# Female professors
Beauty_female <- Beauty[Beauty$gender == "female", ]
linear_regress(score ~ bty_avg, data = Beauty_female)
#> Adjusted R^2: 0.0022
#> Overall F-test p-value: 0.2348
#> 
#>     Variable Estimate                 CI p_value Signif
#>  (Intercept)   3.9501  [3.6213 ; 4.2789]  0.0000    ***
#>      bty_avg   0.0306 [-0.0362 ; 0.0975]  0.2348

Custom Confidence Intervals

By default, IPAG computes 99% confidence intervals. To use a different level (e.g., 95%):

mean_ci(Beauty$score, level = 0.95)
#> 95% CI for mean: [4.1251, 4.2244]

linear_regress(score ~ bty_avg + gender, data = Beauty, level = 0.95)
#> Adjusted R^2: 0.0550
#> Overall F-test p-value: 8.177e-07
#> 
#>     Variable Estimate                CI p_value Signif
#>  (Intercept)   3.7473 [3.5810 ; 3.9137]  0.0000    ***
#>      bty_avg   0.0742 [0.0422 ; 0.1061]  0.0000    ***
#>   gendermale   0.1724 [0.0737 ; 0.2711]  0.0007    ***

Complementary Visualizations

While IPAG focuses on statistical inference, it’s useful to visualize the data:

# Scatter plot
plot(Beauty$bty_avg, Beauty$score,
     xlab = "Average beauty rating",
     ylab = "Evaluation score",
     main = "Relationship between beauty and evaluation",
     pch = 16, col = rgb(0, 0, 0, 0.3))

# Add regression line
abline(lm(score ~ bty_avg, data = Beauty), col = "red", lwd = 2)

# Score comparison by gender
boxplot(score ~ gender, data = Beauty,
        xlab = "Gender",
        ylab = "Evaluation score",
        main = "Distribution of scores by gender",
        col = c("pink", "lightblue"))

Interpretation and Conclusions

This analysis illustrates several important findings:

Beauty effect: There is a positive relationship between instructors’ physical attractiveness and their evaluation scores, even after controlling for other variables.
Control variables: Age, gender, and academic rank also play a role in evaluations.
Robustness: Wide confidence intervals (99%) give us a conservative view of uncertainty around our estimates.
Heterogeneity: The effect of beauty may vary across analyzed subgroups.

IPAG Package Design Principles

The IPAG package follows several design principles:

Transparency: All functions rely on well-established R functions (t.test(), binom.test(), lm(), fisher.test()).
Consistency: Uniform naming convention and clear display methods.
Readability: Interpretable results without requiring deep knowledge of R object structures.
Pedagogical use: Designed for teaching and applications where clarity takes precedence over extensibility.

References

Hamermesh, D. S., & Parker, A. (2005). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376. https://doi.org/10.1016/j.econedurev.2004.07.013

Going Further

For more information about the IPAG package:

Function documentation: ?mean_ci, ?linear_regress, etc.
Other available datasets: data(package = "IPAG")
Source code: https://github.com/gpiaser/IPAG

This vignette was created with IPAG package version 0.1.0.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.