University of Leicester
Browse

Is GPT-4 Good Enough to Evaluate Jokes?

Download (128.56 kB)
conference contribution
posted on 2023-11-06, 11:34 authored by Luis Fabricio Góes, Piotr Sawicki, Marek Grzes, Dan Brown, Marco Volpe

In  this  paper,  we  investigate  the  ability  of  large  languagemodels (LLMs), specifically GPT-4, to assess the funninessof jokes in comparison to human ratings.  We use a datasetof jokes annotated with human ratings and explore differentsystem descriptions in GPT-4 to imitate human judges withvarious types of humour. We propose a novel method to cre-ate a system description using many-shot prompting, provid-ing numerous examples of jokes and their evaluation scores.Additionally, we examine the performance of different sys-tem descriptions when given varying amounts of instructionsand examples on how to evaluate jokes.  Our main contribu-tions include a new method for creating a system descriptionin LLMs to evaluate jokes and a comprehensive methodol-ogy to assess LLMs’ ability to evaluate jokes using rankingsrather than individual scores.

History

Author affiliation

School of Computing and Mathematical Sciences, University of Leicester

Source

14th International Conference on Computational Creativity 2023

Version

  • AM (Accepted Manuscript)

Published in

International Conference on Computational Creativity (ICCC)

Copyright date

2023

Available date

2023-11-06

Temporal coverage: start date

2023-06-19

Temporal coverage: end date

2023-06-23

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC