University of Leicester
Browse

Is GPT-4 Good Enough to Evaluate Jokes?

Download (128.56 kB)
conference contribution
posted on 2023-11-06, 11:34 authored by Luis Fabricio Góes, Piotr Sawicki, Marek Grzes, Dan Brown, Marco Volpe
<p>In  this  paper,  we  investigate  the  ability  of  large  languagemodels (LLMs), specifically GPT-4, to assess the funninessof jokes in comparison to human ratings.  We use a datasetof jokes annotated with human ratings and explore differentsystem descriptions in GPT-4 to imitate human judges withvarious types of humour. We propose a novel method to cre-ate a system description using many-shot prompting, provid-ing numerous examples of jokes and their evaluation scores.Additionally, we examine the performance of different sys-tem descriptions when given varying amounts of instructionsand examples on how to evaluate jokes.  Our main contribu-tions include a new method for creating a system descriptionin LLMs to evaluate jokes and a comprehensive methodol-ogy to assess LLMs’ ability to evaluate jokes using rankingsrather than individual scores.</p>

History

Related Materials

Author affiliation

School of Computing and Mathematical Sciences, University of Leicester

Source

14th International Conference on Computational Creativity 2023

Version

  • AM (Accepted Manuscript)

Published in

International Conference on Computational Creativity (ICCC)

Copyright date

2023

Available date

2023-11-06

Temporal coverage: start date

2023-06-19

Temporal coverage: end date

2023-06-23

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC