Is GPT-4 Good Enough to Evaluate Jokes?

Góes, Luis Fabricio; Sawicki, Piotr; Grzes, Marek; Brown, Dan; Volpe, Marco

Is GPT-4 Good Enough to Evaluate Jokes?

conference contribution

posted on 2023-11-06, 11:34 authored by Luis Fabricio Góes, Piotr Sawicki, Marek Grzes, Dan Brown, Marco Volpe

In this paper, we investigate the ability of large languagemodels (LLMs), specifically GPT-4, to assess the funninessof jokes in comparison to human ratings. We use a datasetof jokes annotated with human ratings and explore differentsystem descriptions in GPT-4 to imitate human judges withvarious types of humour. We propose a novel method to cre-ate a system description using many-shot prompting, provid-ing numerous examples of jokes and their evaluation scores.Additionally, we examine the performance of different sys-tem descriptions when given varying amounts of instructionsand examples on how to evaluate jokes. Our main contribu-tions include a new method for creating a system descriptionin LLMs to evaluate jokes and a comprehensive methodol-ogy to assess LLMs’ ability to evaluate jokes using rankingsrather than individual scores.

History

Author affiliation

School of Computing and Mathematical Sciences, University of Leicester

Source

14th International Conference on Computational Creativity 2023

Version

AM (Accepted Manuscript)

Published in

International Conference on Computational Creativity (ICCC)

Copyright date

2023

Available date

2023-11-06

Temporal coverage: start date

2023-06-19

Temporal coverage: end date

2023-06-23

Language

en

Usage metrics

Keywords

Uncategorised value

Is GPT-4 Good Enough to Evaluate Jokes?

History

Author affiliation

Source

Version

Published in

Copyright date

Available date

Temporal coverage: start date

Temporal coverage: end date

Language

Usage metrics

Categories

Keywords

Licence

Exports