posted on 2019-06-07, 09:00authored bySina Shamshiri, José Miguel Rojas, Juan Pablo Pablo Galeotti, Neil Walkinshaw, Gordon Fraser
Generating unit tests automatically saves time over writing tests manually and can lead to higher code coverage. However, automatically generated tests are usually not based on realistic scenarios, and are therefore generally considered to be less readable. This places a question mark over their practical value: Every time a test fails, a developer has to decide whether this failure has revealed a regression fault in the program under test, or whether the test itself needs to be updated. Does the fact that automatically generated tests are harder to read outweigh the time-savings gained by their automated generation, and render them more of a hindrance than a help for software maintenance? In order to answer this question, we performed an empirical study in which participants were presented with an automatically generated or manually written failing test, and were asked to identify and fix the cause of the failure. Our experiment and two replications resulted in a total of 150 data points based on 75 participants. Whilst maintenance activities take longer when working with automatically generated tests, we found developers to be equally effective with manually written and automatically generated tests. This has implications on how automated test generation is best used in practice, and it indicates a need for research into the generation of more realistic tests.
History
Citation
2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST), 2018, pp. 250-261 (12)
Author affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics
Source
11th IEEE International Conference on Software Testing, Verification and Validation (ICST), Vasteras, SWEDEN
Version
AM (Accepted Manuscript)
Published in
2018 IEEE 11th International Conference on Software Testing
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
To validate and generalize the results of our empirical study
and our conclusions, further replications are important [37]
(e.g., using larger code-bases, larger fixes, expert developers,
other test-generation tools, complete test suites instead of
exactly one failing test, etc.). To this extent, we make all
experimental material available as a comprehensive artifact
package at http://evosuite.org/maintenance.