Study and quality evaluation of LLM’s generated unit test sets for C Programs

Macaia, Matheus

Study and quality evaluation of LLM’s generated unit test sets for C Programs

dc.contributor.advisor1	Vincenzi, Auri Marcelo Rizzo
dc.contributor.advisor1Lattes	http://lattes.cnpq.br/0611351138131709	por
dc.contributor.advisor1orcid	https://orcid.org/0000-0001-5902-1672	por
dc.contributor.author	Macaia, Matheus
dc.contributor.authororcid	https://orcid.org/0009-0006-1539-6249	por
dc.date.accessioned	2024-10-18T12:19:19Z
dc.date.available	2024-10-18T12:19:19Z
dc.date.issued	2024-10-04
dc.description.abstract	As technology becomes more integrated into our daily routines, reliable software becomes increasingly critical. However, the high cost of manual test generation often leads developers to neglect software quality concepts. In this context, the growing demand for automated test generation is a crucial response to the potential negative consequences of inadequate software testing. Problem: Various tools designed explicitly for automated program testing exist for different programming languages, including C. However, learning and properly configuring these tools is often not trivial, and users must install and set them up for use. Solution: This work leverages the rapid rise of Large Language Models (LLMs) to evaluate their capability in generating unit tests for C programs, using code coverage and mutation score as metrics to assess the quality of the generated test sets. Method: This study selected 27 C programs from the literature. We grouped these programs into three non-overlapping categories, depending on how each one accepts inputs (Basic Input – inputs provided as program parameters; Driver Type 1 – each test case is a case option in a switch command and the inputs are hard-coded inside the case option; and Driver Type 2 – similar to Driver Type 1 but with the inputs encoded on external data files). For each program, we interactively asked the LLM to generate tests automatically. After generating the test sets, we collected metrics such as code coverage, mutation score, and test execution success rate to evaluate the efficiency and effectiveness of each set. We then used these metrics as new parameters to enhance the efficiency of the sets. Results: The test sets generated by LLMs demonstrate significant relevance by presenting substantial results, given the ease of use and low need for human intervention in adjusting the necessary configuration guidelines. On average, LLMs test sets reached 100% of code coverage and 98,7% of mutation score on testing programs with basic inputs. The worst results are in testing programs requiring a driver of Type 1, reaching 91,8% of code coverage and 95.2% of mutation score. Nevertheless, these results are very satisfactory, mainly due to the prompt simplicity and the effort required for test case generation.	eng
dc.description.resumo	As technology becomes more integrated into our daily routines, reliable software becomes increasingly critical. However, the high cost of manual test generation often leads developers to neglect software quality concepts. In this context, the growing demand for automated test generation is a crucial response to the potential negative consequences of inadequate software testing. Problem: Various tools designed explicitly for automated program testing exist for different programming languages, including C. However, learning and properly configuring these tools is often not trivial, and users must install and set them up for use. Solution: This work leverages the rapid rise of Large Language Models (LLMs) to evaluate their capability in generating unit tests for C programs, using code coverage and mutation score as metrics to assess the quality of the generated test sets. Method: This study selected 27 C programs from the literature. We grouped these programs into three non-overlapping categories, depending on how each one accepts inputs (Basic Input – inputs provided as program parameters; Driver Type 1 – each test case is a case option in a switch command and the inputs are hard-coded inside the case option; and Driver Type 2 – similar to Driver Type 1 but with the inputs encoded on external data files). For each program, we interactively asked the LLM to generate tests automatically. After generating the test sets, we collected metrics such as code coverage, mutation score, and test execution success rate to evaluate the efficiency and effectiveness of each set. We then used these metrics as new parameters to enhance the efficiency of the sets. Results: The test sets generated by LLMs demonstrate significant relevance by presenting substantial results, given the ease of use and low need for human intervention in adjusting the necessary configuration guidelines. On average, LLMs test sets reached 100% of code coverage and 98,7% of mutation score on testing programs with basic inputs. The worst results are in testing programs requiring a driver of Type 1, reaching 91,8% of code coverage and 95.2% of mutation score. Nevertheless, these results are very satisfactory, mainly due to the prompt simplicity and the effort required for test case generation.	eng
dc.description.sponsorship	Não recebi financiamento	por
dc.identifier.citation	MACAIA, Matheus. Study and quality evaluation of LLM’s generated unit test sets for C Programs. 2024. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20819.	*
dc.identifier.uri	https://repositorio.ufscar.br/handle/20.500.14289/20819
dc.identifier.url	https://github.com/aurimrv/c_programs	por
dc.language.iso	eng	por
dc.publisher	Universidade Federal de São Carlos	por
dc.publisher.address	Câmpus São Carlos	por
dc.publisher.course	Engenharia de Computação - EC	por
dc.publisher.initials	UFSCar	por
dc.rights	Attribution 3.0 Brazil	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/br/	*
dc.subject	Software testing	eng
dc.subject	Automated test generation	eng
dc.subject	Coverage testing	eng
dc.subject	Mutation testing	eng
dc.subject	Large language models	eng
dc.subject	ChatGPT	eng
dc.subject	Unit testing	eng
dc.subject.cnpq	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO	por
dc.title	Study and quality evaluation of LLM’s generated unit test sets for C Programs	eng
dc.title.alternative	Study and quality evaluation of LLM’s generated unit test sets for C Programs	eng
dc.type	TCC	por

Arquivos

Pacote Original

Agora exibindo 1 - 1 de 1

Nome:: monografia_matheus_macaia.pdf
Tamanho:: 701.73 KB
Formato:: Adobe Portable Document Format
Descrição:: Monografia de TCC

Baixar

Coleções

TCC