Study and quality evaluation of LLM’s generated unit test sets for C Programs

dc.contributor.advisor1Vincenzi, Auri Marcelo Rizzo
dc.contributor.advisor1Latteshttp://lattes.cnpq.br/0611351138131709por
dc.contributor.advisor1orcidhttps://orcid.org/0000-0001-5902-1672por
dc.contributor.authorMacaia, Matheus
dc.contributor.authororcidhttps://orcid.org/0009-0006-1539-6249por
dc.date.accessioned2024-10-18T12:19:19Z
dc.date.available2024-10-18T12:19:19Z
dc.date.issued2024-10-04
dc.description.abstractAs technology becomes more integrated into our daily routines, reliable software becomes increasingly critical. However, the high cost of manual test generation often leads developers to neglect software quality concepts. In this context, the growing demand for automated test generation is a crucial response to the potential negative consequences of inadequate software testing. Problem: Various tools designed explicitly for automated program testing exist for different programming languages, including C. However, learning and properly configuring these tools is often not trivial, and users must install and set them up for use. Solution: This work leverages the rapid rise of Large Language Models (LLMs) to evaluate their capability in generating unit tests for C programs, using code coverage and mutation score as metrics to assess the quality of the generated test sets. Method: This study selected 27 C programs from the literature. We grouped these programs into three non-overlapping categories, depending on how each one accepts inputs (Basic Input – inputs provided as program parameters; Driver Type 1 – each test case is a case option in a switch command and the inputs are hard-coded inside the case option; and Driver Type 2 – similar to Driver Type 1 but with the inputs encoded on external data files). For each program, we interactively asked the LLM to generate tests automatically. After generating the test sets, we collected metrics such as code coverage, mutation score, and test execution success rate to evaluate the efficiency and effectiveness of each set. We then used these metrics as new parameters to enhance the efficiency of the sets. Results: The test sets generated by LLMs demonstrate significant relevance by presenting substantial results, given the ease of use and low need for human intervention in adjusting the necessary configuration guidelines. On average, LLMs test sets reached 100% of code coverage and 98,7% of mutation score on testing programs with basic inputs. The worst results are in testing programs requiring a driver of Type 1, reaching 91,8% of code coverage and 95.2% of mutation score. Nevertheless, these results are very satisfactory, mainly due to the prompt simplicity and the effort required for test case generation.eng
dc.description.resumoAs technology becomes more integrated into our daily routines, reliable software becomes increasingly critical. However, the high cost of manual test generation often leads developers to neglect software quality concepts. In this context, the growing demand for automated test generation is a crucial response to the potential negative consequences of inadequate software testing. Problem: Various tools designed explicitly for automated program testing exist for different programming languages, including C. However, learning and properly configuring these tools is often not trivial, and users must install and set them up for use. Solution: This work leverages the rapid rise of Large Language Models (LLMs) to evaluate their capability in generating unit tests for C programs, using code coverage and mutation score as metrics to assess the quality of the generated test sets. Method: This study selected 27 C programs from the literature. We grouped these programs into three non-overlapping categories, depending on how each one accepts inputs (Basic Input – inputs provided as program parameters; Driver Type 1 – each test case is a case option in a switch command and the inputs are hard-coded inside the case option; and Driver Type 2 – similar to Driver Type 1 but with the inputs encoded on external data files). For each program, we interactively asked the LLM to generate tests automatically. After generating the test sets, we collected metrics such as code coverage, mutation score, and test execution success rate to evaluate the efficiency and effectiveness of each set. We then used these metrics as new parameters to enhance the efficiency of the sets. Results: The test sets generated by LLMs demonstrate significant relevance by presenting substantial results, given the ease of use and low need for human intervention in adjusting the necessary configuration guidelines. On average, LLMs test sets reached 100% of code coverage and 98,7% of mutation score on testing programs with basic inputs. The worst results are in testing programs requiring a driver of Type 1, reaching 91,8% of code coverage and 95.2% of mutation score. Nevertheless, these results are very satisfactory, mainly due to the prompt simplicity and the effort required for test case generation.eng
dc.description.sponsorshipNão recebi financiamentopor
dc.identifier.citationMACAIA, Matheus. Study and quality evaluation of LLM’s generated unit test sets for C Programs. 2024. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2024. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/20819.*
dc.identifier.urihttps://repositorio.ufscar.br/handle/20.500.14289/20819
dc.identifier.urlhttps://github.com/aurimrv/c_programspor
dc.language.isoengpor
dc.publisherUniversidade Federal de São Carlospor
dc.publisher.addressCâmpus São Carlospor
dc.publisher.courseEngenharia de Computação - ECpor
dc.publisher.initialsUFSCarpor
dc.rightsAttribution 3.0 Brazil*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/br/*
dc.subjectSoftware testingeng
dc.subjectAutomated test generationeng
dc.subjectCoverage testingeng
dc.subjectMutation testingeng
dc.subjectLarge language modelseng
dc.subjectChatGPTeng
dc.subjectUnit testingeng
dc.subject.cnpqCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAOpor
dc.titleStudy and quality evaluation of LLM’s generated unit test sets for C Programseng
dc.title.alternativeStudy and quality evaluation of LLM’s generated unit test sets for C Programseng
dc.typeTCCpor

Arquivos

Pacote Original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
monografia_matheus_macaia.pdf
Tamanho:
701.73 KB
Formato:
Adobe Portable Document Format
Descrição:
Monografia de TCC

Coleções