Generating SQL Queries In The Serbian Language: A Comparative Analysis of Open-Source and Commercial Language Models
DOI:
https://doi.org/10.62907/juuntics260101001sKeywords:
Large language models (LLMs), Text-to-SQLAbstract
This paper investigates the ability of large language models (LLMs) to generate SQL queries based on questions formulated in natural language (Text-to-SQL), with a particular focus on the Serbian language. The aim of the research is to empirically determine the extent to which the query language and task complexity affect the accuracy of the generated SQL code, as well as to identify the dominant sources of errors in the generation process. For this purpose, an experimental environment was developed, comprising a relational database that simulates an employee management system, an evaluation set of 200 tasks divided into four categories according to complexity level, natural-language formulations that the LLM translates into SQL code, and an automated system for executing evaluations in the selected language. The results suggest that the key challenge of Text-to-SQL systems is no longer syntactic validity, but semantic precision and the correct modeling of relationships between entities in the database, while the choice of natural language in which the question is formulated has a negligible impact on the final accuracy.
Downloads
References
[1] V. Zhong, C. Xiong, and R. Socher, “Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning,” arXiv:1709.00103, 2017. Available: https://arxiv.org/abs/1709.00103
[2] Yu, T., Zhang, R., Yang, K. i dr. (2018). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. Proceedings of EMNLP 2018. https://arxiv.org/abs/1809.08887
[3] Guo, J., Zhan, Z., Gao, Y. i dr. (2019). Towards complex text-to-SQL in cross-domain database with intermediate representation. Proceedings of ACL 2019. https://arxiv.org/abs/1905.08205
[4] Wang, B., Shin, R., Liu, X. i dr. (2020). RAT-SQL: Relation-aware schema encoding and linking for text- to-SQL parsers. Proceedings of ACL 2020. https://arxiv.org/abs/1911.04942
[5] Lei, F., Li, S., Liu, J. i dr. (2020). Re-examining the role of schema linking in text-to-SQL. Proceedings of EMNLP 2020. https://aclanthology.org/2020.emnlp-main.564
[6] Cao, R., Chen, L., Chen, Z. i dr. (2021). LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations. Proceedings of ACL 2021. https://arxiv.org/abs/2106.01093
[7] Pourreza, M. i Rafiei, D. (2023). DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction. NeurIPS 2023. https://arxiv.org/abs/2304.11015
[8] Gao, D., Wang, H., Li, Y. i dr. (2024). Text-to-SQL empowered by large language models: A benchmark evaluation. PVLDB, 17(5), 1132–1145. https://arxiv.org/abs/2308.15363
[9] Li, J., Hui, B., Qu, G. i dr. (2024). Can LLM already serve as a database interface? NeurIPS 2024. https://arxiv.org/abs/2305.03111
[10] Lei, F. i dr. (2025). Spider 2.0: Evaluating language models on real-world enterprise text-to-SQL workflows. ICLR 2025. https://arxiv.org/abs/2411.07763
[11] Z. Hong, Z. Yuan, Q. Zhang, H. Chen, J. Dong, F. Huang, and X. Huang, “Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL,” arXiv:2406.08426, 2024. Available: https://arxiv.org/abs/2406.08426
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Slaviša Sovilj, Jovo Marković, Igor Dugonjić (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
