Arvind Kumar, Sharma and Kavya Nair, Nair (2023) Test-Driven Enterprise Data Engineering with PySpark and DBT. American Journal of Engineering, Mechanics and Architecture, 1 (1). pp. 21-28. ISSN 2993-2637
![]() |
Text
Test-Driven Enterprise Data Engineering with PySpark and DBT.pdf Download (274kB) |
Abstract
Enterprises increasingly rely on large-scale data pipelines to deliver analytics and insights, but traditional development practices often leave data engineering projects vulnerable to errors, inefficiencies, and costly rework. Test-driven development (TDD), long established in software engineering, is now emerging as a critical discipline in modern data engineering. This article explores how PySpark and dbt (data build tool) can be combined to bring test-driven methodologies into enterprise-scale data ecosystems. By applying unit tests to PySpark transformations, and leveraging dbt’s native testing and documentation framework, organizations can enforce data quality, detect schema drift, and validate business logic before deployment. The discussion highlights architectural patterns, integration workflows, and best practices for embedding testing across the data lifecycle—from ingestion to transformation and consumption. Future directions such as AI-assisted test generation and continuous testing in real-time pipelines are also considered. Ultimately, the article positions TDD not merely as a technical safeguard, but as a strategic enabler of trustworthy, maintainable, and scalable enterprise data engineering.
Item Type: | Article |
---|---|
Subjects: | T Technology > TA Engineering (General). Civil engineering (General) |
Divisions: | Postgraduate > Master's of Islamic Education |
Depositing User: | Journal Editor |
Date Deposited: | 10 Sep 2025 09:01 |
Last Modified: | 10 Sep 2025 09:01 |
URI: | http://eprints.umsida.ac.id/id/eprint/16339 |
Actions (login required)
![]() |
View Item |