Russian Federation
The article considers possibilities of software tools for processing large volumes of data (Big Data). The article focuses on the Apache NiFi platform tools, which are part of the Hadoop suite of tools for business ecosystems. Tools such as Hadoop Common, which include libraries for managing the file systems supported by Hadoop, and scenarios for creating the necessary infrastructure and managing distributed data processing, are discussed in detail. The tools of the Apache NiFi platform are considered, including a set of modern ETL-tools (Extract, Transform, Load) for the development of a large data storage, as well as the basic concepts of the Apache NiFi platform, based on the concept of «Flow Based Programming» (FBP). The evaluation of the efficiency of parallel data processing has been made, which has shown that with the increase of the share of consecutive operations in the computer program of data processing the degree of acceleration of calculations decreases. The topic of the article is relevant, as large data sets are now used everywhere and their processing daily gives a significant positive effect.
software, large amounts of data, parallel data processing, Apache NiFi platform, ETL-tools, Hadoop-tools, business ecosystem, Flow Based Programming concept, Hortonworks Data Platform distribution
1. Bakanov V.I. Dinamika potokovyh vychislenij. M.: Trudy NIU VSHE, 2021.
2. Lem Chak. Hadoop v dejstvii. DMK Press, 2012.
3. Uajt Tom. Hadoop. Podrobnoe rukovodstvo. SPb.: Piter, 2013.
4. Vance Ashlee. Hadoop, a Free Software Program, Finds Uses Beyond Search. N.Y.: The New York Times, 2009.
5. Shvachko Konstantin. Apache Hadoop. Coriolis, 2011.
6. Sharp J.A. Data Flow Computing: Theory and Practice. Intellect Limited, 1992.
7. Carkci M. Dataflow and Reactive Programming Systems: A Practical Guide. CreateSpace Independent Publishing Platform, 2014.
8. Wesley M. Johnston, J.R. Paul Hanna, Richard J. Millar. Advances in Dataflow Programming Languages. N.Y. and London, 2015.
9. David Loshin. ETL (Extract, Transform, Load) // Business Intelligence and Analytics. Morgan Kaufmann, 2012.
10. David Haertzen. ETL Tools // Business Intelligence and Analytics. Technics Publications, 2012.
11. Ralph Kimball, Joe Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. John Wiley & Sons, 2004.