Data Intensive Computing addresses the issue of the data-centric processing of a large number of data (data intensity). Two important properties therefore characterize data-intensive computing. First, the data is dominant in the processing, i.e., the interpretation of the data is hidden in the data and its structure, and the data or its structure determine, to a certain extent, its type of (optimal) processing. Second, the intensive nature of the data processing constantly and increasingly pushes the limits of technical systems and algorithms. This applies to the scalable processing, feeding and storage of the data, to the technical operation (5V of Big Data: Volume, Variety, Velocity, Veracity, Value) as well as to the management of the data itself. Data-intensive computing uses techniques from other areas of research and further develops them for the specific application into data-intensive computing. This includes in particular artificial intelligence and high-performance computing methods.

Research within the topic focuses on:
Research and development for data management      

generic (including FAIR principles, research data management, metadata, repositories, federated AAI, compression) and application-oriented (especially with climate and materials research as well as particle physics in connection with the corresponding KIT Centers Climate and environment, materials, elementary and astroparticles)

Research and development of system-related algorithms, methods and tools
for data-intensive computing      

such as algorithms for the opportunistic use of resources, dynamic integration of large-scale cloud resources, job/data scheduling.

Research and development for data analytics      

such as high-performance data analytics, modern hardware architectures (quantum and neuro-computing) for AI/ML, distributed data analysis, anytime computing).

Research on and support of computational and data enabled science and engineering      

e.g., the simulation and data labs in the Helmholtz program “Engineering Digital Futures” and the NHR (Nationales Hochleistungsrechnen).

Operation and further development of large research devices
for data-intensive computing      

e.g., high performance computing (HPC) / supercomputers such as HoreKa, data and analysis centers such as GridKa

Development and application of AI / ML methods for extremely large data sets      

This addresses especially deep learning methods (e.g., 3D/3D+t images and videos) together with KCIST field AI and includes parallel and distributed training algorithms

Development of basic algorithms and data structures for processing
large amounts of data      

(e.g., sorting, index data structures, graph algorithms)

Algorithms, procedures and software for extracting knowledge from data      

and in general data driven algorithms, systems and methods.

Procedure for the highly distributed collection, intermediate storage and
transmission of sensor data      

especially streaming data distribution and processing systems and methods.