KIT Center Information ˑ Systems ˑ Technologies  (KCIST)

Data Intensive Computing deals with the issue of data-centric processing for a large number of data (data intensity). Two important properties therefore make up data-intensive computing. On the one hand, the data is dominant in processing, i.e. the interpretation of the data is hidden in the data and its structure, and the data or its structure determine, within a certain framework, its type of (optimal) processing. Second, the intensive nature of data processing means that the limits of technical systems and algorithms must be constantly pushed. This applies to the scalable processing, feeding and storage of data such as the technical operation (5V of Big Data) as well as to the management of data itself. Data-intensive computing uses techniques from other areas that are specifically being researched and developed further for use in data-intensive computing. This includes in particular artificial intelligence and high performance computing methods.

Second Level Research Topics

Research and development on data management: generic (including FAIR principles, research data management, metadata, repositories, federated AAI, compression) and also application-oriented (especially with climate and materials research and particle physics in connection with the corresponding KIT centers Climate and environment, materials, elementary and astroparticles)

Research and development of system-related algorithms, methods and tools for data-intensive computing (algorithms for the opportunistic use of resources, dynamic integration of large-scale cloud resources, job / data scheduling)

Research and development on data analytics (high performance data analytics, modern hardware architectures (quanta, neuro) for KI / ML, distributed data analysis, anytime computing)

Research on and support of Computational and Data enabled Science and Engineering (simulation and data labs in the Helmholtz program and the NHR

Operation and further development of large research devices for data-intensive computing (HPC / supercomputers such as HoreKa, data and analysis centers such as GridKa)

Development and application of KI / ML methods (especially deep learning) for extremely large data sets (e.g. 3D / 3D + t images and videos) together with KCIST field KI

Development of basic algorithms and data structures for processing large amounts of data (e.g. sorting, index data structures, graph algorithms, ...)

Algorithms, procedures and software for extracting knowledge from data

Procedure for the highly distributed collection, intermediate storage and transmission of sensor data