Parallel anonymization of large data sets

Supervisor / Contact person

  • Hannes Grunert
  • Andreas Heuer

    Character

    • Conception
    • Prototyping

    Prior skills

    • Databases I
    • Database application programming
    • (Databases III)

    Description

    In smart environments, various sensors record the activities of users. Based on this data, the user's intentions are recognized, enabling smart systems such as the Smart Appliance Lab at the University of Rostock to perform actions autonomously.

    The data recorded in this way usually have a direct or at least indirect personal connection. Due to the right to informational self-determination, this data may only be used for the purpose for which the user has previously consented. Through the use of visual concepts, fine-granular access rights and the application of data protection algorithms, access to the data can be restricted (Privacy by Design).

    Data is often distributed across multiple data sources. If information has to be integrated, it is often linked via a natural join (different structure) or unions (same structure). In the literature there are different methods to ensure the anonymity of the linked data even before the integration on a central server.

    The aim of this thesis is to investigate how these methods can be adapted to parallelize anonymization on a large device. The aim is to test how data can be distributed to different nodes in advance without having already selected an anonymization strategy. In addition, the optimization capabilities of the database system will be exploited.

    Work steps

    • Getting familiar with the area:
      • Foundations of privacy
      • Anonymization
      • Distributed and parallel data processing
    • Literature review:
      • Distributed data anonymization
      • Optimization
        • In general
        • PostgreSQL
        • Postgres-XL
    • Conception
      • Selection of a suitable distributed anonymization method
      • Adaptation of the method to parallelisation
      • General optimization of the method
    • Prototyping
      • Implementation and modification of the chosen method
      • Optimization for Postgres-XL
    • Development of a test scenario
    • Test of the algorithm in the existing system environment

    Technologies

    • Java
    • SQL

    Literature

    • John, Bodo (2016) Vergleichende Analyse von Datenschutzalgorithmen und -konzepten. Bachelorarbeit, Universität Rostock.
    • Grunert, Hannes and Heuer, Andreas (2016) Datenschutz im PArADISE. Datenbank-Spektrum, 16 (2). pp. 107-117. ISSN 1618-2162
    • Jan Hendrik Nielsen, Daniel Janusz, Jochen Taeschner, Johann-Christoph Freytag (2015): D2Pt: Privacy-Aware Multiparty Data Publication. BTW 2015: 105-124
    • further literature will be announced at the beginning of the work