Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              I am Antonio and today with Francesco we are going to present
            
            
            
              GDPR and beyond demystifying data governance
            
            
            
              challenges. Francesco and I are data architects
            
            
            
              at Agilelab, an italian consulting firm specializing in
            
            
            
              large scale data management. Agileab is
            
            
            
              an effective and dynamic company structured around an olacosy
            
            
            
              inspired model with multiple business units.
            
            
            
              Through these business units, we are lucky enough to have several Fortune
            
            
            
              500 companies as our customers. Ok,
            
            
            
              lets take a look at what we have in the agenda. Today we will
            
            
            
              talk about data privacy and GDPR,
            
            
            
              why this european regulation is so important and we
            
            
            
              should design systems to be compliant with it.
            
            
            
              We will have an overview of different techniques that we can leverage
            
            
            
              to be compliant and secure like anonymization and encryption.
            
            
            
              Then we will compare these different techniques focusing
            
            
            
              our attention on pros and cons of each one.
            
            
            
              Finally, we represent a viable data sharing strategy
            
            
            
              for real use. Case data is an UI,
            
            
            
              am I right? Data is the fuel
            
            
            
              for innovation. Machine learning, artificial intelligence and
            
            
            
              analytics simply can't be possible without data.
            
            
            
              Just as oil power engines data fuels algorithms,
            
            
            
              enabling machines to learn and improve over time,
            
            
            
              this is the case for machine learning. Also. Data is
            
            
            
              the lifeblood of AI driving smart ecosystem
            
            
            
              that can mimic human intelligence.
            
            
            
              Finally, data analytics extracts valuable
            
            
            
              insights just as refining oil produces useful products.
            
            
            
              Yeah, data is the neural. Oil is the
            
            
            
              neural also in the bad parts of it.
            
            
            
              For example, data breaches are very similar to oil spills.
            
            
            
              They cause extensive damage and leak sensitive information
            
            
            
              that erode trust of our customers.
            
            
            
              We can also have privacy violations which are very similar
            
            
            
              to the pollution that can harm ecosystems.
            
            
            
              Privacy violation disrupts the digital environment and arms
            
            
            
              individuals. Then we have regulatory fines which
            
            
            
              are very comparable to environmental fines,
            
            
            
              which means that when your data is non compliance
            
            
            
              with data protection regulation, you will get
            
            
            
              very huge fines. Finally, you will suffer
            
            
            
              for reputational damage because like environmental
            
            
            
              damage, data breaches can severely
            
            
            
              impact brand trust and loyalty. So let's
            
            
            
              reflect on the reality of data breaches over the past 20
            
            
            
              years. This visualization showcases the top 50
            
            
            
              biggest data breaches. From 2004 to 2020,
            
            
            
              117 billion records
            
            
            
              were compromised. As we can see,
            
            
            
              the severity of breaches is escalating during the years,
            
            
            
              particularly from 2020 2016 onwards, with the web
            
            
            
              sector being the hardest hit, accounting for nearly 10
            
            
            
              billion records lost. Significant breaches span
            
            
            
              across various sectors including finance, government and
            
            
            
              tech, highlighting the widespread vulnerability.
            
            
            
              Notable breaches include Yahoo 2013 losing
            
            
            
              3 billion records and Facebook in 2019 with
            
            
            
              530 million records exposed.
            
            
            
              This growing trend underscores the critical need for robust data
            
            
            
              security measures. These breaches not only
            
            
            
              compromise personal information, but also erode
            
            
            
              public trust and pose severe financial risks.
            
            
            
              As we move forward, it is imperative to prioritize
            
            
            
              data protection and adopt stringent security protocols
            
            
            
              to safeguard these digital assets.
            
            
            
              Thats why European Union came up with GDPR.
            
            
            
              GDPR stands for General Data protection Regulation
            
            
            
              and is a regulation that requires businesses around the world
            
            
            
              to protect the personal data and privacies of European
            
            
            
              Union citizens. Starting from on the
            
            
            
              25 May 2018, GDPR puts
            
            
            
              in place certain restrictions on the collection, use and retention
            
            
            
              of personal data. Personal data is defined
            
            
            
              as information relating to unidentified or identifiable
            
            
            
              natural person. This includes data such as name,
            
            
            
              email, phone number, in addition to data that may be
            
            
            
              less obvious like API addresses, gps location, phone id
            
            
            
              and more. GDPR is based on some
            
            
            
              key principles and we will briefly
            
            
            
              run first of all is lawfulness. Personal data
            
            
            
              must be processed legally, adhering to established laws such
            
            
            
              as GDPR. Then we have fairness.
            
            
            
              Data processing should be fair, protecting vital
            
            
            
              interest, performing tasks carried out in the public interest,
            
            
            
              or pursuing legitimate interest of the data controller
            
            
            
              or third party. Then we have transparency because
            
            
            
              organizations must be open about their data processing activities.
            
            
            
              They should provide clear, accessible and understandable information
            
            
            
              to individuals about how their data is being used,
            
            
            
              who is collecting it and why. And this is
            
            
            
              why you have the cookie in every european
            
            
            
              website now. Then we have purpose limitations.
            
            
            
              So personal data must be collected for specified,
            
            
            
              explicit and legitimate purposes and not further
            
            
            
              process it in a manner that is not compatible with those purposes.
            
            
            
              Then you have the data minimization principle. That means
            
            
            
              organizations should collect only the personal data that is
            
            
            
              necessary to achieve the specified purpose, and we will have
            
            
            
              more on this later. Then we have accuracy. So personal
            
            
            
              data must be accurate and kept up to date,
            
            
            
              and inaccurate data should be corrected or deleted.
            
            
            
              Storage limitation means that personal data should not be kept longer than
            
            
            
              necessary for the purpose for which it was collected. For example,
            
            
            
              an example of storage limitation is the right to be forgotten.
            
            
            
              If I ask to be forgotten by some company, they should
            
            
            
              delete all data about me. This is storage limitation.
            
            
            
              Then we have integrity and confidentiality.
            
            
            
              So organization must ensure that the security
            
            
            
              of personal data, protecting it against unauthorized or
            
            
            
              unlawful processing, accidental loss, destruction or damage
            
            
            
              like data breaches. And then you have accountability. So data
            
            
            
              controllers are responsible for complying with GDPR principles
            
            
            
              and must be able to demonstrate their compliance.
            
            
            
              In order to do that, GDPR creates some requirements around
            
            
            
              the regulation itself. It requires that companies
            
            
            
              have data protection impact assessments, which are tools used
            
            
            
              to identify and mitigate risk associated with data processing
            
            
            
              activity. They must follow data
            
            
            
              breach notification regulation so they
            
            
            
              should timely report data breaches to both authorities
            
            
            
              and individuals. Then they need to
            
            
            
              appoint a data protection officer, someone inside
            
            
            
              the company that ensure that there is a person
            
            
            
              responsible for overseeing data protection strategy
            
            
            
              and compliance with GDPR. Obviously,
            
            
            
              they need to implement data protection by design and by default.
            
            
            
              So every data initiative and the company should
            
            
            
              comply with this regulation without any
            
            
            
              need to integrate it after. And then they
            
            
            
              have some record keeping obligations. So they need to be sure
            
            
            
              the companies maintain a detailed record of their data
            
            
            
              processing activities. This for accountability and
            
            
            
              compliance purposes. Obviously, GDPR had
            
            
            
              huge implications for data governance. But what is data governance?
            
            
            
              Data governance is the process of managing the availability,
            
            
            
              usability, integrity and security of the data in enterprise
            
            
            
              systems based on internal standards and policy that also control
            
            
            
              data usage. An effective data governance
            
            
            
              ensures that data is consistent and trustworthy
            
            
            
              and doesn't get misused. GDPR had some implications
            
            
            
              on internal data governance strategies for companies or enterprises,
            
            
            
              such as they had to enhance data security and privacy control.
            
            
            
              To be compliant. They needed to improve data quality
            
            
            
              and accuracy. Because of the principle we've seen before,
            
            
            
              they need to put an increase to accountability and
            
            
            
              transparency in how data was used.
            
            
            
              And obviously, GDPR put pressure and
            
            
            
              created the necessity for regular audits and assessments around
            
            
            
              data. Today, we will focus mostly
            
            
            
              on the data minimization principle, which in our opinion
            
            
            
              is one of the most important ones in GDPR.
            
            
            
              Data minimization principle is foundational to responsible data
            
            
            
              handling and privacy protection. Under GDPR,
            
            
            
              data minimization principle mandates that organizations should
            
            
            
              only collect and process the personal data that is
            
            
            
              absolutely necessary for their specified purposes.
            
            
            
              Imagine you're building a house. You wouldn't order extra
            
            
            
              bricks that you'll never use as it would be wasteful and
            
            
            
              clutter your space. Similarly, in data processing,
            
            
            
              we should avoid collecting excess data. By adhering
            
            
            
              to this principle, we not only streamline
            
            
            
              our data management practices, but also enhance security
            
            
            
              and compliance. Collecting minimal data
            
            
            
              reduces the risk of breaches and misuse,
            
            
            
              ensuring we respect our customers privacy and build their
            
            
            
              trust. This also simplifies data management and
            
            
            
              can lead to more efficient processes. So let's commit
            
            
            
              to collecting only what we need, protecting privacy
            
            
            
              and fostering a culture of data responsibility.
            
            
            
              So, does this meme look familiar?
            
            
            
              We know that people working on AI, machine learning and
            
            
            
              analytics need real data to do their job,
            
            
            
              but this clashes with GDPR regulation 99% of the
            
            
            
              time. I will leave now the stage to Francesco,
            
            
            
              who will show you how we can build a compliant data sharing strategy
            
            
            
              and still allow data practice share to be effective.
            
            
            
              Here we go, Francesco. This meme will look familiar.
            
            
            
              This is a quite common scenario since most of ML
            
            
            
              engineers and data scientists need to prototype everyday
            
            
            
              their models. In the majority of the cases they
            
            
            
              start using development data, but the risk is that when
            
            
            
              moving to production, the performance of the model is low,
            
            
            
              so they fall back in using sampler data from production.
            
            
            
              This also increased the risk of sensitivity, data leakage
            
            
            
              and exposure in the minor environment. In the next slide,
            
            
            
              let's see how we can unfold this issue and which techniques
            
            
            
              could help our case. So the first thing we are going to
            
            
            
              talk about is about anonymization.
            
            
            
              So data anonymization is one of the techniques that organizations
            
            
            
              can use in order to adapt restrict data privacy regulation,
            
            
            
              but require the security of personal identifiable information such
            
            
            
              as health reports, contact information and
            
            
            
              financial details. It affects pronunciation
            
            
            
              since it is not a reversible operation.
            
            
            
              Cellular initiation simply reduce the correlation of
            
            
            
              data set with the original identity of a data subject
            
            
            
              and is therefore a useful but not an absolute security
            
            
            
              measure. And let's take a look now at the most
            
            
            
              common one technique about below the act of anonymization.
            
            
            
              The first one that we are going to talk about is generalization.
            
            
            
              Generalization usually changes the scale of the data set,
            
            
            
              attributes or the order of magnitude. As you may see,
            
            
            
              we have a simple table with several columns like name,
            
            
            
              age, birth date, state, and disease.
            
            
            
              In the example one, a field that includes number like
            
            
            
              age can be generalized by expressing an interval.
            
            
            
              As you may see, mark age has been
            
            
            
              put within an interval between 20 and 30.
            
            
            
              In the example two, a filtering class dates like
            
            
            
              1993 1019 can be
            
            
            
              generalized by using only the year 1993,
            
            
            
              and this is the very first method. The other one is
            
            
            
              randomization. Randomization involves changing attributes
            
            
            
              in a dataset so that they are less precise while
            
            
            
              maintaining their overall distribution. Below the app
            
            
            
              of randomization, we have textbooks such as noise
            
            
            
              addition and shuffling. Noise addition methods
            
            
            
              provides to inject some modification within the data
            
            
            
              set in order to make it less accurate, for example,
            
            
            
              increasing or decreasing the age of a person,
            
            
            
              as we can see in our example inside the slide
            
            
            
              while shuffling, simply swap the the age of mark and
            
            
            
              john and this is the second method.
            
            
            
              The third method is the most common one is the
            
            
            
              suppression, another useful technique, the most
            
            
            
              used in the space of an analyzation. In my opinion,
            
            
            
              suppression is the process of removing an attribute's value
            
            
            
              entirely from a data set, while reduction removes
            
            
            
              part of the attribute value from a data set.
            
            
            
              With such techniques you can have multiple issues.
            
            
            
              For example, the warning number one is if
            
            
            
              the data is collecting for the purpose of determining
            
            
            
              at which age individuals are most likely to develop a specific
            
            
            
              illness condition suppressing the age data
            
            
            
              would make the data itself useless. The warning number
            
            
            
              two is that the data type will change from integer
            
            
            
              to string and this will break the contract for all the
            
            
            
              data consumer of that kind of data asset.
            
            
            
              In this slide, as you can see, we have a brief comparison
            
            
            
              of the methods explained previously. For each strategy
            
            
            
              we evaluated three main secrecy,
            
            
            
              privacy and utility. And for each capital
            
            
            
              strategy factor we assigned a rating ranging
            
            
            
              from poor to best. As you may see, every method has
            
            
            
              its weakness, so we do not have an evidence of
            
            
            
              a superior technique that could address all the items and
            
            
            
              factors around data. In this case c four c,
            
            
            
              privacy and utility. What we can say is it
            
            
            
              depends a lot from the use case,
            
            
            
              but now let's take a look at the encryption methods in
            
            
            
              order to understand if they could help in the context of
            
            
            
              compliance. The first
            
            
            
              method we are going to talk about is format preserving encryption.
            
            
            
              Format preserving encryption, or SPE is
            
            
            
              a symmetric encryption algorithm which preserves
            
            
            
              the format and of the information while it is
            
            
            
              being encrypted. FPE is weaker than
            
            
            
              advanced decryption starter. AE's performance
            
            
            
              presenting encryption can present the length of data as
            
            
            
              well as its format. FP is by Nissan standard and
            
            
            
              there are three different model of operation, ff one, ff two
            
            
            
              and ff three. FPE works very
            
            
            
              well with existing applications as well as new applications.
            
            
            
              If an application needs data of a certain language
            
            
            
              format, then FBE is the way to grow.
            
            
            
              In order to operate with this algorithm,
            
            
            
              you should use a separate key and a tweak. Another implementation
            
            
            
              is provided by Bouncy Castle and the other one is
            
            
            
              available on Google Cloud as well as provincial toolkit.
            
            
            
              Now that we have seen the first encryption maple,
            
            
            
              let's take a look at another one.
            
            
            
              Homomorphic encryption omofic encryption provides the
            
            
            
              ability to compute on data while the data is encrypted.
            
            
            
              It sounds like magic, don't you? There are three different
            
            
            
              modes. In this case partially Omar pick encryption that
            
            
            
              allows a ten mathematical function to be used,
            
            
            
              for example addition or multiplication.
            
            
            
              Some automorphic encryption. Some function can be
            
            
            
              performed only a fixed number of times or
            
            
            
              up to a certain level of complexity.
            
            
            
              Or in the end we have also fully mm that
            
            
            
              allows all the function mathematical function to be performed
            
            
            
              on unlimited times up to any level of complexity
            
            
            
              without requiring the decryption of the data.
            
            
            
              So suppose you want to overwrite some sensitive data
            
            
            
              in a cloud. In the picture you can see you
            
            
            
              can have that you have on the left the traditional approach.
            
            
            
              In the example you can encrypt the files before moving
            
            
            
              to the cloud. For example, with a standard Andrew algorithm
            
            
            
              like AE's then if you want to perform some transformation
            
            
            
              on these files, you have to decrypt, apply the transformation and
            
            
            
              then encrypt again. This will expose data
            
            
            
              at risk and also introduce a complex operator on
            
            
            
              the right side. Instead you are going to use the amomorphic encryption.
            
            
            
              Once you are on the cloud you can do computation on separate
            
            
            
              text. Also decrypt you will obtain the same result of
            
            
            
              applying the function to the plaintext data.
            
            
            
              Unfortunately, it requires a significant computational
            
            
            
              operator to perform the intensity calculation,
            
            
            
              making this kind of strategy very slow and very
            
            
            
              resource intensive. In addition to passwords
            
            
            
              concern, implementation of this specific caliber
            
            
            
              can be very challenging with highly complex techniques.
            
            
            
              Is it all? No, we have also other strategies and
            
            
            
              methods to present on the table. One of these
            
            
            
              is tokenization. Tokenization involves substituting
            
            
            
              sensitive data like credit card number with non
            
            
            
              sensitive token which are stored securely
            
            
            
              in a separate database called Totembo.
            
            
            
              Synthetic data is artificial data generated
            
            
            
              with the purpose of preserving privacy testing system
            
            
            
              or creating training data for machine learning algorithms.
            
            
            
              Synthetic data generation is a critical and very
            
            
            
              complicated for two main reasons, quality and
            
            
            
              secrecy. Synthetic data that can be reverse engineered
            
            
            
              to identify real data would not be useful in
            
            
            
              privacy context. Faker is a Python
            
            
            
              package that generates fake data for you. There is also mockru
            
            
            
              that is another representative into the ecosystem of
            
            
            
              mock data that allows you to quickly and easily
            
            
            
              the low large amounts of randomly generated test data
            
            
            
              based on the specification that you define. This is
            
            
            
              all in this case. Now let's take a
            
            
            
              look and summarize what we have learned in the previous slide.
            
            
            
              The first thing that you could do in order to have
            
            
            
              to share data for machine learning purpose and
            
            
            
              analysis in minor environment is to use sample data
            
            
            
              coming from probably from the production environment.
            
            
            
              The first thing as a pro ultra realist sampled production
            
            
            
              data provide a realistic representation of actual
            
            
            
              data, helping developers and testers to identify
            
            
            
              issues that may not be evident with synthetic
            
            
            
              or mock data. You have an improved testing, so using real
            
            
            
              data allow for more comprehensive and accurate
            
            
            
              testing of functionality that integrity,
            
            
            
              performance and scalability. Then you have stakeholder
            
            
            
              confidence. Using real data increase stakeholder confidence
            
            
            
              in the testing process and the reliability of the development
            
            
            
              lifecycle. Within the cons you have
            
            
            
              that you have a lot of issues with privacy and
            
            
            
              compliance. Even sample data
            
            
            
              can contain sense of information, raising privacy
            
            
            
              concern and potential non compliance with data
            
            
            
              protection regulations such as GDPR and
            
            
            
              other ones like Hapa.
            
            
            
              Security risk using production data in minor
            
            
            
              environment like development or QA increase
            
            
            
              the risk of data breach and unauthorized access.
            
            
            
              Data freshness. This is another issue. Sample data might become
            
            
            
              outdated quickly, leading to scenarios where dead
            
            
            
              environments are not completely aligned with
            
            
            
              current production environment. Let's take a look at
            
            
            
              standard synthetic data privacy and securities for sure
            
            
            
              are pros. So synthetic data can be generated
            
            
            
              without any real world personal data, significantly reducing
            
            
            
              privacy concern and the risk of data leakage
            
            
            
              availability. Synth data can be created on demand
            
            
            
              efficiency. Generating syntactic data can be more
            
            
            
              cost effective than collecting and labeling a large
            
            
            
              volume of real world data. On the contrary,
            
            
            
              what we have lack of real is synthetic data may not capture all the
            
            
            
              complexities and the nuance of the real world data.
            
            
            
              We can have problem with overfitting. Also,
            
            
            
              there will be validation challenges. Validating the accuracy
            
            
            
              and the reliability of synthetic diagonal could be very challenging
            
            
            
              as it requires ensuring that the synthetic data closely
            
            
            
              mimics real warm data distribution. There is also
            
            
            
              concern about the complexity. It will require
            
            
            
              isotope complexity creating high quality synthetic
            
            
            
              data the wheel requires sophisticated techniques and domain knowledge,
            
            
            
              making the initial setup complex and resource intensive.
            
            
            
              Encrypted data source let's talk about it, focusing our
            
            
            
              attention on standard algorithm privacy protection.
            
            
            
              Anonymized data with encryption reduce the risk of
            
            
            
              exposing personal information. We are
            
            
            
              compliant in regulatory sets.
            
            
            
              Using anonymized data helps organization to comply with
            
            
            
              GDPR, CPI and so on. We can enable
            
            
            
              easily a data sharing mechanism and
            
            
            
              we can share this data with a little
            
            
            
              bit of freedom within department,
            
            
            
              organization or also with external partners.
            
            
            
              We also have a risk mitigation because we are going to
            
            
            
              reduce the potential for data breach and as
            
            
            
              the data no longer contains personal identifiable
            
            
            
              information. But on the contrary, we have some
            
            
            
              complexity. Working with encrypted data,
            
            
            
              especially in the case of AE's algorithm,
            
            
            
              can complicate development phase and also
            
            
            
              testing activities because data will lose any kind
            
            
            
              of meaning, will only keep this distribution.
            
            
            
              Key management challenges effective key management
            
            
            
              is crucial and can be complex,
            
            
            
              especially in non production environment where multiple
            
            
            
              teams and individuals may need access to encryption.
            
            
            
              Limited testing the curvature testing with encrypted data
            
            
            
              may not reflect through application behavior. If decryption process introduces
            
            
            
              delays or errors that wouldn't be or
            
            
            
              wouldn't occur in production. Anonymized data as
            
            
            
              we have seen before, the complex it would be
            
            
            
              complex. The anonymous process probably anonymizing
            
            
            
              data can be very challenging, requiring some sophisticated techniques
            
            
            
              and ongoing management to ensure data remains anonymous.
            
            
            
              There is also the problem of radio identification
            
            
            
              on the data subject. There is a risk that anonymized data
            
            
            
              can be reverted, especially if combined with
            
            
            
              other data sets, for example, knit attack
            
            
            
              and so forth. In some case we lose the utility,
            
            
            
              as we have seen for suppression, but now that
            
            
            
              we have summarized all the possible methods and
            
            
            
              techniques, at least the most important within the
            
            
            
              compliance context, let's take a look at the
            
            
            
              next slide. We are going to present a possible strategy
            
            
            
              for sharing data in a quite secure way in a minor environment.
            
            
            
              The practice I'm going to show you will combine some of the methods
            
            
            
              that we have seen before in the context of a data lake.
            
            
            
              So before moving forward, let's have a little bit
            
            
            
              of context. We are in the cloud and we have a data layer
            
            
            
              in the specified case. In the specific case, we leverage the medallion
            
            
            
              architecture for our storage layer. The most of you already
            
            
            
              know what it is a medallion architecture. It is also known
            
            
            
              as a multi hole architecture. Data at each
            
            
            
              stage get richer by increasing the intrinsic value.
            
            
            
              At each stage, PIi can be present and usually machine
            
            
            
              learning engineer and other scientists operate at silver
            
            
            
              gold layer. In this slide, let's see what
            
            
            
              we can do to shift data and minor environments and enable
            
            
            
              a safe data consumption. This is a receipt for a
            
            
            
              cloud based scenario, for example AWS, but can be
            
            
            
              easily replicated in other cloud vendors.
            
            
            
              So you have a production account on the top and
            
            
            
              for simplification purpose, an on production account in
            
            
            
              the bottom. In each layer we have the usual
            
            
            
              medallion architecture that we have seen before.
            
            
            
              The step one requires that data teams
            
            
            
              are in charge to prioritize their job and anonymize
            
            
            
              data. The encryption process becomes a mandatory step
            
            
            
              in the data lifecycle made of data ingestion,
            
            
            
              data normalization and delivery.
            
            
            
              In step two, we will open a read only cross policy
            
            
            
              account from production to run product. The minor environment
            
            
            
              never writes to prod. It is all enabled for reading operations.
            
            
            
              The encryption key is never shared with manner
            
            
            
              environments. By this way, user Personas that
            
            
            
              walk in the lower end barnet are enabled in doing their job.
            
            
            
              Data in video can prototype new silver dataset reading
            
            
            
              from the bronze encrypted layer. Analysts can model
            
            
            
              new schema and generate new anonymized report
            
            
            
              data. Scientists can propagate their model on quite
            
            
            
              realistic data set since only sensitive
            
            
            
              column will be encrypted. Let's take a look at
            
            
            
              which are the benefits of this kind of practice and strategies.
            
            
            
              Analyst says format preserving encryption
            
            
            
              guarantees reference integrity, no schema
            
            
            
              change across difference data sets and allows
            
            
            
              to reuse business logic. Derek Jones still
            
            
            
              works. After the encryption data,
            
            
            
              engineers are allowed to read only the encrypted layers liberated
            
            
            
              on a doc IAM policy. This is going to simplify
            
            
            
              the data movement and the orchestration process between environments.
            
            
            
              Machine learning engineers can prototype and train their job
            
            
            
              models on acquired real data on
            
            
            
              a safe layer. DevOps practice is still in
            
            
            
              place since deployments of new artifacts and models
            
            
            
              can follow the standard CICB flow.
            
            
            
              Secops says minimization principle is respected
            
            
            
              on minor end since most of the time you have encrypted
            
            
            
              information. Now let's take a look at and
            
            
            
              a don't feature and let's talk about all
            
            
            
              the right to be forgotten below the ecosystem of
            
            
            
              GDPR. In this slide we are going to talk and present
            
            
            
              the crypto shredding technique crypto shredding
            
            
            
              is the practice of deleting data by deleting
            
            
            
              or overdriving bankruptcry. This is going
            
            
            
              to require that the data have been encrypted from
            
            
            
              deleting. The key will automatically logically delete the record
            
            
            
              and all the existing copies since all the
            
            
            
              encrypted info are not reversible anymore.
            
            
            
              This approach is very useful when you have multiple copy
            
            
            
              of data, for example the card or multiple layers
            
            
            
              of data like in the Medellin architecture.
            
            
            
              If you are in the early stage of creating your data layer
            
            
            
              and building the foundation, you can combine crypto
            
            
            
              shredding and format preserving encryption in order to
            
            
            
              enable a very interesting scenario that will catch
            
            
            
              the sacred data sharing practice that Yves explained before
            
            
            
              and the deletion problem of multiple layers on
            
            
            
              environments. It is worthless to say that
            
            
            
              all these techniques and strategies will function with a strong
            
            
            
              than a governance practice place, knowing in other
            
            
            
              bounds that where Pii are stored and
            
            
            
              their lineage is fundamental. But this is another story.
            
            
            
              Thank you everybody and let's get in touch from any question
            
            
            
              and answer.