Approaching the new General Data Protection Regulation (GDPR), effective from May 2018, companies based in Europe or having personal data of people residing in Europe, are struggling to find their most valuable assets in the organization – their sensitive data.Approaching the new General Data Protection Regulation (GDPR), effective from May 2018, companies based in Europe or having personal data of people residing in Europe, are struggling to find their most valuable assets in the organization – their sensitive data.The new regulation requires organizations to prevent any data breach of personally identifiable information (PII) and to delete any data if some individual requests to do so. After removing all PII data, the companies will need to prove that it has been entirely removed to that person and to the authorities.
Most companies today understand their obligation to demonstrate accountability and compliance, and therefore started preparing for the new regulation.There is so much information out there about ways to protect your sensitive data, so much that one can be overwhelmed and start pointing into different directions, hoping to accurately strike the target. If you plan your data governance ahead, you can still reach the deadline and avoid penalties.
Some organizations, mostly banks, insurance companies and manufacturers possess an enormous amount of data, as they are producing data at an accelerated pace, by changing, saving and sharing files, thus creating terabytes and even petabytes of data. The difficulty for these type of firms is finding their sensitive data in millions of files, in structured and unstructured data, which is unfortunately in most cases, an impossible mission to do.
The following personal identification data, is classified as PII under the definition used by the National Institute of Standards and Technology (NIST):
- Full name
- Home address
- Email address
- National identification number
- Passport number
- IP address (when linked, but not PII by itself in US)
- Vehicle registration plate number
- Driver’s license number
- Face, fingerprints, or handwriting
- Credit card numbers
- Digital identity
- Date of birth
- Genetic information
- Telephone number
- Login name, screen name, nickname, or handle
Most organizations who possess PII of European citizens, require detecting and protecting against any PII data breaches, and deleting PII (often referred to as the right to be forgotten) from the company’s data. The Official Journal of the European Union: Regulation (EU) 2016/679 Of the European parliament and of the council of 27 April 2016 has stated:“The supervisory authorities should monitor the application of the provisions pursuant to this regulation and contribute to its consistent application throughout the Union, in order to protect natural persons in relation to the processing of their personal data and to facilitate the free flow of personal data within the internal market. “In order to enable the companies who possess PII of European citizens to facilitate a free flow of PII within the European market, they need to be able to identify their data and categorize it according to the sensitivity level of their organizational policy. They define the flow of data and the markets challenges as follows: “Rapid technological developments and globalization have brought new challenges for the protection of personal data. The scale of the collection and sharing of personal data has increased significantly. Technology allows both private companies and public authorities to make use of personal data on an unprecedented scale in order to pursue their activities. Natural persons increasingly make personal information available publicly and globally. Technology has transformed both the economy and social life, and should further facilitate the free flow of personal data within the Union and the transfer to third countries and international organizations, while ensuring a high level of the protection of personal data.”
Phase 1 – Data Detection
So, the first step that needs to be taken is creating a data lineage which will enable to understand where their PII data is spread across the organization, and will help the decision makers to detect specific types of data. The EU recommends obtaining automated technology that can handle large amounts of data, by automatically scanning it. No matter how large your team is, this is not a project that can be handled manually when facing millions of different types of files hidden I various areas: in the cloud, storages and on premises desktops. The main concern for these types of organizations is that if they are not able to prevent data breaches, they will not be compliant with the new EU GDPR regulation and may face heavy penalties.They need to appoint specific employees that will be responsible for the entire process such as a Data Protection Officer (DPO) who mainly handles the technological solutions, a Chief Information Governance Officer (CIGO), usually it’s a lawyer who is responsible for the compliance, and/or a Compliance Risk Officer (CRO). This person needs to be able to control the entire process from end to end, and to be able to provide the management and the authorities with complete transparency.“The controller should give particular consideration to the nature of the personal data, the purpose and duration of the proposed processing operation or operations, as well as the situation in the country of origin, the third country and the country of final destination, and should provide suitable safeguards to protect fundamental rights and freedoms of natural persons with regard to the processing of their personal data.” The PII data can be found in all types of files, not only in pdf’s and text documents, but it can also be found in image documents- for example a scanned check, a CAD/CAM file which can contain the IP of a product, a confidential sketch, code or binary file etc.’. The common technologies today can extract data out of files which makes the data hidden in text, easy to be found, but the rest of the files which in some organizations such as manufacturing may possess most of the sensitive data in image files. These types of files can’t be accurately detected, and without the right technology that is able to detect PII data in other file formats than text, one can easily miss this important information and cause the organization an substantial damage.
Phase 2 – Data Categorization
This stage consists of data mining actions behind the scenes, created by an automated system. The DPO/controller or the information security decision maker needs to decide if to track a certain data, block the data, or send alerts of a data breach. In order to perform these actions, he needs to view his data in separate categories.Categorizing structured and unstructured data, requires full identification of the data while maintaining scalability – effectively scanning all database without “boiling the ocean”.The DPO is also required to maintain data visibility across multiple sources, and to quickly present all files related to a certain person according to specific entities such as: name, D.O.B., credit card number, social security number, telephone, email address etc.In case of a data breach, the DPO shall directly report to the highest management level of the controller or the processor, or to the Information security officer which will be responsible to report this breach to the relevant authorities.The EU GDPR article 33, requires reporting this breach to the authorities within 72 hours. Once the DPO identifies the data, he’s next step should be labeling/tagging the files according to the sensitivity level defined by the organization.As part of meeting regulatory compliance, the organizations files need to be accurately tagged so that these files can be tracked on premises and even when shared outside the organization.
Phase 3 – Knowledge
Once the data is tagged, you can map personal information across networks and systems, both structured and unstructured and it can easily be tracked, allowing organizations to protect their sensitive data and enable their end users to safely use and share files, thus enhancing data loss prevention. Another aspect that needs to be considered, is protecting sensitive information from insider threats – employees that try to steal sensitive data such as credit cards, contact lists etc. or manipulate the data to gain some benefit. These types of actions are hard to detect on time without an automated tracking.These time-consuming tasks apply to most organizations, arousing them to search for efficient ways to gain insights from their enterprise data so that they can base their decisions upon.The ability to analyze intrinsic data patterns, helps organization get a better vision of their enterprise data and to point out to specific threats. Integrating an encryption technology enables the controller to effectively track and monitor data, and by implementing internal physical segregation system, he can create a data geo-fencing through personal data segregation definitions, cross geo’s / domains, and reports on sharing violation once that rule breaks. Using this combination of technologies, the controller can enable the employees to securely send messages across the organization, between the right departments and out of the organization without being over blocked.
Phase 4 – Artificial Intelligence (AI)
After scanning the data, tagging and tracking it, a higher value for the organization is the ability to automatically screen outlier behavior of sensitive data and trigger protection measures in order to prevent these events to evolve into a data breach incident. This advanced technology is known as “Artificial Intelligence” (AI). Here the AI function is usually comprised of strong pattern recognition component and learning mechanism in order to enable the machine to take these decisions or at least recommend the data protection officer on preferred course of action. This intelligence is measured by its ability to get wiser from every scan and user input or changes in data cartography. Eventually, the AI function build the organizations’ digital footprint that becomes the essential layer between the raw data and the business flows around data protection, compliance and data management.