This is part of a series of reflective blogs about my research into the risks associated with disclosure of personal information online.
In this this study, we ask the question: ‘What risks are individuals exposed to when they disclose personal data online?’ This could be when they fill out a form, or when they make a posting on social media, or even just having their online behaviour tracked by digital advertising agents.
One of the early objectives of this research project is to define what we mean by ‘risk’ when we talk about online behaviour. Classically risk may be defined in terms of the outcome of an event where there is a degree of uncertainty. Very often people talk about the probability of an event taking place and its impact, or the consequence of the event. Risk is normally seen in terms of threats, so would not normally hear talk of the “risk of winning the National Lottery”.
An earlier investigation resulted in the development of a typology of risk (Haynes & Robinson 2015). This could provide a basis for identifying the risks or threats faced by individuals when they disclose personal data online. Is it possible to use this typology (or a modified version of this typology) to develop a better prediction of the probability of a risk event occurring? This might also allow for a quantification of the consequences or impact of a risk event.
Ongoing exploration of concepts of risks with experts from the engineering, health and safety, cybersecurity and insurance industries are intended to refine the concept of risk and to see how that might apply to individual online safety. For instance, the industrial safety sector talks about ‘hazards’, which are defined as threats that could lead to adverse outcomes. Another example might be: unsecured social media being regarded as a hazard but only turning into a risk when a user places their personal data on the social network. The consequence of doing so might be that a hacker steals their identity and uses that identity to make purchases in the victim’s name. The outcome could affect organizations (the bank may have to cover the loss) or individuals (loss of money, adverse effect on credit rating, inconvenience).
A further complication is that risks are rarely based on single events. For instance, a data breach could lead to a number of consequences, each of which might itself be a risk with multiple outcomes. The problem then becomes: ‘What is the event that we are analysing?’ A government risk expert suggested that “Any sequence of events can be represented as a branch of a tree, with different branches representing different possible event sequences.” It then might be possible to investigate a particular branch of the tree. The focus of the investigation would be on the most likely outcomes in the tree that would allow the development of a model based on a simplified case.
Using fault trees (to deduce the events that led to that fault) and event trees (to track the consequences of an event) can provide a powerful way of analysing a particular risk event. For instance, an undesirable event such as a data breach, will have a number of causes and could lead to one of a number of possible consequences.
Another approach is to consider the probability of a particular harm occurring given what is already known about the probability of related phenomena. Privacy calculus research depends on Bayesian analysis to estimate probabilities given what is known about users’ attitudes to risks and benefits of disclosing personal data online. In general terms: for two independent events A and B, the probability of A given that B is true can be calculated from the product of the probability of A times the probability of B given that A is true, divided by the probability of B.
This gives us a way of updating our beliefs (about the probability of occurrence of event A) given that we have information about an independent event B. P(A) is the prior probability and P(A|B) is the posterior probability.
The next stage in this research is to attempt to develop some models of risk in terms of fault trees and event trees. The intention is to see whether it is possible to create a simplified model of risk associated with online disclosure of personal data. This might then be tractable for treatment in a Bayesian network analysis and allow for comparison of predictions based on attitudinal surveys that predominate in the privacy calculus literature.
Haynes, D. & Robinson, L., 2015. Defining User Risk in Social Networking Services. Aslib Journal of Information Management, 67(1), pp.94–115.