- Fuzziness
- Shading
- Colour bands
- Transparency
- Resolution
- Smoothness
- Continuity

Further reading identified two further methods:

- Lightness to signify uncertainty about the classification of a post code (Slingsby et al. 2011) or for projected values (Wong 2010, p.63)
- Sketchiness to signify the fact of uncertainty (such as future projections) rather than indicating the level of uncertainty (Wood et al. 2012)

There are several aspects of uncertainty which may be to do with the accuracy of the data, its precision, the degree of ambiguity and the level of confidence in the data. Where these can be represented numerically, they may be represented visually. Aidan has suggested that of these measures: “some may be considered more intuitive (high uncertainty being more transparent, less light, more fuzzy-looking, etc)”. Uncertainty may be due to poor data, an inappropriate model, or a poor understanding of the phenomenon being represented.

There is already an established visual vocabulary which is described by Bertin .(2011, pp.42–43) He identifies eight graphic variables which form the starting point for any discussion of visualization:

- Size
- Value
- Texture
- Colour
- Orientation
- Shape
- Two planar dimensions (2 variables)

We can use some of these visual variables to represent uncertainty. When we talk about risk there are two considerations:

- What is the probability of a risk event occurring?
- What is the impact of the risk event (consequence)?

In exploring the first of these we face two further questions: ‘How do we characterize a risk event?’ and ‘Are there established categories or do we have to develop a typology of our own?’ (Haynes & Robinson 2015). For example, if someone is browsing online what is the probability that they will enter a malicious site? Malicious sites could be defined by whether or not they are on a published list of known malicious sites. This is probably most meaningful across a large population so that different variables can be taken into account. For instance: frequency of online searches, online duration, experience, attitude to risk, whether or not there is anti-virus software on the device, operating system etc. Different population groups could be examined: nationality, age group, gender identity, socioeconomic group, educational attainment etc.

Estimates of the levels of occurrence of a particular risk event extrapolated to a general population would have an upper and a lower limit. These upper and lower estimates are likely to be imprecise and it might be useful to signal this visually. For example, in PowerPoint it is possible to generate a chart with a blurred effect:

The lower range has been shown in a darker shade to emphasize the fact that it applies to the lower and upper limits of the estimate. The upper range is in a lighter colour, because it only applies to the upper range. An alternative might be to use texturing to indicate different levels of certainty:

There is clearly further work to be done on presentation of results where there is a degree of uncertainty. A possible line of development might be to test different presentations of uncertainty to see how that would affect perceptions of risk.

**References**

Bertin, J., 2011. *Semiology of Graphics: diagrams, networks, maps* W. J. (translator) Berg, ed., Redlands, CA: Esri Press.

Haynes, D. & Robinson, L., 2015. Defining User Risk in Social Networking Services. *Aslib Journal of Information Management*, 67(1), pp.94–115.

Slingsby, A., Dykes, J. & Wood, J., 2011. Exploring Uncertainty in Geodemographics with Interactive Graphics. *IEEE Transactions on Visualization and Computer Graphics*, 17(12), pp.2545–2554.

Wong, D.M., 2010. *The Wall Street Journal Guide to Information Graphics: the dos and don’ts of presenting data, facts and figures*, New York: W.W.Norton.

Wood, J. et al., 2012. Sketchy Rendering for Information Visualization. *IEEE Transactions on Visualization and Computer Graphics*, 18(12), pp.2749–2758.

]]>

In this this study, we ask the question: ‘What risks are individuals exposed to when they disclose personal data online?’ This could be when they fill out a form, or when they make a posting on social media, or even just having their online behaviour tracked by digital advertising agents.

One of the early objectives of this research project is to define what we mean by ‘risk’ when we talk about online behaviour. Classically risk may be defined in terms of the outcome of an event where there is a degree of uncertainty. Very often people talk about the probability of an event taking place and its impact, or the consequence of the event. Risk is normally seen in terms of threats, so would not normally hear talk of the “risk of winning the National Lottery”.

An earlier investigation resulted in the development of a typology of risk (Haynes & Robinson 2015). This could provide a basis for identifying the risks or threats faced by individuals when they disclose personal data online. Is it possible to use this typology (or a modified version of this typology) to develop a better prediction of the probability of a risk event occurring? This might also allow for a quantification of the consequences or impact of a risk event.

Ongoing exploration of concepts of risks with experts from the engineering, health and safety, cybersecurity and insurance industries are intended to refine the concept of risk and to see how that might apply to individual online safety. For instance, the industrial safety sector talks about ‘hazards’, which are defined as threats that could lead to adverse outcomes. Another example might be: unsecured social media being regarded as a hazard but only turning into a risk when a user places their personal data on the social network. The consequence of doing so might be that a hacker steals their identity and uses that identity to make purchases in the victim’s name. The outcome could affect organizations (the bank may have to cover the loss) or individuals (loss of money, adverse effect on credit rating, inconvenience).

A further complication is that risks are rarely based on single events. For instance, a data breach could lead to a number of consequences, each of which might itself be a risk with multiple outcomes. The problem then becomes: ‘What is the event that we are analysing?’ A government risk expert suggested that “Any sequence of events can be represented as a branch of a tree, with different branches representing different possible event sequences.” It then might be possible to investigate a particular branch of the tree. The focus of the investigation would be on the most likely outcomes in the tree that would allow the development of a model based on a simplified case.

Using fault trees (to deduce the events that led to that fault) and event trees (to track the consequences of an event) can provide a powerful way of analysing a particular risk event. For instance, an undesirable event such as a data breach, will have a number of causes and could lead to one of a number of possible consequences.

Another approach is to consider the probability of a particular harm occurring given what is already known about the probability of related phenomena. Privacy calculus research depends on Bayesian analysis to estimate probabilities given what is known about users’ attitudes to risks and benefits of disclosing personal data online. In general terms: for two independent events A and B, the probability of A given that B is true can be calculated from the product of the probability of A times the probability of B given that A is true, divided by the probability of B.

P(A|B)=(P(A)×P(B|A))/(P(B))

This gives us a way of updating our beliefs (about the probability of occurrence of event A) given that we have information about an independent event B. P(A) is the prior probability and P(A|B) is the posterior probability.

The next stage in this research is to attempt to develop some models of risk in terms of fault trees and event trees. The intention is to see whether it is possible to create a simplified model of risk associated with online disclosure of personal data. This might then be tractable for treatment in a Bayesian network analysis and allow for comparison of predictions based on attitudinal surveys that predominate in the privacy calculus literature.

References

Haynes, D. & Robinson, L., 2015. Defining User Risk in Social Networking Services. *Aslib Journal of Information Management*, 67(1), pp.94–115.

]]>

The University takes its ethical responsibilities very seriously, particularly in light of recent developments in the use of research data for undeclared purposes. During the review I received very useful feedback from senior colleagues on the proposed methodology and this gave me an opportunity to refine my approach.

]]>