Discovering Privacy

Antti Vähä-Sipilä, F-Secure /
Twitter: @anttivs
Available online at

This presentation has extra notes for non-interactive viewing. You can access the notes by pressing the arrow down, or by clicking on the arrow in the right bottom corner.

You are standing in the basement. There is an arrow pointing up. You hear the muffled noise of the audience upstairs. There is a sign here that reads:

Congratulations! You found the explanations track. Each slide has a 'down' arrow, and pressing it will show some more detail of what I will be talking about. This is intended for those who are reading the slides outside the live presentation. Press the 'up' arrow to get back on the main slides track.

Types of privacy

  1. Regulatory requirements
  2. Privacy Enhancing Technologies (PETs)
  3. (Security) controls for either

I'm differentiating between legal requirements, which essentially are things that have to be done in order to comply with the regulations, and "privacy enhancing" functionality, which increase privacy without being a legal requirement. Often there is a difference in how these are discovered: When a PET is wanted, it usually is some sort of user promise - perhaps a market differentiator.

Usually (especially) legal requirements start as an policy. Policies are how something should work. However, policies are not program code, and the system that is going to be implemented is only governed by the program code. If the system does not control its privacy aspects, but it only left as a policy (or human behaviour) level, from engineering perspective, there is no "privacy" being built into the system.

The closer you get to the code, then, the more any privacy aspects start to look like security controls. Are you using random identifiers for your customers? This means you have to have a cryptographically strong random number generator and sufficiently large identifiers - pure security engineering. Do you have requirements for data deletion after retaining period? The deletion aspects are purely security engineering.

Privacy Impact
Assessment (PIA)

  • Assessment happens before implementation
    • Timing & scheduling depends on your work management system
  • Do not confuse with a "security assessment" that often happens just before release
  • "[a] tool that you can use to identify and reduce the privacy risks" (ICO)

A Privacy Impact Assessment (PIA) is an activity that happens at some point before implementation takes place. Typically, lawyers would approach a PIA very early in the process, which has the challenge that not everything that the system does is yet known - especially if the requirements are managed using an agile methodology.

However, a PIA should not be performed after-the-fact. The term "assessment" is somewhat tricky, because a "security assessment" (or even "audit") is usually something that happens late in the lifecycle. It is fairly common to confuse the terms.

A PIA is a risk management exercise, so it is justified to say that PIA is "threat modelling for privacy" or "privacy risk analysis". As we will see in a moment, most PIA activities fit in nicely with security-related risk modelling activities.

PIA building blocks

(Paraphrasing ICO)

  • Identify business goals
  • Describe information flows
  • Identify stakeholders
  • Identify risks & controls
  • Implement controls & accept residual risk

As is pretty apparent, on the surface, a PIA does not really differ from a security threat modelling exercise. ICO even mentions information flows, which neatly corresponds to data flow based threat modelling.

The difficult part in this process is actually identifying risks. If you are a privacy (or security) professional, you probably have a pretty good intuitive grasp on what to look for. However, in most organisations, privacy and security experts won't scale. There needs to be some way of finding privacy risks. The most important consideration for identifying risks is that it is systematic. Systematic work provides a scaffolding where people with less experience can arrive at a good enough result. Leaving risk analysis as completely ad hoc or open ended may be risky in itself.

So how can you go on about it?

Side note: PIA in Agile

  • Run PIA against your backlog items
    • Iteratively, as they come
  • If design is still open, conduct "business level" PIA with lawyers
    • Results in privacy user stories and changed functional features
  • If you already know the design, conduct "technical" PIA using threat modelling
    • Results in new features (PETs) or acceptance criteria

threat modelling

  • Many schools of thought; I do data flow analysis
  • Use a DFD or MSC diagram, consider all flows and storage
  • Use a framework such as STRIDE (Microsoft)

There are many ways to do threat modelling. Some people prefer threat trees, some have unstructured discussion. In my line of work, teaching engineers to do it is just as important as the results themselves, so I am using a data flow based threat modelling technique. I have successfully used it in dozens of facilitated sessions for components ranging from embedded device drivers to cloud-deployed web services.

In doing this sort of threat model, we would start with a Data Flow Diagram (DFD) or a Message Sequence Chart (MSC) depending on the complexity of interactions. Each data flow, data store, and processing entity, will be discussed from six aspects that make up the acronym STRIDE (Spoofing, Tampering [Integrity], (non-)Repudiation, Information disclosure [Confidentiality], Denial of Service [Availability], and Elevation of Privilege).

Findings are stored on the product backlog as tasks.

For more information on STRIDE, and its nuances, have a look at Threat Modeling: Designing for Security by Adam Shostack, or my Software Security course at Aalto University in 2015 (see 2014 course at University of Helsinki).

(On how to run this in an agile / Continuous Delivery project, see

Your reading list

Data flow modelling "acronym extensions"

  • Extend a data flow diagram analysis (like STRIDE)
  • LINDDUN: Linkability, Identifiability, Non-Repudiation, Detectability, Disclosure of information, Unawareness of disclosing info, Non-Compliance
  • TRIM: Transfer, Retention, Informed disclosure, Minimisation

If you have tried STRIDE and find that it works, you can pimp the method by adding more letters (i.e., more points to consider). Sometimes this works, but admittedly this may be seen as somewhat mechanistic.

LINDDUN, as described in Shostack's book and academic papers, discusses a number of aspects that apply either to data (linkability, identifiability, detectability), interaction logic (non-repudiation), user experience (unawareness) and compliance. It is interesting to note that non-repudiation is often a good thing for security and a bad thing for privacy.

TRIM is my own set of considerations, which I tried to make as simple and small as possible, and came up independently before I knew of LINDDUN. For each data flow, you consider whether you are allowed to transfer the data over a regulatory or contractual boundary; you have to discuss how long you retain data and how you will delete it; you have the same user experience (UX) discussion about informed disclosure as in LINDDUN's unawareness, and finally, you determine whether you only transfer the minimum set of data that is technically required.

Problems with "acronym extensions"

  • You have to know what you're looking for
    • Example: "Linkability": You have to understand the concepts of anonymity and pseudonymity on information-theoretic level
  • You may miss the big picture (business level flaws)
  • If you don't know all your data assets and their metadata, the analysis fails
    • Don't skip the asset discovery phase!

Contextual Integrity

  • Specify context of use of personal data, and an "integrity promise"
  • Context: Actors, type of data, data transmission method, and (societal) norms
  • Evaluate: If the context or "promise" changes, trigger activities

Helen Nissenbaum's Contextual Integrity is described in her book Privacy in Context. I had a look, and for the purposes of this discussion, Shostack's summary in the Threat Modeling book probably suffices. The method ("Contextual Integrity Heuristic") could be shoehorned into data flow analysis, but I believe it would serve better as a repository of "approved" privacy contexts that business, legal, and security people have agreed on.

You could use the Contextual Integrity Heuristic as a "triage tool" to determine which new functionality would benefit from a more specific PIA. For example, if you do the same old stuff again and again, perhaps you don't need to trigger a PIA. If you break the integrity, then you will have to do a full PIA.

A good thing about this is that the discussion also includes the set of societal norms. This is beyond the bare technical and legal necessities, and takes privacy discussion into the user experience (UX) realm. (LINDDUN and TRIM also have one UX-related consideration, but fall short of considering ethical, political and societal norms.)

Problems with
Contextual Integrity

  • If used as a "context library", someone needs to maintain it
  • Evaluation still needs deep privacy area expertise

Potential solution?

  • Ensure you know your information assets
  • Convey privacy needs as "privacy user stories", PETs or acceptance criteria
    • Depending on how well you already know your upcoming features
  • Use Contextual Integrity Heuristic to triage new feature requests for PIA treatment
  • Use a data flow based threat modelling technique and standardise on a privacy add-on
    • TRIM - easy and quick
    • LINDDUN - more comprehensive
    • Classifications from Ian's book - when the area is new, and experts are present

Thank you

If you try any of this out, please let me know!

Twitter: @anttivs


Available online at