4

Making Public Information Secret

 3 years ago
source link: https://rjlipton.wpcomstaging.com/2016/05/20/making-public-information-secret/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Making Public Information Secret

May 20, 2016

A way to recover and enforce privacy

Scott McNealy, when he was the CEO of Sun Microsystems, famously said nearly 15 years ago, “You have zero privacy anyway. Get over it.”

Today I want to talk about how to enforce privacy by changing what we mean by “privacy.”

We seem to see an unending series of breaks into databases. There is of course a huge amount of theory literature and methods for protecting privacy. Yet people are still broken into and lose their information. We wish to explore whether this can be fixed. We believe the key to the answer is to change the question:

Can we protect data that has been illegally obtained?

This sounds hopeless—how can we make data that has been broken into secure? The answer is that we need to look deeper into what it means to steal private data.

After The Horse Leaves The Barn

The expression “the horse has left the barn” means:

Closing/shutting the stable door after the horse has bolted, or trying to stop something bad happening when it has already happened and the situation cannot be changed.

Indeed, our source gives as its main example: “Improving security after a major theft would seem to be a bit like closing the stable door after the horse has bolted.”

Photo by artist John Lund via Blend Images, all rights reserved.

This strikes us as the nub of privacy. Once information is released on the Internet, whether by accident or by a break-in, there seems to be little that one can do. However, we believe that there may be hope to protect the information anyway. Somehow we believe we can shut the barn door after the horse has left, and get the horse back.

Suppose that some company makes a series of decisions. Can we detect if those decisions depend on information that they should not be using. Let’s call this Post-Privacy Detection.

Post-Privacy Detection

Consider a database that stores values {(v,a)} where {v} is an {n}-bit vector of attributes and {a} is a attribute. Think of {a} as small, even a single bit such as the sex of the individual with attributes {v}. Let us also suppose that the database is initially secure for {a} insofar as given many samples of the values of {v} only, it is impossible to gain advantage in inferring the values of {a}. Thus the leak of {a} is meaningful information.

Now say a decider is an entity {D} that uses information from this database to make decisions. {D} has one or more Boolean functions {f(v,a)} of the attributes. Think of {f(v,a)} as a yes/no on some issue: granting a loan, selling a house, giving insurance at a certain rate, and so on. The idea is that while {a} may not be secret—the database has been broken into—we can check that in aggregate that {a} is effectively secret.

The point here is that we can detect if {a} is being used in an unauthorized manner to make some decision, given protocols for transparency that enable sampling the values {(v,f(v,a))}. If given a polynomial number of samples we cannot tell {a}‘s within {\epsilon} then we have large-scale assurance that {a} was not material to the decision. Our point is this: a leak of values about individuals is material only if they are used by someone to make a decision that should not depend on their “private” information. Thus if a bank gets values of {a}, but does not use them to make a decision, then we would argue that that information while public was effectively private.

Definition 1 Let a database contain values of the form {(v,a)}, and let {f(v,a)} be a Boolean function. Say that the {a} part is effectively private for the decision {f} provided there is another function {g(v)} so that

\displaystyle  \mathsf{Prob}[f(v,a)=g(v)] \ge 1-\epsilon,

where {\epsilon>0}. A decider {D}respects {a} if {a} is effectively private in all of its decision functions.

We can prove a simple lemma showing that this definition implies that {a} is not compromised by sampling the decision values.

Lemma 2 If the database is secure for {a} and {a} is effectively private, then there is no function {h} such that {\mathsf{Prob}[h(v,f(v,a)) = a] \ge 1-\epsilon}.

Proof: Suppose for contradiction such an {h} exists. Also suppose for avoiding contradiction of effective privacy that a function {g(v)} as above exists. Then given {v}, we obtain {u = f(v,a)} with probability {1 - \epsilon}. Then using {h} we obtain {h(v,u) = a} with overall probability at least {1 - 2\epsilon}. This contradicts the initial security of the database for {a}. \Box

Pulling in the Long Reins

To be socially effective, our detection concept should exert influence on deciders to behave in a manner that overtly does not depend on the unauthorized information. This applies to repeatable decisions whose results can be sampled. The sampling would use protocols that effect transparency while likewise protecting the data.

Thus our theoretical notion would require social suasion for its effectiveness. This includes requiring deciders to provide infrastructure by which their decisions can be securely sampled. It might not require them to publish their {A}-oblivious decision functions {g}, only that they could—if challenged—provide one. Most of this is to ponder for the future.

What we can say now, however, is that there do exist ways we can rein in the bad effects of lost privacy. The horses may have bolted, but we can still exert some long-range control over the herd.

Open Problems

Is this idea effective? What things like it have been proposed?

Like this:

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK