~bzr-pqm/bzr/bzr.dev : contents of doc/security.txt at revision 1180

~bzr-pqm/bzr/bzr.dev : (revision 1180)

*****************************
Security aspects of Bazaar-NG
*****************************


* Good security is required.

* Usability is required for good security.

    Being too strict "because it's the secure way" just means that people will
    disable you altogether, or start doing things that they know is wrong,
    because the right way of doing this may be secure, but [..] also very
    inconvenient. 

    -- Linus Torvalds

.. contents:

Requirements
============

David Wheeler gives some good requirements__:
 
    Problem is, the people who develop SCM tools often don't think about what kind of security requirements they need to support. This mini-paper describes briefly the kinds of security requirements an SCM tool should support. 

__ http://www.dwheeler.com/essays/scm-security.html

    confidentiality_
      Are only those who should be able to read information able to do so?

    integrity
      Are only those who should be able to write/change information able to do so? This includes not only limiting access rights for writing, but also protecting against repository corruption.

    availability
      Is the system available to those who need it? (I.E., is it resistant to denial-of-service attacks?)

    identification/authentication
      Does the system safely authenticate its users? If it uses tokens (like passwords), are they protected when stored and while being sent over a network, or are they exposed as cleartext?

    audit
      Are actions recorded?

    non-repudiation
      Can the system "prove" that a certain user/key did an action later?

    self-protection
      Does the system protect itself, and can its own
      data (like timestamps) be trusted?

    trusted paths
      Can the system make sure that its communication with users is
      protected?

Attacker categories
-------------------

* Unprivileged outsiders.

  (Almost always read-only, but people might want to allow them to
  write in some cases, e.g. for wikis.)

* Non-malicious developers with privilege.

* Malicious developers with privilege.

* Attackers who have stolen a privileged developer's identity.


Access control
--------------

Dan Nicolaescu gives these examples of access control:

  - security related code that is still emabargoed, only select few
    are allowed to see it, it is not desirable to release this
    information to the public because a fix is still being worked
    on. It would be nice to be able to have this kind of code under
    the same version control system used for normal development for
    ease of use and easy merging, yet it is crucial to restrict access
    to a branches, files or directories to certain people.

  - feature freeze before a release. It would be good if the release
    manager could disable writing to the release branch, so that the
    last tests are run, and not have someone commit stuff by mistake. 

  - documentation/translation writers don't need write access to the
    whole source code, just to the documentation directories. 

  - For proprietary companies restricting access is even more
    important, for example only some engineers should access the
    latest development version of some code in order to keep some
    trade secrets, etc, etc.

In Bazaar-NG, the basic unit of access control is the branch.  If
people are not supposed to read a branch, or know of its existence,
put it somewhere where they can't see it.  If people are allowed to
read from but not write to a branch then set those permissions.  The
code can later be merged into a public branch if desired with no loss
of function.

We largely rely on lower-level security measures controlling who can
get read or write access to a branch.  If you have a branch that
should be confidential, then put it on an appropriately-secured
machine, with only people in a particular group allowed to read it.

Not having separate repositories is probably a feature here -- unlike
Subversion, no features depend on having branches be in the same
repository.  Each repository can have different group ownership.
(The directories should usually be setgid.)  It also makes it easier
to see just what the access control is; there is only one object that
can meaningfully have an ACL.

The existence of a secret branch can be fairly well hidden from the
world.  When its changes are merged in, all that is visible is the
name, date, and branch name of the commit, not anything about the
location of the source branch.

The documentation case I would handle by having a separate
documentation branch, which could perhaps be checked out into a
subdirectory when it is required.  I think this is fairly common for
larger projects even in CVS.




Confidentiality
---------------

As dwheeler points out, this can be important even for open source
projects, such as when preparing a security patch.
Mechanisms that send email should have an option to encrypt the mail.

I can't think of anywhere encrypted archives would be useful.  If you
want to store it on an encrypted filesystem you can.  If you want to
store encrypted files you can do that too, though that will leak some
information in the metadata and branch structure.


Security in distributed systems
-------------------------------

If I have a branch on my laptop, the software ultimately cannot
prevent me doing anything to that branch -- physical access trumps
software controls.  We can, at most, try to prevent non-malicious
mistakes.

The purpose of the software here is to protect other people, whose
machines I do not control.  In particular, it should be hard for me to
lie to them; the software should detect any false statements.

In particular, these should be prevented:

 * Claiming to be someone else.

 * Attempting to rewrite history.


Revocation
----------

Suppose Alice's code-signing key is stolen by an attacker Charles.
Charles can sign changesets purporting to come from Alice.  

Alice needs to revoke that key; hopefully she has saved a copy of the
key elsewhere and can use that to revoke it.  Failing that she can
mail everyone and ask them to delete it.  This can propagate through
the usual GPG mechanism, which is very nice.

Alice also needs to make a new key and get it trusted.

This revocation does not distinguish between changesets genuinely
signed by Alice in the past, and changesets fraudulently signed by
Charles. 

What can Alice do now?  First of all, she needs to work out what
changesets signed by her key can still be trusted.  One good way to do
this is to check against another branch signed by Bob.  If Bob's key
is safe, we know his copy of Alice's changesets are OK and the full
tree at various points is OK.

Then:

 * Go through her old changesets, check that they're OK -- perhaps
   restore from a trusted backup.  Re-sign those changesets with a new
   key bound to the same email address.  Publish the new signatures
   instead.

   (This seems to indicate it is a good idea to bind signatures to
   changeset by author name/address rather than by key ID.)

 * Roll-up all previous development into a new tree, then sign that.
   This means there is no safe access to the previous individual
   changes, but in some cases it may be OK.

If a key is revoked at a particular time then perhaps we could still
trust commits made before that time.  I don't know if GPG revocations
can support that.


Old keys
--------

Keys also expire, rather than being revoked.  What does this mean?

Ideally we would check that the date when a changeset claims to have
been signed is within the validity period of the key.  This requires
more GPG integration than may at the moment be possible, but in theory
we can do it.

Also need to make sure that commits are in order by date, or at least
reasonably close to being in order (to allow for some clock skew).

One interesting case is when version is committed for which both the
public and private keys have been lost.  This will always be
untrusted, but that should not prevent people continuing to use the
archive if they can accept that.

This suggests that perhaps we should allow for multiple signatures on
a single revision.


Encumbrance attacks
-------------------

A special case where we need to be able to destroy history to avoid a
legal problem.  Allowed as discussed elsewhere: either destroy commits
from the tail backwards, or equivalently branch from a previous
revision and replace with that.

People who saw the original branch can still prove it happened; people
who look in the future will not see any record.  

Either way, probably requires physical branch access.


Multiple signature keys
-----------------------

Should we allow for several signatures on a single changeset?  What
would that mean?  How do we know what signatures are meaningful or
worthwhile?


Forensics
---------

dwheeler:

   [O]nce you find out who did a malicious act, the SCM should make it
   easy to identify all of their actions. In short, if you make it
   easy to catch someone, you increase the attackers' risk... and that
   means the attacker is less likely to do it.

dwheeler asks that the committer's IP address be recorded.  Putting
this in the changeset seems to cause too much of a
privacy/confidentiality problem.  However, an active server might
reasonably record the IPs of all clients.


Non-repudiation
---------------

If a changeset has propagated to Bob, signed by Alice's key, then Bob
can prove that someone possessing Alice's key signed it.  Alice's only
way out is to claim her key was stolen.


Trusted review
--------------

Can be handled by importing onto another branch.  Can have various
levels for "quickly checked", "deeply trusted", etc.

(Is it really necessary to import onto a new branch rather than add
anotations to existing branches?  Copying the whole text seems a bit
redundant.  This might be a nice place for arch-style taggings, where
we just add a reference to another branch.)


Hooks
-----

Automatically running hooks downloaded from someone else is
dangerous.  In particular, the user may not have the chance to check
the hooks are reasonable before they are run.

Conversely, users can subvert client-side hooks.  If we want to run a
check before accepting code onto a shared branch, that must run on the
server.

Enforcing server-side checks gives a good way to run build,
formatting, suspiciousness checks, etc.  This implies that write
access to a repository is through a mediating daemon rather than by
directly writing. 



Signing
-------

We use signing to prove that a particular person (or 'principal',
possibly a robot) committed a particular changeset.

It is the job of external signing software to help work out whether
this is true or not.  This has several parts:

 * Mathematical verification that a signature on a particular 
   changeset header document is correct

 * Determining that the signature corresponds to a particular public
   key

 * Determining that the public key corresponds to the person claimed
   to have authored the changeset (identified by email address.)

The second two are really PKI functions, and somewhat harder than the
first.

The canonical implementation is to use GPG/OpenPGP, but anything will
do.  There are simpler RSA/DSA implementations which assume each user
manually builds a list of trusted keys.

This leaves open the question of which people should be trusted to
provide software on a particular branch or at all.  This is not a very
easy question for software to answer.  We assume that people will know
by other means.  For public code, it may be that all changesets are
re-signed by say samba-team@samba.org.

I think it is fair to distinguish people by an email address, or at
least by $ID@$DOMAIN.  There is no need to have this actually receive
email, so spam need not be a problem.

The signing design is inspired by the very usable security afforded by
OpenSSH: it automatically protects where it can, and allows higher
security to users who want to do some work (by offline verification of
signatures).

Using a signing mechanism other than GPG when key developers already
have GPG and there is a big infrastructure to support it seems
undesirable.  It is true that GPG is quite complex.

The purpose of signing is to protect against unauthorized modification
of archives. 

Bazaar-NG can apply a GPG signature to both patches and manifests.  This
vallows a later proof that the revision and the changeset were produced
by the author they claim to have been written by.

We cannot cryptographically prove that a particular patch was merged
into a branch, because the person doing the merge might have subverted
the patch in the process of merging it.  All we can prove
cryptographically is that the merge committer asserts they took the
patch.

GPGME and PyMe seem to give a reasonable interface for doing this:
there is a function to check a signature, and the return indicates the
signing name, with possible errors including a missing key, etc.


Sign branches, not revisions
''''''''''''''''''''''''''''

Aaron Bentley suggested the interesting idea of signing the mapping of
revisions onto branches, rather than revisions themselves.  For
example a branch could contain just a signed pointer to the most
recent revision. 

(It probably is useful to be able to check signatures on previous
revisions, for example when recovering from an intrusion.)


Protocol attacks
----------------

Both client and server should be resistant to malicious changesets,
network requests, etc.  There's no easy solution.

* Defense in depth.  Check reasonablenes at various points.

* Disallow changesets that try to change files outside of the branch.


Availability
------------

bzr can be configured so as to have no single point of failure to a
denial-of-service attack (or at least nearly none):

* Can have any number of mirrors of a branch.

* If a central server is taken out, developers can continue working
  with state they already have (unbind their branches), and can
  collaborate by email or other means until the server is repaired or
  replaced.

* The origin branch can be on a machine whose location is secret and
  which is not directly publicly accessible.

* Branches can be moved between machines or IP addresses without
  disrupting anything else.

* Branches can be moved around out-of-band, as tarballs over
  bittorrent, etc.

I think the only possible denial of service attacks are those that aim
to shut down the entire network, or block communication with
individual developers, for example by flooding their email address.
But if those people can get connected through some other means, they
can continue.

6 by mbp at sourcefrog import all docs from arch	1	*****************************
	2	Security aspects of Bazaar-NG
	3	*****************************
	4
	5
	6	* Good security is required.
	7
	8	* Usability is required for good security.
	9
	10	Being too strict "because it's the secure way" just means that people will
	11	disable you altogether, or start doing things that they know is wrong,
	12	because the right way of doing this may be secure, but [..] also very
	13	inconvenient.
	14
	15	-- Linus Torvalds
	16
	17	.. contents:
	18
	19	Requirements
	20	============
	21
	22	David Wheeler gives some good requirements__:
	23
	24	Problem is, the people who develop SCM tools often don't think about what kind of security requirements they need to support. This mini-paper describes briefly the kinds of security requirements an SCM tool should support.
	25
	26	__ http://www.dwheeler.com/essays/scm-security.html
	27
	28	confidentiality_
	29	Are only those who should be able to read information able to do so?
	30
	31	integrity
	32	Are only those who should be able to write/change information able to do so? This includes not only limiting access rights for writing, but also protecting against repository corruption.
	33
	34	availability
	35	Is the system available to those who need it? (I.E., is it resistant to denial-of-service attacks?)
	36
	37	identification/authentication
	38	Does the system safely authenticate its users? If it uses tokens (like passwords), are they protected when stored and while being sent over a network, or are they exposed as cleartext?
	39
	40	audit
	41	Are actions recorded?
	42
	43	non-repudiation
	44	Can the system "prove" that a certain user/key did an action later?
	45
	46	self-protection
	47	Does the system protect itself, and can its own
	48	data (like timestamps) be trusted?
	49
	50	trusted paths
	51	Can the system make sure that its communication with users is
	52	protected?
	53
	54	Attacker categories
	55	-------------------
	56
	57	* Unprivileged outsiders.
	58
	59	(Almost always read-only, but people might want to allow them to
	60	write in some cases, e.g. for wikis.)
	61
	62	* Non-malicious developers with privilege.
	63
	64	* Malicious developers with privilege.
65
66	* Attackers who have stolen a privileged developer's identity.
67
68
69	Access control
70	--------------
71
72	Dan Nicolaescu gives these examples of access control:
73
74	- security related code that is still emabargoed, only select few
75	are allowed to see it, it is not desirable to release this
76	information to the public because a fix is still being worked
77	on. It would be nice to be able to have this kind of code under
78	the same version control system used for normal development for
79	ease of use and easy merging, yet it is crucial to restrict access
80	to a branches, files or directories to certain people.
81
82	- feature freeze before a release. It would be good if the release
83	manager could disable writing to the release branch, so that the
84	last tests are run, and not have someone commit stuff by mistake.
85
86	- documentation/translation writers don't need write access to the
87	whole source code, just to the documentation directories.
88
89	- For proprietary companies restricting access is even more
90	important, for example only some engineers should access the
91	latest development version of some code in order to keep some
92	trade secrets, etc, etc.
93
94	In Bazaar-NG, the basic unit of access control is the branch. If
95	people are not supposed to read a branch, or know of its existence,
96	put it somewhere where they can't see it. If people are allowed to
97	read from but not write to a branch then set those permissions. The
98	code can later be merged into a public branch if desired with no loss
99	of function.
100
101	We largely rely on lower-level security measures controlling who can
102	get read or write access to a branch. If you have a branch that
103	should be confidential, then put it on an appropriately-secured
104	machine, with only people in a particular group allowed to read it.
105
106	Not having separate repositories is probably a feature here -- unlike
107	Subversion, no features depend on having branches be in the same
108	repository. Each repository can have different group ownership.
109	(The directories should usually be setgid.) It also makes it easier
110	to see just what the access control is; there is only one object that
111	can meaningfully have an ACL.
112
113	The existence of a secret branch can be fairly well hidden from the
114	world. When its changes are merged in, all that is visible is the
115	name, date, and branch name of the commit, not anything about the
116	location of the source branch.
117
118	The documentation case I would handle by having a separate
119	documentation branch, which could perhaps be checked out into a
120	subdirectory when it is required. I think this is fairly common for
121	larger projects even in CVS.
122
123
124
125
126	Confidentiality
127	---------------
128
129	As dwheeler points out, this can be important even for open source
130	projects, such as when preparing a security patch.
131	Mechanisms that send email should have an option to encrypt the mail.
132
133	I can't think of anywhere encrypted archives would be useful. If you
134	want to store it on an encrypted filesystem you can. If you want to
135	store encrypted files you can do that too, though that will leak some
136	information in the metadata and branch structure.
137
138
139	Security in distributed systems
140	-------------------------------
141
142	If I have a branch on my laptop, the software ultimately cannot
143	prevent me doing anything to that branch -- physical access trumps
144	software controls. We can, at most, try to prevent non-malicious
145	mistakes.
146
147	The purpose of the software here is to protect other people, whose
148	machines I do not control. In particular, it should be hard for me to
149	lie to them; the software should detect any false statements.
150
151	In particular, these should be prevented:
152
153	* Claiming to be someone else.
154
155	* Attempting to rewrite history.
156
157
158	Revocation
159	----------
160
161	Suppose Alice's code-signing key is stolen by an attacker Charles.
162	Charles can sign changesets purporting to come from Alice.
163
164	Alice needs to revoke that key; hopefully she has saved a copy of the
165	key elsewhere and can use that to revoke it. Failing that she can
166	mail everyone and ask them to delete it. This can propagate through
167	the usual GPG mechanism, which is very nice.
168
169	Alice also needs to make a new key and get it trusted.
170
171	This revocation does not distinguish between changesets genuinely
172	signed by Alice in the past, and changesets fraudulently signed by
173	Charles.
174
175	What can Alice do now? First of all, she needs to work out what
176	changesets signed by her key can still be trusted. One good way to do
177	this is to check against another branch signed by Bob. If Bob's key
178	is safe, we know his copy of Alice's changesets are OK and the full
179	tree at various points is OK.
180
181	Then:
182
183	* Go through her old changesets, check that they're OK -- perhaps
184	restore from a trusted backup. Re-sign those changesets with a new
185	key bound to the same email address. Publish the new signatures
186	instead.
187
188	(This seems to indicate it is a good idea to bind signatures to
189	changeset by author name/address rather than by key ID.)
190
191	* Roll-up all previous development into a new tree, then sign that.
192	This means there is no safe access to the previous individual
193	changes, but in some cases it may be OK.
194
195	If a key is revoked at a particular time then perhaps we could still
196	trust commits made before that time. I don't know if GPG revocations
197	can support that.
198
199
200	Old keys
201	--------
202
203	Keys also expire, rather than being revoked. What does this mean?
204
205	Ideally we would check that the date when a changeset claims to have
206	been signed is within the validity period of the key. This requires
207	more GPG integration than may at the moment be possible, but in theory
208	we can do it.
209
210	Also need to make sure that commits are in order by date, or at least
211	reasonably close to being in order (to allow for some clock skew).
212
213	One interesting case is when version is committed for which both the
214	public and private keys have been lost. This will always be
215	untrusted, but that should not prevent people continuing to use the
216	archive if they can accept that.
217
218	This suggests that perhaps we should allow for multiple signatures on
219	a single revision.
220
221
222	Encumbrance attacks
223	-------------------
224
225	A special case where we need to be able to destroy history to avoid a
226	legal problem. Allowed as discussed elsewhere: either destroy commits
227	from the tail backwards, or equivalently branch from a previous
228	revision and replace with that.
229
230	People who saw the original branch can still prove it happened; people
231	who look in the future will not see any record.
232
233	Either way, probably requires physical branch access.
234
235
236	Multiple signature keys
237	-----------------------
238
239	Should we allow for several signatures on a single changeset? What
240	would that mean? How do we know what signatures are meaningful or
241	worthwhile?
242
243
244	Forensics
245	---------
246
247	dwheeler:
248
249	[O]nce you find out who did a malicious act, the SCM should make it
250	easy to identify all of their actions. In short, if you make it
251	easy to catch someone, you increase the attackers' risk... and that
252	means the attacker is less likely to do it.
253
254	dwheeler asks that the committer's IP address be recorded. Putting
255	this in the changeset seems to cause too much of a
256	privacy/confidentiality problem. However, an active server might
257	reasonably record the IPs of all clients.
258
259
260	Non-repudiation
261	---------------
262
263	If a changeset has propagated to Bob, signed by Alice's key, then Bob
264	can prove that someone possessing Alice's key signed it. Alice's only
265	way out is to claim her key was stolen.
266
267
268	Trusted review
269	--------------
270
271	Can be handled by importing onto another branch. Can have various
272	levels for "quickly checked", "deeply trusted", etc.
273
274	(Is it really necessary to import onto a new branch rather than add
275	anotations to existing branches? Copying the whole text seems a bit
276	redundant. This might be a nice place for arch-style taggings, where
277	we just add a reference to another branch.)
278
279
280	Hooks
281	-----
282
283	Automatically running hooks downloaded from someone else is
284	dangerous. In particular, the user may not have the chance to check
285	the hooks are reasonable before they are run.
286
287	Conversely, users can subvert client-side hooks. If we want to run a
288	check before accepting code onto a shared branch, that must run on the
289	server.
290
291	Enforcing server-side checks gives a good way to run build,
292	formatting, suspiciousness checks, etc. This implies that write
293	access to a repository is through a mediating daemon rather than by
294	directly writing.
295
296
297
298	Signing
299	-------
300
301	We use signing to prove that a particular person (or 'principal',
302	possibly a robot) committed a particular changeset.
303
304	It is the job of external signing software to help work out whether
305	this is true or not. This has several parts:
306
307	* Mathematical verification that a signature on a particular
308	changeset header document is correct
309
310	* Determining that the signature corresponds to a particular public
311	key
312
313	* Determining that the public key corresponds to the person claimed
314	to have authored the changeset (identified by email address.)
315
316	The second two are really PKI functions, and somewhat harder than the
317	first.
318
319	The canonical implementation is to use GPG/OpenPGP, but anything will
320	do. There are simpler RSA/DSA implementations which assume each user
321	manually builds a list of trusted keys.
322
323	This leaves open the question of which people should be trusted to
324	provide software on a particular branch or at all. This is not a very
325	easy question for software to answer. We assume that people will know
326	by other means. For public code, it may be that all changesets are
327	re-signed by say samba-team@samba.org.
328
329	I think it is fair to distinguish people by an email address, or at
330	least by $ID@$DOMAIN. There is no need to have this actually receive
331	email, so spam need not be a problem.
332
333	The signing design is inspired by the very usable security afforded by
334	OpenSSH: it automatically protects where it can, and allows higher
335	security to users who want to do some work (by offline verification of
336	signatures).
337
338	Using a signing mechanism other than GPG when key developers already
339	have GPG and there is a big infrastructure to support it seems
340	undesirable. It is true that GPG is quite complex.
341
342	The purpose of signing is to protect against unauthorized modification
343	of archives.
344
345	Bazaar-NG can apply a GPG signature to both patches and manifests. This
346	vallows a later proof that the revision and the changeset were produced
347	by the author they claim to have been written by.
348
349	We cannot cryptographically prove that a particular patch was merged
350	into a branch, because the person doing the merge might have subverted
351	the patch in the process of merging it. All we can prove
352	cryptographically is that the merge committer asserts they took the
353	patch.
354
355	GPGME and PyMe seem to give a reasonable interface for doing this:
356	there is a function to check a signature, and the return indicates the
357	signing name, with possible errors including a missing key, etc.
358
359
360	Sign branches, not revisions
361	''''''''''''''''''''''''''''
362
363	Aaron Bentley suggested the interesting idea of signing the mapping of
364	revisions onto branches, rather than revisions themselves. For
365	example a branch could contain just a signed pointer to the most
366	recent revision.
367
368	(It probably is useful to be able to check signatures on previous
369	revisions, for example when recovering from an intrusion.)
370
371
372	Protocol attacks
373	----------------
374
375	Both client and server should be resistant to malicious changesets,
376	network requests, etc. There's no easy solution.
377
378	* Defense in depth. Check reasonablenes at various points.
379
380	* Disallow changesets that try to change files outside of the branch.
381
382
383	Availability
384	------------
385
386	bzr can be configured so as to have no single point of failure to a
387	denial-of-service attack (or at least nearly none):
388
389	* Can have any number of mirrors of a branch.
390
391	* If a central server is taken out, developers can continue working
392	with state they already have (unbind their branches), and can
393	collaborate by email or other means until the server is repaired or
394	replaced.
395
396	* The origin branch can be on a machine whose location is secret and
397	which is not directly publicly accessible.
398
399	* Branches can be moved between machines or IP addresses without
400	disrupting anything else.
401
402	* Branches can be moved around out-of-band, as tarballs over
403	bittorrent, etc.
404
405	I think the only possible denial of service attacks are those that aim
406	to shut down the entire network, or block communication with
407	individual developers, for example by flooding their email address.
408	But if those people can get connected through some other means, they
409	can continue.