14
14
than just that use case.
16
16
In particular, this document currently focuses almost exclusively on the
17
streaming case, and not the on-disk storage case.
17
streaming case, and not the on-disk storage case. It also does not
18
discuss the APIs used to manipulate containers and their records.
24
This describes just a basic layer for storing simple series of "records".
25
This layer has no intrinsic understanding of the contents of those
30
* a **container lead-in**, "``bzr pack format 1\n``",
31
* followed by one or more **records**.
35
* a 3 byte **kind marker**.
36
* 0 or more bytes of record content, depending on the record type.
44
An **End Marker** record:
46
* has a kind marker of "``EM\n``",
49
End Marker records signal the end of a container.
54
A **Full Text** record:
56
* has a kind marker of "``FT\n``",
57
* followed by one or more optional **name headers**:
58
"``name:`` *name*\ ``\n``", e.g.::
60
name: revision:pqm@pqm.ubuntu.com-20070531210833-8ptk86ocu822hjd5
62
* followed by a mandatory **content length header**:
63
"``length:`` *number*\ ``\n``", where *number* is in decimal, e.g::
67
* followed by an **end of headers** byte: "``\n``",
68
* followed by some **bytes**, exactly as many as specified by the length
71
So a Full Text record looks a bit like an RFC 822-formatted message.
84
Names should be UTF-8 encoded strings, with no whitespace. Names should
85
be unique within a single container, but no guarantee of uniqueness
86
outside of the container is made by this layer.
92
Some key aspects of this format are discussed in this section.
94
No length-prefixing of entire container
95
---------------------------------------
97
The overall container is not length prefixed. Instead there is an end
98
marker so that readers can determine when they have read the entire
99
container. This also does not conflict with the goal of allowing
102
Structured as a self-contained series of records
103
------------------------------------------------
105
The container contains a series of *records*. Each record is
106
self-delimiting. Record markers are lightweight. The overhead in terms
107
of bytes and processing for records in this container vs. the raw contents
108
of those records is minimal.
113
There is a requirement that each object can be given an arbitrary name.
114
Some revision control systems address all content by the SHA-1 digest of
115
that content, but this scheme is unsatisfactory for Bazaar's revision
116
objects. We can still allow addressing by SHA-1 digest for those content
117
types where it makes sense.
119
Some proposed object names:
121
* to name a revision: "``revision:``\ *revision-id*". e.g.,
122
`revision:pqm@pqm.ubuntu.com-20070531210833-8ptk86ocu822hjd5`.
123
* to name an inventory delta: "``inventory.delta:``\ *revision-id*". e.g.,
124
`inventory.delta:pqm@pqm.ubuntu.com-20070531210833-8ptk86ocu822hjd5`.
126
It seems likely that we may want to have multiple names for an object.
127
This format allows that (by allowing multiple ``name`` headers in a Full
130
Although records are in principle addressable by name, this specification
131
alone doesn't provide for efficient access to a particular record given
132
its name. It is intended that seperate indexes will be maintained to
27
To create a low-level file format which is suitable for solving the smart
28
server latency problem and whose layout and requirements are extendable in
29
future versions of Bazaar, and with no requirements that the smart server
36
A **container** is a streamable file that contains a series of
37
**records**. Records may have **names**, and consist of bytes.
195
98
* annotation cache
104
Some key aspects of the described format are discussed in this section.
106
No length-prefixing of entire container
107
---------------------------------------
109
The overall container is not length prefixed. Instead there is an end
110
marker so that readers can determine when they have read the entire
111
container. This also does not conflict with the goal of allowing
114
Structured as a self-contained series of records
115
------------------------------------------------
117
The container contains a series of *records*. Each record is
118
self-delimiting. Record markers are lightweight. The overhead in terms
119
of bytes and processing for records in this container vs. the raw contents
120
of those records is minimal.
125
There is a requirement that each object can be given an arbitrary name.
126
Some revision control systems address all content by the SHA-1 digest of
127
that content, but this scheme is unsatisfactory for Bazaar's revision
128
objects. We can still allow addressing by SHA-1 digest for those content
129
types where it makes sense.
131
Some proposed object names:
133
* to name a revision: "``revision:``\ *revision-id*". e.g.,
134
`revision:pqm@pqm.ubuntu.com-20070531210833-8ptk86ocu822hjd5`.
135
* to name an inventory delta: "``inventory.delta:``\ *revision-id*". e.g.,
136
`inventory.delta:pqm@pqm.ubuntu.com-20070531210833-8ptk86ocu822hjd5`.
138
It seems likely that we may want to have multiple names for an object.
139
This format allows that (by allowing multiple ``name`` headers in a Bytes
142
Although records are in principle addressable by name, this specification
143
alone doesn't provide for efficient access to a particular record given
144
its name. It is intended that seperate indexes will be maintained to
147
It is acceptable to have records with no explicit name, if the expected
148
use of them does not require them. For example:
150
* a record's content could be self-describing in the context of a
151
particular container, or
152
* a record could be accessed via an index based on SHA-1, or
153
* when streaming, the first record could be treated specially.
155
Reasonably cheap for small records
156
----------------------------------
158
The overhead for storing fairly short records (tens of bytes, rather than
159
thousands or millions) is minimal. The minimum overhead is 3 bytes plus
160
the length of the decimal representation of the *length* value (for a
161
record with no name).
167
This describes just a basic layer for storing simple series of "records".
168
This layer has no intrinsic understanding of the contents of those
173
* a **container lead-in**, "``bzr pack format 1\n``",
174
* followed by one or more **records**.
178
* a 1 byte **kind marker**.
179
* 0 or more bytes of record content, depending on the record type.
187
An **End Marker** record:
189
* has a kind marker of "``E``",
192
End Marker records signal the end of a container.
199
* has a kind marker of "``B``",
200
* followed by a mandatory **content length** [1]_:
201
"*number*\ ``\n``", where *number* is in decimal, e.g::
205
* followed by zero or more optional **names**:
206
"*name*\ ``\n``", e.g.::
208
revision:pqm@pqm.ubuntu.com-20070531210833-8ptk86ocu822hjd5
210
* followed by an **end of headers** byte: "``\n``",
211
* followed by some **bytes**, exactly as many as specified by the length
214
So a Bytes record is a series of lines encoding the length and names (if
215
any) followed by a body.
217
For example, this is a possible Bytes record (including the kind marker)::
223
abcdefghijklmnopqrstuvwxyz
229
Names should be UTF-8 encoded strings, with no whitespace. Names should
230
be unique within a single container, but no guarantee of uniqueness
231
outside of the container is made by this layer. Names need to be at least
235
.. [1] This requires that the writer of a record knows the full length of
236
the record up front, which typically means it will need to buffer an
237
entire record in memory. For the first version of this format this is
238
considered to be acceptable.
199
241
vim: ft=rst tw=74 ai