222
222
The underlying message format is::
224
MESSAGE := "bzr message 3 (bzr 1.6)" NEWLINE HEADERS MESSAGE_PARTS
224
MESSAGE := MAGIC NEWLINE HEADERS CONTENTS END_MESSAGE
225
MAGIC := "bzr message 3 (bzr 1.6)"
225
226
HEADERS := LENGTH_PREFIX bencoded_dict
226
MESSAGE_PARTS := MESSAGE_PART [MORE_MESSAGE_PARTS]
227
MORE_MESSAGE_PARTS := END_MESSAGE_PARTS | MESSAGE_PARTS
228
END_MESSAGE_PARTS := "e"
229
BODY := MESSAGE_PART+
230
230
MESSAGE_PART := ONE_BYTE | STRUCTURE | BYTES
231
231
ONE_BYTE := "o" byte
232
232
STRUCTURE := "s" LENGTH_PREFIX bencoded_structure
233
233
BYTES := "b" LENGTH_PREFIX bytes
235
(Where ``+`` indicates one or more.)
235
237
This format allows an arbitrary sequence of message parts to be encoded
238
in a single message. The contents of a MESSAGE have a higher-level
239
message, but knowing just this amount of data it's possible to
240
deserialize and consume a message, so that implementations can respond to
241
messages sent by later versions.
254
259
describes how such messages are encoded. All requests and responses
255
260
defined by earlier protocol versions must be encoded in this way.
257
Conventional requests will send a sequence of:
259
* Arguments (a STRUCTURE of a tuple)
263
* Single body (BYTES), or
265
* Streamed body (multiple BYTES parts), followed by a status (ONE_BYTE)
267
* if status is "E", followed by an Error (STRUCTURE)
269
Conventional responses will send a sequence of:
273
* Arguments (a STRUCTURE of a tuple)
277
* Single body (BYTES), or
279
* Streamed body (multiple BYTES parts), followed by a status (ONE_BYTE)
281
* if status is "E", followed by an Error (STRUCTURE)
283
In all cases, the ONE_BYTE status is either "S" for Success or "E" for
284
Error. Note that the streamed body from version two is now just multiple
262
Conventional requests will send a CONTENTS of ::
264
CONV_REQ := ARGS SINGLE_OR_STREAMED_BODY?
265
SINGLE_OR_STREAMED_BODY := BYTES
268
ARGS := STRUCTURE(argument_tuple)
269
TRAILER := SUCCESS_STATUS | ERROR
270
SUCCESS_STATUS := ONE_BYTE("S")
271
ERROR := ONE_BYTE("E") STRUCTURE(argument_tuple)
273
Conventional responses will send CONTENTS of ::
275
CONV_RESP := RESP_STATUS ARGS SINGLE_OR_STREAMED_BODY?
276
RESP_STATUS := ONE_BYTE("S") | ONE_BYTE("E")
278
If the RESP_STATUS is success ("S"), the arguments are the
279
method-dependent result.
281
For errors (where the Status byte of a response or a streamed body is
282
"E"), the situation is analagous to requests. The first item in the
283
encoded sequence must be a string of the error name. The other arguments
284
supply details about the error, and their number and types will depend on
285
the type of error (as identified by the error name).
287
Note that the streamed body from version two is now just multiple
290
The end of the request or response is indicated by the lower-level
291
END_MESSAGE. If there's only one BYTES element in the body, the TRAILER
292
may or may not be present, depending on whether it was sent as a single
293
chunk or as a stream that happens to have one element.
295
*(Discussion)* The success marker at the end of a streamed body seems
296
redundant; it doesn't have space for any arguments, and the end of the
297
body is marked anyhow by the end of the message. Recipients shouldn't
298
take any action on it, though they should map an error into raising an
301
1.10 clients don't assert that they get a status byte at the end of the
302
message. They will complain (in
303
``ConventionalResponseHandler.byte_part_received``) if they get an
304
initial success and then another byte part with no intervening bytes.
305
If we stop sending the final success message and only flag errors
306
they'll only get one if the error is detected after streaming starts but
307
before any bytes are actually sent. Possibly we should wait until at
308
least the first chunk is ready before declaring success.
287
310
For new methods, these sequences are just a convention and may be varied
288
311
if appropriate for a particular request or response. However, each
289
312
request should at least start with a STRUCTURE encoding the arguments
292
315
bencoded. As a result, unlike previous protocol versions, arguments in
293
316
this version are 8-bit clean.)
295
For errors (where the Status byte of a response or a streamed body is
296
"E"), the situation is analagous to requests. The first item in the
297
encoded sequence must be a string of the error name. The other arguments
298
supply details about the error, and their number and types will depend on
299
the type of error (as identified by the error name).
318
(Discussion) We're discussing having the byte segments be not just a
319
method for sending a stream across the network, but actually having them
320
be preserved in the rpc from end to end. This may be useful when
321
there's an iterator on one side feeding in to an iterator on the other,
322
if it avoids doing chunking and byte-counting at two levels, and if
323
those iterators are a natural place to get good granularity. Also, for
324
cases like ``insert_record_stream`` the server can't do much with the
325
data until it gets a whole chunk, and so it'll be natural and efficient
326
for it to be called with one chunk at a time.
328
On the other hand, there may be times when we've got some bytes from the
329
network but not a full chunk, and it might be worthwhile to pass it up.
330
If we promise to preserve chunks, then to do this we'd need two separate
331
streaming interfaces: "we got a chunk" and "we got some bytes but not
332
yet a full chunk". For ``insert_record_stream`` the second might not be
333
useful, but it might be good when writing to a file where any number of
334
bytes can be processed.
336
If we promise to preserve chunks, it'll tend to make some RPCs work only
337
in chunks, and others just on whole blocks, and we can't so easily
338
migrate RPCs from one to the other transparently to older
341
The data inside those chunks will be serialized anyhow, and possibly the
342
data inside them will already be able to be serialized apart without
343
understanding the chunks. Also, we might want to use these formats e.g.
344
for pack files or in bundles, and so they don't particularly need
345
lower-level chunking. So the current (unmerged, unstable) record stream
346
serialization turns each record into a bencoded tuple and it'd be
347
feasible to parse one tuple at a time from a byte stream that contains a
350
So we've decided that the chunks won't be semantic, and code should not
351
count on them being preserved from client to server.
356
*(Discussion)* It would be nice if the server could notify the client of
357
errors even before a streaming request has finished. This could cover
358
situtaions such as the server not understanding the request, it being
359
unable to open the requested location, or it finding that some of the
360
revisions being sent are not actually needed.
362
Especially in the last case, we'd like to be able to gracefully notice
363
the condition while the client is writing, and then have it adapt its
364
behaviour. In any case, we don't want to have drop and restart the
367
It should be possible for the client to finish its current chunk and
368
then its message, possibly with an error to cancel what's already been
371
This relies on the client being able to read back from the server while
372
it's writing. This is technically difficult for http but feasible over
375
We'd need a clean way to pass this back to the request method, even
376
though it's presumably in the middle of doing its body iterator.
377
Possibly the body iterator could be manually given a reference to the
378
request object, and it can poll it to see if there's a response.
380
Perhaps we need to distinguish error conditions, which should turn into
381
a client-side error regardless of the request code, from early success,
382
which should be handled only if the request code specifically wants to
385
Full-duplex operation
386
~~~~~~~~~~~~~~~~~~~~~
388
Code not geared to do pipelined requests, and this might require doing
389
asynchrony within bzrlib. We might want to either go fully pipelined
390
and asynchronous, but there might be a profitable middle ground.
392
The particular case where duplex communication would be good is in
393
working towards the common points in the graphs between the client and
394
server: we want to send speculatively, but detect as soon as they've
397
So we could for instance have a synchronous core, but rely on the OS
398
network buffering to allow us to work on batches of say 64kB. We can
399
also pipeline requests and responses, without allowing for them
400
happening out of order, or mixed requests happening at the same time.
402
Wonder how our network performance would have turned out now if we'd
403
done full-duplex from the start, and ignored hpss over http. We have
404
pretty good (readonly) http support just over dumb http, and that may be
405
better for many users.
412
On the client, the bzrlib code is "in charge": when it makes a request, or
413
asks from data from the network, that causes network IO. The server is
414
event driven: the network code tells the response handler when data has
415
been received, and it takes back a Response object from the request
416
handler that is then polled for body stream data.