diff --git a/Sources/MaxMindDBReader/ControlByte.swift b/Sources/MaxMindDBReader/ControlByte.swift index a12e515..e8d5f7a 100644 --- a/Sources/MaxMindDBReader/ControlByte.swift +++ b/Sources/MaxMindDBReader/ControlByte.swift @@ -1,5 +1,83 @@ import Foundation +/** +# Control Byte +Primary source of truth: [MaxMindDB Spec'](https://maxmind.github.io/MaxMind-DB/) +## Data Field Format +Each field starts with a control byte. This control byte provides information +about the field's data type and payload size. + +The first three bits of the control byte tell you what type the field is. If +these bits are all 0, then this is an "extended" type, which means that the +*next* byte contains the actual type. Otherwise, the first three bits will +contain a number from 1 to 7, the actual type for the field. + +We've tried to assign the most commonly used types as numbers 1-7 as an +optimization. + +With an extended type, the type number in the second byte is the number +minus 7. In other words, an array (type 11) will be stored with a 0 for the +type in the first byte and a 4 in the second. + +Here is an example of how the control byte may combine with the next byte to +tell us the type: + + 001XXXXX pointer + 010XXXXX UTF-8 string + 110XXXXX unsigned 32-bit int (ASCII) + 000XXXXX 00000011 unsigned 128-bit int (binary) + 000XXXXX 00000100 array + 000XXXXX 00000110 end marker + +### Payload Size + +The next five bits in the control byte tell you how long the data field's +payload is, except for maps and pointers. Maps and pointers use this size +information a bit differently. See below. + +If the five bits are smaller than 29, then those bits are the payload size in +bytes. For example: + + 01000010 UTF-8 string - 2 bytes long + 01011100 UTF-8 string - 28 bytes long + 11000001 unsigned 32-bit int - 1 byte long + 00000011 00000011 unsigned 128-bit int - 3 bytes long + +If the five bits are equal to 29, 30, or 31, then use the following algorithm +to calculate the payload size. + +If the value is 29, then the size is 29 + *the next byte after the type +specifying bytes as an unsigned integer*. + +If the value is 30, then the size is 285 + *the next two bytes after the type +specifying bytes as a single unsigned integer*. + +If the value is 31, then the size is 65,821 + *the next three bytes after the +type specifying bytes as a single unsigned integer*. + +Some examples: + + 01011101 00110011 UTF-8 string - 80 bytes long + +In this case, the last five bits of the control byte equal 29. We treat the +next byte as an unsigned integer. The next byte is 51, so the total size is +(29 + 51) = 80. + + 01011110 00110011 00110011 UTF-8 string - 13,392 bytes long + +The last five bits of the control byte equal 30. We treat the next two bytes +as a single unsigned integer. The next two bytes equal 13,107, so the total +size is (285 + 13,107) = 13,392. + + 01011111 00110011 00110011 00110011 UTF-8 string - 3,421,264 bytes long + +The last five bits of the control byte equal 31. We treat the next three bytes +as a single unsigned integer. The next three bytes equal 3,355,443, so the +total size is (65,821 + 3,355,443) = 3,421,264. + +This means that the maximum payload size for a single field is 16,843,036 +bytes. + */ struct ControlByte { let type: DataType