- pack TEMPLATE,LIST
Takes a LIST of values and converts it into a string using the rules given by the TEMPLATE. The resulting string is the concatenation of the converted values. Typically, each converted value looks like its machine-level representation. For example, on 32-bit machines an integer may be represented by a sequence of 4 bytes that will be converted to a sequence of 4 characters.
The TEMPLATE is a sequence of characters that give the order and type of values, as follows:
a A string with arbitrary binary data, will be null padded. A A text (ASCII) string, will be space padded. Z A null terminated (ASCIZ) string, will be null padded.
b A bit string (ascending bit order inside each byte, like vec()). B A bit string (descending bit order inside each byte). h A hex string (low nybble first). H A hex string (high nybble first).
c A signed char (8-bit) value. C An unsigned char (octet) value. W An unsigned char value (can be greater than 255).
s A signed short (16-bit) value. S An unsigned short value.
l A signed long (32-bit) value. L An unsigned long value.
q A signed quad (64-bit) value. Q An unsigned quad value. (Quads are available only if your system supports 64-bit integer values _and_ if Perl has been compiled to support those. Causes a fatal error otherwise.)
i A signed integer value. I A unsigned integer value. (This 'integer' is _at_least_ 32 bits wide. Its exact size depends on what a local C compiler calls 'int'.)
n An unsigned short (16-bit) in "network" (big-endian) order. N An unsigned long (32-bit) in "network" (big-endian) order. v An unsigned short (16-bit) in "VAX" (little-endian) order. V An unsigned long (32-bit) in "VAX" (little-endian) order.
j A Perl internal signed integer value (IV). J A Perl internal unsigned integer value (UV).
f A single-precision float in the native format. d A double-precision float in the native format.
F A Perl internal floating point value (NV) in the native format D A long double-precision float in the native format. (Long doubles are available only if your system supports long double values _and_ if Perl has been compiled to support those. Causes a fatal error otherwise.)
p A pointer to a null-terminated string. P A pointer to a structure (fixed-length string).
u A uuencoded string. U A Unicode character number. Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode.
w A BER compressed integer (not an ASN.1 BER, see perlpacktut for details). Its bytes represent an unsigned integer in base 128, most significant digit first, with as few digits as possible. Bit eight (the high bit) is set on each byte except the last.
x A null byte. X Back up a byte. @ Null fill or truncate to absolute position, counted from the start of the innermost ()-group. . Null fill or truncate to absolute position specified by value. ( Start of a ()-group.
One or more of the modifiers below may optionally follow some letters in the TEMPLATE (the second column lists the letters for which the modifier is valid):
! sSlLiI Forces native (short, long, int) sizes instead of fixed (16-/32-bit) sizes.
xX Make x and X act as alignment commands.
nNvV Treat integers as signed instead of unsigned.
@. Specify position as byte offset in the internal representation of the packed string. Efficient but dangerous.
> sSiIlLqQ Force big-endian byte-order on the type. jJfFdDpP (The "big end" touches the construct.)
< sSiIlLqQ Force little-endian byte-order on the type. jJfFdDpP (The "little end" touches the construct.)
The
>
and<
modifiers can also be used on()
-groups, in which case they force a certain byte-order on all components of that group, including subgroups.The following rules apply:
-
Each letter may optionally be followed by a number giving a repeat count. With all types except
a
,A
,Z
,b
,B
,h
,H
,@
,.
,x
,X
andP
the pack function will gobble up that many values from the LIST. A*
for the repeat count means to use however many items are left, except for@
,x
,X
, where it is equivalent to0
, for <.> where it means relative to string start andu
, where it is equivalent to 1 (or 45, which is the same). A numeric repeat count may optionally be enclosed in brackets, as inpack 'C[80]', @arr
.One can replace the numeric repeat count by a template enclosed in brackets; then the packed length of this template in bytes is used as a count. For example,
x[L]
skips a long (it skips the number of bytes in a long); the template$t X[$t] $t
unpack()s twice what $t unpacks. If the template in brackets contains alignment commands (such asx![d]
), its packed length is calculated as if the start of the template has the maximal possible alignment.When used with
Z
,*
results in the addition of a trailing null byte (so the packed result will be one longer than the bytelength
of the item).When used with
@
, the repeat count represents an offset from the start of the innermost () group.When used with
.
, the repeat count is used to determine the starting position from where the value offset is calculated. If the repeat count is 0, it's relative to the current position. If the repeat count is*
, the offset is relative to the start of the packed string. And if its an integern
the offset is relative to the start of the n-th innermost () group (or the start of the string ifn
is bigger then the group level).The repeat count for
u
is interpreted as the maximal number of bytes to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat count should not be more than 65. -
The
a
,A
, andZ
types gobble just one value, but pack it as a string of length count, padding with nulls or spaces as necessary. When unpacking,A
strips trailing whitespace and nulls,Z
strips everything after the first null, anda
returns data verbatim.If the value-to-pack is too long, it is truncated. If too long and an explicit count is provided,
Z
packs only$count-1
bytes, followed by a null byte. ThusZ
always packs a trailing null (except when the count is 0). -
Likewise, the
b
andB
fields pack a string that many bits long. Each character of the input field of pack() generates 1 bit of the result. Each result bit is based on the least-significant bit of the corresponding input character, i.e., onord($char)%2
. In particular, characters"0"
and"1"
generate bits 0 and 1, as do characters"\0"
and"\1"
.Starting from the beginning of the input string of pack(), each 8-tuple of characters is converted to 1 character of output. With format
b
the first character of the 8-tuple determines the least-significant bit of a character, and with formatB
it determines the most-significant bit of a character.If the length of the input string is not exactly divisible by 8, the remainder is packed as if the input string were padded by null characters at the end. Similarly, during unpack()ing the "extra" bits are ignored.
If the input string of pack() is longer than needed, extra characters are ignored. A
*
for the repeat count of pack() means to use all the characters of the input field. On unpack()ing the bits are converted to a string of"0"
s and"1"
s. -
The
h
andH
fields pack a string that many nybbles (4-bit groups, representable as hexadecimal digits, 0-9a-f) long.Each character of the input field of pack() generates 4 bits of the result. For non-alphabetical characters the result is based on the 4 least-significant bits of the input character, i.e., on
ord($char)%16
. In particular, characters"0"
and"1"
generate nybbles 0 and 1, as do bytes"\0"
and"\1"
. For characters"a".."f"
and"A".."F"
the result is compatible with the usual hexadecimal digits, so that"a"
and"A"
both generate the nybble0xa==10
. The result for characters"g".."z"
and"G".."Z"
is not well-defined.Starting from the beginning of the input string of pack(), each pair of characters is converted to 1 character of output. With format
h
the first character of the pair determines the least-significant nybble of the output character, and with formatH
it determines the most-significant nybble.If the length of the input string is not even, it behaves as if padded by a null character at the end. Similarly, during unpack()ing the "extra" nybbles are ignored.
If the input string of pack() is longer than needed, extra characters are ignored. A
*
for the repeat count of pack() means to use all the characters of the input field. On unpack()ing the nybbles are converted to a string of hexadecimal digits. -
The
p
type packs a pointer to a null-terminated string. You are responsible for ensuring the string is not a temporary value (which can potentially get deallocated before you get around to using the packed result). TheP
type packs a pointer to a structure of the size indicated by the length. A NULL pointer is created if the corresponding value forp
orP
isundef
, similarly for unpack().If your system has a strange pointer size (i.e. a pointer is neither as big as an int nor as big as a long), it may not be possible to pack or unpack pointers in big- or little-endian byte order. Attempting to do so will result in a fatal error.
-
The
/
template character allows packing and unpacking of a sequence of items where the packed structure contains a packed item count followed by the packed items themselves.For
pack
you write length-item/
sequence-item and the length-item describes how the length value is packed. The ones likely to be of most use are integer-packing ones liken
(for Java strings),w
(for ASN.1 or SNMP) andN
(for Sun XDR).For
pack
, the sequence-item may have a repeat count, in which case the minimum of that and the number of available items is used as argument for the length-item. If it has no repeat count or uses a '*', the number of available items is used.For
unpack
an internal stack of integer arguments unpacked so far is used. You write/
sequence-item and the repeat count is obtained by popping off the last element from the stack. The sequence-item must not have a repeat count.If the sequence-item refers to a string type (
"A"
,"a"
or"Z"
), the length-item is a string length, not a number of strings. If there is an explicit repeat count for pack, the packed string will be adjusted to that given length.unpack 'W/a', "\04Gurusamy"; gives ('Guru') unpack 'a3/A A*', '007 Bond J '; gives (' Bond', 'J') unpack 'a3 x2 /A A*', '007: Bond, J.'; gives ('Bond, J', '.') pack 'n/a* w/a','hello,','world'; gives "\000\006hello,\005world" pack 'a/W2', ord('a') .. ord('z'); gives '2ab'
The length-item is not returned explicitly from
unpack
.Adding a count to the length-item letter is unlikely to do anything useful, unless that letter is
A
,a
orZ
. Packing with a length-item ofa
orZ
may introduce"\000"
characters, which Perl does not regard as legal in numeric strings. -
The integer types
s
,S
,l
, andL
may be followed by a!
modifier to signify native shorts or longs--as you can see from above for example a barel
does mean exactly 32 bits, the nativelong
(as seen by the local C compiler) may be larger. This is an issue mainly in 64-bit platforms. You can see whether using!
makes any difference byprint length(pack("s")), " ", length(pack("s!")), "\n"; print length(pack("l")), " ", length(pack("l!")), "\n";
i!
andI!
also work but only because of completeness; they are identical toi
andI
.The actual sizes (in bytes) of native shorts, ints, longs, and long longs on the platform where Perl was built are also available via Config:
use Config; print $Config{shortsize}, "\n"; print $Config{intsize}, "\n"; print $Config{longsize}, "\n"; print $Config{longlongsize}, "\n";
(The
$Config{longlongsize}
will be undefined if your system does not support long longs.) -
The integer formats
s
,S
,i
,I
,l
,L
,j
, andJ
are inherently non-portable between processors and operating systems because they obey the native byteorder and endianness. For example a 4-byte integer 0x12345678 (305419896 decimal) would be ordered natively (arranged in and handled by the CPU registers) into bytes as0x12 0x34 0x56 0x78 # big-endian 0x78 0x56 0x34 0x12 # little-endian
Basically, the Intel and VAX CPUs are little-endian, while everybody else, for example Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray are big-endian. Alpha and MIPS can be either: Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses them in big-endian mode.
The names `big-endian' and `little-endian' are comic references to the classic "Gulliver's Travels" (via the paper "On Holy Wars and a Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980) and the egg-eating habits of the Lilliputians.
Some systems may have even weirder byte orders such as
0x56 0x78 0x12 0x34 0x34 0x12 0x78 0x56
You can see your system's preference with
print join(" ", map { sprintf "%#02x", $_ } unpack("W*",pack("L",0x12345678))), "\n";
The byteorder on the platform where Perl was built is also available via Config:
use Config; print $Config{byteorder}, "\n";
Byteorders
'1234'
and'12345678'
are little-endian,'4321'
and'87654321'
are big-endian.If you want portable packed integers you can either use the formats
n
,N
,v
, andV
, or you can use the>
and<
modifiers. These modifiers are only available as of perl 5.9.2. See also perlport. -
All integer and floating point formats as well as
p
andP
and()
-groups may be followed by the>
or<
modifiers to force big- or little- endian byte-order, respectively. This is especially useful, sincen
,N
,v
andV
don't cover signed integers, 64-bit integers and floating point values. However, there are some things to keep in mind.Exchanging signed integers between different platforms only works if all platforms store them in the same format. Most platforms store signed integers in two's complement, so usually this is not an issue.
The
>
or<
modifiers can only be used on floating point formats on big- or little-endian machines. Otherwise, attempting to do so will result in a fatal error.Forcing big- or little-endian byte-order on floating point values for data exchange can only work if all platforms are using the same binary representation (e.g. IEEE floating point format). Even if all platforms are using IEEE, there may be subtle differences. Being able to use
>
or<
on floating point values can be very useful, but also very dangerous if you don't know exactly what you're doing. It is definitely not a general way to portably store floating point values.When using
>
or<
on an()
-group, this will affect all types inside the group that accept the byte-order modifiers, including all subgroups. It will silently be ignored for all other types. You are not allowed to override the byte-order within a group that already has a byte-order modifier suffix. -
Real numbers (floats and doubles) are in the native machine format only; due to the multiplicity of floating formats around, and the lack of a standard "network" representation, no facility for interchange has been made. This means that packed floating point data written on one machine may not be readable on another - even if both use IEEE floating point arithmetic (as the endian-ness of the memory representation is not part of the IEEE spec). See also perlport.
If you know exactly what you're doing, you can use the
>
or<
modifiers to force big- or little-endian byte-order on floating point values.Note that Perl uses doubles (or long doubles, if configured) internally for all numeric calculation, and converting from double into float and thence back to double again will lose precision (i.e.,
unpack("f", pack("f", $foo)
) will not in general equal $foo). -
Pack and unpack can operate in two modes, character mode (
C0
mode) where the packed string is processed per character and UTF-8 mode (U0
mode) where the packed string is processed in its UTF-8-encoded Unicode form on a byte by byte basis. Character mode is the default unless the format string starts with anU
. You can switch mode at any moment with an explicitC0
orU0
in the format. A mode is in effect until the next mode switch or until the end of the ()-group in which it was entered. -
You must yourself do any alignment or padding by inserting for example enough
'x'
es while packing. There is no way to pack() and unpack() could know where the characters are going to or coming from. Thereforepack
(andunpack
) handle their output and input as flat sequences of characters. -
A ()-group is a sub-TEMPLATE enclosed in parentheses. A group may take a repeat count, both as postfix, and for unpack() also via the
/
template character. Within each repetition of a group, positioning with@
starts again at 0. Therefore, the result ofpack( '@1A((@2A)@3A)', 'a', 'b', 'c' )
is the string "\0a\0\0bc".
-
x
andX
accept!
modifier. In this case they act as alignment commands: they jump forward/back to the closest position aligned at a multiple ofcount
characters. For example, to pack() or unpack() C'sstruct {char c; double d; char cc[2]}
one may need to use the templateW x![d] d W[2]
; this assumes that doubles must be aligned on the double's size.For alignment commands
count
of 0 is equivalent tocount
of 1; both result in no-ops. -
n
,N
,v
andV
accept the!
modifier. In this case they will represent signed 16-/32-bit integers in big-/little-endian order. This is only portable if all platforms sharing the packed data use the same binary representation for signed integers (e.g. all platforms are using two's complement representation). -
A comment in a TEMPLATE starts with
#
and goes to the end of line. White space may be used to separate pack codes from each other, but modifiers and a repeat count must follow immediately. -
If TEMPLATE requires more arguments to pack() than actually given, pack() assumes additional
""
arguments. If TEMPLATE requires fewer arguments to pack() than actually given, extra arguments are ignored.
Examples:
$foo = pack("WWWW",65,66,67,68); # foo eq "ABCD" $foo = pack("W4",65,66,67,68); # same thing $foo = pack("W4",0x24b6,0x24b7,0x24b8,0x24b9); # same thing with Unicode circled letters. $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9); # same thing with Unicode circled letters. You don't get the UTF-8 # bytes because the U at the start of the format caused a switch to # U0-mode, so the UTF-8 bytes get joined into characters $foo = pack("C0U4",0x24b6,0x24b7,0x24b8,0x24b9); # foo eq "\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9" # This is the UTF-8 encoding of the string in the previous example
$foo = pack("ccxxcc",65,66,67,68); # foo eq "AB\0\0CD"
# note: the above examples featuring "W" and "c" are true # only on ASCII and ASCII-derived systems such as ISO Latin 1 # and UTF-8. In EBCDIC the first example would be # $foo = pack("WWWW",193,194,195,196);
$foo = pack("s2",1,2); # "\1\0\2\0" on little-endian # "\0\1\0\2" on big-endian
$foo = pack("a4","abcd","x","y","z"); # "abcd"
$foo = pack("aaaa","abcd","x","y","z"); # "axyz"
$foo = pack("a14","abcdefg"); # "abcdefg\0\0\0\0\0\0\0"
$foo = pack("i9pl", gmtime); # a real struct tm (on my system anyway)
$utmp_template = "Z8 Z8 Z16 L"; $utmp = pack($utmp_template, @utmp1); # a struct utmp (BSDish)
@utmp2 = unpack($utmp_template, $utmp); # "@utmp1" eq "@utmp2"
sub bintodec { unpack("N", pack("B32", substr("0" x 32 . shift, -32))); }
$foo = pack('sx2l', 12, 34); # short 12, two zero bytes padding, long 34 $bar = pack('s@4l', 12, 34); # short 12, zero fill to position 4, long 34 # $foo eq $bar $baz = pack('s.l', 12, 4, 34); # short 12, zero fill to position 4, long 34
$foo = pack('nN', 42, 4711); # pack big-endian 16- and 32-bit unsigned integers $foo = pack('S>L>', 42, 4711); # exactly the same $foo = pack('s<l<', -42, 4711); # pack little-endian 16- and 32-bit signed integers $foo = pack('(sl)<', -42, 4711); # exactly the same
The same template may generally also be used in unpack().
-