Python’s struct.unpack |
2009-04-14
|
Is it just me, or does struct.unpack seem mostly useless to unpack any real-world binary data ? Real-world binary data usually has a fixed amount of bytes for each value to read out of a binary file. struct.unpack does not allow you to specify how many bytes to read for each value to unpack; rather, it has format specifiers that map to the C type declarations.
The only way I found that works to read, for example, sets of 32 bit signed integers, is to read them with one call per variable, and slicing the data, like so:
self.trackCount = struct.unpack("B", data[0])[0]
self.discId1 = "%08x" % struct.unpack("
Surely there's an obvious better way to do this ?
NOTE: for some reason WordPress does not allow me to have the first <L format string to show up as just that; instead it corrects it to '<I ' (different letter, and a space). FML.
You can use struct.unpack_from that take an offset, and only require that len(buffer[offset:]) >= struct.calcsize(fmt).
Your sample would thus be rewritten as:
self.trackCount = struct.unpack_from(“B”, data, 0)[0]
self.discId1 = “%08x” % struct.unpack_from(“<l “, data, 1)[0]
self.discId2 = “%08x” % struct.unpack_from(“<L”, data, 5)[0]
Comment by sdefresne — 2009-04-14 @ 10:35
What is wrong with
(self.trackCount, self.discId1, self.diskId2) = struct.unpack(â€<BlLâ€, data[:9])
Comment by Guillaume — 2009-04-14 @ 11:09
@Guillaume: what guarantees that struct.unpack reads an ‘L’ as a 32-bit integer, and not a 64-bit integer ?
Comment by Thomas — 2009-04-14 @ 11:17
See http://python.org/doc/current/library/struct.html might help, in particular, the paragraphs explaining what “standard size” means for int and long fields, plus the table indicating that “<” forces the standard size interpretation.
Comment by Marius Gedminas — 2009-04-14 @ 11:30
@Thomas: because the documentation says L is a 32bits integer ?
Comment by tahorg — 2009-04-14 @ 11:36
@Thomas: same than tahorg here. Please notice the difference between the “native” and “standard” mode.
Comment by Guillaume — 2009-04-14 @ 13:12
If you have multiple or many values to unpack, the question is: what sort of data structure do you want to unpack these into? In most cases, I doubt you want the flat tuple that unpack provides. You probably want to unpack the data into an array structure. Both the stdlib array module and numpy permit this (using the same type-codes as struct).
Comment by BC — 2009-04-14 @ 15:58
It gets successively more and more messy, if you have structs inside of structs. Really, it would be great if there was some sort of grouping inside the struct format.
Comment by Edward Z. Yang — 2009-04-14 @ 16:41
Only a learner here, but I’ve been looking at diveintopython.org (I think that’s the site) and this looks similar to the MP3 ID3v1 tag example on there…
Comment by TGM — 2009-04-14 @ 16:46
self.trackCount, self.discId1, self.discId2 = struct.unpack(“<BLL”, data)
self.discId1 = “%08x” % self.discId1
self.discId2 = “%08x” % self.discId2
Comment by ΤΖΩΤΖΙΟΥ — 2009-04-14 @ 18:48
One other benefit of struct.unpack_from is that it doesn’t require any string copying to extract data from part of a larger byte string.
If you’re stuck on Python 2.4, you can emulate it with something like:
def unpack_from(fmt, buf, offset=0):
slice = buffer(buf, offset, struct.calcsize(fmt))
return struct.unpack(fmt, slice)
And as others have said, it will be more efficient if you unpack as much data as you can in one go rather than lots of little calls.
Comment by James Henstridge — 2009-04-15 @ 03:54
The Hachoir core library might be a useful alternative to the struct module.
Comment by Jonathan — 2009-04-24 @ 12:01
That’s http://hachoir.org/
Comment by Jonathan — 2009-04-24 @ 12:05
You can have even more fun if you construct the unpack string on the fly…
Comment by Gadget Steve — 2010-06-01 @ 15:17
Complete Python code example for implementing struct unpack is posted at following reference.
>> magic = struct.unpack(“<14b",test)
refer http://www.fullchipdesign.com/read_binary_rb_bin_struct_unpack.htm
Comment by Atul — 2010-08-31 @ 02:30