Often enough during python programming, you need to read a file, or perform actions on a file. Before doing so, it is often necessary to find out what type of file it is you are dealing with - an mp3, a text file, an image file - whatever.
Using the command 'file' to determine the file typeLinux comes with a wickedly good utility called 'file'. It is fast and can handle lots of different file types. Using it is as simple as:
cf@opensuse:~/bin/python/playground> file myjpeg.jpg
myjpeg.jpg: JPEG image data, EXIF standard
Try to get the filetype by looking at the file extensionOne way of doing this to look at the file's extension, though this sucks, for a number of reasons (particularly on linux/unix). To do so, you would take the filename as a string and use a regular expression to try to get the part of the string before the first dot (e.g. myfile.tar.gz or myfile.mp3). This might not work for some filenames - I'm thinking of files with version numbers etc - something like myfile-1.0.4-1.src.rpm. However, python brings an inbuilt helper called mimetypes, which could make life a bit easier.
Try to get the filetype using the inbuilt mimetypesIf we have a JPEG image file called myjpeg.jpg, we can try to read in information about the file type using mimetypes. Consider the following python code (after starting the interpreter):
>>> import os,mimetypes
>>> mimetypes.guess_type(os.path.join(os.getcwd(),"myjpeg.jpg"))
('image/jpeg', None)
As you can see, mimetypes was clever enough to see that the file is a jpeg file. The second entry in the tuple returned by mimetypes.guess_type would normally be the encoding (if the guess_type function can actually determine the encoding). Compared to the output of the 'file' command shown above, this is pretty feeble, but apparently, it is possible to get better results by using some of the mimetypes other functions to map different encodings. Try using the following for more information:
pydoc mimetypes
Using python-magic as an alternative to mimetypesAs mentioned above, linux has a really good utility for determining file type and other useful information from files - the '
file' utility. python-magic is a kind of python interface to the file utility - it brings its own shared library 'magic.so' and thus provides more information that mimetypes (though I emphasise that I haven't spent too much time on mimetypes - it's probably better than I'm describing. Anyhow, consider the following code:
#!/usr/bin/env python
import magic,os
jpg = os.path.join(os.getcwd(),"myjpeg.jpg")
ms = magic.open(magic.MAGIC_NONE)
ms.load()
type = ms.file(jpg)
print type
f = file(jpg,"r")
buffer = f.read(4096)
f.close()
type = ms.buffer(buffer)
print type
ms.close()
This code outputted the following (when used on the same 'myjpeg.jpg' file as 'file' above:
cf@opensuse:~/bin/python/playground> ./testmagic.py
JPEG image data, EXIF standard
JPEG image data, EXIF standard
The information is more or less the same as provided by 'file'. The major problem with python-file is actually finding and installing the module. There are some
Ubuntu and
Debian packages available, but I was looking for an
openSUSE package. After searching for ages, I decided to package it myself, from the Ubuntu sources. You can download the package from my
openSUSE Build Service project. If you are using openSUSE, you can use the 1-Click Install button below.