Samstag, 19. April 2008



openSUSE 11.0 Beta1 is out since 2008-04-18 - you can grab a live CD version for KDE or GNOME over at http://en.opensuse.org/Development_Version#Downloads - see also: http://news.opensuse.org/

Probably the best feature of 11.0 is the vastly improved speed performance of package management, thanks mainly to changing the libzypp package. `zypper install package` or `zypper search package` is rocket fast compared to openSUSE 10.3.

As always, KDE users get a polished KDE4 or KDE3 desktop.

The latest 2.6.25 kernel makes life a bit easier for many wlan card users, as the latest rt2xx drivers (along with many other wlan drivers) have been directly integrated into the kernel - no more messing about with compiling kernel modules.

Montag, 11. Februar 2008

Changed the location of pyocrhelper project


Because of my somewhat stupid idea to call pyocrhelper python-ocr-helper in my first attempt to use Google's code hosting, I started running into trouble with naming scripts and tarballs etc. It was getting quite confusing, so I figured I'd start of openSUSE Hack Week 2 with a clean sheet and replace the old project with a new one, as well as taking the opportunity to clean up the code a bit and iron out some nasty bugs which I didn't have time to look at up until now.

So, now the deed is done. The new, old project is now up at http://code.google.com/p/pyocrhelper/ and is looking good. The checkin rate is quite impressive for a Monday morning, even if it is only me contributing. I'm actually fairly happy with the script at the moment - there's still a lot of work to be done - features to be added etc, and I'll probably break everything a couple of times.

Anyway, make your own impression of the script and post any bugs or feature requests you might have - this week is a good week to fix stuff.

Freitag, 8. Februar 2008

amarok 1.4.8 can't submit tracks to last.fm

In the last couple of days/weeks/months, amarok seemed to choke when trying to send tracks to last.fm to refresh my profile. This was very annoying. After quite a bit of searching and monkeywork grepping through log files, I found a workaround here.

The workaround isn't very elegant or scalable but, hell, it works.

The solution to the problem is:
  1. Log into last.fm and change your password
  2. Change the password in amarok
  3. Profit

Montag, 4. Februar 2008

Brother DCP-110C on openSUSE 10.3 Linux


I was asked by my neighbour to give him a hand with a Brother DCP-110C multifunction printer under openSUSE 10.3. Since it was me who convinced him to install openSUSE in the first place, I figured I pretty much had to - not that he needed that much convincing in the first place. Anyhow, after a half an hour at his place, I still hadn't got the printer to spit out anything, so I took the hardware all the way down two flights of stairs and now it's sitting at my feet purring happily with openSUSE goodness - kind of.

I browsed through some forums trying to get an idea of what steps were needed to get the Brother beast working. After about two minutes, I saw that it could be done, but wasn't going to be easy. Before I describe how I got it done, there is a caveat which might bite me tomorrow - my machine is an AMD64 and my neighbour's is a 32bit Intel. Without messing about, here are the steps I needed to get it working:

  1. Forget YaST. Don't even try setting anything up. It won't work. Period.
  2. Download the lpr driver from here: http://solutions.brother.com/linux/sol/printer/linux/lpr_drivers.html
  3. Install the driver from (2). with rpm -i DCP110Clpr-1.0.2-1.i386.rpm (as root)
  4. Download the cups wrapper from: http://solutions.brother.com/linux/sol/printer/linux/cups_drivers.html
  5. Install the cups wrapper from (4). with rpm -i cupswrapperDCP110C-1.0.0-1.i386.rpm (as root)
  6. The above rpms only install the cups filter in /usr/lib/cups/filter/brlpdwrapperDCP110C. Thus you need to create a symlink: ln -s /usr/lib/cups/filter/brlpdwrapperDCP110C /usr/lib64/cups/filter/
  7. Now try setting up the printer in YaST. It should more or less work
As you might have guessed from what I was saying about caveats (above), the printer is working on an AMD64 at the moment - it will probably be a(nother) chore to get it up and running on the 32bit Intel. Here are a couple of links which might help:

http://solutions.brother.com/linux/sol/printer/linux/linux_faq-2.html

http://solutions.brother.com/linux/en_us/

Dienstag, 29. Januar 2008

New project in Google code!

New app - pyOcrHelper!
I just checked a new project into Google Code - I called it pyOcrHelper (because I couldn't think of anything else). Basically, it's a python class which makes access to OCR software such as Tesseract or Ocropus easier, because you don't have to think about converting the image/document you have into the format required by Tesseract or by Ocropus - pyOcrHelper takes care of this for you.

What it can do currently:
The first release provides the basic functionality that I required - simply to be able to OCR scan any image file and (importantly) also PDFs (seeing as scanned documents are often sent as images embedded in PDF). As mentioned, this works (kind of). It badly needs documentation and probably also needs to be packaged in the openSUSE build service. There are a couple of loose dependencies which could probably be deleted altogether.

Next steps:
The next steps are to tighten up the code (a lot), to make the code readable and to start raising worthwhile exceptions instead of having class member functions bail out with sys.exit() after doing a sys.stderr.write(). I also want to do some work on output formats. Currently, Ocropus produces half usable HTML, but this could easily be improved upon - and XML can't be that hard to output either. Apart from that, there are other things that I might consider, like taking the opportunity to get to grips with pyqt4 and KDE4/Plasma. I'm thinking of a nice plasma desktop app where you can drop any file and have the OCRd version jump back out at you...

Similar applications:
Just spotted another python project on Google code - Clarify which is aimed at doing more or less the same as what I'm aiming at - but possibly with multithreading as well. Must have a look at the code and the results. Maybe I can learn something from it.

Samstag, 19. Januar 2008

Determining file type with python using python-magic as an alternative to the inbuilt mimetypes

Often enough during python programming, you need to read a file, or perform actions on a file. Before doing so, it is often necessary to find out what type of file it is you are dealing with - an mp3, a text file, an image file - whatever.

Using the command 'file' to determine the file type
Linux comes with a wickedly good utility called 'file'. It is fast and can handle lots of different file types. Using it is as simple as:
cf@opensuse:~/bin/python/playground> file myjpeg.jpg
myjpeg.jpg: JPEG image data, EXIF standard

Try to get the filetype by looking at the file extension
One way of doing this to look at the file's extension, though this sucks, for a number of reasons (particularly on linux/unix). To do so, you would take the filename as a string and use a regular expression to try to get the part of the string before the first dot (e.g. myfile.tar.gz or myfile.mp3). This might not work for some filenames - I'm thinking of files with version numbers etc - something like myfile-1.0.4-1.src.rpm. However, python brings an inbuilt helper called mimetypes, which could make life a bit easier.

Try to get the filetype using the inbuilt mimetypes
If we have a JPEG image file called myjpeg.jpg, we can try to read in information about the file type using mimetypes. Consider the following python code (after starting the interpreter):
>>> import os,mimetypes
>>> mimetypes.guess_type(os.path.join(os.getcwd(),"myjpeg.jpg"))
('image/jpeg', None)
As you can see, mimetypes was clever enough to see that the file is a jpeg file. The second entry in the tuple returned by mimetypes.guess_type would normally be the encoding (if the guess_type function can actually determine the encoding). Compared to the output of the 'file' command shown above, this is pretty feeble, but apparently, it is possible to get better results by using some of the mimetypes other functions to map different encodings. Try using the following for more information:
pydoc mimetypes

Using python-magic as an alternative to mimetypes

As mentioned above, linux has a really good utility for determining file type and other useful information from files - the 'file' utility. python-magic is a kind of python interface to the file utility - it brings its own shared library 'magic.so' and thus provides more information that mimetypes (though I emphasise that I haven't spent too much time on mimetypes - it's probably better than I'm describing. Anyhow, consider the following code:
#!/usr/bin/env python
import magic,os

jpg = os.path.join(os.getcwd(),"myjpeg.jpg")

ms = magic.open(magic.MAGIC_NONE)
ms.load()
type = ms.file(jpg)
print type

f = file(jpg,"r")
buffer = f.read(4096)
f.close()

type = ms.buffer(buffer)
print type
ms.close()
This code outputted the following (when used on the same 'myjpeg.jpg' file as 'file' above:
cf@opensuse:~/bin/python/playground> ./testmagic.py
JPEG image data, EXIF standard
JPEG image data, EXIF standard

The information is more or less the same as provided by 'file'. The major problem with python-file is actually finding and installing the module. There are some Ubuntu and Debian packages available, but I was looking for an openSUSE package. After searching for ages, I decided to package it myself, from the Ubuntu sources. You can download the package from my openSUSE Build Service project. If you are using openSUSE, you can use the 1-Click Install button below.


Samstag, 5. Januar 2008

Messing about with a python implementation of wget


wget is a really wicked tool - I really miss it when I have to use other systems. So I thought, "why not try to implement wget in python - seeing as python is easy to install on lots of systems - even Nokia's S60 can handle it. Of course, writing something with anywhere near the functionality of wget is pretty much impossible - and anyway, I don't particularly need all of it's functionality. The concrete requirement I have is to build a downloader module for currxchange (to download xml files containing currency exchange rates).

After messing about with python's urllib, I decided to use urllib2 - I only really needed urllib2.urlopen as this provides the info() function which can spit out the metadata about the upstream file - things like file_descriptor.info()["Content-Length"] are thus easy to access.

I have the script kind of working with Kelvie Wong's cool ProgressBar module from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/168639 (note: the class posted as a comment right down at the bottom of the page). There is another pretty cool python class available for doing more or less exactly what I want to do (even with gui if you're into that) at http://www.python-forum.de/topic-9647.html. It's threaded and downloads the file in three different parts. Methinks it can resume downloads as well - pretty neat. Still, I'm going to keep messing about with my own class so I can tailor it for currxchange. From all this messing about you really appreciate how much work went into a utility like wget - amazing!


Freitag, 4. Januar 2008


Medion MD96360 on openSUSE 10.3


Sandra got a brand new laptop before Christmas - a stylish shiny Medion in black with swaroski studs on the front panel. Niiice... Anyhow, she didn't particulary want the Windows Vista Home Premium which came with it, so I happily clicked on "Hell no - I don't agree with this EULA" and popped in the openSUSE 10.3 GM 64 bit DVD.

Installing openSUSE 10.3 is ... well ... dead easy. In fact, since 10.1, I haven't really had any problems with openSUSE - now and again there are some quirky hardware parts which just don't work (tm) but nothing that has particulary bothered me. With this laptop, X wasn't detected properly and I got dropped back into the terminal. Installing the fglrx driver solved this (except for an annoying little problem which pops up now and again - more on this later). I was happy to see stuff like the inbuilt card readers working. Ouch - one more thing, now that I think of it - WLAN was a bitch to configure - I still haven't got it working properly - though the inbuilt Ralink r73 card was detected and can be used with iwconfig/iwlist - it seems to have problems with wpa2 authentication. A USB WLAN stick later and WLAN was working... though this really isn't the proper solution...

Anyhow, those were just a few thoughts on the laptop - the overall verdict is "niiice". Small, light, comfortable to type and wickedly fast. More to come on this...