DTW instead of Hausdorff Distance

DTW is normally applied to time series and similar 1D signals. However, the text contours are 2D and we weren’t able to identify any 2D DTW function in the literature. There are two main problems in applying a 1D comparison method to a 2D feature.

  1. DTW assumes the feature is sorted. In histogram/projection features the signal is sorted naturally but in contours this is not the case. The components may be rotated a bit, hence their contours are not aligned. In such cases DTW fails to compare similar elements.
  2. The comparison of two coordinates can be done in more than one way. It’s possible to use Euclidean, Manhattan or any other kind of distance metric and these should be tested.

The first problem is related about the lack of natural ordering in feature descriptors. We can solve this by sorting the contour points with a common criteria. Nevertheless as the circular positioning of the coordinates have no natural beginning, we also don’t have a guarantee that a slight rotation of a component won’t disturb the feature robustness.

The following algorithm is applied to overcome these problems.

  1. Points are sorted clockwise w.r.t to the center of a component.
  2. 20% of the points from the beginning are also repeated at the end of the sequence to alleviate the slight rotation problem.
  3. Each point is measured with another point using a robust metric. This is either Euclidean or Manhattan.

We used Cython to implement this algorithm and evaluated the performance in our Ottoman Lithography dataset.

Mounting Android phones in terminal in Linux

In the good old days, accessing the files in an Android phone was easy. Just plugging via USB port was showing the phone as an external drive. Nowadays, as we need more secure and reliable protocols, it needs new commands to mount them as disks.

I’m using jmtpfs for mounting phones. First install it via

sudo aptitude install jmtpfs

and make a directory in your $HOME like

mkdir ~/myphone

then, after plugging the phone, type

jmtpfs ~/myphone

should mount the phone under the directory. Note that, if you have a password/pattern to unlock your phone, you need to unlock your phone. Otherwise when you cd ~/myphone it will say Input Output Error or something similar.

After you unlock the phone, you can use this directory to use your phone as an external disk. Then when you want to finish and unmount, simply type

fusermount -u ~/myphone

and your phone is ready to be unplugged.

Adventures with Tesseract

This is an example post to tell my adventures in Tesseract.

Typing Ottoman in Emacs

Since Emacs 24 had right-to-left language support, it is theoretically possible to write in Ottoman. But there is no standard input method for Ottoman Turkish and writing Ottoman in, say, Farsi is not a flowing experience.

Emacs is probably the most customizable and extendable piece of software in the world. Therefore it should be very easy to set up another input method specifically for Ottoman, right?

Right. I checked the sources and there was an input method for Farsi It’s aimed towards Westerners who don’t typically have Farsi keyboard. I decided to base my solution to that.

However, as I have a visual transliteration myself, I decided to use it instead with several shortcuts. It’s a bit verbose but easier to remember for infrequent use. And also with defined shortcuts (Capital letter) for frequent use. So I decided to use visual transliteration for Ottoman as a basis for the input method.

The letter input is as in the visual transliteration table. However a few capital letter shortcuts are also added.

Following table shows the letters for basic input. Most of the three letter constructs have alternatives as capital letters shown within the parantheses.

Letter / Input Letter / Input Letter / Input Letter / Input
ء c ا e ب bu1 (B) پ bu3 (P)
ت bo2 (T) ث bo3 ج xu1 (C) چ xu3 (Ç)
ح x خ xo1 (X) د d ذ do1
ر r ز ro1 (Z) ژ ro3 (J) س s
ش so3 (Ş) ص z ض zo1 (D) ط t
ظ to1 ع a غ ao1 (Ğ) ف fo1 (F)
ق fo2 (Q) ك lo5 (K) ك k ك lo5
گ ko7 (G) ڭ ko3 ڭ lo5o3 ڭ ko5o3
ل l م m ن bo1 (N) و w
ؤ wo5 ه h ة ho2 ی y
ي bu2 ئ yo5 آ eo6 (A) أ eo5 (E)

The digits all start with n

Letter / Input Letter / Input Letter / Input Letter / Input Letter / Input
۰ n0 ۱ n1 ۲ n2 ۳ n3 ۴ n4
۵ n5 ۶ n6 ۷ n7 ۸ n8 ۹ n9

As the letters in Ottoman (and Farsi) sometimes don’t obey the rules of connection, sometimes a zero width non joiner character, namely a 0 width space between letters are required. This disallows two letters to connect in usual manner. It can be put in this input method by either &zwnj;, || or <>. This is important in writing suffixes of some words.

You can see other punctuation from the .el file itself.

The file is downloadable from Teknokrat’s github page

Converting Python Files to Cython

We make our prototypes in Python_. A prototype’s purpose is its fast production. However sometimes this fast production aim contradicts with fast execution and we need to parts in C. Recently, instead of C, I tried Cython_ as an alternative to convert Python programs partially to C.

Suppose we have a normal Python program that works without error in Linux. (For Windows a similar approach should work with MinGW or Visual C.) The first step is to save this example.py as example.pyx and creating a Makefile as the following:

all: cfiles compile

cfiles:
     cython -a example.pyx

compile:
     gcc -g -O2 -fpic -c example.c -o example.o `python3-config --includes`
     gcc -g -O2 -shared -o example.so example.o `python3-config --libs`

clean:
     rm -f example.c *.o *.so

Save this as Makefile and now you should be able to run

$ make

in the directory that has Makefile and example.pyx and get example.so as a importable Python module. Note that anything that runs before saving the file with pyx extension should also work now.

You can import the module as normal.

import example

According to Cython docs, this trivial conversion should yield a 5% increase in performance. However this is not the aim of Cython.

Suppose we have some hard calculation methods that we use over and over in loops. We have a code like this in one of our feature comparison modules. It compares lines in a figure by looking at their length, angle and midpoint relative to a center.

Previously code was calculating all these in Python and the tests (on a large datase) took days. Then I converted two functions that make the actual comparison like the following.

import cv2
import numpy as np
    from . import dtw
    from pyemd import emd

    cdef extern from "math.h":
         double log(double)
         double fabs(double)
         double sqrt(double)
         double pow(double, double)

    cdef _cmp_length_midpoint_angle(double a_len,
                                    int a_mp_x,
                                    int a_mp_y,
                                    double a_angle,
                                    double b_len,
                                    int b_mp_x,
                                    int b_mp_y,
                                    double b_angle):
        len_diff = fabs(log(a_len) - log(b_len))
        mid_diff = 4 * sqrt(pow(a_mp_x - b_mp_x, 2) + pow(a_mp_y - b_mp_y , 2))
        ang_diff = 2 * fabs(a_angle - b_angle)
        return len_diff + mid_diff + ang_diff


    cdef _cmp_length_angle(double a_len,
                                         double a_angle,
                                         double b_len,
                                         double b_angle):
        len_diff = fabs(log(a_len) - log(b_len))
        ang_diff = 2 * fabs(a_angle - b_angle)
        return len_diff + ang_diff

    def cmp_length_midpoint_angle(a, b):
        return _cmp_length_midpoint_angle(a[0], a[1][0], a[1][1], a[2],
                                          b[0], b[1][0], b[1][1], b[2])

    def cmp_length_angle(a, b):
        return _cmp_length_angle(a[0], a[2], b[0], b[2])

In the previous version cmp_length_midpoint_angle and cmp_length_angle were standard Python functions. The conversion took about half an hour and it reduced the running time more than half. It paid off even in the first few hours.

Cython is fantastic.

Adding Links Through JQuery

I prefer JQuery way of append() than plain JavaScript inner HTML method of adding elements to a page. This way it is easier to identify errors and produce nice DOM trees without dealing with markup.

In dervaze.com I added a page for each element in the dictionary. These pages should be reachable from the main page, so I decided to add links from word results to the word pages.

I just added

.append($('<a>')
    .attr("href", link)

to the appropriate place in the tree and it put the link.

zsh’de parametre açılımı

zsh parametre açılımı (expansion) için bash ve diğerlerinde olmayan çeşitli kolaylıklar sunuyor.

% filename=$HOME/mydir/myfile.txt
% print ${filename:h}
/home/iesahin/mydir
% print ${filename:t}
myfile.txt
% print ${filename:e}
txt
% print ${filename:h:t}
mydir
% print ${filename:t:r}
myfile

PostgreSQL with Django

In debian, install the following packages: libpq-dev, postresql-9.3, postgresql-contrib-9.3

Change to postgres user using sudo su - postgres

Create a database using createdb mydb

Create a new user by createuser -P myuser

Open postgresql by psql and grant all on mydb to myuser

Change the database options of the django project by

DATABASES = {
  'default': {
  'ENGINE': 'django.db.backends.postgresql_psycopg2',
  'NAME': 'mydb',
  'USER': 'myuser',
  'PASSWORD': 'thepassword',
  'HOST': 'localhost',
  'PORT': ''
  }}

Now you can run manage.py syncdb in your django configuration to generate the models in Postgres.

Linux’ta Date Komutu

Linux terminalde date komutuna birden fazla parametre geçmek gerektiğinde,

date +"%F %R"

şeklinde, " (çift tırnak) işaretini kullanmak gerekiyor.

dervaze.com updated

Finally, we have completed adding another 17.000 words from Belviranli’s Ottoman spelling dictionary and now dervaze.com has about 27500 Ottoman words.

We will continue to include more words, especially proper names as much as possible.