I was creating locale file for Sindhi using Arabic script, since locale file consist hex code of all characters and i got strings from linguist.
It was really very time consuming to check each character in Unicode code page and write its Unicode value, may be there some other good method but i followed this for sd_IN@Devanagari locale file since i am very familiar with Devanagari script.
but Arabic code page is very confusing, since Arabic words contains init, mid, final form of the characters, and Unicode chart contains standalone shape.
but from sometime i am working with python and tried following quick method:
In terminal
>>> w = 'شريمتي'
>>> w.decode("utf8")
did same thing for almost 30 strings
so fast and accurate it is :)
whenever you will confuse with Unicode characters, just do this thing and directly check character information using Unicode value in chart.


Zabeeh Khan said...

I may haven't got you right, but I will still comment. You don't have to define all the forms of the Arabic script wherever you do it. For example, the initial, medial and final forms don't have to be defined. When you define the original complete form of an Arabic letter all the other forms are recognized through the complete form.

I, actually, had the same problem with Pashto script and asked the ML and they answered me. Just wanted to ....

Pravin Satpute said...

zabeeh thanks for you comment,
yeah, you are right.
I mean to say, once syllable is form original character gets turn into init, mid and final shape.
for non Arabic script writer it some times difficult to find original character from complete formed syllable that.
so it is good to use such tricks for identifying Unicode characters used

Zabeeh Khan said...

yeah, of course they are good.. :)

