Audio Programming In Python: April 2014

Wednesday, April 30, 2014

12. OCR Example

This is an example of a Python application. OCR stands for Optical Character Recognition. Typically you have some image files, maybe from scanning or after using the print screen key on the keyboard. The object is to parse all of them with some ocr engine and convert them to text file.

A good OCR is tesseract-ocr. You may get it at this webpage. You can see from the information on this page, that they developed this program at HP Labs between 1985 and 1995. Later in that page, you can see that after 2006, it was further developed at Google. Further it is a very accurate OCR and it is free.

On this page, there are links to download the setup files for all operating systems. For Windows, this is the setup file.

After agreeing to the license, and selecting the defaults, the OCR is successfully installed.

After installing it, opening the command prompt, we can see if the program was correctly installed by typing the v flag.

You can do the same thing in Python. You have to import the os module. The os module is for interacting with the operating system. One of the most useful functions is system(), which allows for passing any String to the command prompt so it may execute it.

To use tesseract-ocr, we have to write tesseract, then image file name, and finally text name. For the image file, we have to give the extension. For the text name, we do not as it always assumes it is txt.

We can think of tesseract-ocr as backend, and here we are using Python as frontend. That is , we interact with the frontend and it deals with the backend.

In our example, this is the content of a folder. Here we have ocr.py, the Python file to be explained later and some image files.

We run the program using %run command in IPython. We get outputs from our program as well as those from the ocr program.

Now, the program ocr.py is described. It first reads the name of all image files, and puts them in a Python List. This is done in Lines 1 to 11. The format of the image filename will have some prefix, some number and finally the image extension, which is png, in our example.

We have to find what is the maximum size in characters of the number field. Also we have to find the prefix. We generate some helpful printouts so we know everything is working ok. Then we have a loop to run for each image file. This step will generate as many text files as we have input image files. Next all text files are read into a List. Finally that List is saved to the file 'out.txt'.

After the module os is imported, the current directory is found, and then we find the List of files in the current directory, and assign it to the fils List. Then fils List is filtered so only those which have the correct extension, are in the filIm List.

We find the length in characters of the maximum image file name. This is done so sorting functions can work, as we shall see later. A temporary List is created by a List Comprehension. This will have the list of string lengths of all the Image files. We find the maximum length and subtract 4 from it. The reason we subtract is because the String dot png takes 4 characters. This Size contains the prefix as well as number.

Now we find the prefix. An empty List is created and then we iterate over all characters in filename. We use the first Image file, filIm[0] since it always exists. We could always test for file existence which would be a better design. The empty list is appended by each character, that is not a digit. Once it is a digit, we break out of the for loop, and use the String function join, to join all the characters in the List.

Now we have two print statements, which would serve as a check. We could immediately see if the program calculated the prefix correctly as well as found the correct number of Image files. You might want to print everything, such as the contents of the List filIm if you want.

Now, for each Image File in the List filIm, the ocr is run. The Tuple which is called tup has four elements. The first is fil, the current looping variable and thus it is one of the Image files. The second is the prefix, such as 'p'. The third element is the size of the number field in characters. Finally the number is extracted from the filename. This Tuple will populate the four placeholders in the S String expression. The first %s is the first element, the filename. The second %s refers to the prefix. Then we write the number, which is 0-formatted. The symbol * indicates this size is calculated at runtime since it might be different depending on the filenames. The symbol * will get the maximum number-field length so it will write 9 as '09' in our example. The %d is the integer corresponding to the fourth element of the Tuple.

With the new files created, we again inquire about what files exist and then we keep only the text files in the List filTxt. Finally this List is sorted to be in the correct order, like '09', '10', '11'.

Next, all the text files are read, and then inserted into a List. The length of the List is not important. However, it is the number of lines in all text files.

Finally, that List is written to a new file called 'out.txt'. The source code for ocr.py is at pythonaudio.blogspot.com.

# ocr.py
import os

# Find current directory
CurDir=os.getcwd()

# List of files in current directory
fils=os.listdir(CurDir)

# List of png files
filIm=[fil for fil in fils if fil[-4:]=='.png']

# Size - number of maximum characters in the filename (not including .png)
Size = max([len(fil) for fil in filIm])-4

# Find Prefix of filenames
# Each character is looped, loop ended when digit found
prefixList=[]
for prefixChar in filIm[0]:
    if prefixChar.isdigit(): break
    else: prefixList.append(prefixChar)
prefix="".join(prefixList)

print "prefix is %s" % prefix
print "There are %d image files" % len(filIm)

# Run tesseract for each image
for fil in filIm:
    tup=(fil,prefix,Size-len(prefix),int(fil[len(prefix):-4]))
    S='tesseract %s %s%0*d' % tup
    print "--> " + S
    os.system(S)

# List of files in directory after text files created
fils=os.listdir(CurDir)

# List of Text Files
filTxt=[fil for fil in fils if fil[-4:]=='.txt']

# Arrange Page Numbers
filTxt.sort() 

L=[]
# Read each text file and add to L
for fil in filTxt:
    fIn=open(fil)
    LTemp=fIn.readlines()
    fIn.close()
    L.extend(LTemp)

# Save L
fOut=open("out.txt","w")
fOut.writelines(L)
fOut.close()

You will find additional information, including a larger image of the slides and text of the audio, at pythonaudio.blogspot.com.

This is the video of Tutorial 12:

Tuesday, April 29, 2014

11. Python Files

Most of the file operations in Python will involve text files. Here we will go over statements for writing and reading of text files.

For reading or writing, we have to have a File object which is created by the open() function. For writing of files, the mode will be 'w', and to append to the end of a file it is the mode 'a'. We can only write out Strings. There are two functions to write to File object. Here we write the String “Hello Python” to the File object. Finally, we should close the File object. Closing will finalize the writing to the actual file.

To write a multiline String, we have to include \n (which stands for the newline character) at the end of each line.

This is the text which was created in the last program.

For writing out multi-line Strings, it is better to use the writelines() function. Here, we append the four lines to a List. Since each is only one line, there is only one \n character.

We can also define the List at once. Notice Python easily understands this is the same statement, since it has not found a closing square bracket. This should be used often to make the program structure readable.

For reading, there are three functions. If the function read() is used, it will read the entire file and return it in a String.

If the read() function is used with a number inside the parenthesis, it will only read that number of characters. This is important if we have fields of certain width.

We can read the entire line, that is until the first newline character with the readline() function. We should see that the newline character is always at the end of Strings which are returned.

Finally, we have the readlines() function. This will read all the lines at once into a List. Most of the time this function is used in reading text files, as it breaks down the file into smaller strings.

You will find additional information, including a larger image of the slides and text, at pythonaudio.blogspot.com.

This is the video of Tutorial 11:

Sunday, April 27, 2014

10. Python Dictionaries

Python Dictionaries are the Python's unordered data structure. They offer more powerful indexing options than integers, which are the only indexing option for Lists and Tuples. Like Lists, they are Mutable.

To create an empty Dictionary, we write a pair of brackets. We can initialize individual elements by putting them in brackets. Unlike Lists or Tuples, we need two objects for each element, one element is the index and the other element is the value. We can see extracting, as usual, is with the square brackets, and the len() function gives the length of elements in a Dictionary.

Once a Dictionary has been created, new elements are added by assignment statements. Notice the index is inside the square brackets, and the value is the term on the right hand side. Next, a Dictionary with three terms, is defined.

Besides being numbers such as integers, the index can be a String or a Tuple. Note an index must be immutable.

However, a List can not be used as an index, since a List is Mutable.

We can create a dictionary from two sequences. Here we have two Lists, A and B. The variables in A will be our keys and the variables in B will be our values. This is done with the zip() function. It creates a List of (Index,Value) Tuples. Finally if this List is operated by the dict() function, the Dictionary is created.

We can use a for statement to iterate over the keys. Here we use a List Comprehension to make a List of Indices in a Dictionary. This is similar to the keys() member function of Dictionaries.

Besides the keys() member function, we also have the values() function to return a List of values, and the items() function to return a List of index, value pairs.

We can use the in statement, to check if an index is in Dictionary. We may also use the get() function to get a value of a particular index. We might also give a default value. The default value is returned, if a particular index, is not found in the Dictionary. The default value is Python None object, and so nothing is returned for line 8, but in line 9, a 0 value is returned as we give a default value.

Like Lists, Dictionaries have a Comprehension statement. We have the value of each element in terms of the looping variable, all enclosed in brackets.

Now we have, a Dictionary example, which is similar to the example in the tutorial 8 on Python List Comprehensions. There, we had an example to get frequency for each of the 88 piano keys. The List created there had an index going from 0 to 87. It would be easier to have index terms like A0 and A4.

To analyze any problem, it is best to visualize in tabular format. For each index from 0 to 87, the note will have two parts: the name, given by the Pos term in this table and the Octave number, given by the Octave term in this table. The first note is A0. We only have 3 notes in Octave 0. Thus, we can see, that there is an offset of 9. This is the reason we have a 9 in the formulas for Pos and Octave. In Python 2.7, integer division results if both numerator and denominator are integers. The modulus operation gets the remainder.

First we have to define the 12 note names. Then we use a for loop to go over the entire 88 keys. We find the Octave and Pos variables according to the formulas we found earlier. In the print, string formatting, of %d (for integer) and %s (for String) are used. We covered this kind of printing earlier. The final comma, which is to the right of the closing parenthesis, tells the print to suppress normal printing of newline character. If we do not have this, we will have to scroll up and down since our list will not fit in one page.

The result of the print is shown here. It is usually good to have checks like this so our programs will not have errors which might keep on accumulating should there be no checks.

Finally we can create our Dictionary. Now instead of using the formatting flags, we may also use the str() function. This will convert most Python objects like integer to String. Notes is a List, but each element is a String already. Thus, now we can use index terms like 'A4'.

If we loop over some Dictionary with the in statement, we are looping over keys. Here for example, we find the number of black keys. The function count() will either return 0 or 1, since our string will have at most, one such symbol.

We can also test values of a Dictionary. For example, we may find the number of notes with frequencies between 100 and 300. This number is found to be 19. This number is between 12 (1 octave) and 24 (2 octave).