Technical Tips & Tricks: Ephesoft

Showing posts with label Ephesoft. Show all posts

Wednesday, October 11, 2017

Auto-Complete in Ephesoft

Adding a semicolon separated list of values to an individual index can improve over-all accuracy while decreasing the processing time required to complete an Ephesoft batch. The auto-completion of the index field happens during the Validation process. A simple example is a field where the possible responses are ‘Positive’, ‘Negative’ or ‘No Response’. The person processing the form can simply enter a character or two that match the characters associated with the entries in the list of values. Once the corresponding value appear as the first entry in the list the user will strike the Down Arrow Key to populate the index with the selected value. A tab will take the person to the next field so they can continue to process the form.

Setup for an Auto-complete Field
First collect the values that are valid for a particular field. Now open the Doc Type in which the index value resides. Under the header ‘Field Option Values List’ enter the list of values. A semicolon is used to separate the list of values. See screenshot #1.

Screenshot #1:

Now look for the header label ‘Field Type’. Select the value ‘Combo’. See Screenshot #2. Apply your changes and you can test the results.

Screenshot #2:

Interacting With an Auto-complete Field

Open a batch file in the Validation module that matches the Doc Type where you added your list of values. Tab down to the field where you entered your list of values. Once there enter a character or two that match the first few characters for the value you want applied to this field. Once you see your value appear at the top of the list simple hit your Down Arrow Key and this value will be used to populate this field. The tab key will take you to the next field. This simple change that can greatly decrease the processing time associated with a Doc Type within the Ephesoft application.

NOTE: The characters your entry are NOT case sensitive; a match will result even if the case of the characters do not a match.

Monday, April 3, 2017

How can I reset the Batch Instance Identifier (ID) value in Ephesoft?

Sometimes, for an Ephesoft deployment, it is advantageous to reset the Batch Instance Identifier (ID) or BI#. This might also be required because there is a "feature", some would say a bug, in MariaDB where the sequence value will reset to '1' when there are no batches in the queue and the DB is bounced.

First, we should determine the current BI# using the following command:

select ID from BATCH_INSTANCE;

NOTE: It will be the highest value returned.

Remember the Batch Instance Identifier is displayed in Hexadecimal format when viewed in the Ephesoft interface. However, we will use an integer when updating the BI# at the DB level and Ephesoft will convert it to a Hexa Decimal value. In this first example, we will set the BI# to '40001'. Ephesoft will display this as 'BI9C41'.

alter table BATCH_INSTANCE auto_increment = 40001;

You can check the value using the previous select statement after you submit your next batch.

If you want to start with a specific Hexadecimal value, say A100, you will need to first convert it to a decimal. In this case, the decimal value '41216' would be used.

alter table BATCH_INSTANCE auto_increment = 41216;

NOTE: To verify that the change has occurred you will need to process a new Ephesoft batch.

Wednesday, November 30, 2016

Using Luke to Troubleshoot Ephesoft FuzzyDB Searches

Ephesoft’s FuzzyDB feature is a great way to use values extracted from a document to retrieve an index value from a database table.

We’ve made great use of this feature to identify a vendor name in a document when the vendor’s name only appears inside a logo, and doesn’t appear in actual text. We configured our FuzzyDB table with one column that contains the vendor name, and several other columns that contain the vendor’s address and/or unique strings from the vendor's paperwork (like a company slogan).

When Ephesoft processes a document, it compares the text of the document to an indexed version of the contents of the address column, and returns the vendor name for the row with the best match.

Sometimes the results returned by the FuzzyDB search aren’t what we expected to find, and it can be helpful to troubleshoot the search using Luke, a Lucene tool included in the Ephesoft installation.

Open Luke by running luke.bat from the following directory: <Ephesoft-Home>\Dependencies\luke
The following window should appear by default. If not, choose File->Open Lucene Index.
Browse to the path of the Lucene index you want to view, then click OK. For Ephesoft FuzzyDB work, choose the path to your FuzzyDB table name in the following folder:<Ephesoft-Home>\SharedFolders\BC<Number>\fuzzydb-index\ephesoft\<table-name> (It’s best to open the index file in Read-Only mode, just to be safe.)
This will take you to the following window:
Click on the Search tab at the top, then go to the Analysis tab on the right and change the drop-down list to “org.apache.lucene.analysis.standard.StandardAnalyzer”
Enter some search criteria the upper-left text box and click Search (ensure that there are no punctuation characters in your search string).
Results will be displayed in the Results list. Note the "rowID" values that are returned:
Open your FuzzyDB table in a SQL tool like Heidi, and you can see that the rowID values returned from the search above map to rowID values in theFuzzyDB table:

The above procedure is useful for generic testing. However, if you have a document in a specific batch that is giving you problems, you can use the following steps to test FuzzyDB retrieval for an entire page.

On the Ephesoft server, find the HOCR.xml file for that page, then copy the HOCR value from the last line of the XML file.
Paste that entire line into an editor like Notepad++ and remove all special characters (see the end of this page for an easy way to clean out special characters in Notepad++).
Convert the entire string to lowercase (Edit -> Convert Case to -> Lowercase).
Paste the cleaned, lowercase string into the Luke search dialog, and click Search.

What to do with Your Search Results?
When you do your page-specific search test, you may find that multiple vendor rows are being returned, or maybe you're simply getting the wrong vendor. Typically this means that the values you've entered in the searchable columns of your FuzzyDB table aren't clear enough to return the proper result.

Edit the values in the searchable columns of the FuzzyDB table, and repeat your search. Continue editing/searching until you get the desired results. Look for values that are unique to that vendor's paperwork, such as company name, address, ZIP code, company slogans, or distinct wording that appears on each of that vendor's documents. You can even use personal names if the vendor's paperwork always has the same person's name on it.

When you're satisfied with the results inside Luke, make sure to do a Learn DB inside Ephesoft to recreate the Lucene indexes within Ephesoft based on the changes you've made to the FuzzyDB table.

Avoid Reserved Words in your Luke Search Expression

The words “AND” and “OR” in uppercase are reserved words in Luke. Ending a search string with either of those words will cause an error. Using AND or OR inside a search string is likely to return unexpected results as those words will be treated as part of the query.

Removing Special Characters in Notepad++

To remove special characters from a string in Notepad++, open the Find/Replace dialog (Ctrl+F) and click on the Replace tab. Select the Regular Expression radio button at the bottom left, then type :punct: in the Find What field, and click the Replace All button to remove those characters (:punct: is a regular expression that will find all punctuation characters in the string).

Monday, October 3, 2016

Exporting Documents from Ephesoft to Alfresco via CMIS: Identifying Alfresco Aspects and Properties

CMIS Export from Ephesoft to Alfresco - Identifying Alfresco Properties and Aspects

Configuring Ephesoft to export documents to Alfresco via CMIS is a fairly straightforward process:

Enable and configure the CMIS export plug-in for your batch class inside Ephesoft
Define your aspect mappings here: <Ephesoft-Installation-Folder>\SharedFolders\<Batch Class #>\cmis-plugin-mapping\aspects-mapping.properties
Define your Alfresco document type and property mappings here: <Ephesoft-Installation-Folder>\SharedFolders\<Batch Class #>\cmis-plugin-mapping\DLF-Attribute-mapping.properties
Create a drop location in Alfresco to receive the files, and configure rules or scripts to move the files to their final destination

(Both of the folders above represent the default locations of those properties files, but your environment may be configured differently.)

Ephesoft has this overall process well-documented on their public wiki (http://wiki.ephesoft.com/cmis-export-plugin). However, when defining your mappings, it can be difficult to determine whether you’re dealing with aspects or properties on the Alfresco side. You’ll need to look through the content model to know for sure, and this article will tell you how.

If the documents are going to be identified as a custom content type in Alfresco, you’ll need access to the custom content model definition file. Once you find the custom content model definition file, you can use the steps below to understand how to read it.

If you’re going to simply use Alfresco’s standard content model, the content model configuration file can be found inside the following jar file:

<Alfresco-Installation-Folder>\tomcat\webapps\alfresco\WEB-INF\lib\alfresco-repository-5.1.jar

(the actual name may differ depending on your Alfresco version)

Copy that jar file to a temporary location, and change its extension from .jar to .zip. Extract the zip file to a temporary folder, or use a zip reader to navigate through the structure to find the alfresco\model\contentModel.xml file. Open the contentModel.xml file in an editor and look for the Alfresco aspect/property that you want to populate. For example, if you want to populate the cm:description field inside Alfresco, search for “cm:description” in the xml file. It will look something like this:

Work your way up the xml structure from the cm:description line, and you can see that cm:description is a property under the cm:titled aspect. This means that the mapping to the Description field in Alfresco must be defined in the aspect-mapping.properties file in Ephesoft. If you’re using CMIS 1.1, you can define that mapping like this:

PartsManuals= P:cm:titled|ManualDescription::cm:description

(Refer to the Ephesoft wiki page for details on how to set up the same mapping for CMIS 1.0.)

In the example above, PartsManuals is the document type in Ephesoft, and ManualDescription is the index field. With this mapping, Ephesoft will create the dm:titled aspect on the document when it’s exported to Alfresco, then it’ll assign the value from the ManualDescription index field in Ephesoft into cm:description in Alfresco.

You can also combine multiple mappings onto a single line by separating them with a semicolon, like this:

PartsManuals= P:cm:titled|ManualDescription::cm:description;P:cm:auditable|Author::cm:creator

This example (also for CMIS 1.1) executes the same mapping as above, and also adds the cm:auditable aspect, populating the cm:creator value with the Author index field from Ephesoft.

Most well-designed custom content models will use aspects instead of properties, so you’ll rarely have to add anything to the DLF-Attribute-mapping.properties file other than the Alfresco document type. However, older versions of Alfresco didn’t provide good support for mapping values into aspects, so it may be necessary to use properties instead of aspects in those cases.

For another example, say that you have a custom content model, and you want to populate the hvac:modelNum field in Alfresco with the model number extracted from Ephesoft. If you search for hvac:modelNum in the content model XML file for your custom content model, it could look something like this:

Work your way back up the xml structure, and you’ll see that hvac:modelNum is a property of the hvac:hvacDocuments type. This means that the mapping to the hvac:modelNum value in Alfresco must be defined in the DLF-Attribute-mapping.properties file in Ephesoft. You can define that mapping like this:

PartsManuals=D:hvac:hvacDocuments
PartsManuals.ModelNumber=hvac:modelNum

(The syntax for the DLF-Attribute-mapping.properties file is the same for both CMIS 1.0 and 1.1.)

In the example above, the first line states that documents of the PartsManuals type in Ephesoft should be imported into Alfresco as the custom document type hvacDocuments. The second line says that the ModelNumber index field in Ephesoft should be inserted into the modelNum property of the hvacDocuments document type.

Even if you aren’t importing values into Alfresco properties, the first line is necessary to tell Alfresco what custom document type these imported documents should enter Alfresco as. If you don’t specify a custom document type, documents will be imported into Alfresco using the standard Alfresco document type.

With the combined aspects and properties mappings described above, you can successfully map a variety of Ephesoft index fields into Alfresco fields via CMIS.

Wednesday, August 17, 2016

Ephesoft Queries - Compare Validation Patterns, Compare OCR Confidence Values

Sometimes it's easier to compare settings across index fields by looking at the values in the database instead of navigating down to each field in the Ephesoft UI. However, changes to the values still need to be made in the Ephesoft UI.

Compare Validation Patterns
The following query can be used to view validation rules across index fields grouped by index field. This is helpful for making sure that the same validation rules are applied to the same fields across different document types. Query the batch_class table to find the batch_class_id value for your batch class (in red).

select
a.field_type_name,
b.pattern,
c.document_type_description
from
field_type a,
regex_validation b,
document_type c
where
a.id = b.field_type_id
and a.document_type_id = c.id
and c.batch_class_id = "12"
order by 1;

Compare OCR Confidence Threshold Values
The following query can be used to view field-level OCR Confidence Threshold values on a form. Higher values will result in more batches stopping in Validation, but overall accuracy should be improved as a result. For Ephesoft installations where multiple forms have similar fields, this query can be used to ensure that the same index field has consistent OCR confidence values across different forms. Query the batch_class table to find the batch_class_id value for your batch class (in red).

select
a.field_type_name,
a.is_hidden,
a.ocr_confidence_threshold,
c.document_type_description
from
field_type a,
document_type c
where
a.document_type_id = c.id
and c.batch_class_id = "12"
order by 1;

Additionally, care should be taken to ensure that hidden fields (is_hidden = 1) have OCR confidence threshold values of 0 (zero). I've seen situations where a hidden field with a high OCR confidence value can cause a batch to stop in Validation even though all of the non-hidden fields have good confidence values, causing the user to wonder why the batch stopped in Validation.