Technical Tips & Tricks: November 2016

Ephesoft’s FuzzyDB feature is a great way to use values extracted from a document to retrieve an index value from a database table.

We’ve made great use of this feature to identify a vendor name in a document when the vendor’s name only appears inside a logo, and doesn’t appear in actual text. We configured our FuzzyDB table with one column that contains the vendor name, and several other columns that contain the vendor’s address and/or unique strings from the vendor's paperwork (like a company slogan).

When Ephesoft processes a document, it compares the text of the document to an indexed version of the contents of the address column, and returns the vendor name for the row with the best match.

Sometimes the results returned by the FuzzyDB search aren’t what we expected to find, and it can be helpful to troubleshoot the search using Luke, a Lucene tool included in the Ephesoft installation.

Open Luke by running luke.bat from the following directory: <Ephesoft-Home>\Dependencies\luke
The following window should appear by default. If not, choose File->Open Lucene Index.
Browse to the path of the Lucene index you want to view, then click OK. For Ephesoft FuzzyDB work, choose the path to your FuzzyDB table name in the following folder:<Ephesoft-Home>\SharedFolders\BC<Number>\fuzzydb-index\ephesoft\<table-name> (It’s best to open the index file in Read-Only mode, just to be safe.)
This will take you to the following window:
Click on the Search tab at the top, then go to the Analysis tab on the right and change the drop-down list to “org.apache.lucene.analysis.standard.StandardAnalyzer”
Enter some search criteria the upper-left text box and click Search (ensure that there are no punctuation characters in your search string).
Results will be displayed in the Results list. Note the "rowID" values that are returned:
Open your FuzzyDB table in a SQL tool like Heidi, and you can see that the rowID values returned from the search above map to rowID values in theFuzzyDB table:

The above procedure is useful for generic testing. However, if you have a document in a specific batch that is giving you problems, you can use the following steps to test FuzzyDB retrieval for an entire page.

On the Ephesoft server, find the HOCR.xml file for that page, then copy the HOCR value from the last line of the XML file.
Paste that entire line into an editor like Notepad++ and remove all special characters (see the end of this page for an easy way to clean out special characters in Notepad++).
Convert the entire string to lowercase (Edit -> Convert Case to -> Lowercase).
Paste the cleaned, lowercase string into the Luke search dialog, and click Search.

What to do with Your Search Results?
When you do your page-specific search test, you may find that multiple vendor rows are being returned, or maybe you're simply getting the wrong vendor. Typically this means that the values you've entered in the searchable columns of your FuzzyDB table aren't clear enough to return the proper result.

Edit the values in the searchable columns of the FuzzyDB table, and repeat your search. Continue editing/searching until you get the desired results. Look for values that are unique to that vendor's paperwork, such as company name, address, ZIP code, company slogans, or distinct wording that appears on each of that vendor's documents. You can even use personal names if the vendor's paperwork always has the same person's name on it.

When you're satisfied with the results inside Luke, make sure to do a Learn DB inside Ephesoft to recreate the Lucene indexes within Ephesoft based on the changes you've made to the FuzzyDB table.

Avoid Reserved Words in your Luke Search Expression

The words “AND” and “OR” in uppercase are reserved words in Luke. Ending a search string with either of those words will cause an error. Using AND or OR inside a search string is likely to return unexpected results as those words will be treated as part of the query.

Removing Special Characters in Notepad++

To remove special characters from a string in Notepad++, open the Find/Replace dialog (Ctrl+F) and click on the Replace tab. Select the Regular Expression radio button at the bottom left, then type :punct: in the Find What field, and click the Replace All button to remove those characters (:punct: is a regular expression that will find all punctuation characters in the string).

In a previous post, we discussed viewing AutoCAD® drawings in Alfresco. However, how do you interact directly between AutoCAD and Alfresco such that the design engineers working in AutoCAD can keep their base drawings along with the associated XRefs in sync with what is in the Alfresco repository?

The solution is Formtek’s EDM Connector for AutoCAD. The EDM Connector for AutoCAD is a plug-in installed on each AutoCAD client. It integrates AutoCAD and the Alfresco repository, via the Formtek EDM Module.

The EDM Connector for AutoCAD allows users to interact with Alfresco directly from AutoCAD

The EDM Connector utilizes the predefined engineering content model provided by the Formtek EDM Module to provide secure management of engineering drawings directly from the AutoCAD application. You can also configure the EDM Connector to use your own custom content model.

With the EDM Connector, you can:

Upload a new engineering drawing or drawing revision to the repository and optionally store the AutoCAD drawing properties and block attributes as repository properties
Browse the repository to download an engineering drawing from the repository to your local machine and open it in AutoCAD
View or edit a drawing’s repository properties
Control versions and revisions between Alfresco and AutoCAD
View a drawing’s revision history in the repository
Lock or unlock a drawing in the repository
Synchronize your local copy of a drawing by checking the repository for a newer version and downloading it if one exists
Automatically maintain the integrity of a drawing’s external references.

Here's a video with more information about the basic functionality of the EDM Connector:

The EDM Connector for AutoCAD works with AutoCAD 2013 through 2017.

Wednesday, November 30, 2016

Using Luke to Troubleshoot Ephesoft FuzzyDB Searches

Wednesday, November 9, 2016

Synchronizing AutoCAD Drawings in Alfresco with AutoCAD