Copyright © 2006 Together Teamlösungen EDV-Dienstleistungen GmbH
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the Together Teamlösungen EDV-Dienstleistungen GmbH.
Together Teamlösungen EDV-Dienstleistungen GmbH DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Table of Contents
Table of Contents
Together Search Server
Admin application
- Database reconnection attempt added.
- Option for reading special characters in filenames added.
- Current operation logging added(reading metadata list, include list...)
Preview application
- Zoom-in or zoom-out option removes header from image preview page.
Table of Contents
Table of Contents
Together Search Server
Admin application
- New application parameter is introduced :
'Snapper.Parser/word/ConverterClassName'
Together Document Viewer
- New application parameter is introduced :
'Snapper.Parser/Word/ConverterClassName'
Table of Contents
Together Search Server
Admin application
- Support to index WebDAV file metadata : 'created' and FTP file metadata : 'owner' are introduced.
- DODS removed.
- Support to index the 'Author' and 'Last saved By' compound file metadata(Word, Excel and Power Point documents), E-Mail metadata( 'From', 'To', 'CC', 'BCC' and 'Subject'), according to this new fields in Lucene index are introduced.
Search application
- Result page of search shows search parameters and fill search input field and sites search boxes again.
- E-mail search is introduced, search over E-mail specific properties : 'From', 'To', 'CC', 'BCC' and 'Subject'.
- Support of sorting result of e-mail search : 'by newest/oldest sent/received E-mail files' and 'by From/To E-mail is sent'.
- Introduction of advanced search through file metadata : 'Author' and 'Last saved By' .
- New look is introduced.
Parser
Power Point, Word and Excel parser, support to extract compound file metadata ('Author', 'Last saved By' ...).
Table of Contents
Together Search Server
Admin application
- Support to index WebDAV.
- Reconstruction of index FTP. Support to index subfolders.
- Introduction the size of the file (New field in index).
- Introduction the owner of the file (New field in index).
- New application parameter is introduced :
'Snapper.Parser/PowerPoint/ConverterClassName'
Search application
- Introduction of search through file metadata : created, accessed, owner.
- Introduction of search through sent/received dates of e-mails.
Together Document Viewer
- New application parameter is introduced :
'Snapper.Parser/PowerPoint/ConverterClassName'
Table of Contents
Together Search Server
Admin application
- Support for modification of configuration files.
- Support for index process without 'TAS' running.
- New application parameters are introduced :
- 'Snapper.Parser/Excel/ConverterClassName'
- 'Snapper.SaveConvertedFile'
- 'Snapper.PathOfConvertedFiles'
according to this new functionality is added. Pure text of excel file is stored in index and ability that converted text if is in HTML form be saved to file system as html document.
Search application
- Reread configuration files every x minutes(according to this, new application parameter 'Snapper.ReReadConfigFilesEveryMinutes' (defined in application configuration file - 'web.xml') are introduced).
- Introduction of search suggest logic.
Together Document Viewer
Parse/display containers files(Archive) with subpath of contents (zip and tgz), subpaths as "directory".
Picture "zoom" links in preview of pictures is added.
Ability to display content of the Excel files separately of indexed data if converted 'html' file exist on file system.
Reread configuration files every x minutes(according to this, new application parameter 'Snapper.ReReadConfigFilesEveryMinutes' (defined in application configuration file - 'web.xml') are introduced).
Table of Contents
EML Parser
Additive properties (Signed, Priority, Read Receipt Requested, Delivery Receipt Requested, Expires and Sensitivity) are introduced.
MSG Parser
Additive properties (Signed, Priority, Read Receipt Requested, Delivery Receipt Requested, Expires and Sensitivity) are introduced.
Note : Extraction of headers (Signed, Priority ...) from 'msg' file is supported only if they exist in coresponding file.
Additional information about files (creation time and last accessed time) are introduced.
Acording to this, new application parameters 'Snapper.IndexOSspecific' (defined in application configuration file - 'web.xml') are introduced.
Table of Contents
Additional information about files (creation time and last accessed time) are introduced.
Acording to this, new application parameters 'Snapper.IndexOSspecific' (defined in application configuration file - 'web.xml') are introduced.
Table of Contents
Excluded 'Enhydra Zeus' generated java source files, for generation and validation of 'xml' files 'Together Search Server' and 'Together Document Viewer' using 'XMLBeans' (Apache XML project).
Lucene
Version. 2.0.0 is introduced.
Excluded support for index data into 'data base'. Acording to this, application parameter 'Snapper.IndexType' (defined in application configuration file - 'web.xml') is expeled.
New functionality to index only 'MetaData' is introduced.
Acording to this, new application parameters 'Snapper.DocumentUpdate' and 'Snapper.DocumentUpdatePattern' (defined in application configuration file - 'web.xml') are introduced.
Acording to this, new file type ('NULL') and 'Document Group' ('Meta Data') is introduced.
DocBook stylesheets release: 1.70.1 is introduced.
Table of Contents
Enhydra Snapper without database dependency.
Snapper Admin
- New application parameter 'Snapper.DocumentGroupConfFile' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.SiteConfFile' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.StatisticActive' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.StatisticDirectory' (defined in application configuration file - 'web.xml') is introduced.
Snapper
- New application parameter 'Snapper.DocumentGroupConfFile' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.SiteConfFile' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.StatisticActive' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.StatisticDirectory' (defined in application configuration file - 'web.xml') is introduced.
Snapper Previewer
- New application parameter 'Snapper.DocumentGroupConfFile' (defined in application configuration file - 'web.xml') is introduced.
- New application parameter 'Snapper.SiteConfFile' (defined in application configuration file - 'web.xml') is introduced.
Table of Contents
Snapper Admin
New application functionality, Re - index mode (ability to continue with indexing).
New application parameter 'Snapper.Indexer.ReIndexMode' (defined in application configuration file - 'web.xml') is introduced.
Table of Contents
Table of Contents
Table of Contents
Snapper Logging
New class 'MonologLoggingManager' is introduced.
Snapper Admin
New application parameter 'Snapper.MaxPropertiesLength' (defined in application configuration file - 'web.xml') is introduced.
Table of Contents
PDF Parser
Removed extraction of title.
Snapper
New application parameter 'Snapper.ResultDatePattern' (defined in application configuration file - 'web.xml') is introduced.
URL parameter 'resultDatePattern' is introduced. If defined, it overrides application parameter 'Snapper.ResultDatePattern'.
URL parameter 'datePattern' is renamed to 'searchDatePattern'.
Table of Contents
Snapper
New application parameter 'Snapper.SearchDatePattern' (defined in application configuration file - 'web.xml') is introduced.
URL parameter 'datePattern' is introduced. If defined, it overrides application parameter 'Snapper.SearchDatePattern'.
Table of Contents
Table of Contents
Snapper Previewer
AbsoluteFilePath element of resulting XML has more structural approach.
Configuration parameters are overrided with URL parameter settings.
Table of Contents
Implemented possibility that indexer can add document metadata from metadata database in index content. New Snapper Admin parameter introduced:
Indexer.MountMetaDataInContent
Snapper supports google search. New parameters are introduced:
GoogleSearcherURL
GoogleSearcherKey
GoogleResultLimit
• New Snapper Previewer parameters are implemented:
ParserPageLimit
PageLimitForParser
ParserCharacterLimit
CharacterLimitForParser
for document parser limits.
Document group
New document types and document group.
Parsers
Detection of not parsed files.
Snapper Previewer
Introduction of translation parsed contet.
Introduction of Google search result in Snapper Searcher.
Table of Contents
Table of Contents
Improved log
New Snapper Admin parameters
Indexer.MountFilePathInContent
Setting it to true will put file path in content
Indexer.MountPropertiesInContent
Setting it to true will put properties in content
New Snapper search application parameter
Snapper.PreviewURL
URL used for creating document preview link - represents snapper previewer application URL.
Document group
Document types can be grouped for quick search
New indexing options
Directory indexing
Content indexing
Parsers
File type mapping
It is possible to map additional type to file parser, e.g. map .properties with text parser.
Word and Excel parser
Removed extraction of title.
New enhydra application - Snapper Previewer
XML based document preview, containing content and other relevant data of the document.
Preview for files from file system.
Document properties, title, filepath are removed from document content in a preview.
Table of Contents
Improved logging in snapperAdmin application (explanation why document was not parsed).
Excel Parser
- Resolved problem that some regular files that was not parsed.
New Application parameter 'Indexer.MountTitleInContent' (SnapperAdmin application).
Parameter SimpleSearch.Type in web.xml of Snapper application is not supported any more. Because of that search.xsl and searchResult.xsl files were changed, also http request parameter for Search.po object 'typeOfSearch' is removed(Simple and Advanced search options are merged).
Improved time of search.
Experimental indexing into Data Base.
New Application parameter 'IndexType' (SnapperAdmin and Snapper application).
Table of Contents
Fixed problem with indexing FTP files.
Application parameter RelativeIndexPath (SnapperAdmin and Snapper).
Application parameter FileSeparator (SnapperAdmin and Snapper).
Table of Contents
Configurable Simple search, search across title of the document and content of the document or only in content of the document. Defined in web.xml file of snapper searcher (parameter SimpleSearch.Type).
Changed search.xsl and searchResult.xsl files to support configurable Simple search.
Table of Contents
Snapper documentation is changed to DocBook format. It is shipped in HTML and PDF format.
Solved a ConnectionAllocation problem
Solved a possible problem between DODS, Enhydra and Snapper when indexing in threads
New authentication type: Tomcat authentication
Additional build targets in build.xml
Table of Contents
Solved a possible problem between DODS, Enhydra and Snapper when indexing in threads
Increased the speed of index by optimizing database calls
Table of Contents
Solved the memory leak problem
Repaired a small bug involving not indexed files presentation
Table of Contents
Changes on the SiteList page. Links instead of buttons of IndexAll
Small bugs by indexing include list removed
Table of Contents
Documented and tested Metadata feature of application
Smaller changes in the look of HTMl pages
Refactoring and solving smaller bugs
Table of Contents
Index All functionality: all sites at once, or one by one
Column name in the NotIndexed table changed from FILE to FILENAME due to MSQL constraints
Included scripts for all DODS supported database servers
Table of Contents
When filled out Index Dir attribute for a Site is the *exact* location of the index
Include/Exclude list as Site attributes
Application parameter Snapper.LogicalNameFromDatabase - 0/no, 1/yes: use metadata's DocumentLogicalName field
Application parameter Snapper.DocumentLogicalName - Metadata field to use as document title
Application parameter Snapper.DBFetchSize - maximal DB fetch size
Other extensions indexing ("other" checkbox/attribute for a Site)
New Menu look
Threaded indexing
Number of documents per Site are displayed on the indexing history page
Multiple real-time checks using XMLRequest when creating a Site
Search results XML includes 'configuration' section with all information about Sites
Search results XML includes all document paths (relative, absolute...)
Search results XML includes 'searched parameters' section containing all request parameters
Using Zeus created classes to manipulate results XML files
Snapper Search application is made up of one presentation object: Seach.po
Search.po object uses request parameters to form an XML file defined by a search.dtd
The po object uses an xsl file defined in xsl request parameter to transform the xml file and display the results
Three xsl files as examples: search.xsl, advancedsearch.xsl, searchresults.xsl
New version of PDFBox (0.7.1) included
Table of Contents
Application split into two applications (wars): snapperAdmin, snapper (search)
Sorting of search results (Sorting types : by relevance , by newest modified files and by oldest modified files)
DB Parameter "Search" as an attribute of Site, indicating a Site to be automatically included in search
History of indexed sites: start time, stop time, length, type (index)
History log of unindexed files per site with the possibility of file downloading
Searching for property values (key=value)
Search results page displays a "new search" dialog
DB Parameter "IndexDir" as an attribute of Site. If not enteres the default: "Snapper/IndexDir" in web.xml is used for indexing a selected site
DB Parameter "IndexDir" as an attribute of Site
Searching by document type
Sample filterDB for filtering documents not to be indexed during indexing.
Loging dialog for snapperAdmin. By default, username: admin password: snapper
Capability to update Site DB attributes - Update Site window
Search results page displays a Site as well to which the file belongs
Modular desing, separate modules for: API, Parsers, Indexer, Searcher, Kernel, Logging, Util
Table of Contents
Solved "OutOfMemory" exception while parsing Excel file(s)
Added support for PPT and PPS file types
Results page, bottom: current page link not displayed
Table of Contents
First release!
API (SnapperCore) released
Database released
Indexing Implementation. Full indexing of a site (delete if exists and recreate)
Site management Implementation: adding a new site, defining maximal size and age of files to be indexed as well as file types being indexed
Site-Path management Implementation. Adding paths to a Site and deleting paths
Three path types and therefore three indexing methods/protocols implemented: local FileSystem, UNC, FTP
Basic search Implementation. Complex searches "AND", "OR" "title:" etc. implemented
Advanced search Implementation. Searching for title, modification date, custom parameters (Ms Word files) implemented
Configurable XSL transformations for search results. Configuration of XSL transformation can be performed in runtime by changing the configuration file (see documentation)
Lucene indexing engine version 1.4.3 implementation
LuceneIndexer wrapper for Lucene's IndexWriter
LuceneSearcher wrapper for Lucene's IndexSearcher
LuceneReader wrapper for Lucene's IndexReader
Implemented parsers for following file types: Plain text (.txt, .java. ini), MS Outoolok Express (.eml), MS Word (.doc), MS Excel (.xls), Open Office Writer (.sxw), Open Office Calc (.sxc), HTML (.html, .htm), Adobe Portable Document Format (.pdf), Rich Text Format (.rtf).
Basic indexing statistics: number of documents a site contains, date and time of last indexing
Basic search statistics: number of hits (searches) for a site
Subfolder indexing for UNC and FileSystem
Mapping of site paths for FileSystem, UNC and FTP
Trigger event (logging a message) after the index size has crossed the given size (MaxIndexLength parameter in web.xml) during indexing
Enabling file download on the search results page trough a "Download" parameter in web.xml
Complex search explanations on both (basic, advanced) Search pages
Number of search results per page can be selected from a combo box on the search page (10, 20, 30...)
web.xml configurable parameters
Might get an error after path is deleted and new search over the site containing the path is performed
Parser constraints:
Some MS Word documents cannot be parsed (due to complex entries)
Some MS Excel documents cannot be parsed (due to complex entries)
RTF files created within MS Word cannot be parsed (not a clean RTF)
PPT, PPS parsers in beta testing, not released
Some Firefox browser issues