Table of Contents
Abstract
URL Canonicalization, Proxy support (experimental).
First release candidate for the 0.4.2 version.
Using Wera in combination with a Proxy server will take care of some of the redirect issues. Work still remains for determining which issues are handled and which are not (for the 0.4.2 release)
For further details see 0.4.1 release notes
Canonization of URLs is now added. If a link points to an URL that is indexed with a different form (e.g http://www.nb.no instead of http://nb.no), WERA will now find this in the index. Canonicalization is configurable. Induvidual rules may be disabled, an the order in which the rules are applied may be changed.
The Javascript inserted by WERA before the html page is delivered to the users browser does not catch all links. To prevent this undesired behaviour the web server hosting WERA can be set up as Proxy server so that all requests for other hosts than the WERA host can be redirected back to WERA. Of course, the user will have to change the browsers proxy setting so that all requests goes to the WERA host.
Abstract
Improved url encoding, added metadata view in Timeline, and Google-like result presentation.
Currently, no canonization of URLs is done in WERA. If a link points to an URL that is indexed with a different form (e.g http://www.nb.no instead of http://nb.no), WERA will not find this in the index and therefore will report: Sorry, no documents with the given uri were found. See [1312202] exacturl needs to canonicalize.
WERA does nothing to handle redirects. The result, depending on the nature of the redirect, will be either that the actual resource is not displayed at all in WERA or that a redirect to live web is executed within the WERA view without any information to the user See bugtracker issues [1312200] Pages at end of redirects not found and [1312214] More redirects to live web.
The old index_encode algorithm used for encoding urls in Wera has been replaced by PHP's native urlencode function. See [1354276] Still have URL encoding issues and [1247134] Beautify the wera URL.
Possibility to display metadata from the Retriever's getmeta request. A metadata check box has been added to the time line view. When checked, the metadata shows up below the timeline (instead of the archived web page).
The presentation of results has been changed so that the default view is one hit per site. Each hit has a link to 'more from this site' which presents all hits within that site in the same was as the old wera result list.
Table 2. Changes
| ID | Type | Summary | Open Date | By | Filer |
|---|---|---|---|---|---|
| 1354276 | Fix | Still have URL encoding issues | 2005-11-11 19:27 | sverreb | stack-sf |
| 1401204 | Add | Possibility to display metadata | 2006-01-10 08:42 | sverreb | sverreb |
| 1346889 | Add | Google-like result presentation | 2005-11-03 13:34 | sverreb | sverreb |
| 1247134 | Fix | Beautify the wera URL | 2005-07-28 23:47 | sverreb | stack-sf |
| 1403277 | Add | Query term from search ui to timeline | 2006-01-11 21:33 | sverreb | sverreb |
| 1403742 | Fix | Non-localized string in code | 2006-01-12 10:56 | sverreb | sverreb |
Abstract
Improved exacturl handling, error handling and encoding issues. Bug fixes and documentation.
Currently, no canonization of URLs is done in WERA. If a link points to an URL that is indexed with a different form (e.g http://www.nb.no instead of http://nb.no), WERA will not find this in the index and therefore will report: Sorry, no documents with the given uri were found. See [1312202] exacturl needs to canonicalize.
WERA does nothing to handle redirects. The result, depending on the nature of the redirect, will be either that the actual resource is not displayed at all in WERA or that a redirect to live web is executed within the WERA view without any information to the user See bugtracker issues [1312200] Pages at end of redirects not found and [1312214] More redirects to live web.
The handling of exacturl searches has been improved considerably on both WERA and NutchWax side. WERA uses the exacturl search functionality extensively both for counting versions of a given URL and to determine the mapping between a given URL/timestamp and its Arc name and offset.
WERA's error messages has been improved. Instead of printing cryptical PHP warnings and errors it prints more meaningful error messages enabling to user to understand what is wrong.
There were major problems with querying with non-ISO8859 characters. To solve this issue changes were made to both WERA and NutchWax.
WERA now sets the encoding in the header of a given web page prior to sending the page to the users browser. The encoding sent is the encoding detected by NutchWax at index time.
Table 3. Changes
| ID | Type | Summary | Open Date | By | Filer |
|---|---|---|---|---|---|
| 1312159 | Add | wera overview doc based on dokuwiki text | 2005-10-03 11:29 | sverreb | stack-sf |
| 1246834 | Add | Move arc path to retreiver (WAS Path...lib/seal/nutch.inc) | 2005-07-28 08:06 | sverreb | stack-sf |
| 1244879 | Add | Add display of text snippets to wera search results page | 2005-07-25 17:32 | sverreb | stack-sf |
| 1333042 | Fix | Search result list - Bad handling of dedup result list | 2005-10-20 03:08 | sverreb | stack-sf |
| 1322601 | Fix | search ui - time param not set | 2005-10-10 05:32 | sverreb | sverreb |
| 1324757 | Fix | debug on messes up displayed web page | 2005-10-12 04:37 | sverreb | sverreb |
| 1249970 | Fix | Installer requires X though claimed not needing it | 2005-08-01 21:23 | stack-sf | stack-sf |
| 1324161 | Fix | euc-jp page not displayed properly in wera | 2005-10-11 12:34 | sverreb | stack-sf |
| 1322668 | Fix | wera help need update | 2005-10-10 05:59 | sverreb | sverreb |
| 1324755 | Fix | Header sent from wera documentdispatcher of wrong format | 2005-10-12 04:32 | sverreb | sverreb |
| 1322554 | Fix | exacturl query returnns 0 of X versions in result list | 2005-10-10 05:13 | sverreb | sverreb |
| 1322594 | Fix | When time param not set url is not found | 2005-10-10 05:29 | sverreb | sverreb |
| 1312442 | Fix | Date range missing in querystring | 2005-10-03 17:26 | sverreb | sverreb |
| 1314403 | Fix | Use newly added 'encoding' in search results | 2005-10-05 19:34 | sverreb | stack-sf |
| 1314098 | Fix | Encoding issue, wera displaying archived web page | 2005-10-05 10:53 | stack-sf | sverreb |
| 1244894 | Fix | Cannot query for non-ISO8859 characters | 2005-07-25 18:38 | stack-sf | stack-sf |
| 1312208 | Fix | Query time encoding issues | 2005-10-03 12:11 | stack-sf | stack-sf |
| 1314360 | Fix | Remove all, any or phrase selection in search ui | 2005-10-05 18:14 | sverreb | sverreb |
| 1312479 | Fix | indexSearch.inc need cleanup | 2005-10-03 18:40 | sverreb | sverreb |
| 1313251 | Fix | Wera search, ugly and/or not useful error messages | 2005-10-04 13:32 | sverreb | sverreb |
| 1282042 | Fix | WERA - Timeline - Warning when URL not found | 2005-09-05 03:28 | sverreb | stack-sf |
| 1312484 | Fix | [wera] Ugly complaint about invalid argument | 2005-10-03 18:47 | sverreb | stack-sf |
| 1312299 | Fix | WERA - Exacturl search not always working | 2005-10-03 13:51 | sverreb | sverreb |
| 1281697 | Fix | searching czech words not working | 2005-09-04 10:36 | stack-sf | kranach |
| 1277376 | Fix | WERA - Duplicate hits in result list | 2005-08-31 05:45 | sverreb | sverreb |
Abstract
Bug fixes
Fixed 1277376 duplicate hits in result list. WERA now uses NutchWAX's dedup functionality to supress duplicate hits in result list. Gives improved performance.
Abstract
First release of WERA
When no X installed the Java based installer should fall back to console mode. Some reports of problems with this. If so, install WERA manually. See manual.
WERA does not work properly with PHP5. Has to do with PHP5's new Object Model. When using the 'NEAR' mode of the documentLocator it will return a resultset concatenated by the resultsets for 'BEFORE' and 'AFTER' instead of returning the one closest in time. Results in wrong aid to the documentRetriever when presenting inline objects.
Support for nutchwax search engine added
Support for nwalucene search removed (replaced by the above).
Support for Fast Search Engine currently not working (will be added in later version).
Advanced search removed (may be added in later version).
Server side link rewriting replaced by javascript client side link rewriting.