Category Archives: Uncategorized

Freakingly Fast Online OCR with OcrGeek.com

Sitting on a ton of scanned documents which you want to digitalize but do not know how to do so? Sure, there are many free Online OCR Services around but most of them are either rather slow or have upload limitations which might be annoying. Recently three started a service which might may a difference since it does not bother about such obstacles.

At  OCRGeek.com you can submit as may pages as you want. The result will be available fast and the usage is straight-forward. As output you can choose both plain text and PDF and also the handsome DjVu Format. This is pretty convenient. In terms of speed the Online OCR offered by this site also is more than compatible. You might be surprised how fast all this goes.
As output formats you can choose both plain text and PDF and also the handsome DjVu Format is supported. The DjVu format has the advantage that is way better compressed than usual PDFs and therefore can save you some disk space. The PDF output has the advantage that it maintains the original structure of the document and yet remains searchable. This especially is useful for documents which contain a lot of graphics. The document looks like it should but is searchable. Of course there is also a lot of expensive tools for optical character recognition but the question is if it is really worth spending a lot of money on such software in our times. The trend clearly goes towards free online services and tools. I would not wonder if in future expensive software will disappear from the market.

If you know any other services fill free to drop a comment here. Of those which have been tested here OcrGeek.com won the competition.

You can check it out here: OcrGeek – Free Online OCR

Apache: Realative Link Problem using Rewrite Engine (mod_rewrite)

Using the Apache Rewrite Engine there might occour a problem with relative links if you are e.g. are jumping between directories or if you are using the the Rewrite Engine for an internal proxy. In the follwing example realtive links in the page will not work properly.

[sourcecode language=”text”]
<Directory "/documentroot/de">
RewriteEngine on
RewriteBase /de/
RewriteRule ^/? http://localhost/?language=de [P]
[/sourcecode]

There is a very easy solution to avoid this problem. You can define a base url tag in the header section of your HTML pages.

In the example above you have to include something like follows to the header section of all pages that do not work properly:

[sourcecode language=”html”]

<head>
<base href="http://localhost/">
</head>

[/sourcecode]

That’s ist!

Apache: Convert URL parameter strings to directories for search engine friendly URLS

Imagine you have a URL like http://www.whatever.com/?language=en which looks pretty ugly. For Search Engine Optimization it would be nice to have a more search engine friendly URL like http://www.whatever.com/en.

Apache offers an easy way to hide the parameter string by the second URL format. However, internally the page still is treated like the first version of the URL but users and search engines will not notice the old URL format.

To implement the example from above you simply have to add the following lines to your httpd.conf .

In this example the directory en is forwarded internally to page.php?language=en. If the user enters http://www.whatever.com/en he will see the content of http://www.whatever.com/page.php?language=en without noticing the parameter string.
[sourcecode language=”text”]
<Directory "/documentroot/en">

Options Indexes FollowSymLinks

AllowOverride All

Order allow,deny
Allow from all

Options +FollowSymlinks
RewriteEngine on

RewriteCond %{REQUEST_URI} =/en/

RewriteRule ^/? page.php\?language\=en [PT]

</Directory>

[/sourcecode]

In the line /documentroot/en you define the directory for which you want to apply the hack. It has to be the internal location at your server.

The line “RewriteCond %{REQUEST_URI} =/en/” tells Apache to apply the hack only if http://www.whatever.com/en is requested. Other URLs in the directory like http://www.whatever.com/en/anypage.html will not be touched.

How to use Apache as a proxy server

The following configuration allows you to make Apache forward the request for your domain to a second domain at port 8081. If the user enters “www.domain.com” in his browser he well see the content provided by the server listening at http://www.otherdomain.com:8081/. 

[sourcecode language=”text”]

<VirtualHost *:80>
ServerName www.testdomain.com
ServerAlias testdomain.com *.testdomain.com
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>

ProxyPass / http://www.testdomain.com:8081/
ProxyPassReverse / http://www.testdomain.com:8081/
<Location />
Order allow,deny
Allow from all
</Location>
</VirtualHost>

[/sourcecode]