Smart Document Archive
- Freemium • Open Source
What is Ambar?
Ambar is a smart documents archive with automated crawling, OCR, deduplication and ultra-fast full-text search. Imagine having billion of files in different formats like xls, doc, txt, pdf, ppt, etc..., in any encoding. Ambar securely stores them and gives you an ability to search through their content and metadata in milliseconds. It is very lightweight, simple and intuitive, but yet very fast and powerful in terms of data amount and scaling. All the rocket-science is hidden behind the simple UI.
The last release was in 2018, and the GitHub repository of the project is archived, so no further development will be done.
- 1,920 Stars
- 369 Forks
- 2 Open Issues
Comments and Reviews
- Search Engine
CategoriesOffice & Productivity • Online Services
Recent user activities on Ambar
- xelnaga added Ambar as alternative(s) to Search Text In Files
- soulflyman edited Ambarso
- NosaLee added Ambar as alternative(s) to PDF to Text
Just wanted to post a quick review in case anyone was interested in trying out the self-hosted option. I got the new OCR-lite version downloaded this morning and it works pretty well. The directions in the FAQ and Getting Started Guide are well written and easy to follow. Literally just download the VM and go. So far it has done a good job scraping a couple gigs worth of documents (txt, pdf, xls, doc and the like). I have also dumped about 5GB of meme images for it to scrape. So far results have been mixed. The documents worked flawlessly and crazy fast. The OCR is really limited to strict typed text and appears a very small font set. Even then it often misses easy to read text. MEME text is what I would consider a little extreme on the OCR testing, but some of it actually works. Overall, the performance is stellar. It almost feels magical. Search is instant and support tons of refinement options including wildcards, phrase search, fuzzy logic, filter distance, filename, file type, and source search.
Pros: Fast, easy to setup (even with the JSON editing for shares), and comes with basic OCR. Cons: OCR is a little too basic, no VM customization (I want to give it more resources and it's own IP address)
It feels, at least to me, there is a lot behind the curtain. The VM is nice and easy to setup, but I don't know whats actually going on in there. I wish there was more overt, plain English, "you host it yourself and all the privacy is yours, we don't send info back home" language on the Ambar page. Other than that a feature request would be the possibility of adding tags to documents. A poorman's way of "improving" search.
[Edited by rd17ambar, March 21]
Originally posted on Reddit
[Edited by rd17ambar, March 21]
Hi, Great service, it enables me to search quickly through all my text notes on dropbox. A possible enhancement would be if I could forward mail to ambar to save and index it.
Regards and keep up the good work,
Originally submitted via email
[Edited by rd17ambar, April 03]
Ambar scans folders for documents, indexes them, runs OCR across, and lets you find documents via keyword search and gives you the direct links and displays some content even directly as a preview in the search results.
While some OCR results are not exactly pleasing to the eye, especially formatted data such as tables, it really does a good job at finding everything quickly. What I am missing is some control over resource use of the provided VM, as well as some more customization of the interface. Though I assume that that is currently reserved for Enterprise customers or in the making.
Great job, best OCR search package I've seen!
The best I've found, and still being developed meaning that the opportunities for improvement still left might just be fulfilled!