Version20.14.0
Revision252
Size376.3 MB
LicenseAGPL-1.0-or-later
Confinementstrict
Basecore20

Datashare is a self-hosted search engine for documents.

ScreenshotScreenshotScreenshot

Datashare is a self-hosted search engine for documents, using Apache Tika and Apache Tesseract to read thousands of file formats. This tool is developed by the International Consortium of Investigative Journalists (ICIJ), famously known for its groundbreaking investigations into the offshore world (Pandora Papers, Panama Papers, etc).

It also provides:

  • Many search filters (file types, creation date, languages, tags, etc)
  • Search in batch (with a CSV)
  • Search results download
  • Tagging and recommendation
  • Named Entities recognition with CoreNLP
  • Optical characters recognition with Apache Tesseract

After the installation, open a terminal and use the following command to start Datashare:

datashare

Datashare should now be available on http://localhost:8080 🚀

Update History

20.5.2 (240)20.14.0 (252)
26 Mar 2026, 17:53 UTC
20.1.4 (232)20.5.2 (240)
18 Dec 2025, 10:21 UTC

Published2 Jan 2023, 12:28 UTC

Last updated26 Feb 2026, 06:48 UTC

First seen13 Dec 2025, 09:47 UTC