Textricator icon
Textricator icon

Textricator

Textricator is a tool for extracting text from computer-generated PDFs and generating structured data . If you have a bunch of PDFs with the same format (or one big, consistently formatted PDF) and you want to extract the data to CSV or JSON,.

Textricator screenshot 1

Cost / License

  • Free
  • Open Source

Application type

Platforms

  • Mac  Install Java
  • Windows  Install Java
  • Linux  Install Java
-
No reviews
2likes
1comment
0news articles

Features

Suggest and vote on features
No features, maybe you want to suggest one?

 Tags

  • harvesting
  • clean-data
  • extractor
  • pdf-web-harvesting
  • web-harvesting
  • Scraping
  • extractor-pdf
  • extractor-text
  • text-harvesting
  • Data Scraping
  • extract-words-from-text

Textricator News & Activities

Highlights All activities

Recent activities

Show all activities

Textricator information

  • Developed by

    Measures for Justice (MFJ)
  • Licensing

    Open Source (AGPL-3.0) and Free product.
  • Alternatives

    12 alternatives listed
  • Supported Languages

    • English

GitHub repository

  •  350 Stars
  •  38 Forks
  •  12 Open Issues
  •   Updated  
View on GitHub

Popular alternatives

View all

Our users have written 1 comments and reviews about Textricator, and it has gotten 2 likes

Textricator was added to AlternativeTo by Hugo Albarracin on and this page was last updated .

Comments and Reviews

   
 Post comment/review
Top Positive Comment
Hugo Albarracin
0

Textricator is a tool to extract text from documents and generate structured data. https://textricator.mfj.io

Featured in Lists

A list with 34 apps by 6feriolimarco without a description.

List by Francisco Ferioli Marco with 34 apps, updated

What is Textricator?

Textricator is a tool to extract text from documents and generate structured data.

If you have a bunch of PDFs with the same format (or one big, consistently formatted PDF) and you want to extract the data to CSV or JSON, Textricator can help! It can even work on OCR'ed documents!

Textricator is released under the GNU Affero General Public License Version 3.

Textricator is deployed to Maven Central with GAV io.mfj:textricator.

This application is actively used and developed by Measures for Justice. We welcome feedback, bug reports, and contributions. Create an issue, send a pull request, or email us at textricator@mfj.io. If you use Textricator, please let us know. Send us your mailing address and we will mail you a sticker.

io.mfj.textricator.Textricator is the main entry point for library usage.

io.mfj.textricator.cli.TextricatorCli is the command-line interface.

The CLI has three subcommands, to use the three main features of Textricator:

text - Extract text from the PDF and generate JSON. table - Parse the text that is in columns and rows. See Table section. form - Parse the text with a configured finite state machine. See Form section.

Official Links