Why I Like Reducto's AI PDF Extraction
A few days ago, a friend told me about Reducto. The company just raised over $24 million from venture capital firms to solve the problem of unstructured data in PDFs by making it usable. It structures the text in PDF documents so you can feed it into other systems, such as LLMs or your own applications.
Some companies are building solutions that require information from PDFs to be fed into their software to achieve a desired output. They spend a significant amount of time figuring out how to extract unstructured information from PDFs, which isn’t as easy as it sounds. The issue is that this isn’t what these companies specialize in, so it takes them a long time to figure out. In many cases, they build something that’s decent but not great. Reducto’s PDF-processing service lets them skip mastering PDF extraction. They use the output from Reducto and focus on the more important aspects of the solution they’re building.
I spent some time playing with the tool’s free version. A few notes:
- Reducto can scan any document and turn the entire document into structured JSON. The free version has a 30-page document limit.
- The tool is incredibly good at extracting information from charts, which isn’t easy.
- There are many useful settings, including extraction method, chunking method, and more.
- The tool has an API, which is great for automation.
- If you need the extracted data formatted in a particular way, you can create a schema in Reducto, and the output will be formatted using your schema. I love this feature.
This is an impressive tool that solves a boring but big problem for companies. I can see companies quickly agreeing to pay for this service, which likely led to rapid growth in revenue and caught the attention of venture capital investors.
I get the impression this service is built for enterprise clients. I wonder if Reducto plans to offer something that’s more suited to small or midsize companies.