Home

OCR, NLP, Computer Vision & generative AI

The unparalleled performance of GPT, for all your documents.

Document capture, recognition, data extraction, verification, and fraud detection have been radically transformed by the latest Artificial Intelligence models.

To shed light on the reality of existing solutions, we ran a test protocol with real data on the latest solutions from Kofax, Abbyy, Nanonets, and others.
The combination of the latest OCR models, Transformers, and generative AI demonstrates its unmatched performance.

OCR benchmark study and paper tests over Kofax Nanonets Abbyy provence

Read the performance benchmark

Observed performance in a production environment:

98.9%

Automatic document splitting

99.7%

Automatic document identification

99.8%

Compliance point checking accuracy

98.4%

Extraction of typewritten data

94.2%

Extraction of handwritten data

0.09

Processing seconds per page

GDPR and Data Security

The confidentiality of your sensitive data

GDPR, HIPAA, cybersecurity... you can't always trust a cloud solution to guarantee the security of your confidential documents, personal data, or health data.
‍
Our on-premise solution uses a local architecture that ensures the protection of your most sensitive data. All processing operations are performed on your own infrastructure, thus guaranteeing the confidentiality and security of your documents.

Download our solution (Docker container)
‍
Our solution is freely downloadable, and all its functionalities can be used in your environment. There are no limits in training capabilities, but restrictions apply to volumes beyond an inference of 5,000 documents.

Simplified Integration

Once your AI model is trained and deployed, you can simply call your API from your applications to perform segmentation, verification, classification, or extraction to customize your business workflows.

1
2
3
4
5
6
7
8
9
10
11
12
13

from provenceAI import Model
‍
my_model = Model(
‍api_key="my_key",
model="my_model")

results = my_model.process(
‍"my_document.pdf")

print(results)

{
"pages": [
   {
   "page": 0,
   "class": {
        "value": "Contract",
        "confidence": 0.99 },
    "extractions": [
        "date": {
           "value": date,
           "confidence": 0.90 },
        "signed: {
           "value": true,
           "confidence": 0.95 }]
   }
]
}

Whether it's automating tedious manual tasks or providing actionable insights, our platform is designed to meet the diverse needs of organizations in various sectors.

Public Services

Automate the processing of documents containing personal and health data to free up staff, better serve citizens and patients.

Financials

Automate the processing of confidential financial documents (consumer, corporate) and conduct anti-fraud checks.

Insurance

Speed up the processing of claims, enrollments and underwriting to streamline customer relations and reduce the burden on underwriters or claims handlers.

Businesses

Simplify the automated processing of HR, accounting, and other massive document flows.

· CAPABILITIES

Our platform is used by large organizations that need to process massive volumes of documents for identification, classification, complex data extraction, or fraud prevention.

Features in free version Frioul v.1.2

Creation of classification, extraction, or combined IDP models.
Unlimited annotations (document and extraction labels).
Model training and inference (unlimited trainings, inference limited to 5,000 pages).
Image correction (orientation, skew, noise...).
scoring : readability scores for images, confidence scores for classification and data extraction.
User arbitration interface: you can correct the output according to a confidence limit.
Performance analysis interface by labels.
Retrieval of extracted data (.csv).
splitter : automatic splitting of pieces on trained documents.
splitter^AI : agnostic zero-shot piece splitting (alpha).
quickLearn^AI : similarity classification engine.
fewShot^AI : few annotations, thanks to generative AI applied to annotated documents, to multiply performance.
quality : module to validate the quality of extracted data, according to criteria for each data type.
API access.

Download

Additional features in custom version Massilia v.1.1

rights: advanced user role and rights management module for the platform.
reTrain: intelligent model retraining planning.
autoImprove^AI: the model automatically self-feeds to improve performance.
arbitrage^AI: the arbitration interface includes advanced features such as model retro-improvement, the possibility to arbitrate files (session of several documents), workflow replay, suggestion of new labels on unrecognized documents, correction on historical documents.
flows: you build in-app advanced data processing workflows.
compute: you build calculation rules for data validation.
lookup: universal connector for querying third-party services (API, SQL).
truthOrDare^AI: document authenticity validation module (image forensics).
fraud^AI: fraud detection models (documentary inconsistency, intra-file inconsistencies, inter-file suspicions) based on business rules.

Automate your documents with the power of AI, confidentially.