Integrate Document AI and Hugging Face to automate workflows with scalable backend

Connect Document AI and Hugging Face nodes to in your workflow. Integrate with any tool or database and ship powerful backend logic and APIs instantly - No code required!

getting started

How To Connect Document AI and Hugging Face

Supported Triggers & Actions

Document AI NODES

Document AI Form Parser

Form Parser allows you to automatically extract fields, values, and generic entities like names, addresses, and prices from standard forms, structuring data in tables. It's ready to use without the need for training or customization, suitable for various document types. **To use this node you must first enable the [Document AI API](https://console.cloud.google.com/apis/library/documentai.googleapis.com?project=_)**

Document AI Invoice Parser

Extract text and values from invoices such as invoice number, supplier name, invoice amount, tax amount, invoice date, due date. **To use this node you must first enable the [Document AI API](https://console.cloud.google.com/apis/library/documentai.googleapis.com?project=_)**

Document AI OCR

Identify and extract text in different types of documents. This processor allows you to identify and extract text, including handwritten text, from documents in over 200 language. **To use this node you must first enable the [Document AI API](https://console.cloud.google.com/apis/library/documentai.googleapis.com?project=_)**

Document AI US License Parser

Extract fields from US Driver License, including names, dates, etc. **To use this node you must first enable the [Document AI API](https://console.cloud.google.com/apis/library/documentai.googleapis.com?project=_)**

Hugging Face NODES

Caption Image

Generate caption for the image using Hugging Face's [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) model for image captioning pretrained on COCO dataset - base architecture (with ViT large backbone).

script

Image Classification

Get classification labels for your image using Hugging Face's [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) model which is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.

script

Text Summarization

Summarize long text using Hugging Face's [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) model which is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

script

Text-To-Image

Generate image from text, using Hugging Face's [openskyml/dalle-3-xl](https://huggingface.co/openskyml/dalle-3-xl) test model very similar to Dall•E 3.

script

Text-To-Music

Generate music from text using Hugging Face's [facebook/musicgen-small](https://huggingface.co/facebook/musicgen-small) model capable of generating high-quality music samples conditioned on text descriptions or audio prompts.

script

blog posts & tutorials

Recommended Reads

Below are recommneded blogs that will help in your journey