Highest Voted 'document-understanding' Questions

3

votes

0 answers

Multi-page document image classification

Sorry for the long post but I needed it to be able to capture all the details and questions. I am working on multi-page document image classification problem and am kind of confused on what approach or model architecture to follow. Here's is the…

asked Jun 15 '23 at 00:26

asanoop24

141
1

2

votes

0 answers

What is the meaning of, or explanation for, having multiple tags in a Doc2Vec model's TaggedDocuments?

I've tried reading the other answers on this topic but I'm unsure if I understand completely. For my dataset, I have a series of tagged documents, "good" or "bad." Each document belongs to an entity, and each entity has a different number of…

python nlp word2vec doc2vec document-understanding

asked Mar 08 '21 at 16:06

Jayke

21
1

2

votes

2 answers

"Object" Detection in Textual Data

I have a task where the input is a parsed document (i.e., full text in 1 string or tokens) and I need to classify parts of the text into say 5 classes (i.e., 5 tokens from the entire text are labeled into 5 different classes). Example: Document #1:…

nlp object-detection document-understanding

asked Jan 11 '21 at 09:32

leed

145
4

1

vote

1 answer

Document clustering to merge common labels

I am building a recommendation system and I have to clean up some of the labels that I have. For example of the data df['resolution_modified'].value_counts() Gives 105829 It is recommended to replace scanner …

machine-learning clustering data-cleaning unsupervised-learning document-understanding

asked Dec 11 '20 at 17:11

Wolfy

237
2
9

1

vote

0 answers

Highlight specific paragraphs from documents

I have a bunch of documents in which I want to highlight certain paragraphs/keyphrases. I have a list of the most frequently appearing sentences and I want to search for these paragraphs/keyphrases in the document and if they appear, highlight them.…

python python-3.x opencv ocr document-understanding

asked Jun 27 '23 at 06:40

spectre

1,831
1
9
29

1

vote

1 answer

Data Analytics Documantaions

I am working as a data analyst in a company. Me and my colleagues use different tools and software to analyze the data and make the reports (e.g., Excel, Python, R, Alteryx, SQL, Tableau). Each one use his/her favourite tools to do the tasks.…

machine-learning data-mining predictive-modeling data-analysis document-understanding

asked Dec 09 '22 at 06:46

N.IT

1,975
4
17
35

1

vote

0 answers

Entity Linking for Receipts

I am building a model for reading receipts from their mobile snapshots. After the receipt is OCR'd, I plan to use a variation on LayoutLM for entity extraction. Entities are: "quantity", "price-per-unit", "product-name", "items-price", etc. What is…

nlp ocr document-understanding spatial-transformer

asked Sep 21 '21 at 19:10

fierval

11
1

0

votes

1 answer

Identify Resume Structure

I am trying to build a resume parser (from PDF to JSON). After extracting text from a pdf as one long string, how would you split the string into different sections like the red lines show. Resumes have different formats and people use different…

machine-learning document-understanding

asked Dec 02 '20 at 19:01

E.K.

405
4
6

0

votes

1 answer

How to extract handwritten phone numbers from a huge set of documents?

Say you have a lot of PDF documents, say K documents. Each document Di is Ni pages long. In one of the Ni pages (don't know which, say Pi), there is the information you need to extract. I am thinking about building a three step pipeline for…

ocr document-understanding

asked Aug 25 '23 at 16:48

Anmol Deep

101
3

0

votes

0 answers

What is the most efficient way of image document classification?

So, I am working on a project where I have to extract sales tax invoice from the pdf document which contains other files along with the invoice. I researched on the topic, and am considering two solutions. Converting pdf to images and then…

image-classification transformer ocr document-understanding

asked Mar 06 '23 at 13:58

Sardar Arslan

1
1

Questions tagged [document-understanding]