1

I have developed a NER model to detect all address and property price independently in a pdf document which have property address and its prices in natural language. There are lots of variations in how property address and prices are mentioned. It could be in described sentencse or sometimes like and many more

One possibility

address 1
details about address 1
details about address 1
price 1

address 2
details about address 2
details about address 2
price 2

So the model in a document would predict say 5 different address and 5 different property prices.

enter image description here

enter image description here

Questions

  1. Now how to build model to assign the price to the correct address?
  2. How to encode this link in the training data and learn that?
GeorgeOfTheRF
  • 2,018
  • 5
  • 17
  • 20
  • Maybe I don't understand something, but if the price always follows the description of the property then wouldn't it be possible to simply assign the next price in the text to the property? Also you might want to use formatting information: if the address and price always belong to the same paragraph, it solves the issue. – Erwan Jul 08 '22 at 15:37
  • the example paragraph I shared above is one variation/example. There are many other varioations/formats possible depending on the entity that creates this document. Is there a way to detect a pair of entities using a ml/statistical model? I am using CRF to detect each of them independtly. – GeorgeOfTheRF Jul 11 '22 at 07:59
  • The question is what kind of indication would be used in the features: is it textual like "the price of this property is X"? Is it formatting, like the price appearing in the same paragraph? Is it just the closest price found near the property description? In other words, how does a human understand the link between property and price? I think answering these questions would give a clue about how to design the model. – Erwan Jul 11 '22 at 08:49

0 Answers0