working mechanism

#11

by BoccheseGiacomo - opened Apr 4, 2023

Apr 4, 2023

I have a question: do this only works with text documents or also images? if i have a pdf formatted as image, do this work? and if i have a pdf with tables, do it convert all to raw text utf-8 or is able to process structures (images,tables,html text) as they are?

Thanks

billa1972

May 14, 2023

As far as I can tell, it's just text from the images. and needs to be in a "segmentId" format.

However, check katanami here and also git https://github.com/katanaml/sparrow

BoccheseGiacomo

May 15, 2023

thanks for the github repo, that's really cool

BoccheseGiacomo changed discussion status to closed May 15, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment