The documents contained in the online demo dataset were crawled from publicly available sources. The documents are mixed in content, so there should be documents on every topic. Critical content was filtered in advance as best as possible.
Since PDF documents are the easiest to access, they make up the largest part of the dataset. However, the filter functions can also be used to reduce the data set to certain file types.
To get a feeling for how many file types are in the demo, here is a small list:
|Scanned PDF's||~100 files|
|3D Models||>30.000 files|
In the demo we have limited ourselves to a selection of our connectors. These include:
- Network drives
In real life, however, we can support many more systems.