The documents contained in the online demo dataset were crawled from publicly available sources. The documents are mixed in content, so there should be documents on every topic. Critical content was filtered in advance as best as possible.
File types
Since PDF documents are the easiest to access, they make up the largest part of the dataset. However, the filter functions can also be used to reduce the data set to certain file types.
To get a feeling for how many file types are in the demo, here is a small list:
File type | Quantity |
>40.000 files | |
Scanned PDF's | ~100 files |
PowerPoint | ~150 files |
Word | >11.500 files |
Excel | >1.000 files |
>2.000 files | |
Images | >100.000 files |
3D Models | >30.000 files |
Tickets | >6.000 files |
Data sources
In the demo we have limited ourselves to a selection of our connectors. These include:
- Network drives
- SharePoint
- OneDrive
- Teams
- Outlook
- OneNote
- Jira
- Confluence
- D.velop
- Gitlab
In real life, however, we can support many more systems.
Comments
0 comments
Please sign in to leave a comment.