Introduction
When you upload content to your platform, the system analyzes the file to extract textual information. Depending on the format, the platform retrieves this information through text extraction (for documents, images, web files, and similar formats) or transcript generation (for audio, video, and supported learning packages).
Only content from which the system can successfully extract or generate text can be used by platform features that rely on textual analysis:
This article outlines all supported content types and the conditions required for successful text extraction and transcript generation.
Supported content types for content analysis
The following table lists all file types that can be analyzed by the platform.
| Category | Types | Extracted content | Training materials / Assets |
| Text files | .txt, .csv | Text | Training materials and assets |
| Document files | .doc, .docx, .odt, .ppt, .pptx, .pdf, .xls, .xlsx | Text | Training materials and assets |
| Image files | .bmp, .jpeg, .png, .tiff | Text in the image | Training materials and assets |
| Web files |
.html, .htm Note: When a web page URL is provided, the transcript is generated only for that specific page. Content from links embedded within the page is not extracted. |
Text | Training materials and assets |
| Audio files | .acc, .mpeg, .wav | Audio transcription | Training materials and assets |
| Video files | .mp4, .mov | Audio transcription | Training materials and assets |
| Google workspace files✴ | Docs, Sheets, Slides | Text | Training materials and assets |
| Linked online videos✴ | YouTube, Vimeo, Wistia | Subtitles | Training materials and assets |
| E-learning packages✴ | SCORM and xAPI/TinCan (Articulate Rise and Articulate Storyline) | Text and audio transcription | Training materials |
| Docebo files | Creator lessons | Text and audio transcription | Training materials |
✴Private content (content requiring authentication to be accessed) is not supported
Unsupported content and extraction limitations
Content types not listed in the table above are not supported for text extraction or transcript generation. These include assignment, Docebo Learning Impact (DLI), LTI, observation checklist, survey, test, Elucidat, archive, playlist, Shape, and AICC.
In addition to being a supported file type, the system must also be able to extract text or generate a transcript from the content. If text extraction fails, the content cannot be used by features that rely on textual analysis.
Text extraction or transcript generation may fail in the following cases:
- Audio or video files that contain no speech (for example, background music only)
- Transcripts shorter than 30 words, which are discarded
- Private content that requires authentication to be accessed
- Images compressed to a degree that prevents accurate Optical Character Recognition (OCR).
Only content from which the platform can successfully extract text or generate a transcript can be used by features such as global search, Harmony, and auto tagging.