Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
Response
Successfully retrieved document
Document object
Optional ID of connection the document was created from. This is useful for identifying the source of the document.
22
"conn_123"
The content to extract and process into a document. This can be a URL to a website, a PDF, an image, or a video.
Plaintext: Any plaintext format
URL: A URL to a website, PDF, image, or video
We automatically detect the content type from the url's response format.
"This is a detailed article about machine learning concepts..."
"https://example.com/article"
"https://youtube.com/watch?v=abc123"
"https://example.com/audio.mp3"
"https://aws-s3.com/bucket/file.pdf"
"https://example.com/image.jpg"
Creation timestamp
"1970-01-01T00:00:00.000Z"
Optional custom ID of the document. This could be an ID from your database that will uniquely identify this document.
255
"mem_abc123"
Unique identifier of the document.
22
"acxV5LHMEsG2hMSNb4umbn"
Optional metadata for the document. This is used to store additional information about the document. You can use this to store any additional information you need about the document. Metadata can be filtered through. Keys must be strings and are case sensitive. Values can be strings, numbers, or booleans. You cannot nest objects.
{
"category": "technology",
"isPublic": true,
"readingTime": 5,
"source": "web",
"tag_1": "ai",
"tag_2": "machine-learning"
}
Source of the document
255
"web"
Status of the document
unknown
, queued
, extracting
, chunking
, embedding
, indexing
, done
, failed
"done"
Summary of the document content
"A comprehensive guide to understanding the basics of machine learning and its applications."
1536
elementsTitle of the document
"Introduction to Machine Learning"
Type of the document
text
, pdf
, tweet
, google_doc
, google_slide
, google_sheet
, image
, video
, notion_doc
, webpage
, onedrive
"text"
Last update timestamp
"1970-01-01T00:00:00.000Z"
Raw content of the document
"This is a detailed article about machine learning concepts..."
URL of the document
"https://example.com/article"
Optional tags this document should be containerized by. This can be an ID for your user, a project ID, or any other identifier you wish to use to group documents.
["user_123", "project_123"]