This document details the enhanced G4F File API, allowing users to upload files, download files from web URLs, and process a wider range of file types for integration with language models.
Key Improvements:
-
Web URL Downloads: Upload a
downloads.json
file to your bucket containing a list of URLs. The API will download and process these files. Example:[{"url": "https://example.com/document.pdf"}]
-
Expanded File Support: Added support for additional plain text file extensions:
.txt
,.xml
,.json
,.js
,.har
,.sh
,.py
,.php
,.css
,.yaml
,.sql
,.log
,.csv
,.twig
,.md
. Binary file support remains for.pdf
,.html
,.docx
,.odt
,.epub
,.xlsx
, and.zip
. -
Server-Sent Events (SSE): SSE are now used to provide asynchronous updates on file download and processing progress. This improves the user experience, particularly for large files and multiple downloads.
API Endpoints:
-
Upload:
/backend-api/v2/files/{bucket_id}
(POST)- Method: POST
-
Path Parameters:
bucket_id
(Generated by your own. For example a UUID) -
Body: Multipart/form-data with files OR a
downloads.json
file containing URLs. -
Response: JSON object with
bucket_id
,url
, and a list of uploaded/downloaded filenames.
-
Retrieve:
/backend-api/v2/files/{bucket_id}
(GET)- Method: GET
-
Path Parameters:
bucket_id
-
Query Parameters:
-
delete_files
: (Optional, boolean, defaulttrue
) Delete files after retrieval. -
refine_chunks_with_spacy
: (Optional, boolean, defaultfalse
) Apply spaCy-based refinement.
-
-
Response: Streaming response with extracted text, separated by ``` markers. SSE updates are sent if the
Accept
header includes `text/event-stream`.
Example Usage (Python):
import requests
import uuid
import json
def upload_and_process(files_or_urls, bucket_id=None):
if bucket_id is None:
bucket_id = str(uuid.uuid4())
if isinstance(files_or_urls, list): #URLs
files = {'files': ('downloads.json', json.dumps(files_or_urls), 'application/json')}
elif isinstance(files_or_urls, dict): #Files
files = files_or_urls
else:
raise ValueError("files_or_urls must be a list of URLs or a dictionary of files")
upload_response = requests.post(f'http://localhost:1337/backend-api/v2/files/{bucket_id}', files=files)
if upload_response.status_code == 200:
upload_data = upload_response.json()
print(f"Upload successful. Bucket ID: {upload_data['bucket_id']}")
else:
print(f"Upload failed: {upload_response.status_code} - {upload_response.text}")
response = requests.get(f'http://localhost:1337/backend-api/v2/files/{bucket_id}', stream=True, headers={'Accept': 'text/event-stream'})
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data:'):
try:
data = json.loads(line[5:]) #remove data: prefix
if "action" in data:
print(f"SSE Event: {data}")
elif "error" in data:
print(f"Error: {data['error']['message']}")
else:
print(f"File data received: {data}") #Assuming it's file content
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
else:
print(f"Unhandled SSE event: {line}")
response.close()
return bucket_id
# Example with URLs
urls = [{"url": "https://github.com/xtekky/gpt4free/issues"}]
bucket_id = upload_and_process(urls)
#Example with files
files = {'files': ('document.pdf', open('document.pdf', 'rb'))}
bucket_id = upload_and_process(files)
Usage of Uploaded Files:
from g4f.client import Client
# Enable debug mode
import g4f.debug
g4f.debug.logging = True
client = Client()
# Upload example file
files = {'files': ('demo.docx', open('demo.docx', 'rb'))}
bucket_id = upload_and_process(files)
# Send request with file:
response = client.chat.completions.create(
[{"role": "user", "content": [
{"type": "text", "text": "Discribe this file."},
{"bucket_id": bucket_id}
]}],
)
print(response.choices[0].message.content)
Example Output:
This document is a demonstration of the DOCX Input plugin capabilities in the software ...
Example Usage (JavaScript):
function uuid() {
return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
(c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
);
}
async function upload_files_or_urls(data) {
let bucket_id = uuid(); // Use a random generated key for your bucket
let formData = new FormData();
if (typeof data === "object" && data.constructor === Array) { //URLs
const blob = new Blob([JSON.stringify(data)], { type: 'application/json' });
const file = new File([blob], 'downloads.json', { type: 'application/json' }); // Create File object
formData.append('files', file); // Append as a file
} else { //Files
Array.from(data).forEach(file => {
formData.append('files', file);
});
}
await fetch("/backend-api/v2/files/" + bucket_id, {
method: 'POST',
body: formData
});
function connectToSSE(url) {
const eventSource = new EventSource(url);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.error) {
console.error("Error:", data.error.message);
} else if (data.action === "done") {
console.log("Files loaded successfully. Bucket ID:", bucket_id);
// Use bucket_id in your LLM prompt.
const prompt = `Use files from bucket. ${JSON.stringify({"bucket_id": bucket_id})} to answer this: ...your question...`;
// ... Send prompt to your language model ...
} else {
console.log("SSE Event:", data); // Update UI with progress as needed
}
};
eventSource.onerror = (event) => {
console.error("SSE Error:", event);
eventSource.close();
};
}
connectToSSE(`/backend-api/v2/files/${bucket_id}`); //Retrieve and refine
}
// Example with URLs
const urls = [{"url": "https://github.com/xtekky/gpt4free/issues"}];
upload_files_or_urls(urls)
// Example with files (using a file input element)
const fileInput = document.getElementById('fileInput');
fileInput.addEventListener('change', () => {
upload_files_or_urls(fileInput.files);
});
Integrating with ChatCompletion
:
To incorporate file uploads into your client applications, include the bucket
in your chat completion requests, using inline content parts.
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Answer this question using the files in the specified bucket: ...your question..."},
{"bucket_id": "your_actual_bucket_id"}
]
}
]
}
Important Considerations:
- Error Handling: Implement robust error handling in both Python and JavaScript to gracefully manage potential issues during file uploads, downloads, and API interactions.
-
Dependencies: Ensure all required packages are installed (
pip install -U g4f[files]
for Python).
markitdown
is a simple and lightweight alternative to the G4F File API for extracting plain or markdown-formatted text from uploaded files. While the G4F File API supports bucket-based multi-file workflows and streaming, markitdown
is ideal for quick, direct conversion of individual files (e.g. .pdf
, .docx
, .wav
, etc.).
- 🔄 Converts a wide range of files (PDF, DOCX, TXT, AUDIO, etc.) to markdown/plain text.
- 📤 Simple POST API: Send a file, receive extracted text.
- ⚡ Fast, no bucket, SSE, or URL fetch needed.
- 🎯 Ideal for use-cases where full document text is needed inline in chat prompts.
pip install markitdown[all]
import requests
def convert_with_markitdown(file_path):
with open(file_path, 'rb') as file:
response = requests.post('http://localhost:8080/api/markitdown', files={'file': file})
if response.status_code == 200:
data = response.json()
return data['text']
else:
raise Exception(f"Conversion failed: {response.status_code} - {response.text}")
# Usage
text = convert_with_markitdown('example.pdf')
print(text)
<input type="file" id="fileInput" />
<script>
async function convertToMarkdown(file) {
const formData = new FormData();
formData.append('file', file);
try {
const response = await fetch('http://localhost:8080/api/markitdown', {
method: 'POST',
body: formData
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const data = await response.json();
console.log("Converted Text:", data.text);
// You can now inject data.text into a prompt or display in the UI.
} catch (error) {
console.error('Conversion failed:', error);
}
}
document.getElementById('fileInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (file) await convertToMarkdown(file);
});
</script>
Once you retrieve text
from markitdown
, you can insert it into your LLM prompt as inline content:
{
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "<text_from_markitdown>" },
{ "type": "text", "text": "Answer this question using the above content: ...your question..." }
]
}
]
}
Example in Python:
response = client.chat.completions.create(
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": text},
{"type": "text", "text": "Summarize this document."}
]
}
]
)
Feature | G4F File API | MarkItDown API |
---|---|---|
Upload Files | ✅ Yes | ✅ Yes |
Web URL Downloads | ✅ Yes via downloads.json
|
❌ No |
SSE Progress Streaming | ✅ Yes | ❌ No |
Markdown/Text Output | Raw/structured | Clean markdown/plain text |
Bucket/File Management | ✅ Multi-file | ❌ Single-file only |
Use Case | Multi-step pipelines, large workflows | Quick extraction, inline usage |
-
Use G4F File API when:
- You need to upload/download many files.
- You want streamed SSE progress.
- You're building a multi-step or large workflow.
-
Use MarkItDown when:
- You want quick markdown/plain text extraction from a single file.
- You plan to inject the text directly into an LLM prompt.
- You prefer a simple one-call API.