Automating Task Uploads with Python
For large-scale operations, uploading files manually via the Dashboard is inefficient. This tutorial shows how to use Python to upload thousands of files programmatically.
Prerequisites
- Python 3.8+
requestslibrary (pip install requests)- AuditorIA Backend running at
http://localhost:8000
Step 1: Authentication (Optional)
If your instance is protected by Keycloak, you first need to obtain an access token. (Skipped for local development if auth is disabled).
Step 2: The Upload Script
We will iterate over a directory of MP3 files and upload them one by one.
import os
import requests
API_URL = "http://localhost:8000/tasks"
AUDIO_DIR = "./recordings"
def upload_file(filepath):
print(f"Uploading {filepath}...")
with open(filepath, 'rb') as f:
files = {'file': f}
# Optional: Add metadata like 'speakers_count'
data = {'speakers_count': 2}
try:
response = requests.post(API_URL, files=files, data=data)
response.raise_for_status()
print(f"Success: {response.json()['id']}")
except Exception as e:
print(f"Failed: {e}")
if __name__ == "__main__":
for filename in os.listdir(AUDIO_DIR):
if filename.endswith(".mp3"):
upload_file(os.path.join(AUDIO_DIR, filename))Step 3: Handling Concurrency
To speed this up significantly, use ThreadPoolExecutor.
from concurrent.futures import ThreadPoolExecutor
# ... (previous code)
if __name__ == "__main__":
files = [os.path.join(AUDIO_DIR, f) for f in os.listdir(AUDIO_DIR) if f.endswith(".mp3")]
# Upload 5 files in parallel
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(upload_file, files)⚠️
Be mindful of the max_workers setting. Setting it too high might overwhelm your GPU worker if the queue fills up too fast.