Tutoriales
Automatización Python

Automating Task Uploads with Python

For large-scale operations, uploading files manually via the Dashboard is inefficient. This tutorial shows how to use Python to upload thousands of files programmatically.

Prerequisites

  • Python 3.8+
  • requests library (pip install requests)
  • AuditorIA Backend running at http://localhost:8000

Step 1: Authentication (Optional)

If your instance is protected by Keycloak, you first need to obtain an access token. (Skipped for local development if auth is disabled).

Step 2: The Upload Script

We will iterate over a directory of MP3 files and upload them one by one.

import os
import requests
 
API_URL = "http://localhost:8000/tasks"
AUDIO_DIR = "./recordings"
 
def upload_file(filepath):
    print(f"Uploading {filepath}...")
    with open(filepath, 'rb') as f:
        files = {'file': f}
        # Optional: Add metadata like 'speakers_count'
        data = {'speakers_count': 2}
        
        try:
            response = requests.post(API_URL, files=files, data=data)
            response.raise_for_status()
            print(f"Success: {response.json()['id']}")
        except Exception as e:
            print(f"Failed: {e}")
 
if __name__ == "__main__":
    for filename in os.listdir(AUDIO_DIR):
        if filename.endswith(".mp3"):
            upload_file(os.path.join(AUDIO_DIR, filename))

Step 3: Handling Concurrency

To speed this up significantly, use ThreadPoolExecutor.

from concurrent.futures import ThreadPoolExecutor
 
# ... (previous code)
 
if __name__ == "__main__":
    files = [os.path.join(AUDIO_DIR, f) for f in os.listdir(AUDIO_DIR) if f.endswith(".mp3")]
    
    # Upload 5 files in parallel
    with ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(upload_file, files)
⚠️

Be mindful of the max_workers setting. Setting it too high might overwhelm your GPU worker if the queue fills up too fast.