I recently needed to upload an entire directory, including subdirectories, while also preserving the local file paths.

Here's an example of something I might upload:


example
├── bar
│   └── a.txt
├── b.txt
├── c.txt
└── foo
    └── a.txt

In this case, what I need to receive on the back-end (in addition to the file data itself) would resemble the following.


example/bar/a.txt
example/foo/a.txt
example/b.txt
example/c.txt

Implementing the HTML Form for Directory Upload

There is nothing special about the Django form itself. You can start with something as simple as the below form class.


class UploadForm(forms.Form):
    file = forms.FileField()

When implementing the template, however, we'll define the upload input ourselves instead of allowing the form class to render it.


<form id="upload-form" method="post" enctype="multipart/form-data">
    <input type="file" name="files" id="directory-upload" webkitdirectory multiple required>
    <button type="submit">Submit</button>
</form>

The above form will allow the user to select directories. This is a great start, but there is a problem.

Although the upload will include subdirectories, only filenames will be included! In the first example, the files foo/a.txt and bar/a.txt will both be identified only as a.txt on the back-end.

This doesn't really matter if all you care about is the file data. In my case, however, I needed the paths too.

The webkitRelativePath attribute of the File object is what we need.

We can access the relative path of each file in JavaScript like so:


const input = document.getElementById('directory-upload');
const formData = new FormData();

Array.from(input.files).forEach(file => {
    const relativePath = file.webkitRelativePath;
    formData.append('files', file, relativePath);
});

fetch('/upload-endpoint', {
    method: 'POST',
    body: formData
})
.then(response => response.json())
.then(data => {
    console.log('Success:', data);
})
.catch((error) => {
    console.error('Error:', error);
});

In the above code, we send the relative path instead of only the filename. During my first attempt at solving this problem, I thought I was almost done at this point.

The problem, however, is that Django will strip the path and preserve only the filename.

The UploadedFile class calls os.path.basename on the received path. I also tried extending the UploadedFile class, and implementing my own FileUploadHandler, but in both cases it seems the path is already stripped before my code is called.

Using JSZip to Preserve Relative File Paths

My solution involves using JSZip to compress the selected directory into a zip archive that is then uploaded as a single file.


document.getElementById("upload-form").onsubmit = function (e) {
  const form = this;
  const files = document.getElementById("directory-upload").files;
  const zip = new JSZip();

  for (let i = 0; i < files.length; i++) {
    zip.file(files[i].webkitRelativePath || files[i].name, files[i]);
  }

  zip.generateAsync({ type: "blob" }).then(function (content) {
    const formData = new FormData();
    formData.append("file", content, "project.zip");

    const csrftoken = document.querySelector(
      "[name=csrfmiddlewaretoken]",
    ).value;

    fetch(form.action, {
      method: "POST",
      headers: { "X-CSRFToken": csrftoken },
      body: formData,
    })
      .then((response) => response.json())
      .then((data) => {
        console.log("Success!");
      })
      .catch((error) => {
        console.error("Error:", error);
      });
  });
};

Processing the Upload with Django

How you handle the files on the back-end largely depends on the problem you are solving, but will probably involve the zipfile module.

Here's a simplified example from my own code for handling the uploaded zip file:


    zip_bytes = read_from_s3(
        settings.AWS_STORAGE_BUCKET_NAME, upload_s3_key, decode=False
    )
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_zip_path = os.path.join(temp_dir, "project.zip")
        with open(temp_zip_path, "wb") as temp_zip_file:
            temp_zip_file.write(zip_bytes)

        with zipfile.ZipFile(temp_zip_path, "r") as zip_ref:
            zip_contents = zip_ref.namelist()

            # Iterate the relative paths contained within the zip file.
            for member in zip_contents:
                # Open each file within the zip archive for reading.
                with zip_ref.open(member) as source:
                    pass

Remember to use caution if writing the contents of the zip file onto your own filesystem. In particular, be sure to sanitize the paths and prevent any unwanted directory traversals. Otherwise, expanding the zip archive could end up overwriting files elsewhere.

I hope this short guide has been useful to you! I know I spent more time than I'd like on this problem, until I came up with the solution here.