Uploading and validating an image from an URL with Django

The full code can be seen over on my Github page and has everything needed to see the example live

In this post I’m going to show how we can go about creating a small app that allows the user to upload an image via an URL. The user will be presented with a single form field in which they can enter an image URL:

Image

Image

Both the URL and the corresponding image (if present) will be validated. It will then be downloaded to our application server and finally it will be assigned to a Django model instance. The main steps in this process are as follows:

  • Show the user a form.
  • Check that the URL is valid (via Django’s Form class validation).
  • Check that the URL extension is image-like.
  • Check that the URL has a valid image mimetype.
  • Check that the image exists on the remote server.
  • Check that the image is within a certain size.
  • Check that the mimetype after opening the file is image-like.
  • Download the image from the server and assign to an ImageField in a model.
  • Show the uploaded image.

There are a couple of requirements we first need to install before we get started:

# We will be installing Pillow and need some OS prerequisites. You can see the
# the full installation guide here:
# http://pillow.readthedocs.org/en/latest/installation.html

# OSX (requires brew)
brew install libtiff libjpeg webp little-cms2

# Ubuntu
sudo apt-get install libtiff4-dev libjpeg8-dev zlib1g-dev libfreetype6-dev liblcms2-dev libwebp-dev tcl8.5-dev tk8.5-dev python-tk

pip install pillow requests python-magic

Now we’ll go through some of the various functions we are going to use to validate the image file before and after it is downloaded. Some of these have been taken from Stackoverflow answers so I’ve added the source where relevant. Afterwards we will look at the model, form and view required to tie it all together. If you want to browse the code in full, it can be seen on my Github page.

Validating the URL’s extension

We first look at the URL that the user has supplied and make sure that it ends with a valid image-like extension: .jpg, .jpeg, .png etc. This has the obvious limitation that only URLs that end with a file-like tail such as foo.com/bar.jpeg will be accepted

VALID_IMAGE_EXTENSIONS = [
    ".jpg",
    ".jpeg",
    ".png",
    ".gif",
]

def valid_url_extension(url, extension_list=VALID_IMAGE_EXTENSIONS):
    # http://stackoverflow.com/a/10543969/396300
    return any([url.endswith(e) for e in extension_list])              

Validating the URL mimetype

This step is slightly redundant but useful as an alternative to the above file extensions check. We use the python mimetype library to check the URL and make sure it is image-like.

import mimetype
VALID_IMAGE_MIMETYPES = [
   	"image"
]

def valid_url_mimetype(url, mimetype_list=VALID_IMAGE_MIMETYPES):
    # http://stackoverflow.com/a/10543969/396300
    mimetype, encoding = mimetypes.guess_type(url)
    if mimetype:
        return any([mimetype.startswith(m) for m in mimetype_list])
    else:
        return False

Validating that the image exists on the server

Next we check whether or not the resource at the URL supplied by the user actually exists. To do this we make a HEAD request instead of a GET request which avoids actually having to download the file.

The following code uses httplib but you could alternatively use urllib to fetch the file

import httplib

def image_exists(domain, path):
    # http://stackoverflow.com/questions/2486145/python-check-if-url-to-jpg-exists
    try:
        conn = httplib.HTTPConnection(domain)
        conn.request('HEAD', path)
        response = conn.getresponse()
        conn.close()
    except:
        return False
    return response.status == 200

Validating the image file size remotely

While we are checking that the file exists on the remote server by making a HEAD request, we can also try to check the size of the file before actually downloading it, saving us from fetching a file that exceeds our limitations. We do this by checking the headers for a ‘content-length’ key/value which tells us the byte size of the image. Note that there is no error checking in this example. It isn’t guarenteed that the remote server will actually return a content-length header nor that the value will be correct so use with caution.

import httplib

def image_exists(domain, path, check_size=False, size_limit=1024):
    try:
        conn = httplib.HTTPConnection(domain)
        conn.request('HEAD', path)
        response = conn.getresponse()
        headers = response.getheaders()
        conn.close()
    except:
        return False

    try:
        length = int([x[1] for x in headers if x[0] == 'content-length'][0])
    except:
        length = 0
    if length > MAX_SIZE:
        return False

    return response.status == 200

Fetching the image

There are numerous ways of downloading the file from the remote server. We want to simply grab the image from the URL without saving it to disk as we are going to perform some validation on the image before writing it. For this reason, we use the StringIO library to write the image as a string.

The easiest approach for downloading the image is to use python-requests

import requests
import StringIO

def retrieve_image(url):
   	response = requests.get(url)
   	return StringIO.StringIO(response.content)

but you could just as easily use urllib2

import urllib2
import StringIO

def retrieve_image(url):
   	return StringIO.StringIO(urllib2.urlopen(url).read())

Validating the file mimetype

Once we have the image downloaded we can do a more detailed check for the mimetype. Here we are using the python-magic library to look inside the downloaded file.

import magic

VALID_IMAGE_MIMETYPES = [
    "image"
]

def get_mimetype(fobject):
    mime = magic.Magic(mime=True)
    mimetype = mime.from_buffer(fobject.read(1024))
    fobject.seek(0)
    return mimetype

def valid_image_mimetype(fobject):
    # http://stackoverflow.com/q/20272579/396300
    mimetype = get_mimetype(fobject)
    if mimetype:
        return mimetype.startswith('image')
    else:
        return False

Validating the image file size locally

Now that we have the file in memory we can check whether or not the image dimensions are definitely within our limitations. We have already potentially checked this remotely using the content-length header but as that is not guarenteed to exist or be correct so this is a good secondary check.

The following function takes a PIL image as a parameter. This conversion from StringIO to PIL image takes place in our class view which we will look at in a minute.

MAX_SIZE = 4*1024*1024

def valid_image_size(image, max_size=MAX_SIZE):
    width, height = image.size
    if (width * height) > max_size:
        return (False, "Image is too large")
    return (True, image)

Other Utility Functions

The following are some other utility functions that we will make use of in our form and view.

import os
import StringIO
from urlparse import urlparse
from django.core.files.base import ContentFile

def split_url(url):
    parse_object = urlparse(url)
    return parse_object.netloc, parse_object.path

def get_url_tail(url):
    return url.split('/')[-1]  

def get_extension(filename):
    return os.path.splitext(filename)[1]

def pil_to_django(image, format="JPEG"):
    # http://stackoverflow.com/questions/3723220/how-do-you-convert-a-pil-image-to-a-django-file
    fobject = StringIO.StringIO()
    image.save(fobject, format=format)
    return ContentFile(fobject.getvalue())

The Model

Our Django model is very basic, we just use a models.ImageField to hold a reference to our downloaded image. Again, we will need to do some conversions in our view to get our StringIO to a Django image. We also dynamically modify the filename of our downloaded image file, adding a date and time value to ensure uniqueness. This is really down to your own requirements.

from django.db import models
import os, datetime
from django.utils.text import slugify

UPLOAD_PATH = "image_uploader/"

class UploadedImage(models.Model):
    def generate_upload_path(self, filename):
        filename, ext = os.path.splitext(filename.lower())
        filename = "%s.%s%s" % (slugify(filename),datetime.datetime.now().strftime("%Y-%m-%d.%H-%M-%S"), ext)
        return '%s/%s' % (UPLOAD_PATH, filename)

    image = models.ImageField(blank=True, null=True, upload_to=generate_upload_path)

The Form

In validating the form, we just want to make sure that the URL submitted is correct - this is processed by the underlying Django URLField. Note that we don’t do any actual checks on the file yet - this will happen at the view level.

from .utils import *

from django.utils.translation import ugettext as _
from django import forms


class UploadURLForm(forms.Form):
    url = forms.URLField(required=True,
        error_messages={
            "required": "Please enter a valid URL to an image (.jpg .jpeg .png)"
        },
    )

    def clean_url(self):
        url = self.cleaned_data['url'].lower()
        domain, path = split_url(url)
        if not valid_url_extension(url) or not valid_url_mimetype(url):
            raise forms.ValidationError(_("Not a valid Image. The URL must have an image extensions (.jpg/.jpeg/.png)"))
        return url

The Views

We add our custom validation to the form_valid method of our view. Because we have extended the form’s clean_url method, we can be sure that by the time form_valid is called we have a valid URL value that at least looks like it’s a valid image.

We then use our validation methods from above. We check that the image exists on the server, we make sure it is a valid image by checking it’s mimetype and we make sure it is within our dimension limiations. We then save it to our model’s ImageField.

Finally, we have a very simple second view for showing the image once it’s downloaded


from PIL import Image
from django.views.generic.edit import FormView
from django.views.generic import DetailView
from django.utils.translation import ugettext as _
from django.core.urlresolvers import reverse

from .forms import UploadURLForm
from .utils import *
from .models import UploadedImage


class UploadURLView(FormView):
    form_class = UploadURLForm
    template_name = "image_uploader/upload.html"

    def get_success_url(self):
        return reverse("upload-detail", args=[self.uploaded_image.pk, ])

    def form_valid(self, form):
        def _invalidate(msg):
            form.errors['url'] = [msg, ]
            return super(UploadURLView, self).form_invalid(form)

        url = form.data['url']
        domain, path = split_url(url)
        filename = get_url_tail(path)

        if not image_exists(domain, path):
            return _invalidate(_("Couldn't retreive image. (There was an error reaching the server)"))

        fobject = retrieve_image(url)
        if not valid_image_mimetype(fobject):
            return _invalidate(_("Downloaded file was not a valid image"))

        pil_image = Image.open(fobject)
        if not valid_image_size(pil_image)[0]:
            return _invalidate(_("Image is too large (> 4mb)"))

        django_file = pil_to_django(pil_image)
        self.uploaded_image = UploadedImage()
        self.uploaded_image.image.save(filename, django_file)
        self.uploaded_image.save()

        return super(UploadURLView, self).form_valid(form)

class UploadDetailView(DetailView):
    model = UploadedImage
    context_object_name = "image"
    template_name = "image_uploader/detail.html"

This is a great start in offering the user a dialog for uplodaing images from various sources. It could be integrated into a popup dialog that allows users to upload images from Google image searches, traditional uploads, ajax multi-uploads as well as URLs. Examples of these may even come in later blog posts.