Validating IP Addresses

UPDATE: Added terminating ‘$’ in ipv4 regex as noted in comment from raorn.

I’ve been working on a fix to a system script that passes around and manipulates IP addresses. With IPv6 becoming more prevalent this script must work with IPv6 addresses not just v4. While working on this and digging around the web I ran across some stuff that I think is worth sharing.

The first thing I always do when I’m working with a new data format is writing a script / function that can be used to validate it. Here’s what I came up with for IPv4 and IPv6.

IPv4 Regex

With IPv4 this pretty boring and can be done with a one line regular expression (regex) that’s all over the web. I clean things up a bit by using shell variables but the regex should be clear:

#!/bin/sh
QUAD="25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9]"
is_ipv4 () {
    echo $1 | grep --silent "^(${QUAD})(.(${QUAD})){3}$"
    if [ $? -eq 0 ]; then
        return 1
    fi
    return 0
}

is_ipv4 $1
if [ $? -eq 1 ]; then
    exit 0
else
    echo "Invalid IPv4 address." >&2
    exit 1
fi

Nothing earth shattering.

IPv6 Regex

Working with IPv6 addresses is a bit more complex. To compensate for the larger addresses size when representing IPv6 addresses in text, the RFC recommends a canonical textual representation with rules that allow for compression (called “zero folding”). Addresses represented in this compressed format are more difficult to validate with just one regex and the regex is much longer:

#!/bin/sh
WORD="[0-9A-Fa-f]{1,4}"
# flat address, no compressed words
FLAT="^${WORD}(:${WORD}){7}$"
# ::'s compressions excluding beginning and end edge cases
COMP2="^(${WORD}:){1,1}(:${WORD}){1,6}$"
COMP3="^(${WORD}:){1,2}(:${WORD}){1,5}$"
COMP4="^(${WORD}:){1,3}(:${WORD}){1,4}$"
COMP5="^(${WORD}:){1,4}(:${WORD}){1,3}$"
COMP6="^(${WORD}:){1,5}(:${WORD}){1,2}$"
COMP7="^(${WORD}:){1,6}(:${WORD}){1,1}$"
# trailing :: edge case, includes case of only :: (all 0's)
EDGE_TAIL="^((${WORD}:){1,7}|:):$"
# leading :: edge case
EDGE_LEAD="^:(:${WORD}){1,7}$"
is_ipv6 () {
    echo $1 | grep --silent "(${FLAT})|(${COMP2})|(${COMP3})|(${COMP4})|(${COMP5})|(${COMP6})|(${COMP7})|(${EDGE_TAIL})|(${EDGE_LEAD})"
    if [ $? -eq 0 ]; then
        return 1
    fi
    return 0
}

is_ipv6 $1
if [ $? -eq 1 ]; then
    exit 0
else
    echo "Invalid IPv6 address: $1" >&2
    exit 1
fi

Folks on the web have got it right too and I definitely took a queue from Vernon Mauery. I got a bit caught up in the differences between addresses from RFC4291 and the recommendations in RFC5952. The prior allows for zero folding of single 16-bit 0 fields while the latter discourages this. As the “robustness principle” dictates this validation script will identify addresses with zero folded single 16-bit 0 fields as valid but tools producing addresses should not.

I haven’t taken on any of the weirdness that are mixed hexadecimal and dot decimal notations … those will remain for the interested reader.

5 thoughts on “Validating IP Addresses

Leave a comment