UPDATE: Added terminating ‘$’ in ipv4 regex as noted in comment from raorn.
I’ve been working on a fix to a system script that passes around and manipulates IP addresses. With IPv6 becoming more prevalent this script must work with IPv6 addresses not just v4. While working on this and digging around the web I ran across some stuff that I think is worth sharing.
The first thing I always do when I’m working with a new data format is writing a script / function that can be used to validate it. Here’s what I came up with for IPv4 and IPv6.
IPv4 Regex
With IPv4 this pretty boring and can be done with a one line regular expression (regex) that’s all over the web. I clean things up a bit by using shell variables but the regex should be clear:
#!/bin/sh QUAD="25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9]" is_ipv4 () { echo $1 | grep --silent "^(${QUAD})(.(${QUAD})){3}$" if [ $? -eq 0 ]; then return 1 fi return 0 } is_ipv4 $1 if [ $? -eq 1 ]; then exit 0 else echo "Invalid IPv4 address." >&2 exit 1 fi
Nothing earth shattering.
IPv6 Regex
Working with IPv6 addresses is a bit more complex. To compensate for the larger addresses size when representing IPv6 addresses in text, the RFC recommends a canonical textual representation with rules that allow for compression (called “zero folding”). Addresses represented in this compressed format are more difficult to validate with just one regex and the regex is much longer:
#!/bin/sh WORD="[0-9A-Fa-f]{1,4}" # flat address, no compressed words FLAT="^${WORD}(:${WORD}){7}$" # ::'s compressions excluding beginning and end edge cases COMP2="^(${WORD}:){1,1}(:${WORD}){1,6}$" COMP3="^(${WORD}:){1,2}(:${WORD}){1,5}$" COMP4="^(${WORD}:){1,3}(:${WORD}){1,4}$" COMP5="^(${WORD}:){1,4}(:${WORD}){1,3}$" COMP6="^(${WORD}:){1,5}(:${WORD}){1,2}$" COMP7="^(${WORD}:){1,6}(:${WORD}){1,1}$" # trailing :: edge case, includes case of only :: (all 0's) EDGE_TAIL="^((${WORD}:){1,7}|:):$" # leading :: edge case EDGE_LEAD="^:(:${WORD}){1,7}$" is_ipv6 () { echo $1 | grep --silent "(${FLAT})|(${COMP2})|(${COMP3})|(${COMP4})|(${COMP5})|(${COMP6})|(${COMP7})|(${EDGE_TAIL})|(${EDGE_LEAD})" if [ $? -eq 0 ]; then return 1 fi return 0 } is_ipv6 $1 if [ $? -eq 1 ]; then exit 0 else echo "Invalid IPv6 address: $1" >&2 exit 1 fi
Folks on the web have got it right too and I definitely took a queue from Vernon Mauery. I got a bit caught up in the differences between addresses from RFC4291 and the recommendations in RFC5952. The prior allows for zero folding of single 16-bit 0 fields while the latter discourages this. As the “robustness principle” dictates this validation script will identify addresses with zero folded single 16-bit 0 fields as valid but tools producing addresses should not.
I haven’t taken on any of the weirdness that are mixed hexadecimal and dot decimal notations … those will remain for the interested reader.
IPv4 Regex…
169.252.12.257 is also showing as an valid IP.
LikeLike
EOL anchor ($) is missing at the end of regexp.
echo $1 | grep –silent “^(${QUAD})(.(${QUAD})){3}$”
LikeLike
Great catch. Updated & Thanks!
LikeLike
Super Thanks, you helped me a lot 🙂
LikeLike
Thank you for the script, I use it in a project of mine. I modified the code to combine both of the scripts together, I didn’t touch the rest of the code, so if someone is interesting, it can be found here: https://git.hackerspace.org.il/itzhak/Cert_View/blob/master/module.certview.ip.sh
LikeLike