Extracting table schema from database

As part of rehabilitating an old project I’ve had to go through a fairly sizable MySQL database and extract the table structure. I’m doing this for knowledge capture. Currently the schema is only known to the internals of the software that operates on the database, there’s no documentation to speak of (oh what I’d do to get an ER diagram and some notes from the original developers).

In the scheme of things this is a pretty small database (well two of them actually), with only a few dozen tables. Doing the extraction by hand wouldn’t be overly time consuming but it would be overly mind numbing. And as the adage goes: “anything worth doing once is worth scripting”, so I wrote up a quick shell script that will dump the CREATE TABLE statements for each table in a set of databases.

#!/bin/sh
#  extract table schema from some set of databases
#
DB_USER=you
DB_PASSWD=yourpasswd
PARAMS="--skip-column-names -u ${DB_USER} -p${DB_PASSWD}"
DB_SELECTOR='^(db0|db1|db2)$'
TABLE_SELECTOR='.*'

echo "SHOW DATABASES;" 
    | mysql ${PARAMS} 
    | grep ${DB_SELECTOR} 
    | while read DB; do
    echo "CONNECT ${DB}; SHOW TABLES;" 
        | grep "${TABLE_SELECTOR}" 
        | mysql ${PARAMS} 
        | while read TABLE; do
        echo "CONNECT ${DB}; SHOW CREATE TABLE ${TABLE}" 
            | mysql ${PARAMS} 
            | sed -e  "s/\\n/\n/g;s/^${TABLE}[t]//;s/AUTO_INCREMENT=[0-9]*//
g" > ${DB}-${TABLE}.sql;
    done;
done;

I’m a big fan of piping things into while read VAR statements. I’m sure some hate it since it’s a bit hard to read / see where the loop begins but it seems elegant to me and it’s very useful.

The script is basically three sql commands:

  • The first gets the names of all databases the user has access to. This could feasibly be a large number of databases so we add a “selector” to give us a regex we can use to narrow this set down.
  • The second SQL command gets the tables from the selected database. This is within a while loop so it is executed once for each database. This set of tables is also filtered through another regex.
  • Finally in the third SQL command we dump the create table statement for each table (within yet another while loop).

This gets us the CREATE TABLE statements but MySQL puts some stuff in here we don’t want:

  • It prefixes each CREATE TABLE statement with the table name for some reason
  • It also includes the AUTO_INCREMENT value which we don’t care about either
  • It also doesn’t put new line characters in the output but does put the string “n”.

We pipe this output through a pretty gnarly looking sed find/replace to clean it up. The output is then redirected to a file with a name formatted to contain both the name of database and the name of the table. That’s all there is to it. I’ve put up dummy values for some of the selectors as an example.

The script can be downloaded here.

NOTE: The grep and sed regexs can be pretty confusing because of some bash double quoting and escaping. Knowing the rules for bash escaping is essential … and not covered here 🙂

Cheers!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s