a field guide to sql database audits
how to inventory what’s inside your legacy system’s database before you decide whether to rebuild, extend, or resurrect it.
If your old vendor is gone and your custom software is still running, the most valuable thing in your company is almost certainly your SQL database. This piece is a step-by-step guide to figuring out what’s in it, what’s worth keeping, and how to use that knowledge to make the next decision.
It’s written for the person inside the company most likely to care — the project manager or operations lead, usually one generation younger than the owner, comfortable with a spreadsheet, possibly a little intimidated by the word “SQL” itself. You do not need to be a developer to do this. You do need to take a few afternoons and be methodical.
step 1 — locate the database
Find the actual machine it runs on. Possibilities, in descending order of frequency:
- A server in a closet at the office. Usually a box with a sticker that says “DO NOT TURN OFF.”
- A virtual machine at a hosting provider. Ask the old developer or IT contractor which one; check for monthly invoices if nobody knows.
- A cloud database (AWS RDS, Azure SQL, Google Cloud SQL). Look for credit-card charges with those names.
- On the same machine that runs the application, often under SQL Server Express, MySQL, or PostgreSQL.
Write down the hostname, the database engine (SQL Server? MySQL? Postgres?), the version, and who has login credentials.
If the answer to “who has credentials” is one person, and that person is not on your payroll, that is the highest-priority problem to fix this week.
step 2 — back it up before touching anything
Before anyone runs a single query, produce a full database backup and copy it to a second location. This is non-negotiable and it’s why we’re calling it step 2 instead of step 4.
Specifics depend on the engine (pg_dump, mysqldump, SQL Server BACKUP DATABASE), but the deliverable is a single file (or compressed archive) that a competent person could use to restore the entire database to a fresh server. Put it on a drive that isn’t the same machine as the database. Put another copy somewhere offline.
You want this for two reasons: one, you’re about to read from the database and you want to be sure nothing you do accidentally changes anything. Two, you now have the single artifact that any future developer or vendor — including us — would need to quote a resurrection project.
step 3 — list the tables
Every SQL database has a way to list its own tables. Here’s the query in the three engines you’re most likely to encounter:
- PostgreSQL:
SELECT table_name FROM information_schema.tables WHERE table_schema='public'; - MySQL:
SHOW TABLES; - SQL Server:
SELECT name FROM sys.tables;
Copy the list into a spreadsheet. One row per table. You’re about to annotate this.
step 4 — count rows and date-range each table
For each table in your list, get three numbers:
- Row count (
SELECT COUNT(*) FROM tablename;). - Earliest record (whatever the created-date column is — it’s usually
created_at,date_created, orinsert_date). - Latest record (same column,
MAXinstead ofMIN).
This tells you two things at a glance. First, which tables are alive (recent dates, growing row counts) vs. which are dead (no rows added in three years). Second, which tables have real data at scale (tens of thousands or millions of rows) vs. lookup tables (50 rows of state abbreviations).
A typical small-business database has 50-200 tables. Usually fewer than 30 of them are load-bearing. The rest are either long-abandoned features, lookup data, or audit logs.
step 5 — find the business entities
Identify the five to fifteen tables that correspond to the things your business actually talks about: customers, jobs, invoices, employees, vendors, time cards. These are the core entities, and they’re the ones that matter for any future project.
The tell is a combination of: high row count, growing recently, and a name that matches what your team calls things. tbl_customer, Customers, or client_master is obvious. So is invoice_header or jobs.
Mark these rows in your spreadsheet. Everything else is either supporting data (line items, attachments, notes linked to the core entities) or junk.
step 6 — dump one row of each core table
For each core table, run SELECT * FROM tablename LIMIT 1; and paste the result into your spreadsheet. You’re doing this to catalog the columns and see, with your own eyes, what information is stored.
You’re looking for three kinds of surprises:
- columns that are always empty. Someone planned a feature, added a column, never wired it up. Fine — but you now know it’s empty.
- columns whose names you don’t recognize. These are usually the most interesting. They’re business logic the original developer encoded in a field name that made sense to them. Find out what they mean.
- free-text “notes” columns. These almost always contain the most valuable data in the database. Customer context, special instructions, the story of the relationship. Treat them as a goldmine, and be alarmed if they’re large and unstructured, because that’s knowledge nobody is currently searching.
step 7 — check for rot
Run the following cheap checks on your core tables:
- How many customer rows have no email address? No phone? No address? A certain amount is fine; more than 40% is a sign of a data-quality crisis brewing.
- How many jobs / invoices / POs have status fields whose meaning nobody currently working there can explain?
- Are there records that reference customer IDs or vendor IDs that don’t exist anymore? (Orphan foreign keys.)
- Are there visible test rows — “John Test,” “Acme Co,” “DELETE ME” — still in production?
The point isn’t to fix these now. The point is to know what you have before you let anyone quote you on a rebuild.
step 8 — write the two-page summary
When you’re done, write two pages. One page of tables: engine, version, host, row counts, dates, core-entity flag. One page of prose: what are the five most important tables, what’s in the notes fields, what’s broken, what’s growing, what’s dead. Include the last-modified date of each core table.
That document is what you take into any conversation about what to do next — whether that’s calling us, calling a successor vendor, or deciding to live with what you have for another year. Without it, every quote you receive will be padded with discovery time. With it, you’ll get honest, bounded answers.
what to avoid
- Don’t let anyone change the database as part of the audit. Reads only.
- Don’t let anyone walk off with the backup copy until you have two.
- Don’t rush. An audit done in an afternoon produces a document you can’t trust.
- Don’t skip the notes columns. They’re the least structured part of the data and the most business-critical.
what we do with it
When a prospect comes to us with a completed two-page audit, quoting becomes twenty-minute exercise instead of a three-meeting process. We can look at the core entities, the row counts, and the notes, and tell you within an hour whether the cleanest path is to put an AI front end on what you have, to connect it to a newer system you already use, or — honestly, sometimes — to let it retire and consolidate onto something off-the-shelf.
Those three options lead to very different projects at very different prices. The audit is how you figure out which one you’re actually in.
Want to talk through this in the context of your shop? Talk to a builder. No pitch deck, no sales motion — just a conversation.