Should I merge all these tables into one?


#1

I’ve got a legacy site with about a dozen tables containing “articles”.

10 have exactly the same fields. 1 has additional fields that make an article an “event” with a start/end date and time. 1 has a few other additional fields that hold unimportant text.

I was considering merging all of them into a single table and putting the extra custom fields that 2 of them use into some kind of extra* or meta* table.


#2

There is definitely redundancy there. I would check where the ‘event’ articles are and see if those can be moved into a separate table entirely; and merge the others into a single table.

It’s likely that they are in a separate table for a reason, so researching what the reason is would be needed. It may require a spin off table that holds article types.


#3

Events are in their own events table. Only 2 additional fields differ from the other tables.

The reason is simply that we had no idea what we were doing at the time.


#4

Are the other fields relevant to the article table or completely different?


#5

Learn “Database Normalization”. You will know how to handle this once you do.


#6

I looked through the project some more and threw away some parts and tried to make sense of the rest.

Here’s a list of all tables and fields:

department field was related to role based access control stuff which I might throw away because they never have more than 1 web administrator.

If I did merge everything into a single table I’d made another table just for storing the id’s of articles to include in the next auto-generated newsletter, so I could drop the newsletter field. I’d do the same for articles featured on the home page so can drop the main_feature and hot fields.

The tables that contain content that resembles a real magazine article are communitycorner, corporatebriefs, news, news_members, and reports. Sometimes an outside person or company will submit their own articles for publication which is why there is both author and author_firm fields on the corporatebriefs table but might be useful to have them on all tables listed above. Sometimes an outsider or government agency will also send over a PDF file or video that needs to be published, so the author_firm field could also apply to download and download_members and video.

So that leaves me with the 5 custom fields on associations table and 5 on events table (I’m dropping annual field). I’m wondering if there’s an elegant way of keeping just those 10 fields in 1 or 2 separate tables and combining them with the main table as needed.

I’m considering making one big table named posts that contains all the articles similar fields and then making mysql “views” named events and associations that join the 10 custom fields to their corresponding records in posts table. Good idea? No?

If I merge them all into 1 posts table I’ll also drop the disgusting category field and use a cross reference table so any post can be in as many categories as necessary.

I’m also wondering if I should move addedwhen, addedby, last_edited, and editedby to a separate table.


#7

I am seeing a lot of areas that could/ should be pulled out into separate tables and joined.


#8

such as?

something different from what I was thinking of doing?


#9

Category and Department all seem like they should be foreign key constraints. Feature and main_feature seems like it should be on a linking table. Keywords looks like a bloated column, but that may be needed. I don’t know what Newsletter, Hot, or Credit means in this context. I would also follow a common naming convention. Rather that editedby and author_firm, just stay consistent.

These tables are almost identical and I am trying to understand why. So, every table should have base columns, such as: an ID, dateadded, addedBy, EditedDate, editedBy. That keeps some normalcy to the table and lets you know you can at least check those stats. Those are all good.


#10

I’m not sure what “foreign key constraints” means.

I’m dropping department and moving category to a cross reference / lookup / whatever it is called type of table to better allow a post to be in multiple categories.

Feature and main_feature seems like it should be on a linking table.

main_feature will be dropped and I’ll use a different table for that. I’m not sure yet what the feature field was actually used for. Might drop it entirely.

Keywords looks like a bloated column

Was only used to define contents of meta keyword tag in html head. Is not searched on.

I don’t know what Newsletter, Hot, or Credit means in this context.

Newsletter will be moved to another table. Hot will be dropped. Credit is for when you want to indicate that the article was written by some outside entity that is not a registered user in the system like “This article originally appeared in the Bureau of Commerce quarterly announcement bla bla bla”.

These tables are almost identical and I am trying to understand why.

The core of the problem was the categories field. Back in the day we had no clue how to handle multiple categories sensibly and came up with the worst solution imaginable. To avoid headaches we split different types of articles into separate identical tables and only used certain sections of the categories table with each article table. We also didn’t know about indexing or proper fulltext search back then and having all articles in a single table doing LIKE searches on all fields was slow as hell. After I learned about cross reference tables as a good approach for one-to-many categories I wanted to go back and refactor the systems but I would always get stuck righ where I am right now, not knowing what to do about those extra fields that differ from the norm. My brain just gets stuck here every time and I don’t know what to do.


#11

Oh and that screenshot of the tables is actually from an older project where the differences are less different. I’d like to eventually refactor the tables on this more recent project that unfortunately has slightly more differences between tables http://urca.bahama.land/consultations.php

On that project having a unified posts table with a unified categories list would have been very beneficial because they were extremely indecisive about how to categorize their content. They would often want to move large numbers of posts from one section (db table) to another and I’d have to do that manually for them.


#12

A foreign key constraint means that it references another table.

For instance,
http://sqlfiddle.com/#!9/8354ca/3


#13

The other option, depending, is to have a joining table if articles may have a many to many relationship.


#14

Oh yeah, constraints, yet another thing I somehow never got around to using.

On this system attempts to delete a category that is still used by articles is interrupted by a form that requires you to move all articles to a different category before proceeding with category deletion.

The other option, depending, is to have a joining table if articles may have a many to many relationship.

Not sure what you mean there. The articles do not have any relationship with each other. 1 article can have many categories assigned (will do this with a cross reference table).