Common Mistakesundergraduate

15 Common Mistakes When Studying Databases (And How to Fix Them) | LearnByTeaching.ai

Database knowledge is essential for virtually every software engineering role. The subject combines mathematical theory (relational algebra, normalization) with practical engineering (query optimization, indexing, concurrency). Here are 15 mistakes that commonly trip up database students.

#1CriticalConceptual

Designing Schemas Without Proper Normalization

Students create tables that mix unrelated data, leading to update anomalies, insertion anomalies, and deletion anomalies. Normalization eliminates these problems by organizing data according to functional dependencies.

Storing customer name, address, and order details in a single table, so updating a customer's address requires modifying every row for every order they have ever placed — and missing one creates inconsistent data.

How to fix it

Learn normalization through 3NF step by step: identify functional dependencies, eliminate partial dependencies (2NF), and eliminate transitive dependencies (3NF). Normalize your schema first, then selectively denormalize only where performance requires it and you understand the tradeoffs.

#2CriticalConceptual

Writing Queries Without Understanding JOINs

JOINs are the fundamental mechanism for combining data from multiple tables. Students who do not understand inner, left, right, and full outer JOINs write incorrect queries that either exclude needed data or produce duplicate rows.

Using an INNER JOIN to find all customers and their orders, which excludes customers who have never placed an order. A LEFT JOIN would include all customers, with NULL for the order columns where no orders exist.

How to fix it

Draw Venn diagrams for each JOIN type: INNER (intersection), LEFT (all from left table), RIGHT (all from right table), FULL OUTER (union). Practice with a small dataset where you can verify results manually. Always ask: should rows without a match be included or excluded?

#3CriticalConceptual

Not Understanding Indexes and Query Performance

Without indexes, every query performs a full table scan. Students who design schemas and write queries without considering indexing create applications that become unusably slow as data grows.

Querying a million-row orders table with WHERE customer_id = 12345 without an index on customer_id, causing a full table scan that takes seconds instead of the milliseconds an indexed lookup would require.

How to fix it

Understand that indexes are B-tree (or hash) structures that enable fast lookup by specific columns. Create indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY. Use EXPLAIN ANALYZE to verify that the query planner uses your indexes.

#4MajorConceptual

Confusing Transaction Isolation Levels

Students memorize isolation level names without understanding which anomalies each level prevents and allows. This leads to either overly strict isolation (poor performance) or overly lax isolation (data inconsistency).

Not understanding the difference between a dirty read (reading uncommitted data), a non-repeatable read (data changes between two reads in the same transaction), and a phantom read (new rows appear between two identical queries).

How to fix it

Create a grid: rows are isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable), columns are anomalies (dirty read, non-repeatable read, phantom read). Mark which anomalies each level prevents. Then construct a concrete scenario for each anomaly type.

#5MajorConceptual

Writing N+1 Query Patterns

The N+1 problem occurs when code executes one query to fetch a list, then one additional query per item to fetch related data. This turns what should be one or two queries into hundreds or thousands.

Fetching all 100 orders with one query, then executing a separate query for each order to get the customer name — 101 queries total, when a single JOIN query would return all data at once.

How to fix it

Recognize the N+1 pattern: a loop that executes a query per iteration. Replace it with a JOIN that fetches all related data in a single query, or use IN clauses to batch lookups. Always watch for loops containing database calls.

#6MajorConceptual

Over-Normalizing Schemas

While normalization prevents anomalies, excessive normalization creates schemas with so many small tables that every query requires multiple JOINs, degrading both readability and performance.

Normalizing a user's first name and last name into separate tables because they are theoretically independent attributes, requiring a three-table JOIN to display a user's full name.

How to fix it

Normalize to 3NF as a starting point, then evaluate whether specific denormalizations improve query performance for your most common access patterns. Denormalization is a deliberate tradeoff — document why each denormalized element exists.

#7MajorStudy Habit

Not Using EXPLAIN to Analyze Queries

Students write queries and check only whether the result is correct. They never look at the execution plan to understand how the database actually processes the query, missing opportunities for optimization.

Writing a query with a WHERE clause on a non-indexed column that performs a full sequential scan on a large table, never realizing it because the query returns correct results (just slowly).

How to fix it

Run EXPLAIN ANALYZE on every query you write against a non-trivial dataset. Learn to read execution plans: identify sequential scans vs. index scans, sort operations, hash joins vs. nested loop joins. This skill separates competent engineers from novices.

#8MajorConceptual

Misunderstanding NULL Semantics

NULL in SQL is not zero, not an empty string, and not false. NULL represents unknown or missing data, and it has unique comparison behavior that catches students off guard.

Writing WHERE status != 'active' and expecting it to include rows where status is NULL, when in fact NULL != 'active' evaluates to NULL (not TRUE), so NULL rows are excluded from the result.

How to fix it

Remember: any comparison with NULL returns NULL, not TRUE or FALSE. Use IS NULL and IS NOT NULL for explicit NULL checks. When filtering, consider whether NULL values should be included and handle them explicitly with COALESCE or IS NULL conditions.

#9MajorStudy Habit

Not Practicing SQL on Real Datasets

SQL is a practical skill that requires hands-on practice. Students who only study SQL syntax from textbooks without writing and executing queries cannot develop the fluency that professional work requires.

Knowing the syntax of a window function (ROW_NUMBER, RANK, SUM OVER) from the textbook but being unable to write one from scratch to calculate a running total or rank customers by lifetime value.

How to fix it

Install PostgreSQL, load a non-trivial dataset (Northwind, Sakila, or a Kaggle dataset), and practice writing queries daily. Start with basic CRUD, progress to JOINs, subqueries, CTEs, and window functions. The only way to learn SQL is to write SQL.

#10MajorConceptual

Choosing SQL vs NoSQL Based on Trend Rather Than Requirements

Students default to NoSQL because it seems modern, or stick with SQL because it is familiar, without analyzing their specific data model, query patterns, and consistency requirements.

Choosing MongoDB for an application with highly relational data (users, orders, products with many-to-many relationships) because 'NoSQL scales better,' when the relational model fits the data naturally and SQL would simplify the application significantly.

How to fix it

Choose based on data characteristics: highly relational data with complex queries favors SQL. Document-oriented data with flexible schemas favors document stores. High-throughput key-value lookups favor key-value stores. The right choice depends on the problem, not the trend.

#11MinorStudy Habit

Ignoring Database Constraints

Foreign keys, unique constraints, check constraints, and NOT NULL constraints enforce data integrity at the database level. Students who rely on application code alone for validation create systems where bad data can enter through any access path.

Not adding a foreign key constraint between orders.customer_id and customers.id, allowing orders to reference non-existent customers when data is inserted directly or through an alternative code path.

How to fix it

Define constraints in the schema: primary keys, foreign keys, unique constraints, NOT NULL, and check constraints. These are the database's contract for data integrity. Application-level validation is complementary, not a substitute.

#12MinorConceptual

Not Understanding Write-Ahead Logging

Write-ahead logging (WAL) is how databases provide durability and crash recovery. Students skip this topic because it seems like an implementation detail, but it explains fundamental behavior around performance and reliability.

Not understanding why a committed transaction is guaranteed to survive a crash (because the WAL is flushed to disk before the commit is acknowledged), or why write-heavy workloads can be I/O-bound.

How to fix it

Study the WAL protocol: all changes are written to the log before being applied to data pages. This enables redo (replaying committed changes after a crash) and undo (rolling back uncommitted changes). Understanding WAL explains transaction durability and performance characteristics.

#13MinorStudy Habit

Writing Insecure Queries with String Concatenation

Building SQL queries by concatenating user input creates SQL injection vulnerabilities — one of the most common and devastating web application attacks.

Writing query = "SELECT * FROM users WHERE name = '" + userInput + "'", which allows an attacker to input "'; DROP TABLE users; --" and delete the entire table.

How to fix it

Always use parameterized queries (prepared statements). Never concatenate user input into SQL strings. This is not just a best practice — it is the only acceptable approach. Learn your language's parameterized query API and use it consistently.

#14MinorConceptual

Not Understanding the CAP Theorem

The CAP theorem states that a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. Students who do not understand this cannot reason about distributed database design choices.

Expecting a distributed NoSQL database to provide both strong consistency and high availability during a network partition, when the CAP theorem proves this is impossible.

How to fix it

Understand each property: Consistency (all nodes see the same data), Availability (every request gets a response), Partition tolerance (system works despite network failures). Since network partitions are inevitable, the real choice is between CP (consistent but may be unavailable) and AP (available but may be inconsistent).

#15MinorConceptual

Designing Schemas Without Considering Access Patterns

A schema that is perfectly normalized may perform poorly if it does not align with how the application actually reads and writes data. Schema design should be driven by query patterns, not just theoretical correctness.

Normalizing a user profile into five tables (name, address, preferences, settings, activity) when the application always loads the complete profile, requiring a five-table JOIN for the most common operation.

How to fix it

List your application's most common queries before designing the schema. Optimize the schema for these access patterns. Start normalized, then denormalize strategically where the most frequent queries would otherwise require expensive multi-table JOINs.

Quick Self-Check

Can I normalize a denormalized table to 3NF and explain each step?
Can I write a query with JOINs, GROUP BY, HAVING, and a window function without looking up syntax?
Can I read an EXPLAIN ANALYZE output and identify whether my query uses indexes or performs sequential scans?
Can I explain the difference between Read Committed and Repeatable Read isolation levels with concrete examples?
Can I explain why parameterized queries prevent SQL injection?

Pro Tips

✓Set up PostgreSQL locally and load a real dataset — Kaggle has excellent datasets for SQL practice. Write increasingly complex queries and analyze their execution plans.
✓Learn window functions (ROW_NUMBER, RANK, LAG, LEAD, SUM OVER) early. They appear in interviews and solve entire categories of problems that would otherwise require subqueries or application-level code.
✓For schema design, start with an ER diagram showing entities and relationships before writing CREATE TABLE statements. The visual representation catches design problems before they become schema problems.
✓Practice writing CTEs (Common Table Expressions) for complex queries — they dramatically improve readability compared to nested subqueries and are the modern approach to complex SQL.
✓Study how one real database engine works internally (PostgreSQL's documentation is excellent). Understanding B-tree indexes, the buffer pool, and the query planner transforms SQL from a magic incantation into a tool you truly understand.

15 Common Mistakes When Studying Databases (And How to Fix Them) | LearnByTeaching.ai

Designing Schemas Without Proper Normalization

Writing Queries Without Understanding JOINs

Not Understanding Indexes and Query Performance

Confusing Transaction Isolation Levels

Writing N+1 Query Patterns

Over-Normalizing Schemas

Not Using EXPLAIN to Analyze Queries

Misunderstanding NULL Semantics

Not Practicing SQL on Real Datasets

Choosing SQL vs NoSQL Based on Trend Rather Than Requirements

Ignoring Database Constraints

Not Understanding Write-Ahead Logging

Writing Insecure Queries with String Concatenation

Not Understanding the CAP Theorem

Designing Schemas Without Considering Access Patterns

Quick Self-Check

Pro Tips

More Databases Resources

Avoid databases mistakes by teaching it