Database Optimization Techniques for High-Performance Apps

Database performance can make or break your application. A slow database means slow page loads, frustrated users, and lost revenue. Yet many applications suffer from preventable performance issues due to poor query design, missing indexes, or suboptimal configurations. Let's explore proven techniques to optimize database performance.

Understanding Database Performance

Before optimizing, you need to understand what you're optimizing for:

Query response time: How quickly individual queries execute
Throughput: Number of queries processed per second
Concurrency: Handling multiple simultaneous connections
Resource utilization: CPU, memory, disk I/O, network
Scalability: Performance as data volume grows

1. Indexing Strategies

Indexes are the most powerful tool for improving query performance. They allow the database to find data without scanning entire tables.

When to Create Indexes

WHERE clauses: Columns frequently used in filtering
JOIN conditions: Foreign key columns
ORDER BY clauses: Columns used for sorting
GROUP BY clauses: Columns used for grouping
Unique constraints: Enforce uniqueness efficiently

Index Types

B-Tree Indexes (Default):

-- Most common, good for equality and range queries

CREATE INDEX idx_users_email ON users(email);

CREATE INDEX idx_orders_created ON orders(created_at);

Composite Indexes:

-- Multiple columns, order matters

CREATE INDEX idx_users_status_created ON users(status, created_at);

-- Efficient for:

WHERE status = 'active' AND created_at > '2024-01-01'

WHERE status = 'active'

-- NOT efficient for:

WHERE created_at > '2024-01-01'  -- Doesn't use first column

Partial Indexes:

-- Index only subset of rows
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

-- Smaller, faster for specific queries
SELECT * FROM users WHERE status = 'active' AND email = '[email protected]';

Full-Text Indexes:

-- For text search

CREATE FULLTEXT INDEX idx_products_search ON products(name, description);

SELECT * FROM products WHERE MATCH(name, description) AGAINST('laptop');

Index Best Practices

Don't over-index: Each index slows down writes and uses storage
Monitor index usage: Remove unused indexes
Consider column order: Most selective columns first in composite indexes
Index foreign keys: Essential for JOIN performance
Use covering indexes: Include all columns needed by query

2. Query Optimization

Well-written queries can be orders of magnitude faster than poorly written ones.

Use EXPLAIN to Analyze Queries

EXPLAIN ANALYZE

SELECT u.name, COUNT(o.id) as order_count

FROM users u

LEFT JOIN orders o ON u.id = o.user_id

WHERE u.status = 'active'

GROUP BY u.id, u.name;

Avoid SELECT *

-- Bad: Retrieves unnecessary data

SELECT * FROM users WHERE id = 123;

-- Good: Only retrieve needed columns

SELECT id, name, email FROM users WHERE id = 123;

Optimize JOINs

-- Ensure JOIN columns are indexed

CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Use appropriate JOIN type

-- INNER JOIN: Only matching rows

-- LEFT JOIN: All left table rows + matches

-- Avoid CROSS JOIN unless intentional

Limit Result Sets

-- Always use LIMIT for large tables

SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

-- Use pagination for large result sets

SELECT * FROM products

ORDER BY id

LIMIT 20 OFFSET 40;

Avoid N+1 Query Problem

-- Bad: N+1 queries (1 + N for each user)

SELECT * FROM users;

-- Then for each user:

SELECT * FROM orders WHERE user_id = ?;

-- Good: Single query with JOIN

SELECT u.*, o.*

FROM users u

LEFT JOIN orders o ON u.id = o.user_id;

Use EXISTS Instead of COUNT

-- Bad: Counts all rows

SELECT * FROM users WHERE (SELECT COUNT(*) FROM orders WHERE user_id = users.id) > 0;

-- Good: Stops at first match

SELECT * FROM users WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = users.id);

3. Database Schema Design

Good schema design prevents performance problems before they start.

Normalization vs Denormalization

Normalization (reduce redundancy):

Eliminates data duplication
Ensures data consistency
Requires JOINs for related data
Good for write-heavy workloads

Denormalization (optimize reads):

Stores redundant data
Reduces JOINs
Faster reads, slower writes
Good for read-heavy workloads

Choose Appropriate Data Types

-- Use smallest appropriate type

TINYINT    -- 0 to 255 (1 byte)

SMALLINT   -- -32,768 to 32,767 (2 bytes)

INT        -- -2B to 2B (4 bytes)

BIGINT     -- Very large numbers (8 bytes)

-- Use VARCHAR instead of CHAR for variable-length strings

VARCHAR(255) -- Only uses needed space

CHAR(255)    -- Always uses 255 bytes

-- Use ENUM for fixed sets

status ENUM('pending', 'active', 'inactive')

Partition Large Tables

-- Partition by date range

CREATE TABLE orders (

    id BIGINT,

    user_id INT,

    created_at DATE,

    ...

) PARTITION BY RANGE (YEAR(created_at)) (

    PARTITION p2022 VALUES LESS THAN (2023),

    PARTITION p2023 VALUES LESS THAN (2024),

    PARTITION p2024 VALUES LESS THAN (2025),

    PARTITION p2025 VALUES LESS THAN (2026)

);

4. Caching Strategies

Reduce database load by caching frequently accessed data.

Application-Level Caching

Redis/Memcached: Cache query results, computed values
Cache invalidation: Update or remove when data changes
TTL (Time To Live): Automatic expiration for stale data

Query Result Caching

-- MySQL query cache (deprecated in MySQL 8.0)

-- Use application-level caching instead

-- Example with Redis:

key = "user:123:profile"

cached = redis.get(key)

if cached:

    return cached

else:

    data = db.query("SELECT * FROM users WHERE id = 123")

    redis.setex(key, 3600, data)  # Cache for 1 hour

    return data

Materialized Views

-- Pre-compute expensive aggregations

CREATE MATERIALIZED VIEW user_order_stats AS

SELECT

    user_id,

    COUNT(*) as total_orders,

    SUM(total_amount) as total_spent,

    MAX(created_at) as last_order_date

FROM orders

GROUP BY user_id;

-- Refresh periodically

REFRESH MATERIALIZED VIEW user_order_stats;

5. Connection Pooling

Reuse database connections instead of creating new ones for each request.

Connection Pool Configuration

// Example with Node.js

const pool = mysql.createPool({

    host: 'localhost',

    user: 'app_user',

    password: 'password',

    database: 'myapp',

    connectionLimit: 10,      // Max connections

    queueLimit: 0,            // Unlimited queue

    waitForConnections: true, // Wait if all busy

    acquireTimeout: 10000     // 10 second timeout

});

Pool Sizing Guidelines

Start small: 10-20 connections often sufficient
Monitor usage: Increase if connections frequently maxed out
Consider database limits: Don't exceed max_connections
Multiple app servers: Total connections = pool_size × servers

6. Database Configuration Tuning

Optimize database server settings for your workload.

MySQL/MariaDB Key Settings

# Buffer pool (70-80% of available RAM)

innodb_buffer_pool_size = 8G

# Log file size (larger = better write performance)

innodb_log_file_size = 512M

# Connection limits

max_connections = 200

# Query cache (deprecated in MySQL 8.0)

query_cache_size = 0

# Temporary tables

tmp_table_size = 64M

max_heap_table_size = 64M

PostgreSQL Key Settings

# Shared buffers (25% of RAM)

shared_buffers = 4GB

# Effective cache size (50-75% of RAM)

effective_cache_size = 12GB

# Work memory per operation

work_mem = 64MB

# Maintenance work memory

maintenance_work_mem = 512MB

# WAL settings

wal_buffers = 16MB

checkpoint_completion_target = 0.9

7. Monitoring and Profiling

You can't optimize what you don't measure. Implement comprehensive monitoring.

Key Metrics to Monitor

Slow query log: Queries exceeding threshold
Query execution time: Average, P95, P99
Connection count: Active, idle, waiting
Cache hit ratio: Buffer pool effectiveness
Disk I/O: Reads, writes, latency
Lock contention: Deadlocks, wait times
Replication lag: For read replicas

Enable Slow Query Log

-- MySQL

SET GLOBAL slow_query_log = 'ON';

SET GLOBAL long_query_time = 1;  -- Log queries > 1 second

SET GLOBAL log_queries_not_using_indexes = 'ON';

-- PostgreSQL

ALTER SYSTEM SET log_min_duration_statement = 1000;  -- 1 second

SELECT pg_reload_conf();

Monitoring Tools

MySQL: MySQL Workbench, Percona Monitoring and Management
PostgreSQL: pgAdmin, pg_stat_statements
General: Datadog, New Relic, Prometheus + Grafana

8. Read Replicas and Sharding

Scale beyond single-server limits with replication and sharding.

Read Replicas

Primary: Handles all writes
Replicas: Handle read queries
Replication lag: Slight delay in replica data
Use cases: Analytics, reporting, read-heavy apps

Sharding

Horizontal partitioning: Split data across multiple databases
Shard key: Determines which shard holds data (e.g., user_id)
Challenges: Cross-shard queries, rebalancing
When to shard: Single database can't handle load

9. Batch Operations

Process multiple records in single operations instead of loops.

Bulk Inserts

-- Bad: Multiple single inserts
INSERT INTO users (name, email) VALUES ('John', '[email protected]');
INSERT INTO users (name, email) VALUES ('Jane', '[email protected]');
-- ... 1000 more times

-- Good: Single bulk insert
INSERT INTO users (name, email) VALUES
('John', '[email protected]'),
('Jane', '[email protected]'),
... -- All 1000 rows
('Bob', '[email protected]');

Batch Updates

-- Bad: Update in loop

UPDATE users SET status = 'active' WHERE id = 1;

UPDATE users SET status = 'active' WHERE id = 2;

-- ...

-- Good: Single update with IN clause

UPDATE users SET status = 'active' WHERE id IN (1, 2, 3, ..., 1000);

10. Regular Maintenance

Keep your database healthy with regular maintenance tasks.

Maintenance Checklist

Analyze tables: Update statistics for query planner
Optimize tables: Defragment and reclaim space
Update indexes: Rebuild fragmented indexes
Purge old data: Archive or delete obsolete records
Monitor growth: Plan for storage expansion
Review slow queries: Optimize problematic queries
Test backups: Ensure recovery procedures work

-- MySQL maintenance

ANALYZE TABLE users;

OPTIMIZE TABLE users;

-- PostgreSQL maintenance

VACUUM ANALYZE users;

REINDEX TABLE users;

Common Performance Pitfalls

Missing indexes: Most common cause of slow queries
Over-indexing: Slows down writes unnecessarily
N+1 queries: Fetching related data in loops
Large transactions: Holding locks too long
Unbounded queries: No LIMIT on large tables
Ignoring monitoring: Not knowing where problems are
Premature optimization: Optimizing before measuring

Conclusion

Database optimization is an ongoing process, not a one-time task. Start with proper indexing and query optimization, monitor performance continuously, and scale strategically as your application grows.

Remember the 80/20 rule: often 20% of your queries cause 80% of performance issues. Focus on identifying and optimizing those critical queries first.

At D2 Enterprises, we've optimized databases handling billions of records and millions of queries per day. Whether you're experiencing performance issues or planning for scale, these techniques provide a solid foundation for building high-performance applications.