CSV loads to MySQL fast: Loading large datasets in few minutes

October 27, 2024

You might encounter situations where you need to pre-populate test data for performance testing but you need huge records of data which is taking ages if runnig with normal insert query.

OK, I got you. The following is the steps on how to update MySQL configuration and loading fast using CSV file.

Optional Config for Better Performance Before Loading

The following configurations are good to set for better performance before loading.

SET GLOBAL max_allowed_packet = 1G;
SET GLOBAL sql_mode = '';
SET GLOBAL innodb_buffer_pool_size = 1G;
SET FOREIGN_KEY_CHECKS = 0;
SET UNIQUE_CHECKS = 0;

Create the Table

For this demo, user table will be created. This can be any table of your choice.

    CREATE TABLE user (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    age INT NULL,
    email VARCHAR(255),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

Prepare the CSV File

For this demo, the following file will be saved as data.csv. This can be any name.

id,name,age,email,created_at,updated_at
1,John Doe,30,johndoe@example.com,2024-06-25 10:00:00,2024-06-25 10:00:00
2,Jane Smith,25,janesmith@example.com,2024-06-25 11:00:00,2024-06-25 11:00:00

Load it

LOAD DATA INFILE 'C:/path/to/your/data.csv'
INTO TABLE user
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, name, age, email, created_at, updated_at);

PS secure-file-priv issues can be occurred when loading. This is the security feature which restricts the location from which files can be loaded. If that the case, you have two options:

  • Option 1. Put the data.csv file in the configured directory, if there is one.
  • Option 2. Turn this feature off temporarily, which you should not do in production.

To check the configured directory:

SHOW VARIABLES LIKE 'secure-file-priv';

To turn off the settings. Open my.ini or my.cnf of your MySQL configuration.

[mysqld]
secure-file-priv=""

Restart the MySQL server to apply the changes.

Handling Null Values

You need to specify how MySQL should interpret missing or blank values in the CSV file to handle null values.

The following example shows that @age is a user variable that temporarily stores the value from the CSV file. The NULLIF function replaces empty strings with NULL.

LOAD DATA INFILE 'C:/path/to/your/data.csv'
INTO TABLE big_dataset
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, name, @age, email, created_at, updated_at)
SET age = NULLIF(@age, '');

By now your data should be imported as expected.

Post-Load Steps

If you make changes for the Step 1, now, it is time to revert back the changes.

SET GLOBAL max_allowed_packet = 64M;
SET GLOBAL sql_mode = 'STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION';
SET GLOBAL innodb_buffer_pool_size = 128M;
SET FOREIGN_KEY_CHECKS = 1;
SET UNIQUE_CHECKS = 1;

Conclusion

This is all you need but properly configuring your MySQL server and handling null values correctly are the minimum requirements to load a smooth and error-free data.

Happy Data Importing!