PHP
Processing Large Datasets Efficiently with Eloquent `chunk` and `chunkById`
Discover how to efficiently iterate and process large numbers of Eloquent records using `chunk` and `chunkById` to reduce memory consumption and prevent timeouts.
<?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class Product extends Model
{
// ... model definition ...
}
// Scenario: Process thousands or millions of products
// 1. Using `chunk()`: Iterates over records in chunks
// Best when you don't care about the order or if IDs are not sequential/unique.
Product::chunk(1000, function ($products) {
foreach ($products as $product) {
// Process each product in the chunk
$product->update(['status' => 'processed']);
}
// Log::info('Processed a chunk of products.');
});
// 2. Using `chunkById()`: Iterates by ID, more efficient for large datasets
// This method is generally preferred as it's more resilient to changes
// in the underlying data during iteration (e.g., records being added).
Product::where('category_id', 5)
->chunkById(500, function ($products) {
foreach ($products as $product) {
// Process each product in the chunk
// For example, trigger a job for each product
// ProcessProductJob::dispatch($product->id);
echo "Processing product ID: {$product->id}
";
}
}, $column = 'id', $alias = null);
// Example of more complex processing within chunk
Product::chunk(200, function ($products) {
$productIds = $products->pluck('id')->toArray();
// Perform a bulk update or other batch operation
Product::whereIn('id', $productIds)->update(['processed_at' => now()]);
echo "Updated a chunk of " . count($productIds) . " products.
";
});
How it works: When dealing with extremely large datasets, fetching all records into memory at once can lead to memory exhaustion and script timeouts. Eloquent's `chunk()` and `chunkById()` methods provide an elegant solution by retrieving a small subset of records at a time and passing them to a closure for processing. `chunk()` uses an `OFFSET` and `LIMIT` approach, while `chunkById()` relies on the primary key, making it more efficient and reliable for very large tables where new records might be added during iteration. This ensures your application can process extensive data without performance bottlenecks.