Understanding performance differences: JavaScript vs WebAssembly in CSV parsing — A deep dive
When migrating a CSV parser from JavaScript to WebAssembly (Rust), I encountered some surprising results. I’ll be deep diving into the technical details of both implementations and explore why the JavaScript version performed slightly better.
The Implementations
Both parsers handle similar tasks:
- Split input into lines
- Parse headers
- Process data rows
- Convert values to appropriate types (numbers/strings)
Redacted/Simplified Javascript version
function parseCSV(csvString) {
const lines = csvString?.trim()?.split('\n');
const headers = lines[0]?.match(/(".*?"|[^,]+)/g)
?.map(header => header.replace(/^"(.*)"$/, '$1').trim());
return lines?.slice(1).map(line => {
const values = line.match(/(".*?"|[^,]+)/g) || [];
const row = {};
headers?.forEach((header, index) => {
const value = values[index]?.replace(/^"(.*)"$/, '$1').trim() || '';
const numValue = Number(value);
row[header] = isNaN(numValue) ? value : numValue;
});
return row;
});
}
Redacted/Simplified Rust Version
#[wasm_bindgen]
pub fn parse_csv(csv_string: Option<String>) -> Result<JsValue, JsValue> {
let lines: Vec<&str> = csv_string.trim().split('\n').collect();
let headers: Vec<String> = lines[0].split(',')
.map(|h| h.trim().trim_matches('"').to_string())
.collect();
let rows: Vec<Row> = lines[1..].iter()
.map(|line| {
let values: Vec<&str> = line.split(',')
.map(|v| v.trim().trim_matches('"'))
.collect();
// Create row with type conversion...
})
.collect();
Ok(to_value(&rows)?)
}
Performance Analysis
The JavaScript version consistently performed 1–2ms faster than the WebAssembly version. Here’s why:
1. Serialization Overhead (Most Significant Factor)
The biggest performance hit comes from the necessary serialization/deserialization between JavaScript and WebAssembly. The Rust code needs to:
// This conversion has measurable cost
#[wasm_bindgen]
pub fn parse_csv(csv_string: Option<String>) -> Result<JsValue, JsValue>
- Convert JavaScript string input to Rust String
- Convert Rust structures back to JavaScript objects
- Use
serde
for serialization (serde-wasm-bindgen implementation)
Use wasm-bindgen
for type conversion (wasm-bindgen string conversion)
2. V8’s String Optimization
JavaScript’s garbage collector is highly optimized for string operations
const values = line.match(/(".*?"|[^,]+)/g) || [];
V8’s string handling benefits from:
Ropes data structure for string concatenation (V8 Blog: String data structures)
Optimized regular expression engine with backtracking (V8 Blog: Regexp optimization)
Rust:
let values: Vec<&str> = line.split(',')
.map(|v| v.trim().trim_matches('"'))
.collect();
Each trim()
and trim_matches()
creates new string slices, requiring additional bounds checking.
3. Memory Layout and Access Patterns
JavaScript strings are immutable and optimized for:
- Fast substring operations
- Efficient character access through inline caching
// JS: Direct access to string data
const headers = lines[0].split(',');
// WASM: Must copy between linear memory and JS heap
let headers: Vec<String> = lines[0]
.split(',')
.map(|h| h.to_string())
.collect();
WebAssembly memory access requires:
- Boundary checking
- Linear memory translation
WebAssembly Memory Model: https://webassembly.github.io/spec/core/syntax/modules.html#memories
V8’s memory management: https://v8.dev/blog/trash-talk
Conclusion
The performance difference (1–2ms slower in WASM) is primarily due to:
- Serialization costs when crossing JS-WASM boundary
- V8’s highly optimized string handling
- Additional memory management overhead in WASM
I for one still remain hopeful for the adoption of WASM, while this usecase might not have been the best fit for it — there are most certainly longer running and compute heavy tasks such as photo/video editting along with memory intensive applications such as Figma that greatly benefit from using WebAssembly!