Hello all,
I was wondering, can FasterCSV be used to clean a csv file of unwanted
characters, endline spaces, tab spaces and other characters?
I have a csv file 6,000 lines long and it comes from another department.
I have been working to read this file into a database via FasterCSV.
Went I start to read the file in, I get the dreaded
"FasterCSV::MalformedCSVError" error message. I've worked with this in
the past and the solution was to "clean" the data. EOL spaces and other
undesired characters were scattered within the file.
I was wondering, can FasterCSV be used to clean a csv file of unwanted
characters, endline spaces, tab spaces and other characters?
I believe the answer to this question is yes. I've already shown you how to modify headers as they are read. There's a similar system for normal fields.
Now, this does not modify the file on disk, obviously. If you want to do that, it would be best to write out a new and corrected file. You could then rename the new file to replace the old, if needed.
I have a csv file 6,000 lines long and it comes from another department.
I have been working to read this file into a database via FasterCSV.
Went I start to read the file in, I get the dreaded
"FasterCSV::MalformedCSVError" error message. I've worked with this in
the past and the solution was to "clean" the data. EOL spaces and other
undesired characters were scattered within the file.
FasterCSV does need valid CSV data, yes. If you have non-valid CSV data, it will need to be cleaned before FasterCSV can read it.
Is it possible to use FasterCSV for this task?
If your data is valid CSV, FasterCSV can read and transform it. If your data is not valid CSV, you will need to fix it before FasterCSV can read it.
I have read and imported into DB simple CSV data in the past and never
had problems like this file.
FasterCSV does need valid CSV data, yes. If you have non-valid CSV
data, it will need to be cleaned before FasterCSV can read it.
The question of validity is confusing. When a file is saved via
MS-Excel isn't it valid?
I understand that there are several different formats (MS-DOS, Windows,
MAC) when considering saving a CSV. Which one should be chosen?
FasterCSV does need valid CSV data, yes. If you have non-valid CSV
data, it will need to be cleaned before FasterCSV can read it.
The question of validity is confusing. When a file is saved via
MS-Excel isn't it valid?
I expect assume FasterCSV would read the files Excel saves, yes. (Hope I don't end up regretting that.)
I understand that there are several different formats (MS-DOS, Windows,
MAC) when considering saving a CSV. Which one should be chosen?
I doubt it matters. The only different should be line endings, which FasterCSV is going to try and guess anyway.
I'm betting your problem is one of encodings. Do you have some non-ASCII data in the spreadsheet? If that data is not UTF-8, it could be tripping up FasterCSV's parser.
Well,
As you suggested below, it was a problem with encoding.
I'm betting your problem is one of encodings. Do you have some
non-ASCII data in the spreadsheet? If that data is not UTF-8, it could
be tripping up FasterCSV's parser.
What I did was I took the file and opened it in 'TextEdit' a Mac version
of MS-Notepad. Saved the file in UTF-8 and everything took off.
I'm going to try the same task in MS-Notepad (when I fire up the Windows
VM).
Well,
As you suggested below, it was a problem with encoding.
I'm betting your problem is one of encodings. Do you have some
non-ASCII data in the spreadsheet? If that data is not UTF-8, it could
be tripping up FasterCSV's parser.
What I did was I took the file and opened it in 'TextEdit' a Mac version
of MS-Notepad.
TextEdit isn't a Mac version of Notepad. It's a text editor that comes
with the OS, but the similarities end there.
Saved the file in UTF-8 and everything took off.
I'm going to try the same task in MS-Notepad (when I fire up the Windows
VM).
It may not work. Notepad is ridiculously underpowered compared to
TextEdit.