Monday, May 28, 2012

Import Data, Parse It, and Crunch It

These days I do a lot of data import and massage it one way or another. We are unraveling an interesting technical issue: How does mild electrical stimulation of the spinal cord (aka neuromodulation) stop chronic, debilitating pain signals? No one knows. We are close to presenting the most detailed theory yet elucidated. The project was conceived by Dr. Jeffrey Arle (neurosurgeon) and Dr. Jay Shils (electrical engineer and neurophysiologist), pioneers of neuromodulation. Jeff Arle came up with the key insight into how neuromodulation might work, and it appears in concept he is right (look for our article later this year).




Researchers around the world are exploring neuromodulation for diverse disorders such as chronic pain, movement disorders, depression and drug dependence.

Drs. Arle and Shils lead the way with their new book, Essential Neuromodulation (Academic Press 2011, 504 pp), which compiles critical work by the field's major contributors and provides a springboard for the new applications.







Nick Iftimia did the initial, very detailed research on the spinal cord. I built the initial 2200-connection model and then Nick built the more sophisticated 11,000-connection model. Then I used SolidWorks computer-aided design (CAD) software to build an accurate 3D model of three spinal cord segments and various electrical stimulators, of which the "business end" is an array of platinum-iridium electrodes.

Jay Shils selected COMSOL for our finite-element analysis and I imported the SolidWorks model into it and modeled the physics, that is, the electric potential field generated by the stimulator electrode array. Then I export part of the model, import it into Mathematica, and crunch the data for further processing by Longzhi Mei, our chief programmer, who works in C++ and Java.

All that is a prelude to how I use Mathematica in just this one portion of the project (I use it extensively elsewhere, too). Here is typically how I proceed. First I set a file path.

COMSOLFilePath="C:\\Users\\82V\\Documents\\Neuroscience\\Arle-Shils\\COMSOL\\Dura Thickening-Scar Studies\\Multi-Level-Scar-Studies\\5-Level Scar Model with Revised Geometry\\Calibration to Lee";

And then a file directory path.

SetDirectory@FileNameDrop[COMSOLFilePath,-1]

C:\Users\82V\Documents\Neuroscience\Arle-Shils\COMSOL\Dura Thickening-Scar Studies\Multi-Level-Scar-Studies\5-Level Scar Model with Revised Geometry\Calibration to Lee

I might check the FileNames in the directory.

FileNames@"*.txt"

{New-Control-WhiteMatter-1p5V-300x20x300.txt,New-Control-WhiteMatter-1p65V-300x20x300.txt,New-Control-WhiteMatter-1p6V-300x20x300.txt,New-Control-WhiteMatterV-Cathode-3v_Anodes3v_300x20x300.txt,Old-Control-WhiteMatterV-300x20x300.txt,Old-Control-WhiteMatterV-30x20x30.txt}

Then, for in-Notebook use, I just use FileNameSetter to create a convenient button for navigating and selecting a file, with the familiar Windows dialog box, to process.

FileNameSetter@Dynamic@COMSOLFilePath







FileNameSetter with Dynamic allows using the button repeatedly to select different files.Therefore I might need to just check which filepath is currently assigned to the variable, COMSOLFilePath.

COMSOLFilePath

C:\Users\82V\Documents\Neuroscience\Arle-Shils\COMSOL\Dura Thickening-Scar Studies\Multi-Level-Scar-Studies\5-Level Scar Model with Revised Geometry\Calibration to Lee\VMax@-0p65.txt

Now, obviously all those housekeeping functions are preliminary to doing something interesting with the data in the file. I Import it, and, briefly, I've learned that the "Table" format usually gives me the cleanest Mathematica representation, but note that a great feature of Mathematica is that it has well over a hundred Import and Export formats.

mesh1p12=Import[COMSOLFilePath,"Table"];

I take a quick look at the data file header and first few lines of data, which in this case are {x, y, z, V} where V is electrical potential in volts.

mesh1p12[[1;;15]]

{{%,Model:,5-level-Scar-Model-revised-Geometry-Control-Finer.mph},{%,Version:,COMSOL,4.2.0.150},{%,Date:,May,28,2012,,15:57},{%,Dimension:,3},{%,Nodes:,274151},{%,Expressions:,1},{%,Description:,Electric,potential},{%,x,y,z,V,(V)},{16.3066,4.64461,-1.77884,0.001474},{16.5566,4.47377,-1.76414,0.001851},{16.3066,4.36936,-1.74021,0.00151},{16.3066,4.48662,-1.96791,0.001486},{16.8066,4.36807,-1.73984,0.002314},{16.5556,4.17127,-1.65972,0.001872},{16.6095,4.24711,-1.89929,0.001917}}

I might check the Length of the file; here the Length measures the number of finite element mesh elements plus the header lines (8 of them).

mesh1p12//Length

274159

Now I tell Mathematica to skip the header (start at Part # 9, go to the end) and then find the Minimum of just column 4, the electrical potential. 

Min@mesh1p12[[9;;,4]]

-1.12

Those are just typical Mathematica "one-liners." Now I need to do something more sophisticated, and repeatedly. The resulting function is still concise, thanks to Mathematica incorporating a high-level programming language. Just to give you an idea of what is going on, here is a slice plot of the data at one y-value. This is what the computer in your spinal cord looks like. The gray matter, which are the neurons or processors, is the internal symmetric silhouette with the "horns," and the white matter, which are the wiring or bus lines, is the ellipse surrounding the gray matter. The white matter, that is the wires, receives signals from the peripheral nerves and the brain, feeds them to the gray matter, which is the computer that processes them, and then relays them back to brain or muscle servomotors.

ListPlot[slicePlotXZ,PlotRange->All,AspectRatio->Automatic]



Here's a overview of what the function does, as it's written in my Notebook. "NaN" is COMSOL-speak for "not a number," i.e. no numerical value. Hence I want to first get rid of all those records. Then, since we are only interested in the dorsal (back) side of the spinal cord, I lose all records with x<10, leaving the back side records to Export:

"This will 1) remove all NaN entries and 2) all records in the ventral side of the cord, as determined by the reductionCriterion, which has to be in the form, for example, {x_, ___} /; x<10 . Remember the reduction criterion is taken by DeleteCase, not Cases!"

And here's the function. For my convenience and edification, I ask it to Print various things I want to watch, such as how long it takes to Import these large files, the Length of the initial file, the Length of the reduced file, how long it takes to Export the much smaller crunched file, and the size of the reduced file in bytes.

COMSOL File Reducer

Clear@COMSOLFileReducer2;
COMSOLFileReducer2[COMSOLFilePath_String,reductionCriterion_]:=
Module[{COMSOLOutputFile=COMSOLFilePath,COMSOLwhiteMatterDataFile,COMSOLwhiteMatterDataFileNoNaN,COMSOLwhiteMatterDataFileReducedDomain,exportReducedFileName},
SetDirectory@FileNameDrop[COMSOLFilePath,-1];
Print@Timing[COMSOLwhiteMatterDataFile=Import[COMSOLOutputFile,"Table"];];
Print@Length@COMSOLwhiteMatterDataFile;
COMSOLwhiteMatterDataFileNoNaN=DeleteCases[COMSOLwhiteMatterDataFile,{___,"NaN"}];
COMSOLwhiteMatterDataFileReducedDomain=DeleteCases[COMSOLwhiteMatterDataFileNoNaN, reductionCriterion];
Print@Length@COMSOLwhiteMatterDataFileReducedDomain;
Print@COMSOLwhiteMatterDataFileReducedDomain[[1;;12]];
exportReducedFileName="Reduced"<>FileNameTake@COMSOLOutputFile;
Print@Timing[Export[exportReducedFileName,COMSOLwhiteMatterDataFileReducedDomain,"Table"];];
FileByteCount@exportReducedFileName]

And finally, here is the function in operation, starting with the command line, followed by the Print statement output, which as mentioned is just for my purview. The essential functions reduces the data file to essentials and Exports it in the appropriate format. You can see how much quicker it takes to Export the crunched file of just 1.5MB compared to the Imported file, which typically is 90MB.

In[25]:= COMSOLFileReducer2[COMSOLFilePath,{x_,y_,z_,V_}/;x<10]

During evaluation of In[25]:= {113.881,Null}
During evaluation of In[25]:= 1800008
During evaluation of In[25]:= 36942
During evaluation of In[25]:= {{%,Model:,5-level-Scar-Model-revised-Geometry-Control-Finer.mph},{%,Version:,COMSOL,4.2.0.150},{%,Date:,May,28,2012,,12:29},{%,Dimension:,3},{%,Nodes:,1800000},{%,Expressions:,1},{%,Description:,Electric,potential,in,White,Matter},{%,x,y,z,V,(V)},{10.0067,-55.9804,-6.09667,0.0000541505},{10.1067,-55.9804,-6.09667,0.0000542645},{10.2067,-55.9804,-6.09667,0.0000543788},{10.3067,-55.9804,-6.09667,0.0000544941}}
During evaluation of In[25]:= {3.62,Null}
Out[25]= 1559151








Examples of Postfix Usage and Precedence Analysis

As stated in the Mathematica archives and this blog (two parts), one of the unique aspects of my book is extensive, pervasive usage of Prefix to simplify code and save keystrokes. Like any usage of abbreviated operators, this can require knowledge of operator Precedence.


First, here's a common example of abbreviated operator precedence without Prefix. Do we need parens around the exponential coefficients?

Plot[E^-r t/.{r->.5},{t,-2,10}]














The graph shows something is wrong and Precedence analysis tells us yes, we need parens.

Precedence/@{Power,Minus,Times,ReplaceAll}

{590.,480.,400.,110.}

Plot[E^(-r t)/.{r->.5},{t,-2,10}]















Here's a typical example using Prefix. Will this function work as intended? (It's from S11: What's New in Mathematica 8 by Paul Wellin.)

f1@x_ := 1/x Cos[Log@x/x]; Plot[f1@x, {x, 0, .01}]

What would speed up Precedence analysis would be seeing an operator's Precedence popup on Mouseover, like a Tooltip. And a more minor recommendation to WRI is to make Precedence Listable to avoid needing Map. But that only saves one keystroke and I'm after much bigger game by recommending extensive usage of Prefix.

A tip is to list the operators in the order they appear in your function. Then it's easier to see which are 'stickier' than if they are out of order. Anyway, using Precedence, we see that the function will work as intended.


Unprotect@Precedence; SetAttributes[Precedence,Listable]; Protect@Precedence; Precedence@{Divide,Times,Prefix}

{470.,400.,640.}


f1@x_ := 1/x Cos[Log@x/x]; Plot[f1@x, {x, 0, .01}]














Now that we know the Precedence of Prefix and Divide, we can write the following interesting usage of Range with impunity; we don't need Table or Map. It works because Divide is Listable.

Range@5/6

{1/6,1/3,1/2,2/3,5/6}