Friday, March 21, 2014

String Processing: Compose Last Name + First Initial and Create Email Address

String Processing: Compose First Name + First Initial and Create Email Addresses

I've imported a List of teacher's names. Now I need to create usernames and email addresses for them to upload to an app. Here is the List; note there are nested Lists within it.

In[226]:= teacherNames={{{"Bazan, Dorothy","Bleakney, Tom"},{"Calorio, Marie","Cardoso, Brian","Carpenito, Courtney","Carton, Jacqueline"},{"Coffey, Tara","Conaty, Thomas"}},{{"Davenport, Joseph","DiSabatino, Jennifer"},{"Doran, Robert","Dsida, Terri"},"Eramo, Paula","Needle, Julie",{"O'Connell, Marisa","O'Neil, Christopher"}}};

The nesting is just going to get in the way so let's remove extra braces with Flatten. Incidentally, you need not hesitate to build up deeply nested Lists in a function and then Flatten them all at once, which is very fast. To keep seeing these names as Strings, we Postfix InputForm to display the quotation marks.

In[227]:= teacherNamesFlattened=Flatten@teacherNames;teacherNamesFlattened//InputForm

{"Bazan, Dorothy", "Bleakney, Tom", "Calorio, Marie", "Cardoso, Brian",
 "Carpenito, Courtney", "Carton, Jacqueline", "Coffey, Tara", "Conaty, Thomas",
 "Davenport, Joseph", "DiSabatino, Jennifer", "Doran, Robert", "Dsida, Terri",
 "Eramo, Paula", "Needle, Julie", "O'Connell, Marisa", "O'Neil, Christopher"}

Let's lose the apostrophes. It's as simple as using a replacement Rule sending apostrophe to no character ("" ' "" -> ""). Without InputForm we don't see the quotation marks, but unless we use ToExpression to convert it, a String remains a String forever.

In[228]:= teacherNamesFlattenedCleaned=StringReplace[teacherNamesFlattened,"'"->""]
Out[228]= {Bazan, Dorothy,Bleakney, Tom,Calorio, Marie,Cardoso, Brian,Carpenito, Courtney,Carton, Jacqueline,Coffey, Tara,Conaty, Thomas,Davenport, Joseph,DiSabatino, Jennifer,Doran, Robert,Dsida, Terri,Eramo, Paula,Needle, Julie,OConnell, Marisa,ONeil, Christopher}

To compose usernames, we'll use a standard format, for instance, lastName + firstInitial. We use StringReplace, which takes one or more replacement Rules as arguments. We construct a pattern using deconstructed pieces of the first and last name. Note that named Blank patterns work in this type of String pattern matching.

Since I'm now going to perform several operations on the source List, for clarity, I will use Postfix with Function syntax and avoid deeply nested, hard-to-read code. The rationale behind this syntax is in A New Mathematica Programming Style Part 2..

This code works, but what is going wrong?

In[229]:= userNames=teacherNamesFlattenedCleaned//StringReplace[#,lastName__<>", "<>firstName__:>lastName<>First@Characters@firstName]&

During evaluation of In[229]:= StringJoin::string: String expected at position 1 in lastName__<>, <>firstName__. >>
During evaluation of In[229]:= StringJoin::string: String expected at position 3 in lastName__<>, <>firstName__. >>

Out[229]= {BazanD,BleakneyT,CalorioM,CardosoB,CarpenitoC,CartonJ,CoffeyT,ConatyT,DavenportJ,DiSabatinoJ,DoranR,DsidaT,EramoP,NeedleJ,OConnellM,ONeilC}

The error is due to using StringJoin ("<>") correctly to compose the deconstructed Strings but incorrectly to deconstruct them with patterns. Using the correct operator, StringExpression ("~~") solves the problem. Also let's convert the usernames to all lower case. Since ToLowerCase is Listable, we don't need to Map it over all the usernames.

In[230]:= userNames=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>lastName<>First@Characters@firstName]&//ToLowerCase

Out[230]= {bazand,bleakneyt,caloriom,cardosob,carpenitoc,cartonj,coffeyt,conatyt,davenportj,disabatinoj,doranr,dsidat,eramop,needlej,oconnellm,oneilc}

We can just as easily compose usernames with a reversed form, firstInitial + lastName.

In[231]:= userNames2=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>First@Characters@firstName<>lastName]&//ToLowerCase

Out[231]= {dbazan,tbleakney,mcalorio,bcardoso,ccarpenito,jcarton,tcoffey,tconaty,jdavenport,jdisabatino,rdoran,tdsida,peramo,jneedle,moconnell,coneil}

To compose email addresses, in a similar fashion, we'll use a template form,

In[232]:= emailAddresses=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>firstName<>"."<>lastName<>""]&//ToLowerCase

Out[232]= {,,,,,,,,,,,,,,,}

Now I'd like to make one List with both username and email address. Here is a typical use of Thread, where using the abbreviated operator for List, instead of Thread[List[userNames2,emailAddresses]] makes the code simple to understand.

In[233]:= namesAndEmails=Thread[{userNames2,emailAddresses}]

Out[233]= {{dbazan,},{tbleakney,},{mcalorio,},{bcardoso,},{ccarpenito,},{jcarton,},{tcoffey,},{tconaty,},{jdavenport,},{jdisabatino,},{rdoran,},{tdsida,},{peramo,},{jneedle,},{moconnell,},{coneil,}}

Finally I'll Export the List to a file in my Documents directory. It is safest to use FileNameJoin to compose a filename, since it will catch syntax errors as well as make sure it's in the correct format for your operating system.

In[234]:= Export[FileNameJoin@{$UserDocumentsDirectory,"HomeTownUserNames.xlsx"}, namesAndEmails]

Out[234]= C:\Users\kwcarlso\Documents\HomeTownUserNames.xlsx

No comments:

Post a Comment