Friday, March 21, 2014

String Processing: Compose Last Name + First Initial and Create Email Address

String Processing: Compose First Name + First Initial and Create Email Addresses

I've imported a List of teacher's names. Now I need to create usernames and email addresses for them to upload to an app. Here is the List; note there are nested Lists within it.

In[226]:= teacherNames={{{"Bazan, Dorothy","Bleakney, Tom"},{"Calorio, Marie","Cardoso, Brian","Carpenito, Courtney","Carton, Jacqueline"},{"Coffey, Tara","Conaty, Thomas"}},{{"Davenport, Joseph","DiSabatino, Jennifer"},{"Doran, Robert","Dsida, Terri"},"Eramo, Paula","Needle, Julie",{"O'Connell, Marisa","O'Neil, Christopher"}}};

The nesting is just going to get in the way so let's remove extra braces with Flatten. Incidentally, you need not hesitate to build up deeply nested Lists in a function and then Flatten them all at once, which is very fast. To keep seeing these names as Strings, we Postfix InputForm to display the quotation marks.

In[227]:= teacherNamesFlattened=Flatten@teacherNames;teacherNamesFlattened//InputForm

Out[227]//InputForm=
{"Bazan, Dorothy", "Bleakney, Tom", "Calorio, Marie", "Cardoso, Brian",
 "Carpenito, Courtney", "Carton, Jacqueline", "Coffey, Tara", "Conaty, Thomas",
 "Davenport, Joseph", "DiSabatino, Jennifer", "Doran, Robert", "Dsida, Terri",
 "Eramo, Paula", "Needle, Julie", "O'Connell, Marisa", "O'Neil, Christopher"}

Let's lose the apostrophes. It's as simple as using a replacement Rule sending apostrophe to no character ("" ' "" -> ""). Without InputForm we don't see the quotation marks, but unless we use ToExpression to convert it, a String remains a String forever.

In[228]:= teacherNamesFlattenedCleaned=StringReplace[teacherNamesFlattened,"'"->""]
Out[228]= {Bazan, Dorothy,Bleakney, Tom,Calorio, Marie,Cardoso, Brian,Carpenito, Courtney,Carton, Jacqueline,Coffey, Tara,Conaty, Thomas,Davenport, Joseph,DiSabatino, Jennifer,Doran, Robert,Dsida, Terri,Eramo, Paula,Needle, Julie,OConnell, Marisa,ONeil, Christopher}

To compose usernames, we'll use a standard format, for instance, lastName + firstInitial. We use StringReplace, which takes one or more replacement Rules as arguments. We construct a pattern using deconstructed pieces of the first and last name. Note that named Blank patterns work in this type of String pattern matching.

Since I'm now going to perform several operations on the source List, for clarity, I will use Postfix with Function syntax and avoid deeply nested, hard-to-read code. The rationale behind this syntax is in A New Mathematica Programming Style Part 2..

This code works, but what is going wrong?

In[229]:= userNames=teacherNamesFlattenedCleaned//StringReplace[#,lastName__<>", "<>firstName__:>lastName<>First@Characters@firstName]&

During evaluation of In[229]:= StringJoin::string: String expected at position 1 in lastName__<>, <>firstName__. >>
During evaluation of In[229]:= StringJoin::string: String expected at position 3 in lastName__<>, <>firstName__. >>

Out[229]= {BazanD,BleakneyT,CalorioM,CardosoB,CarpenitoC,CartonJ,CoffeyT,ConatyT,DavenportJ,DiSabatinoJ,DoranR,DsidaT,EramoP,NeedleJ,OConnellM,ONeilC}

The error is due to using StringJoin ("<>") correctly to compose the deconstructed Strings but incorrectly to deconstruct them with patterns. Using the correct operator, StringExpression ("~~") solves the problem. Also let's convert the usernames to all lower case. Since ToLowerCase is Listable, we don't need to Map it over all the usernames.

In[230]:= userNames=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>lastName<>First@Characters@firstName]&//ToLowerCase

Out[230]= {bazand,bleakneyt,caloriom,cardosob,carpenitoc,cartonj,coffeyt,conatyt,davenportj,disabatinoj,doranr,dsidat,eramop,needlej,oconnellm,oneilc}

We can just as easily compose usernames with a reversed form, firstInitial + lastName.

In[231]:= userNames2=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>First@Characters@firstName<>lastName]&//ToLowerCase

Out[231]= {dbazan,tbleakney,mcalorio,bcardoso,ccarpenito,jcarton,tcoffey,tconaty,jdavenport,jdisabatino,rdoran,tdsida,peramo,jneedle,moconnell,coneil}

To compose email addresses, in a similar fashion, we'll use a template form, firstName.lastName@schoolDomain.org.

In[232]:= emailAddresses=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>firstName<>"."<>lastName<>"@hometownschools.org"]&//ToLowerCase

Out[232]= {dorothy.bazan@hometownschools.org,tom.bleakney@hometownschools.org,marie.calorio@hometownschools.org,brian.cardoso@hometownschools.org,courtney.carpenito@hometownschools.org,jacqueline.carton@hometownschools.org,tara.coffey@hometownschools.org,thomas.conaty@hometownschools.org,joseph.davenport@hometownschools.org,jennifer.disabatino@hometownschools.org,robert.doran@hometownschools.org,terri.dsida@hometownschools.org,paula.eramo@hometownschools.org,julie.needle@hometownschools.org,marisa.oconnell@hometownschools.org,christopher.oneil@hometownschools.org}

Now I'd like to make one List with both username and email address. Here is a typical use of Thread, where using the abbreviated operator for List, instead of Thread[List[userNames2,emailAddresses]] makes the code simple to understand.

In[233]:= namesAndEmails=Thread[{userNames2,emailAddresses}]

Out[233]= {{dbazan,dorothy.bazan@hometownschools.org},{tbleakney,tom.bleakney@hometownschools.org},{mcalorio,marie.calorio@hometownschools.org},{bcardoso,brian.cardoso@hometownschools.org},{ccarpenito,courtney.carpenito@hometownschools.org},{jcarton,jacqueline.carton@hometownschools.org},{tcoffey,tara.coffey@hometownschools.org},{tconaty,thomas.conaty@hometownschools.org},{jdavenport,joseph.davenport@hometownschools.org},{jdisabatino,jennifer.disabatino@hometownschools.org},{rdoran,robert.doran@hometownschools.org},{tdsida,terri.dsida@hometownschools.org},{peramo,paula.eramo@hometownschools.org},{jneedle,julie.needle@hometownschools.org},{moconnell,marisa.oconnell@hometownschools.org},{coneil,christopher.oneil@hometownschools.org}}

Finally I'll Export the List to a file in my Documents directory. It is safest to use FileNameJoin to compose a filename, since it will catch syntax errors as well as make sure it's in the correct format for your operating system.

In[234]:= Export[FileNameJoin@{$UserDocumentsDirectory,"HomeTownUserNames.xlsx"}, namesAndEmails]

Out[234]= C:\Users\kwcarlso\Documents\HomeTownUserNames.xlsx

Sunday, March 9, 2014

Example of a File of Functions to Pre-Load at Startup

Here is a portion of my file of functions that I load on startup so they're available in every session. See also Load Functions at Startup Using Initialization Files.

Functions to Pre-Load


String Processing

Cleaners

trimWhiteSpace::usage="trimWhiteSpace is a simple stringtrim function that does a better job than StringTrim since it removes all WhiteSpaceCharacters from beginning, end or middle of Strings. It is Listable and is intended to Map over all fields in all records where all fields are Strings, no matter if they are wrapped in Lists at any level."

Clear@trimWhiteSpace;
SetAttributes[trimWhiteSpace,Listable];
trimWhiteSpace@stringFile_:=StringReplace[stringFile,WhitespaceCharacter->""];

cleanseText::usage="cleanseText is a moderately thorough stringtrim function. The regular expression removes whitespace characters from the beginning and end of the string, including non-breaking spaces. StringReplace removes apostrophes and commas. It is Listable and intended to automatically Map over all fields in all records where all fields are Strings, no matter if they are wrapped in Lists at any level. These rules should be expanded to include more data cleansing functions as needed."

Clear@cleanseText;
SetAttributes[cleanseText,Listable];
cleanseText@text_String:=StringReplace[text,RegularExpression["^(\\s)*|(\\s)*$"]->""]//StringReplace[#,{"'"->"",WhitespaceCharacter->" ","("|")"->""}]&;

Web Page Functions


Make a URL from a String

makeURL::usage="makeURL takes a String to use as the prefix of the URL, cleans it up (removes whitespace and non-breaking spaces (which are often invisible), puts the String in lower case, replaces single spaces with hyphens and finally appends '.html'. Example: 'My Home Page' -> my-home-page.html".

Clear@makeURL;
SetAttributes[makeURL,Listable];
makeURL@urlName_String:=ToLowerCase@cleanseText@urlName//StringReplace[#,Whitespace->"-"]&//StringReplace[#,"---"->"-"]&//FileNameJoin[{#<>".html"}]&;

Load Functions at Startup Using Initialization Files

See also Example of a File of Functions to Pre-Load at Startup.

Mathematica forgets everything that has happened in a session, such as variable assignments and function definitions, when you quit the kernel. But you can use init.m files to automatically load anything from simple functions, to arbitrarily complex programs, or even external program calls that you want it to know or do on startup. When you accumulate functions that you want to always be available, you can save them in a text file or Package and load them on startup.

There are init.m files in a number of locations—the two intended for user use are in your $UserBaseDirectory /Kernel and /FrontEnd folders. Kernel/init.m will run when you start the Kernel while FrontEnd/init.m will run when you first open a Notebook during a Mathematica session. I recommend using the Kernal init.m since the Front End init.m typically has a lot of stuff in it and since you may Quit the Kernel during a session and want to re-load what's in your init.m file when you re-start it.

First, locate your user init.m files. "Environment variables", which are parameters set for your local environment, i.e. your computer and user account, are one category of $commands. $UserBaseDirectory is for your personal use.

In[1]:= $UserBaseDirectory

Out[1]= "C:\\Users\\kwcarlso\\AppData\\Roaming\\Mathematica"

We can search for any init.m file in this Directory. Note the inclusion of Infinity or the search would leave out sub-directories.

In[2]:= initFiles = FileNames["init.m", $UserBaseDirectory, Infinity]

Out[2]= {"C:\\Users\\kwcarlso\\AppData\\Roaming\\Mathematica\\FrontEnd\\init.m", \
"C:\\Users\\kwcarlso\\AppData\\Roaming\\Mathematica\\Kernel\\init.m"}

Let's take a quick look at the contents of the Kernel init.m using FilePrint, which displays the contents of a text file like a text editor without executing any functions. Here is my init.m files after I modified it. You can see it will Print a Message saying hello and telling me its location, then load some functions that I want available in any Session.

In[3]:= FilePrint@Last@initFiles

(** User Mathematica initialization file **)

Print@"Hello World from your kernel init.m file! I'm located in:\n
C:\\Users\\kwcarlso\\AppData\\Roaming\\Mathematica\\Kernel."

<<"C:\\Users\\kwcarlso\\Dropbox\\Mathematica\\Initialization\\Pre-Load Functions.txt"(** User Mathematica initialization file **)

Print@"Hello World from your kernel init.m file! I'm located in:\n
C:\\Users\\kwcarlso\\AppData\\Roaming\\Mathematica\\Kernel."

<<"C:\\Users\\kwcarlso\\Dropbox\\Mathematica\\Initialization\\Pre-Load Functions.txt"


To set this up similarly for yourself:

  1. Locate your Kernel/init.m file as shown above.
  2. Copy and paste its filepath into your file manager (e.g. Windows Explorer) or otherwise navigate there.
  3. Open the init.m file with a text editor (e.g. Notepad).
  4. Add the Print statement with your init.m location.
  5. Add the filepath to your pre-load functions file.

The easiest way to prepare functions for automatic re-use is to save them in a text file. Two other methods are using Initialization Cells or Packages. Functions stored in a text file or a Package are loaded with Get (<<), which does execute them as it reads them in. Initialization Cells (menu: Cell/Cell Properties/Initialization Cell or /Initialization Group) are the most common method used by beginners.

Saving functions in a text file is a good intermediate step for beginners who aren't ready to create Packages. Just take care to start your function names with small letters so you don't accidentally use the name of a built-in function, and Clear your definitions before defining them.

Clear@functionName; functionName[parameter1_,parameter2_,...]:=definition

Appendix: Locations of All Users' init.m Files


You can execute this cell and see where the initialization directories for all users are located:

In[4]:= {$BaseDirectory,$UserBaseDirectory}//TableForm

Out[4]
C:\ProgramData\Mathematica
C:\Users\kwcarlso\AppData\Roaming\Mathematica

There are four possible init.m files to modify, depending on whether you want to initialize on Kernel or Front End startup, and for all users or just the user who is logged in.