What is SMILES?

SMILES is a way of representing chemical structures using (fairly) short ASCII strings. It stands for Simplified Molecular Input Line Entry System and was developed as a way for computer systems to store chemical structure information in a database.

Many SMILES strings may map to the same molecule ... for example, CCO, OCC, and C(O)C all represent the same molecule (ethanol), and it gets even more complicated for highly branched compounds or those which incorporate ring structures. For this reason it's not possible to directly compare two SMILES strings to see if they're the same molecule or not; the SMILES strings must be converted into graph representations and then reduced back to SMILES using a deterministic algorithm that always produces the same SMILES string for a given graph no matter the ordering of the individual atoms (nodes). There are several such algorithms, and these produce what is usually called "Canonical SMILES", though the "canonicity" is usually only for that particular algorithm; there's no overall "official" canonical SMILES string for any given molecule.

There are extensions to SMILES called SMARTS (which allows "wildcards" in the string) and SMIRKS (which is allows chemical reactions to be represented).
SMILES is a way of representing chemical structures using (fairly) short ASCII strings. It stands for Simplified Molecular Input Line Entry System and was developed as a way for computer systems to store chemical structure information in a database.

Many SMILES strings may map to the same molecule ... for example, CCO, OCC, and C(O)C all represent the same molecule (ethanol), and it gets even more complicated for highly branched compounds or those which incorporate ring structures. For this reason it's not possible to directly compare two SMILES strings to see if they're the same molecule or not; the SMILES strings must be converted into graph representations and then reduced back to SMILES using a deterministic algorithm that always produces the same SMILES string for a given graph no matter the ordering of the individual atoms (nodes). There are several such algorithms, and these produce what is usually called "Canonical SMILES", though the "canonicity" is usually only for that particular algorithm; there's no overall "official" canonical SMILES string for any given molecule.

There are extensions to SMILES called SMARTS (which allows "wildcards" in the string) and SMIRKS (which is allows chemical reactions to be represented).