| |
| ------------------------------------------------------------------------------- |
| This document is too incomplete to be of much use. |
| Patches are welcome! |
| |
| |
| Theory of operation |
| ------------------- |
| |
| Uncrustify goes through several steps to reformat code. |
| The first step, parsing, is the most complex and important. |
| |
| |
| Step 1 - Tokenize |
| ----------------- |
| C code must be understood to some degree to be able to be properly indented. |
| The parsing step reads in a text buffer and breaks it into chunks and puts |
| those chunks in a list. |
| |
| When a chunk is parsed, the original column and line are recorded. |
| |
| These are the chunks that are parsed: |
| - punctuators |
| - numbers |
| - words (keywords, variables, etc) |
| - comments |
| - strings |
| - whitespace |
| - preprocessors |
| |
| See token_enum.h for a complete list. |
| See punctuators.cpp and keywords.cpp for examples of how they are used. |
| |
| In the code, chunk types are prefixed with 'CT_'. |
| The CT_WORD token is changed into a more specific token using the lookup table |
| in keywords.cpp |
| |
| |
| Step 2 - Tokenize Cleanup |
| ------------------------- |
| |
| The second step is to change the token type for certain constructs that need |
| to be adjusted early on. |
| For example, the '<' token can be either a CT_COMPARE or CT_ANGLE_OPEN. |
| Both are handled very differently. |
| If a CT_WORD follows CT_ENUM/CT_STRUCT/CT_UNION, then it is marked as a CT_TYPE. |
| Basically, anything that doesn't depend on the nesting level can be done at this |
| stage. |
| |
| |
| Step 3 - Brace Cleanup |
| ------------------------- |
| |
| This is possibly the most difficult step. |
| do/if/else/for/switch/while bodies are examined and virtual braces are added. |
| Brace parent types are set. |
| Statement start and expression starts are labeled. |
| And #ifdef constructs are handled. |
| |
| This step determines the levels (brace_level, level, & pp_level). |
| |
| REVISIT: |
| The code in brace_cleanup.cpp needs to be reworked to take advantage of being |
| able to scan forward and backward. The original code was going to be merged |
| into tokenize.cpp, but that was WAY too complex. |
| |
| |
| Step 4 - Fix Symbols (combine.cpp) |
| ---------------------------------- |
| |
| This step is no longer properly named. |
| In the original design, neighboring chunks were to be combined into longer |
| chunks. This proved to be a silly idea. But the name of the file stuck. |
| |
| This is where most of the interesting identification stuff goes on. |
| Colons type are detected, variables are marked, functions are labeled, etc. |
| Also, all the punctuators are classified. Ie, CT_MINUS become CT_NEG or CT_ARITH. |
| |
| - Types are marked. |
| - Functions are marked. |
| - Parenthesis and braces are marked where appropriate. |
| - finds and marks casts |
| - finds and marks variable definitions (for aligning) |
| - finds and marks assignments that may be aligned |
| - changes CT_INCDEC_AFTER to CT_INCDEC_BEFORE |
| - changes CT_STAR to either CT_PTR_TYPE, CT_DEREF or CT_ARITH |
| - changes CT_MINUS to either CT_NEG or CT_ARITH |
| - changes CT_PLUS and CT_ADDR to CT_ARITH, if needed |
| - other stuff? |
| |
| |
| Casts |
| ----- |
| Casts are detected as follows: |
| - paren pair not part of if/for/etc nor part of a function |
| - contains only CT_QUALIFIER, CT_TYPE, '*', and no more than one CT_WORD |
| - is not followed by CT_ARITH |
| |
| Tough cases: |
| (foo) * bar; |
| |
| If uncertain about a cast like this: (foo_t), some simple rules are applied. |
| If the word ends in '_t', it is a cast, unless followed by '+'. |
| If the word is all caps (FOO), it is a cast. |
| If you use custom types (very likely) that aren't detected properly (unlikely), |
| the add them to the config file like so: (example Using C-Sharp types) |
| type UInt32 UInt16 UInt8 Byte |
| type Int32 Int16 Int8 |
| |
| |
| Step 6+ Everything else |
| ------------------------- |
| |
| From this point on, many filters are run on the chunk list to change the |
| token columns. |
| |
| indent.cpp sets the left-most column. |
| align.cpp set the column for individual chunks. |
| space.cpp sets the spacing between chunks. |
| Others insert newlines, change token position, etc. |
| |
| |
| Last Step - Output |
| ------------------------- |
| |
| At the final step the list is printed to the output. |
| Everything except comments are printed as-is. |
| Comments are reformatted in the output stage. |
| |