Skip to main content

Misspellings of Words

Problem: The OCR model sometimes misreads characters, especially in certain fonts or noisy images. This can result in words being misclassified or misspelled, which then causes the automation to fail when it searches for exact matches. Example:

โœ… Expected Behavior
โตŠ Text is correctly spelled:
โœ… Hallo โœ…
๐Ÿ‘ Works with click().text("Hallo")

โŒ Actual Issue
โตŠ Text is misspelled
โŒ HaII0 โŒ
๐Ÿ‘Ž Canโ€™t find click().text("Hallo"). Because of recognition issues. (l->Iand o -> 0)

Solutions

You can directly correct OCR predictions and improve OCR model accuracy by training your workspace-specific model.Steps:
  1. Start the AskUI shell:
    askui-shell
    
  2. Launch the OCR Teaching App:
    AskUI-StartOCRTeaching
    
  3. Upload a screenshot containing the misclassified word (e.g., โ€œHalloโ€).
  4. Switch to Trained Model for precise corrections.
  5. Select the wrongly detected word (HaII0) and replace it with the correct label: Hallo.
  6. Press the Train Correction
  7. Click โ€œCopy Modelโ€ to copy the newly trained model ID.
  8. In your automation code, update on model config on global level or on step level to use the new model:
Global Level Model Composition
    with VisionAgent(model= {
        "locate": ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspace-id>",
                tags=["trained"]
            )])
    }) as agent:
Step Level Model Composition
    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspce-id>",
                tags=["trained"]
            )]))

Text Detection Issues

1. Icon Text Merging

Problem: Sometimes, Text Detector/annotation tool, merges an icon and texts into one, even though they look merged on screen. Example: Say you want to click just the name โ€œAlice Johnsonโ€ field or just the position field in a interface - but OCR detects them as one long string:

โœ… Expected Behavior
๐Ÿ–ผ๏ธ Icon and Text are detected separately:
Icon and text detected separately
๐Ÿง‘ โœ… Name โœ… ๐Ÿค– โœ… Role โœ…
๐Ÿ‘ Works with click().text("Name") or click().text("Name")

โŒ Actual Issue
๐Ÿ–ผ๏ธ Icon and text are detected together:
Icon and text merged together
๐Ÿง‘ Name โŒ ๐Ÿค– Role โœ…
๐Ÿ‘Ž Canโ€™t find click().text("Name").

Solution

You can ignore train the OCR Recognition model to ignore the OCR detection error.Steps:
  1. Start the AskUI shell:
    askui-shell
    
  2. Launch the OCR Teaching App:
    AskUI-StartOCRTeaching
    
  3. Upload a screenshot containing the misclassified word (e.g., โ€œHalloโ€).
  4. Switch to Trained Model for precise corrections.
  5. Select the wrongly detected word (HaII0) and replace it with the correct label: Hallo.
  6. Press the Train Correction
  7. Click โ€œCopy Modelโ€ to copy the newly trained model ID.
  8. In your automation code, update on model config on global level or on step level to use the new model:
Global Level Model Composition
    with VisionAgent(model= {
        "locate": ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspace-id>",
                tags=["trained"]
            )])
    }) as agent:
Step Level Model Composition
    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspce-id>",
                tags=["trained"]
            )]))
    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspce-id>",
                tags=["word_level"]
            )]))

2. Merged Texts

Problem: Sometimes, Text Detector/ annotation tool, merges two separate texts into one, even though they look clearly split on screen. Example: Say you want to click just the name โ€œAlice Johnsonโ€ field or just the position field in a interface - but OCR detects them as one long string:

โœ… Expected Behavior
๐Ÿ–ผ๏ธ Text fields detected separately:
Text fields detected separately
Alice Johnson โœ… Software Engineer โœ…
๐Ÿ‘ Works with text("Alice Johnson") or text("Software Engineer")

โŒ Actual Issue
๐Ÿ–ผ๏ธ Texts merged into one block:
Texts merged into one block
Alice Johnson Software EngineerโŒ
๐Ÿ‘Ž Canโ€™t find either one on its own.

Solutions

    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="00000000_0000_0000_0000_000000000000",
                tags=["word_level"]
            )]))
    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspce-id>",
                tags=["word_level"]
            )]))
This command show how you can use an anchore element move the mouse over another element.
  await aui.moveMouseRelativeTo(100, 0).containsText("Name").exec()

3.Text Separation

Problem: Sometimes, Text Detector/ annotation tool, septerates a text into two texts, even though they look clearly merged on screen. Example: Say you want to click just the name โ€œAlice Johnsonโ€ field or just the position field in a interface - but OCR detects them as two words:

โœ… Expected Behavior
๐Ÿ–ผ๏ธ Words are detected as one sentence:
Words detected as one sentence
Alice Johnson โœ…
๐Ÿ‘ Works with text("Alice Johnson")

โŒ Actual Issue
๐Ÿ–ผ๏ธ Words are detected as separated texts:
Words detected separately
AliceโŒ JohnsonโŒ
๐Ÿ‘Ž Canโ€™t find either text("Alice Johnson") on its own.

Solution

    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="00000000_0000_0000_0000_000000000000",
                tags=["word_level"]
            )]))
    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspce-id>",
                tags=["word_level"]
            )]))

4. Vertical Text Merging

Problem: Sometimes, Text Detector/ annotation tool, merges two lines to one text, even though they look clearly as two lines on screen. Example: Say you want to click just the name โ€œAlice Johnsonโ€ field or just the position field in a interface - but OCR detects them as one:

โœ… Expected Behavior
๐Ÿ–ผ๏ธ Texts are detected as two lines:
Texts detected as two lines
Alice Johnson โœ…
๐Ÿ‘ Works with text("Alice Johnson")

โŒ Actual Issue
๐Ÿ–ผ๏ธ Texts are detected as one text:
Texts merged vertically
<no words recognized>โŒ
๐Ÿ‘Ž Canโ€™t find either text("Alice Johnson") on its own.

Solution

    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="00000000_0000_0000_0000_000000000000",
                tags=["word_level"]
            )]))
    agent.click("Alice Johnson", model=ModelComposition([ModelDefinition(
                task="e2e_ocr",
                architecture="easy_ocr",
                version="1",
                interface="online_learning",
                useCase="<your-workspce-id>",
                tags=["word_level"]
            )]))

5. Single Character not Detected

Problem: Sometimes, Text Detector/ annotation tool, does not detect single charactors, even though they look clearly on screen. Example: Say you want to click **just the character โ€œ2โ€ - but OCR does not detects them:

โœ… Expected Behavior
๐Ÿ–ผ๏ธ Single chars are detected:
Single characters detected
1 โœ… 2 โœ… 3 โœ…
๐Ÿ‘ Works with text("2")

โŒ Actual Issue
๐Ÿ–ผ๏ธ Char 2 is not detected:
Character 2 not detected
1 โœ… 2 โŒ 3 โœ…
๐Ÿ‘Ž Canโ€™t find either text("2") on its own.

Solution

Single characters are sometimes flaky. So itโ€™s better to relay on AI element.Steps:
  1. Open AskUI Shell
askui-shell
  1. Create a new AI Element
# Capture elements from your screen
AskUI-NewAiElement -Name "my-element-name"
  1. Use captured AI Elements in your code:
from askui import locators as loc
...
with VisionAgent() as agent:
    agent.click(loc.AiElement("my-element-name"))
If you cannot use the AskUI-NewAIElement command, activate experimental commands by running AskUI-ImportExperimentalCommands in your terminal.

6. Text not Detected

Problem: Sometimes, for no apparent reason, Text Detector/ annotation tool does not detect a text, even though you can see it clearly on screen. Example: Say you want to click just the name โ€œAlice Johnsonโ€ field - but OCR does not detects the text at all:

โœ… Expected Behavior
๐Ÿ–ผ๏ธ Text was detected:
Text detected properly
Alice Johnson โœ…
๐Ÿ‘ Works with text("Alice Johnson")

โŒ Actual Issue
๐Ÿ–ผ๏ธ Text wasnโ€™t detected
Text not detected
Alice JohnsonโŒ
๐Ÿ‘Ž Canโ€™t find either text("Alice Johnson") on its own.

Solution

In the case the text was not detected you have to use the AI Element.Steps:
  1. Open AskUI Shell
askui-shell
  1. Create a new AI Element
# Capture elements from your screen
AskUI-NewAiElement -Name "my-element-name"
  1. Use captured AI Elements in your code:
from askui import locators as loc
...
with VisionAgent() as agent:
    agent.click(loc.AiElement("my-element-name"))
If you cannot use the AskUI-NewAIElement command, activate experimental commands by running AskUI-ImportExperimentalCommands in your terminal.