+
+

Lab 3: Implementation - Extract Data

Prerequisites

Before starting

If you have not completed the previous Module/Lab then you should import the solution into your RPA Builder to ensure that you begin this Lab from the right place.

Importing previous Lab solution

You should have the Google Drive link sent to you along with the credentials and other workshop material. Such drive contains the solution for all Labs. If you don’t have it, ask the instructor.

  1. From the Windows box, download the Module_1_Lab_2_completed.crpa file referred to the solution of the previous Lab.

  2. Open RPA Builder and open the project you want to overwrite with the Lab solution. Make sure you open YOUR project not to overwrite other’s.

    module01 lab03 001
  3. Once opened, click the Import Project option under the File menu:

    module01 lab03 002
  4. Click Yes in the confirmation dialog, locate your just downloaded .crpa file and click Open to import it.

You can now continue with the Lab.

Step 1: Extract Data Workflow Intialization

  1. Double click on the Extract Data activity to open its properties window below.

    First thing, include all the following existing Activity Parameters into the activity:

    module01 lab03 007
    module01 lab03 008

Step 2: Extract Data from Invoice file

There are many ways to extract data (OCR) from a document with MuleSoft RPA:

  • If the document is a PDF file and contains raw text, you can use the Read PDF step

  • Leverage the built-in OCR capabilities:

    • AI OCR (file based): apply the OCR to a whole file (must be an image file like JPG or PNG) or to a certain section of it

    • AI OCR (screen based): apply the OCR to anything that is displayed on the screen. It could be a opened document, a web page, an application label or menu, etc.

  • Use existing integration with AWS Textract: Amazon Textract is a service that automatically extracts text, handwriting, and data from scanned documents. MuleSoft RPA comes with an OOTB integration with AWS Textract services but you must purchase the service to AWS yourself:

module01 lab03 009

For this Module, we will use the AI OCR (file based) capability.

  1. We are going to extract the data from attached file that came with the email. This invoice is in image format (PNG file).

    In order to make everything more readable, let’s create a group of steps for the data that we will retrieve (Invoice Info).

    Drag and drop one Group step and name it as Extract Invoice information:

    module01 lab03 018

    The Group step doesn’t have any result in the bot execution, it gives you a clearer overview when designing your Workflows.

  2. Drag and drop an AI OCR (file-based) step into the just created group:

    module01 lab03 061

    Open it to set its configuration. Name it as Get Invoice Details.

    As the OCR is file based we need to provide the path and file name the OCR will be performed to. Remember that such file is the attachment that was extracted in the previous Activity and saved in the attachments temporary folder.

    Set the Directory Path pinned to Activity Parameters.attachmentsTemporaryFolder and File name to Activity Parameters.attachmentFilename.

    Note: Remember that these two activity parameters have a default value, pointing to the Learners_Repository\Student Files folder, that you should have unzipped previously, and the Automation Workshop Invoice.png file.

  3. We don’t want to extract all the information contained in the invoice (image file). The AI OCR (file-based) step allows you to set a scan area where the data will be extracted from.

    Click on the Load design-time file button and select the Automation Workshop Invoice.png file located in the Learners_Repository\Student Files folder.

    module01 lab03 065
  4. Click the Define scan area button to set the are data will be extracted from and select the are as shown below:

    module01 lab03 062

    For better results, make sure to set the Scaling factor to x2.5:

    module01 lab03 063

    Click Save to continue and Test to test the settings. You should get the following results:

    module01 lab03 064

    The exact data extracted should be:

    100↲
    09.05.2022↲
    4559-658↲
    24.05.2022

    Take note of the exact text and format that will be extracted by the step, noticing that it will be a single string that also includes some CRLF ( new lines)!

    Click OK to close the AI OCR (file-based) wizard window.

  5. Now, we need to handle the result of the OCR extraction. As seen in the Test above, the OCR activity returns a single string containing the four fields that we need. These fields are separated by a CRLF (carriage return + line feed pair usual in Windows Platforms). In order to get each field separately, we need to split the single string obtained into 4 ones, using the CRLF as the separator.

    Drag and drop a String to Array step just after the AI OCR (file-based) one and name it as Convert to Array - Invoice.

    For the Input String pin it to Get Invoice Details.Recognized Text and for the Separator pin it to Activity Parameters.invoiceSeparator:

    module01 lab03 066

    This step converts a single string into an array of elements based on the separator used. On this case, we’re using the CRLF one.

  6. Now, we can access to each field (string line) as it is now an element in an array.

    Drag and drop a Read from Array and a Set Variable steps:

    module01 lab03 067
  7. Open the Read from Array step, name it as Get Invoice Number, ìn the Array to read from to Convert to Array - Invoice.ResultAsArray and set the Index to read at to 1:

    module01 lab03 068

    Click OK to save.

  8. Open the Set Variable step, name it as Set Invoice Number, pin the Value to Get Invoice Number.AsString and pin the variable to Activity Parameters.invoiceNumber:

    module01 lab03 069
  9. We must repeat the last two steps per invoice field left. But as this is a tedious task, we’re going to use the Template feature in RPA Builder to ease your work. Templates allow developers to reuse excerpts of RPA activities (one or multiple steps) built previously in other processes. This helps us to follow MuleSoft’s philosophy about reusability.

    If you would like to learn more about how to create them please, review this link.

    Two templates have been prepared for you to import. Follow the steps below to import them into MuleSoft RPA Builder.

  10. In MuleSoft RPA Builder go to Tools/Templates Manager

    module01 lab03 046

    Click on the Import button

    module01 lab03 047

    From the Student Files\Module_1_Saved_Templates folder from the Google Drive shared with you, select the Insert Data.tptx file and click OK to import it.

    Repeat the same operation for the Invoice Details.tptx file.

    module01 lab03 048

    You can click on any of the templates to see what they contain:

    module01 lab03 049

    Click Cancel to close the Templates Manager window.

  11. Once imported, you can make use of these templates in your activities from the Toolbox (User Templates section):

    module01 lab03 050
  12. Drad and drop the Invoide Details template step just after the existing Extract Invoice information group.

    You’ll notice that a new Extract Invoice information group has been added with the extra steps inside:

    module01 lab03 070
  13. We might leave this as-is, but let’s move the new added steps inside the first group. Just select all of them and move them at the end of the first group:

    module01 lab03 071
  14. Remove the empty group

  15. We still need to re-attach the variables to the steps. On each Get…​ and Set…​ steps relink them as follows:

    Activity Setting Value

    Get Invoice Date

    Array to read from

    Convert to Array - Invoice.ResultAsArray

    Set Invoice Date

    Variable

    Activity Parameters.invoiceDate

    Get Invoice Purchase Order

    Array to read from

    Convert to Array - Invoice.ResultAsArray

    Set Invoice Purchase Order

    Variable

    Activity Parameters.invoicePurchaseOrder

    Get Invoice Due Date

    Array to read from

    Convert to Array - Invoice.ResultAsArray

    Set Invoice Due Date

    Variable

    Activity Parameters.invoiceDueDate

Step 3: (Optional) Add Debug Logic

We have completed this Lab, however even if you run it for testing, you only see the bot doing things but no result whatsoever.

Full debugging capabilities, including breakpoints, are covered in Module 2. But we have some tools to check what’s going on during the development, similar to the usual System.out.println statements we usually include in our code.

MuleSoft RPA provides the Message Box step that can be added wherever the developer wants. It basically displays a modal popup window with any text. Very convenient to show the result of one or multiple steps at a certain point of the execution of an activity.

For this lab, let’s add a Message Box step at the very end of the activity, that displays the values of all the variables we have been setting before.

The Message Box step only supports the link to a single variable as the displayed text. We may add as many Message Box steps as existing variables, or even better, we could build a single string that contains the value of all of the existing variables we want to display.

  1. To do so, we will first drag and drop a Combine Strings step into the Workflow run succeeded section and the bottom:

    module01 lab03 041

    You can easily guess that the steps included in the section Workflow run succeeded are executed when the main activity steps finish with no errors, steps in the Workflow run failed section when any unhandled error occurr and steps in the Common finalization handling are always executed.

  2. Combine Strings step allows you to concatenate up to 20 external variables into a single string. Set the pattern as follows:

    Invoice Number:{1}{@CRLF}Date:{2}{@CRLF}Purchase Order:{3}{@CRLF}Due Date:{4}

    Click OK and then click its pin button to bind the parameters ({n}) to the actual variables as follows:

    Parameter Variable

    {1}

    Activity Parameters.invoiceNumber

    {2}

    Activity Parameters.invoiceDate

    {3}

    Activity Parameters.invoicePurchaseOrder

    {4}

    Activity Parameters.invoiceDueDate

  3. Finally, drag and drop a Message Box step and set it up as follows:

    • Title: DEBUG

    • Text: Combine Strings.Combined String

  4. Now, you can run the whole activity and see the actual results.

    Click the play button on the Extract Data opened activity:

    module01 lab03 042

    You will see how the bot does its things and at the end, a Message Box is showed with the result of the execution:

    module01 lab03 043
    Note on Message Box

    Message Box step is always executed and its default timeout to close itself is 10 seconds. You can change such timeout or even remove it so that the Message Box will remain on the screen and the execution of the activity flow will stop there.

    Take this into account when rolling out a process to a TEST or PRODUCTION phase. Make sure to either disable, remove or set a short timeout value to all Message Box steps you may have in your activities.

    Another best practice is to wrap any Message Box activity with a Select Case step and run it based on the value of a condition (i.e. a Debug boolean Activity Parameter). When you deploy an Automation Process to a TEST or PRODUCTION phase, you can always set such condition to false in order to make sure the Message Box steps don’t get executed.

Submit your feedback!
Share your thoughts to help us build the best workshop experience for you!
Take our latest survey!