+
+

Lab 2 : Extracting Data from Unstructured Documents Using Einstein for IDP

Overview

Now that you learned to extract data from a document using NLP and Machine Learning, lets see what Generative AI can do.

This lab will guide you through the process of using Einstein for Intelligent Document Processing (IDP) to extract specific data from unstructured documents such as contracts, handwritten notes, or receipts. You will also learn how to refine extraction methods to improve accuracy through prompt engineering.

Step 1: Create a New Document Action

Logoff of existing Anypoint user.

  1. Log into Anypoint Platform with the Einstein credentials provided.

  2. In the Anypoint Home menu, under the Automation section, click on the Get Started button to access the Intelligent Document Processing tool.

    module g  idp access
  3. Click on the Create New button to start creating a new IDP Document Action.

    module g  idp createnew
  4. A configuration window will open where you can configure the new Document Action.

  5. Select the document type as Generic.

    module g  idp lab2 generic
  6. Enter the following details:

    • Name: - NTO Einstein

    • Description: - NTO Einstein

  7. Click the Create button.

    module g  new doc act gen fields

Step 2: Craft Your Initial Prompt - Document Summary

This view helps to configure, review, and test the new Document Action before publishing it later.

There are three main areas:

module g  idp lab2 generic parts
  1. On the right-hand side in the Configurator section, click "Add New".

  2. Enter the following details:

    • Name: summarize

    • Instructions: Summarize this document in a couple of sentences.

  3. Upload the Service Contract provided.

  4. Click the Save button, then click the Run button to test the extraction.

  5. Review the results.

Step 3: Extract Specific Information

  1. Click Add New under the Outputs section on the right-hand side.

  2. Enter the following details:

    • Name: simple_prompt

    • Instructions: Extract the start date, the names of the contracting parties, and the payment amount from the attached document.

  3. Upload the Service Contract provided.

  4. Click the Save button, then click the Run button to test the extraction.

  5. Review the results, including the formatting and the data.

Step 4: Refine the Prompt

  1. Click Add New under the Outputs section.

  2. Enter the following details:

    • Name: medium_prompt

    • Instructions:

      Provide the following details:
      1. Contract start date
      2. Full names of both parties
      3. Total payment amount.
      Output the response in JSON format.
  3. Click Save and Run again to view the outputs.

  4. Compare the new results with the previous extraction.

Step 5: Add More Context and Logic

  1. Click Add New under the Outputs section.

  2. Enter the following details:

    • Name: advanced_prompt

    • Instructions: Extract the payment amount in USD if the document contains the term 'invoice' and provide the date only if it is in the future. Provide the response in JSON format. Format the dollar amount in USD and the date as MM/DD/YYYY.

  3. Click Save and Run again to view the outputs.

  4. Compare the new results with the previous extraction.

This lab will help you learn how to create, configure, and refine prompts to extract data from unstructured documents using Einstein for IDP. You will also understand how to adjust prompts for better accuracy, add logic, and output the results in structured formats like JSON.

Submit your feedback!
Share your thoughts to help us build the best workshop experience for you!
Take our latest survey!