hi there and welcome to the future of documents a new series from google cloud to help developers make the most out of their unstructured data with document ai my name is holt skinner i'm a developer advocate for google cloud and i'll be helping you throughout this journey the world's businesses rely heavily on documents to convey information think of all the pdfs emails forms and contracts that you interact with on a daily basis there's an enormous amount of data in these files the problem with this type of data is that it is unstructured or dark data
dark data is information that businesses collect process and store during regular activities but generally fail to use for other purposes in other words businesses are sitting on a document goldmine full of data that could be used to automate processes or gather analytics if it could be extracted into a machine-readable format let's take a look at the ways companies can begin to structure this data there are three methods currently used to extract data from documents first we have manual data entry humans read the documents and then manually enter that data they see into a system this
requires a significant time investment and is prone to mistakes next there are semi-automated solutions applications can parse documents have a fixed layout and then extract the text using optical character recognition or ocr technology this method can be effective but it's limited in the types of documents that can be processed the third option is to analyze documents and extract information using artificial intelligence and machine learning aiml technology has progressed rapidly in the last few years and it's now possible to use it to read documents parse the content and extract valuable information from many different document types
this helps remove much of the data entry toil and can reduce document processing time running these applications in the cloud also allows for flexible scalability as the volume of documents changes document understanding is an incredibly complex field in machine learning because it combines a lot of different techniques and algorithms for example ocr image recognition natural language processing entity extraction machine translation and data loss prevention just to name a few and this is where document ai comes in doc ai is google cloud's managed service to turn your unstructured content into structured data document ai is an
end-to-end cloud-based platform for document processing not only does it read and adjust your documents it understands the spatial structure of the document for example if you run a general form through a form parser it recognizes there are questions and answers in your form and you'll get those back as key value pairs we'll dive into the details in a later video with this data in a structured format you can begin to make it useful maybe you want to run analytics on customer feedback maybe you're processing large multi-page application forms or maybe you're trying to add more
data sources to your dashboards you can easily incorporate document data into your applications simply by calling an api no data science expertise required now let's break down the main categories of document ai there is general doc ai which is designed to work with just about any document you can throw at it this includes ocr a structured form parser and document quality analysis for specialized document ai google has pre-built models for common business document types we have models for standardized forms such as w-2s driver's licenses etc and we have models for high variance document types such
as invoices and receipts google trains and maintains the general and specialized models so you don't have to soon the platform will also offer the ability to build models for your own document types you'll be able to train custom models from scratch or uptrain existing models without having to write any machine learning code in this video we talked about how document ai can help you make sense of your data so that you can go from unstructured or dark data to useful structured data using ai in the next video we'll explore how to start using document ai
processors with the cloud console and api then we can start digitizing and getting insights from our documents if you'd like to get started you can go to the document ai code labs linked in the description for step-by-step guides thanks for tuning in and don't forget to like and subscribe [Music] you