Welcome Engineers if you're aiming to pass your dp600 exam you're in the right place we're diving into case study 1 here are the general instructions for case study that you can pause the video and read let's dive in here's the overview contoso LTI is a us-based health supplements company Koso has two divisions named sales and research the sales division contains two departments named online sales and Retail sales the research division assigns internally developed product lines to individual teams of researchers and analysts their existing environment says that Koso has a Microsoft entra tenant named contoso.com the
tenant contains two groups named research reviewers group one and research reviewers group two coming to data environment Koso has the following data environment the sales division uses a Microsoft power bi Premium capacity the semantic model of the online sales department includes a fact table named orders that uses import made in the system of origin the order ID value represents the sequence in which orders are created the research Department uses an on premises third-party data warehousing product fabric is enabled for contoso.com an Azure data Lake storage Gen 2 storage account named storage 1 contain contains research
division data for a product Line named product line one the data is in the Delta format a data Lake storage Gen 2 storage account named storage 2 contains research division data for a product line named product line 2 the data is in the CSV format here are the requirements in planned changes Koso plans to make the following changes enable support for fabric in the powerbi pre prium capacity used by the sales division make all the data for the sales Division and the Research division available in fabric for the research division create two fabric workspaces named
product line 1 ws and product l 2ws in product line 1 WS create a Lakehouse named Lakehouse 1 in Lakehouse 1 create a shortcut to storage one named research product data analytics requirements cont also identifies the following data analytics requirements all the workspaces for the sales Division and the research division must support all fabric Experiences the research division workspaces must use a dedicated OnDemand capacity that has per minute billing the research division workspaces must be grouped together logically to support one Lake data Hub filtering based on the department name for the research division workspaces the
members of research reviewers group one must be able to read Lakehouse and Warehouse data and shortcuts by using SQL end points for the research division Workspaces the members of research reviewers group 2 must be able to read Lakehouse data by using Lakehouse Explorer all the semantic models and reports for the research division must use Version Control that supports branching data preparation requirements Koso identifies the following data preparation requirements the research division data for product line one must be retrieved from lakeh House 1 by using fabric Notebooks all the research division data in the lake houses
must be presented as managed tables in Lakehouse Explorer semantic model requirements contoso identifies the following requirements for implementing and managing semantic models the number of rows added to the orders table during refreshes must be minimized the semantic models in the research division workspaces must use direct Lake mode General requirements Koso Identifies the following highlevel requirements that must be considered for all solutions follow the principle of least privilege when applicable minimize implementation and maintenance effort when possible Here Comes our question you need to ensure that contoso can use ver control to meet the data analytics requirements
and the general requirements what should you do a store At the semantic models and reports in data Lake Gen 2 storage B modify the settings of the research workspaces to use a GitHub repository C modify the settings of the research division workspaces to use an Azure repost repository D store all the semantic models and reports in Microsoft One Drive please pause the video and take a moment to think about it the answer is option C modify the settings of the research Division workspaces to use an Azure repos repository why because if we look at the
options option A storing semantic models and reports in data Lake Gen 2 storage might provide a storage solution but it does not address the requirement for Version Control option b while GitHub is is a popular version control platform and supports branching it will not integrate seamlessly with contoso as your based Environment option D Microsoft One Drive is primarily a file hosting service and is not well suited for Version Control of complex data sets semantic models and reports and has nothing to do with Version Control therefore the correct answer is C modify the settings of the
research division workspace to use an Azure repos repository also to support our answer let's look at the Microsoft documentation that says currently only Git in Azure repos with the same tenant as the fabric tenant is supported let's move on to the next question you need to recommend a solution to group the research division workspaces what should you include in the recommendation to answer select the appropriate options in the answer area please pause the video and take a moment to think about it the grouping method is domain because in fabric a domain logically groups together all
the data In an organization relevant to a particular area or field you can read more about it in the Microsoft documentation Linked In the description box the tool for this task is the fabric admin portal you can create domains using the admin portal while you might argue that one data Hub can be used it is primarily for viewing domains and underlying artifacts not for creating them since the question asks for a solution to group the research Division Workspaces I would recommend using the fabric admin portal question three you need to refresh the orders table of
the online sales department the solution must meet the semantic model requirements what should you include in the solution a an Azure data Factory pipeline that executes a stored procedure activity to retrieve the maximum value of the order ID column in the destination Lakehouse B An Azure data Factory pipeline that executes a stored procedure activity to retrieve the minimum value of the order ID column in the destination Lakehouse C an Azure data Factory pipeline that executes a data flow to retrieve the minimum value of the order ID column in the destination Lakehouse D an Azure data
Factory pipeline that executes a data flow to retrieve the maximum value of the order ID column in the destination Lakehouse the correct answer is D firstly options B and C talk about the minimum value so those two options are ruled out in option A A stored procedure does not return a value so the correct answer is D there is a detailed explanation about incremental load in the Microsoft documentation which I will link in the description box for your reference now let's move on to the last question for this case study which syntax should you use
in a notebook to Access the research division data for product line one please pause the video and take a moment to think about it the correct answer is option b because it has the correct syntax here is the syntax from the documentation question 6 before you need to recommend which type of fabric capacity SKU meets the data analytics requirements for the research division what should you recommend a a b e m c p d f the correct answer is option d f we All know that the fabric SKU start with f and all other options
are incorrect question 65 you need to migrate the research division data for product line one the solution must meet the data preparation requirements how should you complete the code to answer select the appropriate options in the answer area the correct answers are Delta for the format and table's product line for the save location why choose Delta over CSV or par CSV this format Isn't suitable for our needs because it doesn't support creating managed tables which are crucial for maintaining metadata and data life cycle within a Lakehouse parket while par does support managed tables and could
technically be used it doesn't align with the specific requirement of this case study the need to query data using SQL endpoints Delta format supports this capability seamlessly making it the Preferred Choice why the specific path tables product line the path tables product line one is correct because it directly points to where we want our data stored within the lakeh House's structured directory system without needing additional subdirectories it simplifies access and management additional note on usage when using the save EST table as method you do not need to specify the tables prefix As spark handles this
internally when managing metadata for the table using save the full path needs to be explicitly defined if you're not using save a stable question 66 what should you use to implement calculation groups for the research division semantic models a Microsoft powerbi desktop B the powerbi service C Dax Studio D tabular editor option A B and D can all be correct answers because you can create Calculation Group by either Microsoft powerbi desktop the powerbi service or tabular editor since the question asks to pick one I would select tabular editor since it is easy to create calculation
group in it compared to others tools according to the Microsoft Community question 67 which workspace role assignments should you recommend for research reviewers group one and research reviewers group two to answer select the appropriate Options in the answer area research reviewers group one requires viewer access because according to the requirements they need to be able to to read Lakehouse and Warehouse data and access shortcuts via SQL endpoints research reviewers group 2 needs contributor access as they are required to read Lakehouse data through the Lakehouse Explorer for a better understanding of roles within Microsoft fabric workspaces
I recommend you to review the workspace roles documentation prior to your exam we're diving into case study 2 lwar Inc is a Manu ufacturing company that has offices throughout North America the analytics team at lwar contains data Engineers analytics Engineers data analysts and data scientists in their existing environment lwar has been using a Microsoft power bi tenant for 3 years lwar has not enabled any fabric capacities and Features in the available data lwar has data that must be analyzed as shown in the following table and the product data contains a single table and the following
columns the customer satisfaction data contains the survey question and response tables for each survey submitted the following occurs one row is added to the survey table one row is added to the response table for each question in the survey the question table contains the Text of each survey question the third question in each survey response is an overall satisfaction score customers can submit a survey after each purchase user problems the analytics team has large volumes of data some of which is semi-structured the team wants to use fabric to create a new data store product data
is often classified into three pricing groups high medium and low this logic is implemented in several Databases and semantic models but the logic does not always match across implementations under requir re Ms we have planned changes where lwar plans to enable fabric features in the existing tenant the analytics team will create a new data store as a proof of concept po the remaining lwar users will only get access to the fabric features once the PC is complete the POC will be completed by using a fabric trial capacity the following three workspaces Will be created one
Analytics PC will contain the data store semantic models reports pipelines data flow and notebooks used to populate the data store two data engineering POC will contain all the pipelines data flows and notebooks used to populate one Lake three data science POC will contain all the notebooks and reports created by the data scientists the following will be created in the analytics POC workspace space a Data store type to be decided a custom semantic model a default semantic model requirement for the interactive reports are the data Engineers will create data pipelines to load data to one Lake
either hourly or daily depending on the data source the analytics Engineers will create processes to ingest transform and load the data to the data store in the analytics po workspace daily whenever possible possible the Data Engineers will use low code tools for data ingestion the choice of which data cleansing and transformation tools to use will be at the data engineer discretion all the semantic models and reports in the analytics POC workspace will use the data store as the sole data source technical requirements the data store must support the following read access by using tsql or
Python semi-structured and unstructured data Rowl security RLS for users executing Tsql queries files loaded by the data Engineers to one Lake will be stored in the par format and will meet Delta Lake specifications data will be loaded without transformation in one area of the analytics POC data store the data will then be cleansed merged and transformed into a dimensional model the data load process must ensure that the raw and cleanse data is updated completely completely before populating the dimensional model the dimensional Model must contain a date Dimension there is no existing data source for the
date Dimension the lwar fiscal year matches the calendar year the date Dimension must always contain dates from 2010 through the end of the current year the product pricing group logic must be maintained by the analytics engineers in a single location the pricing group data must be made available in the data store door for t-o queries and in the default Semantic model the following logic must be used list prices that are less than or equal to 50 are in the low pricing group list prices that are greater than 50 and less than or equal to 1,000
are in the medium pricing group list prices that are greater than 1,000 are in the high pricing Group Security requirements only fabric administrators and the analytics team must be able to see the fabric items created as part of the POC lwar Identifies the following security requirements for the fabric items in the analytics POC workspace fabric administrators will be the workspace administrators the data Engineers must be able to read from and write to the data store no access must be granted to data sets or reports the analytics Engineers must be able to read from write to
and create schemas in the data store they also must be able to create and share semantic models with the data Analysts and View and modify all reports in the workspace the data scientists must be able to read from the data store but not write to it they will access the data by using a spark notebook the data analysts must have read access to only the dimensional model objects in the data store they also must have access to create powerbi reports by using the semantic models created by the analytics Engineers the date Dimension must be Available
to all users of the data store the principle of least privilege must be followed both the default and custom semantic models must include only tables or views from the dimensional model in the data store lwar already has the following Microsoft entra security groups fabric admins fabric administrators analytics Team all the members of the analytics team data analysts the data analysts on the analytics team data Scientists the data scientists on the analytics team data Engineers the data Engineers on the analytics team analytics Engineers the analytics Engineers on the analytics team report requirements the data analysts must
create a customer satisfaction report that meets the following requirements ments enables a user to select a product to filter customer survey responses to only those who have purchased that product displays the average overall Satisfaction score of all the surveys submitted during the last 12 months up to a selected dat shows data as soon as the data is updated in the data store ensures that the report and the semantic model only contain data from the current and previous year ensures that the report respects any table level security specified in the source data store minimizes the execution
time of report queries here is the question you need to assign permissions for the data store in The analytics POC workspace the solution must meet the security requirements which additional permissions should you assign when you share the data store to answer select the appropriate options in the answer area pause the video take a moment to think it through true and then check out the answer the correct answer for data Engineers is read all Apache spark which is the read all permission this permission is necessary because data Engineers need to read and Write data to the
data store as specified in the security requirements according to Microsoft documentation the readall permission should be provided only if the shared recipient wants complete access to your warehouse's files using the Spark engine for the data analysts it is the build reports on the default data set permission because with the build permission the shared recipient can create reports using the default Semantic model connected to your warehouse and this fulfills the analytics requirement mentioned in the case study the data scientists would need red all Apache spark permission because with this permission the shared recipient has read access
to the underlying paret files in one Lake which can be consumed using spark and the case study mentions that the data scientists will access the data by using spark notebooks you might have thought SQL Endpoints as correct answer as they are read only but it isn't because SQL endpoints do not support spark notebooks the next question is you need to create a Dax measure to calculate the average overall satisfaction score how should you complete the Dax code please take a moment to review the code and verify your answer the correct answer is average survey response
value and period because response value it is the only value that Can be aggregated and the customer key does not make sense as it is merely an identifier and cannot provide a score period is the correct as it is used for defining a date range which is pre-calculated as a variable you need to resolve the issue with the pricing group classification how should you complete the tsql statement take a moment to review the code and think and check the answer the correct answer is view because it meets the Requirement of showing data as soon as
the data is updated in the data store if you thought table because it should be part of the default semantic model you are mistaken within the warehouse a user can add Warehouse objects tables or views to their default powerbi semantic model therefore the correct answer is View and as it displays data immediately after updates in the data store a link is provided in the description to support this answer next correct answer Option is a basic SQL case statement next question is what should you recommend using to ingest the customer data into the data store in
the analytics POC workspace a a stored procedure b a pipeline that contains a kql activity c a spark notebook d a data flow take a moment to think and check the answer the correct answer is D data flow because it is the best option why let's check the other options option A A stored procedure it doesn't make sense To use a stored procedure for loading the data option b this option is incorrect because kql is for real-time data option C a spark notebook although this could work work the text States whenever possible the data Engineers
will use low code tools for data ingestion option D a data flow might not be the optimal solution as we can use copy activity for this since copy activity is not given in the answer options and a data flow can get this job Done and is low code the correct answer is a data flow last question of the case study which type of data store should you recommend in the analytics POC workspace a a data Lake b a warehouse c a lake house D an external Hive metast store take a moment to think and go
back and read the technical requirements if need and then answer the question the correct answer is Lakehouse because Lakehouse supports unstructured structured and semi-structured data and In the technical requirements it says semi-structured and unstructured data which should hint you to choose Lakehouse let's look at other options data Lake could be valid but it is not really a concept that we have in fabric b a warehouse it is not valid because it supports only structured data D an external Hive metast store doesn't make sense as it is a centralized repository that stores Metadata related to the
tables partitions columns schemas and other data structures used by Hive and Spark question 60 you need to recommend a solution to prepare the tenant for the PO which two actions should you recommend performing from the fabric admin portal each correct answer presents part of the solution a enable the users can try Microsoft fabric paid features option for the entire organization B enable the users can try Microsoft fabric paid features option for specific security groups C enable the allow Azure active directory guest users to access Microsoft fabric option for specific security groups D enable the users
can create fabric items option and exclude specific security groups e enable the users can create fabric items option for specific security groups pause the video think about the answer and let's discuss the correct options are B and E let me explain why option A Enabling paid features for the entire organization risks unnecessary exposure and costs since not everyone needs these features for the POC this is why it's not suitable option b is correct because it restricts the trial of paid features to specific groups involved in the POC aligning with the principle of least privilege this
FOC focused approach limits potential risks and concentrates resources where they are most needed option C is Incorrect because allowing Azure active directory guest users access is not relevant to the poc's internal team requirements and introduces unnecessary security risks option D excluding specific security groups from creating fabric items could hinder the progress of team members who need these capabilities for the PO making this approach inefficient option e is also correct it enables specific security groups to create Necessary fabric items ensuring that only authorized Personnel are involved which enhances security and efficiency that's why B and E
are the best choices focusing on security and efficiency question 61 you need to design a semantic model for the customer satisfaction report which data source authentication method and mode should you use to answer select the appropriate option in the answer area the correct answers for the setup involve SSO Authentication and direct query mode Let's explore why these choices are optimal for the scenario authentication method single sign on SSO is the preferred authentication method because it allows users to authenticate using their existing organizational credentials this not only simplifies access across multiple systems but also bolsters security
by reducing the number of separate credentials users need to Manage basic authentication which relies on usernames and passwords doesn't align with the robust security requirements of Enterprise level applications within fabric service principle authentication is generally used for non-interactive applications or automated tools which does not fit the interactive needs of analytics team members who will be working directly with the reports and semantic models thus it is an incorrect Choice mode direct query is chosen because it ensures that the data displayed in reports is as current as possible by querying the data source in real time whenever
the report is accessed while direct lake is a viable option due to its efficiency and support for row-level security it does not guarantee immediate data updates as it may rely on cached data this can be a limitation when the requirements specifies that the data needs to reflect Updates immediately by selecting SSO authentication and direct query mode we ensure both secure and efficient access to the most current data aligning with the Project's security and performance requirements for a deeper understanding I recommend checking out the provided links in the video description for detailed guidance on setting up
these features in fabric question 62 you need to implement the date dimension in the Data store the solution must meet the technical requirements what are two ways to achieve the goal each correct answer presents a complete solution note each correct selection is worth one point a populate the date Dimension table by using a data flow B populate the date Dimension table by using a copy activity in a pipeline C populate the date Dimension view by using tcq D populate the date Dimension table by using a stored procedure activity in A pipeline pause the video and
consider the options before checking the answer the correct answers are option A and option D option a data flows are excellent Tools in fabric for managing data Transformations and manipulations before storage they are particularly Adept at handling date calculations and Transformations making them ideal for creating a date Dimension from scratch Thus it is a correct answer Choice option b while the copy activity is efficient for moving data from one place to another it isn't typically used for generating new data from scratch since there's no existing data source for a date Dimension this is an incorrect
option option C this option is incorrect because it involves a view which does not store data persistently views only display data based on existing tables And cannot serve as Standalone data storage especially not for a date Dimension that requires physical storage for efficient querying and Reporting thus this is also an incorrect Choice option D stored procedures provide a robust method for encapsulating business Logic for data manipulation directly within the SQL environment so it is our correct answer Choice question 63 you need to ensure the data loading activities in the analytics po workspace Are executed in
the appropriate sequence the solution must meet the technical requirements what should you do a create a data flow that has multiple steps and schedule the data flow B create and schedule a spark notebook C create and schedule a spark job definition d create a pipeline that has dependencies between activities and schedule the pipeline pause the video and consider the options before checking the answer the correct Choice here is option D which involves Using a pipeline with dependencies between activities this approach is ideal because it ensures that all data loading activities are executed one after the
other in sequence other options fall short for the following reasons option A suggests creating a a data flow with multiple steps while data flows are great for managing sequential data processing within their scope they lack the comprehensive orchestration capabilities required for handling Dependencies that span across multiple workflows or different types of operations option b involves scheduling a spark notebook spark notebooks are better suited for exploratory data analysis and interactive data manipulation rather than for structured and repeatable data processing tasks required in production environments option C recommends scheduling a spark job definition although this is effective
for specific Tasks spark jobs do not inherently offer a mechanism to manage complex dependencies between various tasks as effectively as a pipeline you have a fabric Warehouse that contains a table named staging sales staging sales contains the following columns you need to write a tsql query that will return data for the year 2023 that displays product ID and product name and has a summarized amount That is higher than 10,000 which query should you use pause the video and look at the answer options and check the answer the correct answer is option A because it uses
the having Clause with the aggregate sum function the having Clause was introduced in SQL to work with aggregate functions which the where keyword cannot do while option D might seem similar it uses the Alias total amount which is not supported with the having Clause but is Supported in the order by Clause options B and C are incorrect because the having Clause cannot be used with non-aggregate columns you have a data warehouse that contains a table named customers in the stage schema customers contains all the customer record updates from a customer relationship management system there can
be multiple updates per customer you need to write a tsql query that will return the customer ID name postal code and the last updated time of the most Recent Row for each customer ID how should you complete the code to answer select the appropriate options in the answer area pause the video give it a think and then check the answer the correct answer is row number and where X is equal to one you have a data warehouse that contains a table named customers in the stage schema customers contains all the customer record updates from a
customer relationship management system there can be multiple updates per Customer you need to write a tsql query that will return the customer ID name postal code and the last updated time of the most recent Row for each customer ID how should you complete the code to answer select the appropriate options in the answer area pause the video give it a think and then check the answer the correct answer is row number and where X is equal to 1 question 12 you have a fabric tenant you plan to create a fabric notebook that will use Spark
data frames to generate Microsoft power bi visuals you run the following code for each of the following statements select yes if the statement is true otherwise select no take a moment to review the code and think carefully then check your answer let's first look at what the code does and then check the answers from Power by client import quick visualize get data set config in this line of code we are importing the modules quick visualize And get data set config that are necessary to work with data frames in your notebook then in the second line
of code it creates a new powerbi report object from a spark data frame and PBI visualize renders a new report now going back to our question one the code embeds an existing powerbi report no because we saw that the code creates a new powerbi report two the code creates a powerbi report yes three the code displays a summary of the data frame no because it Is the display DF summary true function that gives you a summary of the data frame though you might say that PBI visualize which creates a report also gives a quick summary
as shown in the image I would still go with no for the third question this is because the code snippet clearly states that it renders a new report according to Microsoft documentation the display DF summary true function should be used to check the statistical summary of a given dat Frame the summary includes the column name column type unique values and missing values for each column you can also select a specific column to see its minimum value maximum value mean value and standard deviation for more information please read the link in the description about notebook visualization
in Microsoft fabric you are the administrator of a fabric workspace that contains a Lakehouse named Lakehouse 1 Lake House 1 contains the following Tables table one a Delta table created by using a shortcut table two an external table created by using spark Table Three a manage table you plan to connect to Lakehouse 1 by using its SQL endpoint what will you be able to do after connecting to Lakehouse 1 a read Table Three B update the data table three C read table two D update the data in table one read the question again and give
it a think and check the answer the correct answer is a read Table Three Let's break down why this is the case firstly since we are using a SQL endpoint it's important to know that SQL endpoints are red only this means that options B and D which involve updating data are incorrect now we are left with options A and C let's consider option C red table 2 table two is an external table created using spark typically external tables created by spark cannot be accessed through SQL endpoints now let's look at Option a read table three
Table three is a managed table managed tables are fully controlled by The Lakehouse including storage and metadata management this means that these tables are fully supported for both read and write operations via SQL endpoints so with the SQL endpoint you can read Table Three without any issues therefore the correct answer is a read Table Three you have a fabric tenant that contains a warehouse you use a data flow to load a new data set from one Lake to the Warehouse you need to add a power query step to identify the maximum values for the numeric
columns which function should you include in the step a table Max n option b B table Max option C table range option D table profile look through the options carefully and answer the question the correct answer is table profile if you said table Max you are wrong initially I also thought table Max might be correct but it's not table Max Returns the largest Row in the table Given the comparison criteria it doesn't provide a summary of the maximum values for each numeric column notice that the question asks for the maximum values for numeric columns plural
if it asked for a single column's maximum value table Max could have been an option let's see the other options option a table Max n this function Returns the largest row or rows in the table given the comparison criteria it's not designed to return the Maximum values for each numeric column option C table range this function returns a specified number of rows from a table starting at a specified offset it's unrelated to finding the maximum values for numeric columns option D table profile provides a statistical summary of a table including the maximum values for each
numeric column this makes it the right choice for our requirement so the correct answer is option D table profile just a reminder Make sure you read the questions in the exam carefully so that you don't make silly mistakes question 15 you have a fabric tenant that contains a machine learning model registered in a fabric workspace you need to use the model to generate predictions by using the predict function in a fabric notebook which two languages can you use to perform model scoring each correct answer presents a complete Solution a tsql b Dax C spark SQL
d p Spark pause the video and take a moment to think and check the answer the correct answer is spark SQL and P spark according to the Microsoft documentation you can invoke the predict function using Transformer API from synapse ml or spark SQL API or P spark userdefined function UDF question 16 you are analyzing the data in a fabric notebook you have a spark data frame assigned to a variable Named DF you need to use the chart view in the notebook to explore the data manually which function should you run to make the data available
in the chart view a display HTML B show C write D display pause the video think it through and verify your answer the correct function to use is display let's look at our options in order option a display HTML this function is used used to display HTML content in the notebook not to visualize data frames in the chart View the show function prints the data frame to the console which is useful for quick text-based inspections but does not provide the interactive chart view the right function is used to write the data frame to external storage
such as a file or a database and doesn't help in visualizing the data frame within the notebook the correct answer is the display function because it outputs the data frame in a way that allows for interactive exploration and Visualization it's integrated with the notebook environment providing features like chart view making it the ideal choice for this task to make your data available in the chart view in a fabric notebook you should use the display function this function is specifically designed to visualize spark data frames in various forms such as charts tables and other visual representations
Direct within the notebook interface so remember to Interactively explore your data using the chart view in a fabric notebook you need to use the display function for more information you can check out the Microsoft documentation on using display in notebooks question 17 you have a fabric tenant that contains a Microsoft powerbi report named report one report one includes a python visual data displayed by the VIS visual is grouped automatically and duplicate rows are not Displayed you need all rows to appear in the visual what should you do a reference the columns in the python code
by index B modify the sort column by property for all columns C add a unique field to each row D modify the summarize by property for all columns pause the video think it through and verify your answer the correct answer is option C add a unique field to each row if you thought option A was correct let me show you why it's not let's first rule out Options B and D option b modify the sort column by property for all columns this option changes the Sorting order of the columns but does not prevent powerbi from
automatically grouping rows that have identical values in all displayed columns option D modify the summarize by property for all columns this option is relevant for aggregating numerical data within standard powerbi visuals it does not affect the grouping Of rows in a python visual where you need to ensure that all rows including duplicates are displayed option a reference the columns in the python code by index is incorrect this option means accessing the data in the data frame using numerical indices for example DF do ioi index number instead of column names like DF do field name this
method does not affect how powerbi groups or displays the data because it only Changes how you access data within the python script accessing by column names or accessing by numerical indices both approaches access the same data after power bi's pre-processing which means that the method used to reference the data in the python on script doesn't affect whether powerbi grouped the rows or not so clearly option A is incorrect because referencing colums by index does not affect how powerbi processes data The correct approach is to add a unique field to each row option C because by
adding a unique field to each row it ensure all rows including duplicates are passed to the python Visual and this prevents powerbi from grouping rows and ensures all are displayed this solution is supported by Microsoft documentation which states that adding an index field makes all rows unique and prevents grouping I hope this explanation was helpful question 18 you have a fabric tenant that contains a semantic model the model contains data about retail stores you need to write a Dax query that will be executed by using the xmla endpoint the query must return a table of
stores that have opened since December 1 2020 23 how should you complete the Dax expression please check the code closely give it a think and verify your answer the correct answer options are Define evaluate and Summarize please check the links in the description for more information question 19 you have a fabric workspace named workspace 1 that contains a data flow named data flow 1 data flow 1 has a query that returns 2,000 rows you view the query in power query as shown in the following exhibit what can you identify about the pickup longitude column A
the column has duplicate values B all the table rows are profiled C the column has Missing values D there are 935 values that occur only once the correct answer is a the column has duplicate values here's why option b all the table rows are profiled is incorrect because the exhibit shows that only 1,000 rows are being profiled not all 2,000 rows option C the column has missing values is incorrect because the null count is zero indicating there are no missing values in the column option D there are 935 values That occur only once is incorrect
because having 935 distinct values does not necessarily mean each occurs only once and remember the data is being profiled only on 1,000 rows therefore the column must have duplicate values which is option A always read the answer options carefully to avoid confusion if you got this answer wrong I strongly recommend watching the YouTube video mentioned in the Description question 20 you have a fabric tenant named tenant 1 that contains a workspace named ws1 ws1 uses a capacity named C1 and contains a data set named DS1 you need to ensure read write access to DS1 is
available by using xmla endpoint what should be modified first a the DS1 settings B the ws1 settings C the C1 settings D the tenant settings pause the video and look at the answer options and check the answer the correct answer is C The C1 settings to ensure read write access to a data set using the xmla endpoint you need to configure the capacity settings because the xmla endpoint must be enabled at the capacity level the other options are incorrect question 21 you have a fabric tenant that contains a workspace named workspace 1 workspace 1 is
assigned to a fabric capacity you need to recommend a solution to provide users with the Ability to create and publish custom direct Lake semantic models by using external tools the solution must follow the principle of least privilege which three actions in the fabric admin portal should you include in the recommendation each correct answer presents part of the solution note each correct answer is worth one point a from the tenant settings set allow xmla endpoints and analyze in Excel with on Premises data sets to enabled B from the tenant settings set allow Azure active directory guest
users to access Microsoft fabric to enabled C from the tenant settings select users can edit data model in the powerbi service D from the capacity settings set xmla endpoint to read write E from the tenant settings set users can create fabric items to enabled F from the tenant settings enable publish to web pause the video think it through and verify your answer The correct answer options are a d and e let's see why option A from the tenant settings set allow xmla endpoints and analyze in Excel with on premises data sets to enabled this setting
enables the use of xmla endpoints which are necessary for external tools like ssms Excel tabular editor or any other tool that can connect via xmla to interact with semantic models it also allows users to analyze data in Excel using live connections to these endpoints for More details refer to the Microsoft documentation option b from the tenant settings set allow Azure active directory guest users to access Microsoft fabric to enabled this setting allows guest users from Azure active directory to access microsof soft fabric however it is not related to enabling or managing xmla endpoints creating semantic
models or providing the necessary permissions for internal users to create and publish Custom direct Lake semantic models so it is incorrect option C from the tenant settings select users can edit data model in the powerbi service while this setting allows users to edit data models within the powerbi service it does not specifically enable the creation and management of custom direct Lake semantic models using external tools or xmla endpoints it focuses more on editing existing models rather than creating new ones or managing them Externally according to the documentation this setting doesn't apply to direct Lake data
sets or editing a data set through an API or xmla endpoint which are considered external tools thus this option is incorrect refer to the data model tenant settings for more information option D from the capacity settings set xmla endpoint to read write this setting allows read and write operations through the xmla endpoint which is essential for creating and Publishing semantic models therefore this is a correct answer option option E from the tenant settings set users can create fabric items to enabled this set permits users to create items within fabric including custom semantic models this is
necessary for users to have the capability to manage their models therefore this is a correct answer option option F from the tenant settings enable publish to web this Setting allows users to publish reports to the web making them publicly accessible it does not pertain to the internal creation modification or management of semantic models using xmla endpoints or external tools it is more about sharing reports publicly rather than providing permissions for model creation and management so it is incorrect so the correct options are a d and e hope you understood the reasons why see you in
the next video and make Sure to check the links provided for a deeper understanding question 22 you are creating a semantic model in Microsoft power bi desktop you plan to make bulk changes to the model by using the tabular model definition language tmdl extension for Microsoft Visual Studio code you need to save the semantic model to a file which file format should you use a pbip b PBX c pbit d pbids pause the video think it through and verify your answer the correct file format to Use is a pbip why the pbip powerbi project file
format allows you to save your powerbi desktop project in a way that is compatible with external tools like Visual Studio code for bulk editing using tmdl PBP files are particularly useful when working with tabular models and Performing bulk modifications making them ideal for this scenario let's see why the other options are incorrect option b PB iix this is The standard file format for powerbi desktop files it contains the complete report including the data model visualizations and Report definitions while PB files are essential for regular report development and sharing they are not specifically designed for bulk
editing with tmdl in Visual Studio code therefore it is incorrect option C pbit this format is used for power bi templates it includes the report layout And queries but does not contain the actual data pbit files are useful for creating reusable templates but are not suited for direct bulk editing of the semantic model using tmdl therefore it is incorrect option D PBS this file format is used for powerbi data source files it defines the data sources and their connection details but does not contain the data model or report elements pbids files are not Intended for
semantic model modifications or bulk editing therefore it is incorrect by saving your semantic model in the pbip format you can effectively leverage the tmdl extension in Visual Studio code for bulk changes so our answer is a pbip question 23 you have a fabric tenant that contains a warehouse named Warehouse one Warehouse 1 contains three schemas named Schema a schema B and schema C you need to ensure that a user named user one can truncate tables in schema a only how should you complete the tsql statement please pause the video and look at the answer area
and then verify your answer the correct answer is Grant alter on schema SCH schema a to user one let's see why this is the correct answer Grant alter permission per we need to Grant the alter permission because it allows a user to perform data definition language Ddl operations on objects within a schema this includes creating altering or truncating tables if you look at the Microsoft documentation it says to truncate the minimum permission required is Alter option b connect is incorrect because the connect permission allows a user to connect to the database while it is necessary
for accessing the database it does not provide the ability to modify objects within the database such as truncating Tables option C execute is incorrect because the execute permission allows a user to execute stored procedures and functions it does not Grant permissions to modify or truncate tables correct schema specification on schema schema a this specifies that the permission applies only to the schema named schema a to user one this indicates that the permission is being granted to the user user one why not other options on database schema a this is incorrect Because schema a is a
schema not a database this syntax would be used if you were granting permissions on the entire database on object schema a this is incorrect because object is used for specific database object objects like tables views Etc if you read the question the requirement is to allow user one to truncate tables within schema a only so permissions need to be granted at the schema level hope you understood the Reasoning behind the answer question 24 you plan to deploy Microsoft powerbi items by using fabric deployment pipelines you have a deployment pipeline that contains three stages named development
test and production a workspace is assigned to each stage you need to provide powerbi developers with access to the pipeline the solution must meet the following requirements ensure that the developers can deploy items to the Workspaces for development and test prevent the developers from deploying items to the workspace for production follow the principle of least privilege which three levels of access should you assign to the developers each correct answer presents part of of the solution a build permission to the production semantic models B admin access to the deployment pipeline C viewer access to the development
and test workspaces D viewer access to the Production workspace e contributor access to the development and test workspaces F contributor access to the production workspace please take a moment to pause the video review the answer options and then verify your answer the correct answer options are b d and e now let's look at all the answer options and see why these are correct option A build permission to the production semantic models this permission is Incorrect because according to the Microsoft documentation it states that workspace members assigned the viewer role without build permissions can't access the
semantic model or edit workspace content this means if you have build permission you can access the semantic model or edit workspace content which we do not want to give developers in production thus this option is incorrect option b admin access to the Deployment pipeline admin access is necessary for developers to manage the deployment process move items between stages and oversee the entire pipeline without this access they wouldn't be able to deploy items to any workspace you might think that giving admin access allows developers to move items between development test and production stages however we can
control the permission on the production workspace By assigning the developer as a viewer option D so they cannot deploy to production option C viewer access to the development and test workspaces viewer access is too restrictive to Ure enure that developers can deploy items to the development and test workspaces they need contributor access which is option e making it the correct answer option D viewer access to the production workspace assigning Developers as viewers in the production workspace ensures they can see but not deploy or modify content in production this adheres to the principle of least privilege
making it the correct Choice option e contributor access to the development and test workspaces contributor access allows developers to make changes and deploy items within the development and test workspaces this access level is necessary for their development and Testing tasks making it the correct answer option F contributor access to the production workspace this violates the principle of least privilege by allowing developers to make changes in the production environment therefore it is incorrect so by selecting options b d and e we can ensure that developers have the necessary permissions to perform their tasks without having more
access than required particularly in the production environment I hope this Explanation was helpful please review the roles in workspaces in Microsoft fabric before your exam question 25 you have a fabric workspace that contains a direct query semantic model the model model queries a data source that has 500 million rows you have a Microsoft power by report named report 1 that uses the model report one contains visuals on multiple Pages you need to reduce the query execution time for the visuals on all the pages what are two features that You can use each correct answer presents
a complete solution a user defined aggregations B automatic aggregation C query caching d one Lake integration pause the video and think about what the correct options might be then resume to check your answer the correct answers are option A and B let's see why option A userdefined aggregations userdefined aggregations allow you to pre-aggregate data at Different levels of granularity by doing so powerbi can query the pre-aggregate data rather than the raw data significantly reducing query times and improving performance according to the documentation aggregations in powerbi can improve query performance over large direct query semantic models
by using aggregations you cache data at the aggregated level in memory thus it is the correct answer Option b automatic aggregations is correct because automatic aggregations use state-of-the-art machine learning ml to continuously optimize direct query semantic models for maximum report query Performance Based on usage patterns in powerbi this feature optimizes query performance by pre- agrega data and minimizing the need to query large data sets directly read more about it from the links given in the description box if you thought option C query caching Was correct like me we both or incorrect while query caching can
help improve performance by storing query results it is typically less effective for reports with highly Dynamic data or user interactions it also does not directly address the scale and complexity of querying 500 million rows Additionally the documentation states that query caching is not applicable for direct query or live connect semantic models And in our question it is mentioned that we have a direct query semantic model so this option is incorrect option D one Lake integration is also incorrect one Lake integration is not specifically aimed at optimizing query performance within powerbi reports it is more focused
on providing a unified data Lake solution and does not directly impact query execution times for direct query models question 26 you have a fabric Tenant that contains contains 30 CSV files in one Lake the files are updated daily you create a Microsoft powerbi semantic model named model one that uses the CSV files as a data source you configure incremental refresh for model one and publish the model to a premium capacity in the fabric tenant when you initiate a refresh of model one the refresh fails after running out of resources what is a possible cause of
the failure A query folding is occurring B only refresh complete days is selected C xmla endpoint is set to read only D query folding is not occurring e the data type of the column used to partition the data has changed the answer options are tricky so please review the question and options then verify your answer the correct answer is option e if you thought it was option D qu query folding is not occurring let me explain why that is incorrect query folding does Not happen with CSV files supporting documentation from Microsoft confirms that query folding
is not applicable to flat file sources like csvs therefore the refresh failure is not due to the absence of query folding as it would not be possible in this scenario option a query folding is occurring query folding means that data transformations are being pushed down to the data source significantly reducing the workload on powerbi given that the data source is CSV files query folding cannot occur making this scenario incorrect option b only refresh complete days is selected this setting ensures that all rows for an entire day are included in the refresh operation it does not
directly address the issue of running out of resources during the refresh so this option is incorrect option C xmla endpoint is set to read only this setting controls the ability to read data models via the xmla Endpoint but does not directly impact resource usage during a data refresh operation therefore it is also incorrect option e the data type of the column used to partition the data has changed a mismatch in data types where date time is required for the range start and range end parameters but the table column is not in the datetime format can
cause the incremental refresh to fail both the parameter data types and the filtered data column must be in the Datetime format if not the query cannot be folded properly leading to Resource exhaustion during the refresh given that CSV files do not support query folding we can confidently rule out option D the refresh failure for model one in our question is likely due to a data type change in the partition column making option e the best answer question 27 you have a fabric tenant that uses a Microsoft power bi premium capacity you need to enable scaleout
for a semantic Model what should you do first a at the semantic model level set large data set storage format to off B at the tenant level set create and use metrics to enabled C at the semantic model level set large data set storage format to on D at the tenant level set data activator to enabled pause the video and take a moment to think and answer the correct answer is option C to enable scale out for a semantic model we need to follow the steps given in the documentation Which says that we need to
go to the semantic model level and set large data set storage format to on all the other options are incorrect question 28 you have a fabric tenant that contains a warehouse the warehouse uses Rowl security RLS you create a direct Lake semantic model that uses the Delta tables and RLS of the warehouse when users interact with a report built from the model which mode will be used by the DX queries a Direct query B dual C direct Lake D import please pause the video and give it a think and answer the question the correct answer
is a direct query if you thought option C direct Lake was correct because the RLS applies to queries on a warehouse you're on the right track but there's an important limitation to note when using powerbi queries on a warehouse in direct Lake mode there is a known issue where it falls back to direct query mode to comply with Rowl Security Microsoft documentation supports this queries using row-level security against tables in the warehouse including the Lakehouse SQL analytics endpoint will fall back to direct query mode thus option A is the correct answer why other options are
incorrect let's check option b dual dual mode allows tables to operate in both import and direct query modes depending on the query context while this can optimize performance dual mode does not Inherently address the enforcement of RLS for direct Lake semantic models consistent enforcement of RLS requires queries to be executed directly against the source where RLS is defined which direct query ensures dual Mode's primary function is performance optimization not security enforcement option D import is also incorrect because import mode involves loading data into powerbi in memory storage which allows for fast query Performance since the
data is readily available in memory however it does not interact with the live data source after the initial data load meaning it cannot dynamically enforce RLS RLS s needs to be applied at query time to ensure security policies are respected import Mode's static nature makes it unsuitable for scenarios requiring real-time security enforcement like RLS for more details you can pause the video and read the summary or visit the links in the Description to explore the Microsoft documentation question 29 you have a fabric tenant that contains a complex semantic model the model is based on a
star schema and contains many tables including a fact table named sales you need to create a diagram of the model the diagram must contain only the sales table and related tables what should you use from Microsoft powerbi desktop a data categories B data view C model view D Dax query view this is a pretty simple and straightforward question the correct answer is option C model view because model view allows you to visually see all the tables and relationships in your data model it will clearly display the fact table sales in the center with the related
Dimension tables connected around it providing a clear and accurate diagram of the model why other options are incorrect option a data categories is Incorrect because data categories are used to specify the type of data in a column EG address City Country but they do not provide a visual representation of the data model or relationships between tables option b data view is also incorrect because data view is used for viewing and editing the data in the tables themselves it does not offer a visual representation of the relationships between tables which is necessary for creating a diagram
of the Model option D dax's query view is also incorrect because Dax query view is used for writing and running Dax queries to analyze data not for visualizing the structure of the data model or the relationships between tables hope this explanation is clear question 30 you have a fabric tenant that contains a semantic model the model uses direct Lake mode you suspect that some Dax queries load unnecessary columns into memory you need To identify the frequently used columns that are loaded into memory what are two ways to achieve the goal each correct answer presents a
complete solution a use the analyze in Excel feature B use the vertac analyzer tool C query the system discover storage table column segments Dynamic management view D query the Discover memory Grant Dynamic management view read all the answer options carefully and then verify your answer the correct answer options are Option b and option C let's look at our answer options in order option A use the analyze in Excel feature this feature is primarily used for ad hoc analysis and creating pivot tables based on the powerbi data set it does not provide detailed information about the
memory usage of columns thus this option is incorrect option b use the vertac analy tool vertac analyzer is a tool specifically designed to analyze the memory usage of tables and columns in Powerbi models it helps identify which columns are consuming the most memory providing insights into optimization opportunities this tool is widely used for detailed memory analysis and optimization in powerbi and Analysis Services models it provides a detailed breakdown of memory usage by tables and columns helping you pinpoint which columns are consuming the most memory so this is one of our correct choices option C query
The system discover storage table column segments DMV this DMV provides detailed information about the storage and memory usage of table columns within a powerbi model by querying this view you can identify which columns are loaded into memory and how much memory they consume therefore the this is also our correct answer Choice option D query the Discover memory Grant DMV this DMV is used to analyze memory Grants and resource allocations for executing queries not specifically to identify which columns are loaded into memory thus this option is incorrect to refresh your memory about how to use vertac
analyzer and the DMVs I recommend you watch the videos Link in the description and read the blogs mentioned in the description question 31 you have the source data model shown in the following exhibit the primary keys of the tables are indicated by a key Symbol beside the columns involved in each key you need to create a dimensional data model that will enable the analysis of order items by date product and customer what should you include in the solution to answer select the appropriate options in the answer area pause the video and give it a think
as all the options presented here might seem correct at first glance now let's take a moment to analyze each one Carefully the relationship between order item and product must be based on option A based on the product ID column alone might seem straightforward but carries a risk if product ID is not unique across different companies this assumption could lead to significant errors such as missling products to orders which would comp compromise the reliability of your data analysis option b based on both Company ID and product ID is a robust choice Because it ensures that each
product is uniquely identified by both its product ID and the company it belongs to this method eliminates the risk of product ID collisions between different companies enhancing the accuracy of your data relationships option C based on a new key combining company ID and product ID offers a streamlined approach by creating a single composite key this can simplify query operations by reducing the complexity of joins and Is particularly beneficial for performance optimization however it requires additional setup like adjusting the database schema which could introduce complexity given these points option b is often the most practical without
additional overhead it guarantees accuracy without necessitating schema modifications making it an effective balance between Simplicity and data Integrity the company entity must be Option A omitted simplifying the schema by omitting the company entity could seem appealing if company information is deemed irrelevant however omitting it could restrict your analytical capabilities particularly if any future business questions require insights related to company specific contexts just because the question does not specify the need for company data Does not imply it should be excluded especially considering that product ID and Company ID form a composite primary key in the product
table option b denormalized into the product entity only while this setup streamlines product related queries by embedding company data directly within the product table it neglects the benefits that company information could also provide in analyzing customer Behavior option C denormalized into the customer and product entities this is the most comprehensive approach it ensures that company details are accessible across both the product and customer Dimensions supporting a wide range of queries and analyses this method not only enhances the model's flexibility but also ensures that all necessary information is readily available thus optimizing both performance and the
depth of analytical Insights from this discussion I would say that the best choice is option C as it supports a holistic view of the data allowing for richer and more contextual analyses across multiple dimensions in summary for setting up a dimensional data model in this scenario the best practices would be to link order item and product using both Company ID and product ID option b and to denormalize the company information into both customer and product tables Option C this setup ensures both robust data architecture and comprehensive analytical capabilities I hope this explanation has been helpful
to you when faced with questions like this it's important to assess which option is the best choice if a question is complex mark it for review and proceed with others then revisit it later always keep an eye on the timer to manage your exam time efficiently question 32 you have a Fabric tenant that contains a semantic model named model one model one uses import mode model one contains a table named orders orders has 100 million rows and the following Fields you need to reduce the memory used by model one and the time it takes to
refresh the model which two actions should you perform each correct answer presents part of the solution a split order date time into separate date and time columns B replace total quantity with a Calculated column C convert quantity into the text data type D replace total sales amount with a measure pause the video take a moment to think about it and then check the answer the correct options are A and D let's break down y option a split order date time into separate date and time columns this is beneficial because splitting the date time into separate
date and time components often results in better compression each component date and time Has fewer unique values than a combined date time column leading to reduced memory usage and potentially faster query performance if you frequently query by only one component thus this is correct option B replace total quantity with a calculated column this option is incorrect although calculated columns can be useful they are computed and stored within the model during refreshes consuming additional memory so replacing a direct query column like total Quantity with a calculated column would actually increase the memory usage not decrease it
option C convert quantity into the text data type this is also incorrect converting a numerical column like quantity into text would likely increase memory usage Text data types generally have worse compression compared to numerical types and could slow down processing times during refreshes and queries because operations on text are Less efficient than on numbers option D replace total sales amount with a measure this is a correct Choice using a measure for total sales amount instead of storing it as a calculated column or directly in the table can significantly reduce memory usage measures are calculated at
query time and do not store data within the model meaning they don't add to memory load during model refresh which can speed up the process here's a concise Summary of the differences between calculated columns and measures question 33 you have a fabric tenant that contains a semantic model you need to prevent report creators from populating visuals by using implicit measures what are two tools that you can use to achieve the goal each correct answer presents a complete solution a Microsoft power bi desktop B tabular editor C Microsoft SQL Server management Studio ssms D Dax Studio
pause the video Take a moment to think and then check the answer the correct options are A and B option a Microsoft powerbi desktop powerbi desktop allows you to set properties within the data model to discourage or disable implicit measures this feature prevents automatic summation or other aggregations when fields are dragged into visuals ensuring that only explicitly defined measures are used in reports therefore this is the correct answer option b tabular Editor tabular editor is an external tool that allows detailed manipulation of powerbi models much more extensively than what is possible within powerbi desktop it
can access and modify model properties including those that control the usage of implicit measures you can set the discourage implicit measures property to truth so this is the correct answer option options C and D are incorrect because ssms is predominantly used for managing SQL Server databases and is more aligned with managing traditional SQL databases and SQL Server analysis Services ssas then powerbi models directly while it can connect to and manage ssas models its functionality for directly impacting powerbi specific settings like discouraging implicit measures is limited option D DAC Studio DAC studio is a tool primarily
used for writing testing and analyzing Dax queries while It is excellent for optimizing performance and debugging calculations it does not provide capabilities to manage model Properties or settings that impact how measures are used within powerbi reports question 35 you have a fabric tenant that contains a lake house named named Lakehouse 1 Lakehouse 1 contains a table named table 1 you are creating a new data pipeline you plan to copy external data to table one the schema of the external data changes Regularly you need the copy operation to meet the following requirements replace table one with
the schema of the external data replace all the data in table one with the rows in the external data you add a copy data activity to the pipeline what should you do for the copy data activity a from The Source tab add additional columns B from the destination tab set table action to overwrite C from the settings tab select Enable staging D from the source tab select enable partition Discovery E From the Source tab select recursively pause the video take a moment to think and then check the answer the correct choice is option b here's
why setting the table action to overwrite in the destination tab ensures that each time the pipeline runs the existing table table one in your Lakehouse is completely replaced by the new schema and data from the external Source this approach handles schema changes effectively recreating the table with the new schema and replacing all the old data perfectly meeting your requirements let's explore why the other options don't fit option A from The Source tab add additional columns this is used to enhance the output with extra columns not to replace an existing schema or data set option C
from the settings tab select enable staging this improves data movement performance but Does not influence the schema or full data set replacement option D from the source tab select enable partition Discovery useful for partitioned data processing but Irrelevant for schema replacement in the destination table option E From the Source tab select recursively this is for retrieving data from nested folders in a hierarchical file system and doesn't relate to overwriting a database table schema or Data therefore choosing over write option b ensures your destination table in Lakehouse 1 consistently reflects the the latest structure and content
from the external data source question 36 you have a fabric tenant that contains a Lakehouse you plan to query sales data files by using the SQL endpoint the files will be in an Amazon simple storage service Amazon S3 storage bucket you need to recommend which file format to use and where to Create a shortcut which two actions should you include in the recommendation each correct answer presents part of the solution a create a shortcut in the file section B use the paret format C use the CSV format D create a shortcut in the table section
e use the Delta format the correct answer options are d and e and here's why to utilize the SQL endpoint effectively it's essential to create the shortcut in the taable section as highlighted in option D this setup is necessary because it enables the SQL endpoint to recognize and interact with the data as structured tables which is crucial for executing SQL queries directly furthermore the data needs to be in the Delta format as indicated in option e this requirement stems from the fact that only tables in the Delta format are recognized as managed Tables by the
SQL analytics endpoint a attempting to use Other formats like paret or CSV formats option b and c results in those files not being recognized as table objects thus rendering them non-query via the SQL endpoint documentation supports these choices by clearly stating that only tables in Delta format are available in the SQL analytics endpoint confirming that Park CSV and other formats cannot be queried using this endpoint regarding option A creating a shortcut In the file section is useful for managing files directly but for SQL queries it's more effective to organize data in the tables section so
it is incorrect question 37 you have a fabric tenant that contains a Lakehouse named Lakehouse 1 Lakehouse 1 contains a subfolder named subfolder one that contains CSV files you need to convert the CSV files into the Delta format that has vorder optimization Enabled what should you do from Lakehouse Explorer a use the load to tables feature B create a new shortcut in the files section c create a new shortcut in the table section D use the optimize feature pause the video and give it a think and check the answer the correct answer is option A
and here's why option A use the load to tables feature in Lakehouse Explorer allows you to load data directly from files like the csvs in subfolder one into tables While converting them into the Delta format during this process the tables are always loaded using the Delta lake table format with vorder optimization enabled which optimizes the layout of the Delta files for more efficient querying why other options are incorrect B create a new new shortcut in the files section while creating a shortcut in the files section might help in accessing the files more easily it does
not facilitate their conversion to Delta Format nor allow for the application of vorder optimization C create a new shortcut in the table section although creating a shortcut in the table section is useful for accessing existing tables or data already in a structured format it does not directly enable the conversion of CSV files to Delta format D use the optimize feature the optimize feature is typically used to improve the Performance of existing Delta Tables by compacting files and optimizing their layout it is not used for converting file formats or initially applying vorder optimization during the conversion
process thus to achieve the conversion of CSV files to Delta format with optimizing ations the load to tables feature is the most direct and effective approach in Lakehouse Explorer for more information you can read the load to Table capabilities overview found in the documentation question 38 you have a fabric tenant that contains a Lakehouse named Lakehouse 1 lakeh House 1 contains an unpartitioned table named table one you plan to copy data to table one and partition the table based on a date column in the source data you create a copy activity to copy the data
to table one you need to specify the partition column in the destination settings of the copy activity what Should you do first a from the destination tab set mode to append B from the destination tab select the partition column C from The Source tab select enable partition Discovery D from the destination tabs set mode to overwrite pause the video and take a moment to think about your answer the correct choice is option D if you selected option C let's revisit the question which specifies what you need to do First to enable partition Discovery effectively you
first need to set the mode to overwrite in the destination tab thus option C is not the first step option A is incorrect because when you set the mode to append it merely adds data to the existing data set without reconfiguring or partitioning a new which does not meet the requirement to partition based on the criteria option b while necessary for specifying which column to partition on Logically follows the first step of enabling the right mode overwrite so this is an incorrect option question 39 you have a fabric tenant that contains a warehouse named Warehouse
1 Warehouse 1 contains a fact table named fact sales that has 1 billion rows you run the following tsql statement create table test doact sales as clone of dbo doact sales for each of the following statements select yes if the statement is true otherwise select no pause the Video and take a moment to think and check the answer the answer to the first statement a replica of dbo fact sales is created in the test schema by copying the metadata only is yes because in Microsoft fabric when using the as clone syntax the operation creates a
zeroc copy clone which means only the metadata of the original table is duplicated the actual data stored as paret files remains shared between the original and The cloned table this results in both tables pointing to the same physical data files hence the term zero copy answers for the second and third statements is no because upon creation a table clone is an independent and separate copy of the data from its source any changes made through DML or ddl on the source of the Clone table which means any schema or data changes are not reflected in the
Clone table similarly any changes made through ddl Or DML on the table clone are not reflected on the source of the Clone table question 40 you have sourced data in a folder on a local computer you need to create a solution that will use fabric to populate a data store the solution must meet the following requirements support the use of data flows to load and append data to the data store ensure that Delta tables are vorder optimized and compacted Automatically which two types of data store should you use a a lake house B an Azure
SQ database c a warehouse d a kql database pause the video and take a moment to reflect before checking the answer the correct choices are a Lakehouse and C Warehouse firstly we can remove option b and d because azour SQL database and kql database are not suitable aour SQL database primarily handles relational data and doesn't natively support Delta tables or the Specific optimizations like vorder inherent to Delta Lake technology similarly kql used with Azure data Explorer is geared towards analytics on large data sets of streaming data and lacks support for Delta tables or similar optimizations
now option A Lakehouse is ideal as it utilizes the Delta lake table format which is Central to Microsoft fabric this ensures that tables are optimized for analytics which Is detailed in Microsoft Fabrics documentation therefore it is correct option C Warehouse is also a good choice because it employs vorder right optimization on paret files regardless of the data ingestion method this optimization greatly enhances read speeds under Microsoft Fabric's Computing engines such as powerbi SQL and Spark notably vorder is a permanent feature in synapse data warehouse further enhancing Performance question 41 you have a fabric tenant that
contains a lake house you are using a fabric notebook to save a large data frame by using the following code for each of the following statements select yes if the statement is true otherwise select no pause the video and take a moment to reflect before we dive into the answers for each statement for statement one the answer is yes utilizing the Partition by featuring spark organizes data into a Hierarchical directory structure based on the specified partition keys this means you can expect to see your data sorted into directories formatted by year month and day in
your designated output path this structure not only simplifies data management but also enhances accessibility especially with large data sets statement two again the answer is yes par the preferred file format for these partitions is engineered for Efficient querying and supports parallel processing this capability allows spark to read file partitions simultaneously across multiple nodes facilitating distributed query execution and thus boosting both performance and scalability statement three and once more the answer is yes paret files employ Snappy compression by default which means compression is automatically handled without requiring any additional code however if needed you can specify
The compression type explicitly using option compression Snappy or choose a different compression codec to suit your needs 42 you have a fabric workspace named workspace 1 that contains a data flow name data flow one contains a query that Returns the data shown in the following exhibit you need to transform the data columns into attribute value pairs where columns become rows you select the vendor ID column which transformation Should you select from the context menu of the vendor ID column a group by B unpivot columns C unpivot other columns D split column e remove other columns
this is a pretty straightforward question and the correct answer is option C unpivot other columns if you initially thought the correct answer was unpivot columns I encourage you to pause the video and take a moment to review the differences between these two [Music] Options question 43 you have a fabric tenant that contains a data pipeline you need to ensure that the pipeline runs every 4 hours on Mondays and Fridays to what should you set repeat for the schedule a daily B by the minute C weekly D hourly pause the video and take a moment to
think and check the answer the correct answer is C weekly here's why this is the best choice for your requirement daily option A is not Suitable because it implies the pipeline would run every day which does not align with the requirement of running only on Mondays and Fridays by the minute option b and hourly option D are also inappropriate because they specify a frequency that is too frequent and does not directly align with the specified days of the week weekly option C is the ideal setting because it allows you to specify the pipeline to run
on specific days of the week which can be set to Mondays and Fridays additionally within the weekly schedule you can further specify the pipeline to trigger every 4 hours on these days this aligns perfectly with the requirement of ensuring the pipeline runs every 4 hours but only on the specified days question 44 you have a fabric tenant that contains a warehouse several times a day the performance of all Warehouse queries degrades you suspect that fabric is Throttling the compute used by the warehouse what should you use to identify whether throttling is occurring a the Capac
settings B the monitoring Hub C Dynamic management views DMVs D the Microsoft fabric capacity metrics app pause the video for a moment to think and verify the answer the correct option is D the Microsoft fabric capacity metrics app is specifically tailored to Monitor and manage capacity related issues within your Microsoft Fabric environment it offers detailed insights into resource usage particularly highlighting when operations are throttled due to capacity limits being reached the app provides metrics like the number of operations rejected during throttling events enabling you to pinpoint both the timing and the exact causes of these
issues now let's explore why the other options are not suitable option A the Capacity settings while this allows you to view and adjust the capacity settings of your Warehouse it lacks the capability to provide real-time monitoring or historical data that could indicate throttling events it's useful for setting limits but not for diagnosing operational issues like throttling option b the monitoring Hub this tool is excellent for General monitoring of activities and performance Across various items in fabric but it isn't specifically designed to identify throttling events which are crucial for understanding resource limitations option C DMVs is
incorrect because just like most Warehouse operations DMVs are also classified as background and covered by the background rejection policy as a result DMVs cannot be queried when capacity is throttled even though DMVs are not available capacity admins can go to Microsoft fabric Capacity metrics app to understand the root cause which is option D the Microsoft fabric capacity metrics app that stands out as the optimal Choice question 45 you have a fabric workspace that uses the default spark starter pool and runtime version 1.2 you plan to read a CSV file named sales raw. cssv in a
Lakehouse select columns and save the data as a Delta table to the managed area of the Lakehouse sales RSV contains 12 columns You have the following code for each of the following statements select select yes if the statement is true otherwise select no all right let's pause the video right here I'd like you to take a moment to look over the code and think about the answers to these statements once you have your thoughts we'll discuss the correct responses statement one The Spark engine will read only the sales order number order date customer name unit
price Columns from sales raw CSV file the answer here is actually no why although the code uses do select to specify certain columns spark initially reads all columns from the CSV because dot select is supplied only after the data is loaded into memory this means all columns from sales. ra. cssv are read before any selection is made statement two removing the partition will reduce the execution time of the query the answer here is generally no But it can depend on the context partitions in spark help divide data into manageable chunks that can be processed in
parallel across different cluster nodes if you remove these partitions spark processes the entire data set as one bulk which might lead to decreased performance especially with large data sets however in scenarios where the data set is small or the cluster isn't fully utilized removing partitions might reduce some overhead Overall though for larger data sets and well-configured environments partitioning is beneficial for performance statement three adding infos schema akru will increase the execution time of the query the answer is yes when infos schema is set to True spark must sample the data to infer the column types
this additional step requires reading the data twice once to infer the the schema and wants to actually process the data thus increasing the initial Load time particularly for large data sets and a little side note though it's not directly asked here if in for schema as true were included in the code spark would attempt to read only the specified columns directly if it can effectively pass the CSV structure on the first read this optimization could potentially change the answer to statement one to yes as it would allow spark to bypass unnecessary columns right from the
start thanks for thinking this through with me I hope this breakdown helps clarify how spark handles data operations question 46 you have a fabric tenant that contains a warehouse a user discovers that a report that usually takes 2 minutes to render has been running for 45 minutes and has still not rendered you need to identify what is preventing the report query from from completing which Dynamic management view DMV should you use a cydm exec requests B CDM exec sessions C CDM exec connections D CDM PDW exec requests the correct answer is cydm exec requests if
you are considering cydm PDW exec requests remember this DMV is tailored for Azure synapse analytics and analytics platform system which use a massively parallel processing MPP architecture the Microsoft documentation clearly says that for serverless SQL pool or Microsoft fabric use csdm exec requests thus option A is correct let's briefly look at the other options option B csdm exec sessions it offers information on user sessions which doesn't directly help identify why a specific query is stalled option C SI d M exec connections this DMV provides details on each connection to the server useful but not specific
enough for our needs today for your reference here's a quick rundown of these DMVs feel free to pause the video and take a closer look if you need a deeper understanding of how each DMV can Be utilized in a monitoring scenario within Microsoft fabric question 47 you are creating a data flow in fabric to ingest data from an Azure SQL database B by using a tsql statement you need to ensure that any foldable power query transformation steps are processed by the Microsoft SQL Server engine how should you complete the code pause the video and review
the provided code and check the answer the completed code is shown here the value native query Function is used to run a native SQL query directly against the database this function is particularly useful when you need to execute complex queries or utilize specific SQL features that are not natively supported by the power query user interface we are using the the enable folding true option because we need to ensure that any foldable power query transformation steps are processed by the Microsoft SQL Server engine as per The question a brief overview of query folding this is a
feature in power query that attempts to push as much of the data transformation logic as possible back to the data source instead of importing large amounts of data and then filtering and processing it within power query enabling query folding allows these transformations to be executed directly at the database level this can drastically reduce the Amount of data transferred and improve overall performance you can read more about it from the documentation and the link is in the description question 48 you have a fabric tenant that contains a Lakehouse named lake house 1 readings from 100 iot
devices are appended to a Delta table in Lakehouse 1 each set of readings is approximately 25 K by approximately 10 gab of data is received daily all the table and Spark session settings are set to the default you Discover that queries are slow to execute in addition The Lakehouse storage contains data and log files that are no longer used you need to remove the files that are no longer used and combine small files into larger files with a Target size of 1 gbyte per file what should you do pause the video review the answer options
and then check the correct response to remove files you can achieve this by scheduling the vacuum command for combining files this Can be accomplished by scheduling the optimize command these are the correct methods for managing file operations efficiently all other options provided do not correctly address these tasks for detailed guidance on how to use the vacuum and optimize commands effectively please refer to the documentation Linked In the description below question 49 you need to create a data loading pattern for a type one slowly changing Dimension SCD which two actions should you include in the process
each correct answer presents part of the solution a update rows when the non-key attributes have changed B insert new rows when the natural key exists in the dimension table and the non-key attribute values have changed C update the effective end date of rows when the non-key attribute values have changed D insert new records when the natural key is a new value in the table please pause The video and think through the answer options and then check the answer the correct answers are options A and D option a update rows when the non-key attributes have changed
is correct because this is the core mechanism of a type 1D the objective is to update existing records with new data when there are changes in non-key attributes overwriting the old data this process ensures the dimension table always Reflects the most current state of data without retaining historical versions option D insert new records when the natural key is a new value in the table when a new data entry comes in with a natural key that doesn't yet exist in the dimension table we need to add it as a new record this action isn't about managing
changes to existing data but is crucial for adding new elements into the database thus this is a correct answer Choice Option b is incorrect because in a type 1 SCD the practice is to update existing rows rather than duplicate them when changes to non-key attributes occur inserting new rows under these circumstances aligns more with type 2 SCD practices which preserve historical data by maintaining different versions of Records if you considered option C as correct you should have been sure that the SCD table that the question is Asking has an effective end date column and usually
managing effective dates is a characteristic of a type 2D where it's necessary to track the historical period during which certain data values were valid in a type 1D there is no concept of tracking historical periods with start and end dates because data is simply overwritten like in the documentation if you see there is a modified date in SCD Type 1 and if the question provided this image we would be confident that option is also correct but again the question asked for select the two actions and the most important ones are option A and D One
updates are performed on existing records when there are changes to non-key attributes two insertions occur only for completely new keys that do not exist in the dimension table sometimes it is better not to overthink and answer the Question to refresh your memory on SCD types please pause the video and read through it or visit the link mentioned in the description question 50 you have a fabric workspace named workspace one and an Azure data Lake storage Gen 2 account named storage 1 workspace 1 contains a lake house named Lakehouse 1 you need to create a shortcut
to storage one in Lakehouse 1 which connection and Endo should you spfy to answer select the appropriate Options in the answer area the correct answer for connection protocol is https and for endpoint is DFS if you initially thought abfs or abfs were correct for connections to Azure data Lake storage Gen 2 within fabric it's important to note that despite AB FSS typically being used for secure connections it encounters issues in fabric there have been instances reported within the fabric Community where users experienced errors when Using these protocols leading to the recommendation of using https Instead
This preference is supported by documentation and likely stems from specific compatibility or configuration needs within the fabric platform as this technology evolves Microsoft May address these issues so it's wise to stay updated for now based on the current guidelines and the date of this video https Remains the recommended choice the endpoint DFS is Correct because for data Lake storage Gen 2 it supports the hierarchical name space which is crucial for data Lake storage operations enabling better performance for directory management and file operations blob is incorrect because the blob endpoint is generally used for Azure blob
storage which does not support the hierarchical Nam space capabilities needed for optimal data Lake Operations using this point would not leverage the full features of data Lake storage Gen 2 and could lead to limitations in managing and organizing data efficiently similarly the file endpoint is incorrect for this use case as it is primarily associated with Azure file storage designed for managing file shares in cloud or on premises environments utilizing this endpoint would fail to harness the full Capabilities of data Lake storage Gen 2 and might restrict efficient data management to learn how to create a
shortcut to Azure data Lake storage Gen 2 in fabric please visit the link in the description below question 51 you are analyzing customer purchases in a fabric notebook by using pypar you have the following data frames transactions contains five columns named transaction ID customer ID product ID amount and date and has 10 million rows With each row representing a transaction customers contains customer details in 1,000 rows and three columns named customer. ID name and Country you need to join the data frames on the customer. ID column the solution must minimize data shuffling you write the
following code which code should you run to populate the results data frame pause the video carefully review each option and its corresponding code and then check the answer the correct answer is Option A why answer a is correct if you see it uses the broadcast join function which is a technique used in spark to minimize data shuffling in a broadcast join the smaller data frame customers in this case with only 1,000 rows is broadcasted to each node in the cluster this means that spark sends a copy of the smaller data frame to each executor node
in the cluster the larger data frame transactions with 10 million rows does not need to be shuffled across the Network since customers is much smaller than transactions broadcasting it will significantly reduce the amount of data that needs to be shuffled during the join process this makes the join operation faster and more efficient let's see why other options are incorrect option b firstly this does not utilize broadcast must join which means there could be significant data shuffling as spark tries to collocate Records with the same customer ID from both data frames on the same node the
distinct in the code does not contribute to reducing shuffling and is aimed at removing duplicate records which is not specified as a requirement option C uses a simple join and similar to option b this approach performs a simple join without utilizing broadcast leading to potential inefficiencies due to data shuffling across the cluster so this is incorrect Option D users cross join and this performs a crossjoin that is the cartisian product between the two data frames and then filters the result based on the customer ID match this approach is highly inefficient because it initially combines all
rows of transactions with all rows of customers resulting in a significant computational overhead before filtering them making it the least efficient method for this requirement by broadcasting the smaller Data frame and directly joining on the matching column option A leverages Sparks optimization techniques to handle large data sets efficiently reducing Network traffic and execution time thus option A is the best and correct answer [Music] Choice question 52 you have a Microsoft powerbi report and a semantic model that uses direct Lake mode from powerbi desktop you open performance analyzer as shown in the Following exhibit use the
drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic pause the video and take a moment to consider the answer options let's explore why the correct answers are automatic and direct Lake first let's analyze the visuals from the performance analyzer the first card linked to table one utilizes direct Lake this is inferred because there's no Indication of direct query in the performance analyzer for this card which aligns with the documentation stating that absence of direct query implies direct Lake usage the second card linked to view1 clearly
uses direct query as indicated in the performance analyzer now now back to our question the direct Lake fallback behavior is set to automatic is correct because this setting allows the model to use either direct lake or direct query the query For the table visual is executed by using direct lake is correct as it is confirmed by the lack of a direct query tag in its performance analysis this example is from Microsoft's documentation which you can review for a deeper understanding through the link provided in the description question 53 you have a fabric tenant that contains
a lakeh house named Lakehouse 1 Lakehouse 1 contains a table named NYC taxi raw NYC taxi row contains The following table you create a fabric notebook and attach it to Lakehouse 1 you need to use ppar code to transform the data the solution must meet the following requirements add a column named pickup date that will contain only the date portion of pickup date time filter the data frame to include only rows where fair amount is a positive number that is less than 100 how should you complete the code to answer select the appropriate options in
The answer area the correct answers are here to add a column named pickup dat DF dowith column is the appropriate choice because it allows you to specify both the column name and the operations for its values the other options do not fit our needs DF do columns this is simply used to list the names of all columns in a data frame not modify them DF do select while it selects specific columns it doesn't inherently add or modify columns DF do With columns renamed this changes the names of existing columns which isn't required here for extracting
the date cast date is correct because it accurately converts the timestamp to a date type focusing only on the date part of the pickup date time Alias date is incorrect as it's used to rename columns in dataframe outputs not for type casting cast pickup date is incorrect because pickup date is mistakenly used as a data type get field date is also Inappropriate as it's meant for accessing fields from complex types like arrays not for simple type conversion for filtering the correct syntax to use use is the first option because it correctly applies conditions to the
fair amount column ensuring that only positive values less than 100 are included all the other options have syntax errors so they are incorrect note this question is part of a series of questions that present the same Scenario each question in the series contains a unique solution that might meet the stated goals some question sets might have more than one correct solution While others might not have a correct solution after you answer a question in this section you will not be able to return to it as a result these questions will not appear in the review
screen question 54 you have a fabric tenant that contains a new semantic model in one Lake you use a fabric notebook to read the data into a spark data frame you need to evaluate the data to calculate the Min Ma Max mean and standard deviation values for all the string and numeric columns solution you use the following pypar expression does this meet the goal a yes B no the correct answer is no because the explained function in pypar is used to display the execution plan of a data frame which shows how spark will execute the
query But does not compute or return statistical calculation question 55 solution you use the following P spark expression does this meet the goal a yes B no the correct answer is no the expression DF doow method is used to display the contents of the data frame typically the first few rows which helps in visually inspecting the data but does not perform any statistical analysis question 56 solution you use The following ppar expression does this meet the goal a yes B no the answer is yes the expression DF do summary is suitable for achieving the goal
of evaluating the data to calculate the minin max mean and standard deviation values for all the string and numeric columns in a data frame and DDF do describe computes basic statistics for numeric and string columns note this question is part of a series of questions that present the same scenario Each question in the series contains a unique solution that might meet the stated goals some question sets might have more than one correct solution While others might not have a correct solution after you answer a question in this section you will not be able to return
to it as a result these questions will not appear in the review screen question 57 you have a fabric tenant that contains a lakeh house named Lakehouse 1 Lakehouse 1 contains a Delta table named customer when you query customer you discover that the query is slow to execute you suspect that maintenance was not performed on the table you need to identify whether maintenance tasks were performed on customer solution you run the following spark SQL statement describe history customer does this meet the goal a yes B no the correct answer is yes because the spark SQL
statement describe history Customer provides a detailed history of all operations performed on the table such as updates merges deletes and optimizations which are considered maintenance tasks by reviewing the history you can confirm whether any maintenance tasks have been executed which could affect the query performance question 58 solution you run the following spark SQL statement refresh table customer does this meet the goal a yes B no the answer is no Running the refresh table customer command is used to update the metadata of a table and ensure that any changes to the underlying data are recognized by
The Spark engine however this command does not provide information on whether maintenance tasks were performed on the Delta table question 59 solution you run the following spark SQL statement explain table customer does this meet the goal a yes B no the answer is no the solution does not meet the goal the Explain command is used to show The Logical and physical plans for a query helping to understand how spark will execute the query but it does not provide information about whether maintenance tasks have been performed on a table question 68 you have a fabric tenant
you need to configure one Lake security for users shown in the following table the solution must follow the principle of least privilege which permission should you assign to each User to answer select the appropriate options in the answer area pause the video and consider the options before checking the answer use user one requires the readall permission this permission grants comprehensive access across the entire Lakehouse allowing user one to read all data using Apache spark which is essential for their role user 2 requires the read data permission this specific permission is Crucial for accessing data via
the SQL analytics endpoint it allows user to to read all data available through SQL endpoints without the restriction of SQL policies 69 you have an Azure repos repository named repo 1 and a fabric enabled Microsoft powerbi premium capacity the capacity contains two workspaces named workspace 1 and workspace 2 git integration is enabled at the workspace level you plan to use Microsoft powerbi Desktop and workspace 1 to make Version Control changes to a semantic model stored in repo 1 the changes will be built and deployed to workspace 2 by using Azure pipelines you need to ensure
that report and semantic model definitions are saved as individual text files in a folder hierarchy the solution must minimize development and maintenance effort in which file format should you save the changes a pbip b pbids c pbit d PBX this is a Straightforward question the correct answer is pbip when you save work as a pbip powerbi desktop organizes report and semantic model items into individual text files within folders this aligns perfectly with what the question requires to provide further Clarity the powerbi documentation states that saving as a powerbi project pbip allows for better authoring collaboration
and project management as all item definitions are stored as plain Text files in an intuitive folder structure here's why the other options are not suitable pbids files are specific to powerbi data sources and are used primarily for managing connection settings not for saving project files PYT files are templates that provide a starting point for reports but do not include actual data making them unsuitable for complete project versioning PBX files contain both the Report structure and the data imported from various sources which is more than what's needed for simple project management and Version Control below is
a summary of the key differences between these file formats question 70 you are implementing a medallion architecture in a single fabric workspace you have a Lakehouse that contains the bronze and silver layers and a warehouse that contains the gold layer you create the items required to populate the layers as Shown in the following table you need to ensure that the layers are populated daily in sequential order such that silver is populated only after bronze is complete and gold is populated only after silver is complete the solution must minimize development effort and complexity what should you
use to execute each set of items each option may be used once more than once or not at all pause the video and give it a think and check the answer the Appropriate triggers and actions should be for orchestration pip line a schedule because you need a schedule to automatically trigger the sequence of processes daily this ensures that the data in each layer is refreshed and processed on a daily basis for bronze layer an invoke pipeline activity since the bronze layer uses pipelines with copy activities an invoke pipeline activity is Suitable it will trigger the
pipeline that contains the copy activities necessary for populating the bronze layer for silver layer a pipeline data flow activity because it utilizes data flows this activity will specifically trigger the data flow processes that transform and refine the data post bronze layer processing for gold layer a pipeline stored procedure activity because the gold layer uses stored procedures and Thus a pipeline stored procedure activity is the most direct method to execute these procedures this activity ensures that the stored procedures which form the gold layer are run after the silver layer is fully processed hope this explanation was
helpful question 71 you are building a solution by using a fabric notebook you have a spark data frame assigned to a variable named DF the data frame returns Four columns you need to change the data type of a string column named age to integer the solution must return a data frame that includes all the columns how should you complete the code each value may be used once more than once or not at all pause the video and give it a think and check the answer the correct answer is here the wi column method is used
to either add a new column to the data frame or replace an existing column with the same name in this case it's Modifying the age column cast in in this part of the code casts or converts the values in the age column to integers if the original values in age are in a different data type like string or float they will be converted to integers show method displays the rows of the data frame in a tabular form it's used here to visually inspect the results of the Cast Operation question 72 you have an Azure data
Lake storage Gen 2 account named storage one that Contains a parket file named sales paret you have a fabric tenant that contains a workspace named workspace 1 using a notebook in workspace one you need to load the content of the file to the default Lakehouse the solution must ensure that the content will display automatically as a table named sales in Lakehouse Explorer how should you complete the code to answer select the appropriate options in the answer area pause the Video and think about this the correct format is Delta if you considered par remember that the
requirement specifically mentions that the solution must ensure that the content will display automatically as a table named sales in Lakehouse Explorer this is achievable only with Delta tables as they automatically integrate with the Lakehouse environment use the method sa stable with the name sales to directly register the table in The tables section of Lakehouse explorer meeting the explicit requirements of the scenario additional note on usage when using the save a stable as method you do not need to specify the tables prefix as spark handles this internally when managing metadata for the table using save the
full path needs to be explicitly defined if you're not using save a stable question 73 you have a fabric workspace named workspace 1 that contains a Lakehouse Named Lakehouse 1 in workspace 1 you create a data pipeline named pipeline one you have CSV files stored in an Azure storage account you need to add an activity to pipeline 1 that will copy data from the CSV files to Lakehouse 1 the activity must support powerquery M formula language expressions which type of activity should you add a data flow B notebook C script D copy data pause the
video and give it a think and check the answer the correct answer Is a data flow here's why data flow activities in fabric support the use of power query M formula language which allows for Advanced Data transformation capabilities during the data ingestion process by using data flows you can efficiently transform data from CSV files and load it into Lakehouse 1 with Transformations defined in power query M other options are less appropriate because B notebook typically involves using python Scala or SQL for data Processing tasks within spark environments and does not natively support power query M
formula language C script can involve various scripting activities but does not specify the direct use of powerquery M formula language as part of its processing D copy data activity is mainly used for copying data without advanced Transformations that power query M provides while it can move data from CSV files to Lakehouse 1 it lacks The transformation capability provided by power query M thus to utilize power query M for Transformations as part of the data ingestion process into Lakehouse 1 using a data flow is the most suitable and direct approach 74 you have a fabric tenant
that contains Lakehouse named Lakehouse 1 Lakehouse one contains a Delta table with eight columns you receive new data that contains the same eight columns and two additional columns you create a spark Data frame and assign the data frame to a variable named DF the data frame contains the new data you need to add the new data to the Delta table to meet the following requirements keep all the existing rows ensure that all the new data is added to the table how should you complete the code to answer select the appropriate options in the answer area
pause the video and give it a think and check the answer the correct answer is append and merge schema true mode Append is used because you need to add the new data rows to the existing Delta table without overwriting or removing any existing data and option merge schema true is crucial when the incoming data has new columns not present in the existing table setting merge schema to True allows the schema of the Delta table to be updated to accommodate the new columns ensuring that all the new data is added correctly why other options are incorrect
mode Error and error if exists these would stop the operation if the table exists which is not suitable as you want to add data to an existing table mode ignore would ignore the right operation if the table already exists thus not adding any new data overwrite mode would delete all existing Data before adding new data which contradicts the requirement to keep all existing rows option overwrite schema false overwrite schema True these options pertain to completely replacing the schema which isn't necessary here and could lead to data loss or schema mismatches since only adding new columns
is required not a full schema replacement question 75 you have a fabric Warehouse that contains a table named sales. orders sales. orders contains the following columns you need to write a t-l query that will return the following columns how should you Complete the code to answer select the appropriate options in the answer area this is a pretty straightforward W SQL question the answers are coals and least coals is designed to return the first non-null expression among its arguments in this case you want to return the weight if it's not null otherwise the quantity if it's
not null and if both are null you want to return one choose this function selects an item From a list based on an index which doesn't suit the requirement here it doesn't handle null values or provide fallback logic like cols so it is incorrect I this is a conditional function that works well for simple true false scenarios but would require nested conditions to achieve what coales does easily in this context thus it is also incorrect least is the correct answer Because it Compares two or more expressions and Returns the smallest value in this scenario you
need to return the lowest price between list price and sale price ignoring null values Coes while coales list price sale price could return list price if sale price is null it does not compare the two values to find the smallest one which is what the question asks for mean mean is typically used in aggregate functions to return the Minimum value from a set of rows not to compare individual column values within the same row it would not work correctly in this scenario where you need to compare values within a single row question 76 you have
a fabric tenant that contains a lake housee you plan to use a visual query to merge two tables you need to ensure that the query returns all the rows in both tables which type of join should you use a inner B full outer C left outer d right Anti- e right outer F left anti this is another simple tsql question the correct answer is B full outer join question 77 you have a fabric tenant that contains a Lakehouse named Lakehouse 1 Lakehouse 1 contains a Delta table that has 1 million paret files you need to
remove files that were not referenced by the table during the past 30 days the solution must ensure that the transaction log remains consistent and the ACD properties of the table are Are maintained what should you do a from one Lake file explorer delete the files B run the optimize command and specify the Z order parameter C run the optimize command and specify the V order parameter D run the vacuum command pause the video and give it a think and check the answer the correct answer is D run the vacuum command because the vacuum command is
used to remove files that are no longer needed by the Delta table typically Files that are not referenced by the table in the transaction log for a specified period such as the past 30 days this ensures that the transaction log remains consistent and that the aid atomicity consistency isolation durability properties of the Delta table are maintained by running vacuum you effectively clean up the data storage removing old and unused files while preserving the Integrity of the table Option A is incorrect because manually deleting files from the file explorer would not maintain the consistency of the
transaction log or the acid properties of the Delta table this could lead to data corruption as the transaction log would still reference files that no longer exist option b is also incorrect because the optimize command with the Zed order parameter is used to optimize the layout of the data files particularly for Faster query performance by physically organizing the data in a way that is beneficial for query patterns however it does not delete unreferenced files option C is incorrect because similar to Z order the V order parameter in the optimize command is also focused on optimizing
the data layout but it does not handle the removal of unreferenced files it it reorganizes data for query performance but does not address file cleanup feel free to pause the video and Take a moment to read about the table maintenance operations if you'd like question 78 you are implementing two Dimension tables named customers and products in a fabric Warehouse you need to use slowly changing Dimension SCD to manage the versioning of data the solution must meet the requirements shown in the following table which type of s C CD should you use for each table each
SCD type may be used once more than once or not at all pause the video and Give it a think and check the answer for customers create a new version of the row this action refers to SCD type 2 in SCD type 2 a new record is created Whenever there is a change in the source data the previous record is preserved allowing you to track historical changes for products overwrite the existing value in the latest row this action refers to SCD type 1 in SCD type 1 changes in the source data are reflected by overwriting
the existing data in the Dimension table there is no historical data preservation in this case here's a quick summary of SCD types that I've put together take a moment to read through it question 79 you have a fabric workspace named named workspace 1 and an Azure SQL database you plan to create a data flow that will read data from the database and then transform the data by performing an inner join you need to ignore spaces in the values when performing the inner join the solution Must minimize development effort what should you do a append the
queries by using fuzzy matching B merge the queries by using fuzzy matching C append the queries by using a lookup table D merge the queries by using a lookup table pause the video and give it a think and answer the question option b is the correct answer why merge the queries by using fuzzy matching is correct merge queries this is the correct operation when you want to join Two data sets based on a common column in this scenario you want to perform an inner join fuzzy matching fuzzy matching allows for approximate matching between the values
in the columns including ignoring spaces or slight variations this is particularly useful when data may have inconsistencies like Extra Spaces different casing or small typographical errors by merging the queries with fuzzy matching you can effectively ignore spaces in the values During the join operation achieving the desired outcome with minimal development effort why the other options are incorrect option A append the queries by using fuzzy matching appending queries this operation combines data sets by stacking one on top of the other similar to a union operation it does not join or merge based on a common column
while fuzzy matching can be used to match records with similar but not identical data Appending does not involve joining data on keys so fuzzy matching is not relevant here you need to perform an inner join not combine data sets by appending them option C append the queries by using a lookup table this is not a common operation lookup tables are generally used to find corresponding values in another table EG matching IDs to names but appending queries would simply combine the data sets without any merging orch ing logic this option Doesn't apply to the scenario where
you need to join tables based on a common column option D merge the queries by using a lookup table while merging is the correct approach for performing an inner join using a lookup table does not directly address the issue of ignoring spaces in the values a lookup table is typically used to find a corresponding value but doesn't handle the issue of spaces or fuzzy matching this approach wouldn't ignore spaces or Perform fuzzy matching which is necessary to correctly join the data in this scenario question 80 you have a fabric tenant that contains a warehouse named
Warehouse 1 Warehouse 1 contains two schemas named schema 1 and schema 2 and a table named schema 1. City you need to make a copy of schema 1. City in schema 2 the solution must minimize the copy of of data which tsql statement should you run pause the video and go through the Answer options and give it a think and check the answer the correct answer is option C create table schema 2. city as clone of schema 1. city is correct because it creates a new table with the same structure as schema 1. City without
duplicating the data immediately minimizing data copying option A fully duplicates the data from schema 1. City into schema 2. city which doesn't minimize data copying so it is incorrect option b also fully copies the Data into the new table duplicating it and not minimizing data copying which is also incorrect option D similar to a and b this option duplicates all data from schema one city not minimizing data copying question 81 you have a a fabric tenant that contains a Lakehouse named Lakehouse 1 you need to prevent new tables added to Lakehouse 1 from being added
automatically to the default semantic model of the Lakehouse what should you configure a the SQL analytics endpoint settings B the semantic model settings C the workspace settings D the lake house one settings pause the video and take a moment to consider the answer options carefully before checking the correct one the correct answer is a the SQL analytics endpoint settings if you thought it was option b that's understandable the semantic model settings do manage the behavior and Structure of the model but they don't control the synchronization between Lakehouse tables and the powerbi semantic model this synchronization
is handled in the SQL analytics endpoint settings these settings determine how data from The Lakehouse or Warehouse is exposed and synced with tools like powerbi since the default behavior of automatically adding tables to the semantic model has been disabled you'll Need to configure the SQL analytics endpoint settings to enable or prevent this process that's why option A is correct option C the workspace settings deals with broader aspects like permissions and capacity but it doesn't affect the automatic synchronization of tables to the power bi semantic model making it incorrect option D The Lakehouse settings focuses on
managing The Lakehouse itself such as data storage and security but it doesn't Control the interaction with the powerbi semantic model so it's also incorrect question 82 you have a fabric tenant that contains a workspace named workspace 1 workspace 1 contains a Lakehouse named Lakehouse 1 and a warehouse named Warehouse 1 you need to create a new table in Warehouse 1 named POS customers by querying the customer table in Lakehouse 1 how should you complete the tsql statement to answer select the appropriate options in the Answer area the correct answer for the first dropdown is create
table Debo pause customers as select this option is used to create a new table deboos customers based on the results of a select query it is the appropriate choice when you want to create a table from the results of a query option create table DEA post customers is incorrect because this option would create an empty table with The specified name dbo POS customers it doesn't use a select statement to populate the table with data from another table which is required in this scenario option create table debut. customers as clone of is also in incorrect because
as clone of is used to create an exact copy of another table's structure and data however it's not appropriate here because the requirement is to create a new table based on a query from another table Lakehouse 1. Dbo do customer not to clone an existing table the second dropdown from Lakehouse 1. dbo do customer is correct since you need to query the customer table located in Lakehouse 1 this option correctly specifies the source of the data for the select statement option from dbo customer is incorrect because this option would refer to a customer table within
the same database likely the warehouse one but it doesn't specify The Lakehouse 1 as the source the question Explicitly states that the data needs to be queried from Lakehouse 1 option from dbo Doos customers this option incorrect references the table you are supposed to create dbo Doos customers as the source since this table does not exist yet it can't be the data source for the new table creation thus this is an incorrect option question 83 you have a fabric tenant you are creating an Azure data Factory pipeline you have a stored Procedure that Returns the
number of active customers and their average sales for the current month you need to to add an activity that will execute the stored procedure in a warehouse The Returned values must be available to the downstream activities of the pipeline which type of activity should you add a switch B copy data C append variable D look up you'll find this question straightforward if you've worked with Azure data Factory or synapse pipelines The correct answer is option D lookup this activity is used to execute a stored procedure or query and retrieve the results making them available for
Downstream activities in the pipeline it's ideal when you need to fetch data like the number of active customers and their average sales and pass it to other activities in your pipeline option A the switch activity is used for branching logic based on conditions not for executing stored Procedures or retrieving data option b the copy data activity is used for data movement example copying data between data stores not for executing stored procedures or retrieving output for use within the pipeline option C append variable this activity appends values to an existing variable in the pipeline it does
not execute stored procedures or retrieve data for use by Downstream activities question 84 you have a fabric tenant that contains two workspaces Named workspace 1 and workspace 2 workspace 1 contains a Lakehouse named Lakehouse 1 workspace 2 contains a Lakehouse named Lakehouse 2 Lakehouse 1 contains a table named dbo do sales Lakehouse 2 contains a table named db. customers you need to ensure that you can write queries that reference both dbo do sales and db. customers in the same SQL query without making additional copies of the tables what should use a a shortcut b a
data flow C A View d a Managed table pause the video and give it a think and check the answer the correct answer is a shortcut in Microsoft fabric a shortcut allows you to reference data from one workspace or Lakehouse in another without copying the data this enables you to query both db. sales from Lakehouse 1 and dbo customers from Lakehouse 2 in the same SQL query seamlessly all other options are incorrect because option b data flows are used for transforming and loading Data not for querying tables across different workspaces without copying data option C
A View would allow you to create a virtual table based on a query but it still requires that the tables be in the same database or workspace option D a manag table stores data within a workspace but using it would involve making copies of the data which does not meet the requirement of avoiding additional copies question 85 you have a fabric tenant that contains a Warehouse you are designing a star schema model that will contain a customer Dimension the customer Dimension table will be a type two slowly changing Dimension SCD you need to recommend which
columns to add to the table The Columns must not already exist in the source which three types of columns should you recommend each correct answer presents part of the solution note each correct answer is worth one point a a foreign Key b a natural Key C an effective end date and time d a surrogate key e an effective start date and time pause the video and go through the answer options carefully and then check your answer the correct answer options are c d and e let's go through our options in order option A a foreign
key would typically exist in a fact table to link to the dimension table it is not a column type that is unique to handling type 2DS in the dimension table itself thus it is Incorrect option b a natural key is an identifier that comes from the source system and already exists in the source data the question asks for columns that do not already exist in the source so a natural key would not meet this requirement option C an effective end date and time in a type two slowly changing Dimension SCD an effective end date is
used to indicate when a particular version of a record becomes inactive this allows tracking the History of changes this makes it a correct Choice option D a surrogate key is a unique identifier for each record in the dimension table which is essential in a type 2D to uniquely identify different versions of the same entity eg a customer as the data changes over time this is another correct Choice option e an effective start date and time is necessary to record when a particular version of the record became active Allowing for accurate tracking of historical data in
the dimension therefore it's a correct choice question 86 you have a fabric tenant you plan to create a data pipeline named pipeline 1 pipeline one will include two activities that will execute in sequence you need to ensure that a failure of the first activity will not block the second activity which conditional path should you configure between the first activity and the second activity a upon failure B Upon completion C upon skip d upon success the correct answer is option b upon completion because this conditional path ensures that the second activity will execute regardless of whether
the first activity succeeds or fails it allows the pipeline to continue execution even if the first activity encounters an error the others are incorrect because a upon failure this path only triggers the Second activity if the first activity fails which does not meet the requirement to ensure the second activity runs regardless of the first activity's outcome C upon skip this path triggers the second activity only if the first activity is skipped which isn't applicable here since you want the second activity to run regardless of success or failure D upon success this path only triggers the
second activity If the first activity succeeds it would block the second activity from executing if the first one fails which doesn't satisfy the requirement to ensure the second activity runs regardless of the first activity's outcome question 87 you have a Microsoft powerbi semantic model you need to identify any surrogate key columns in the model that have the summarized by property set to a value other than To None the solution must minimize effort what should you use a Dax format matter in Dax Studio B model Explorer in Microsoft power bi desktop C model view in Microsoft
powerbi desktop D best practice analyzer in tabular editor the correct answer is D best practice analyzer in tabular editor this tool is specifically designed to automatically check your powerbi model against best practices including identifying any surrogate key columns with the summarize by property set to something other than none it efficiently Pinpoints potential issues in your model saving you the effort of manually inspecting each column here's why the other options are not correct a Dax formatter in Dax Studio while Dax studio is excellent for formatting and analyzing Dax code it isn't intended for inspecting the properties
of columns in a semantic model therefore it wouldn't help in identifying the summarize buy settings for surrogate Keys B model Explorer in Microsoft powerbi desktop although you can use the model Explorer to inspect your model it doesn't provide the automated checks or best practice recommendations like the best practice analyzer this means you would need to manually search for any surrogate key columns with incorrect summarization settings Which is less efficient C model view in Microsoft power bi desk desktop Similar to the model Explorer the model view allows you to manually examine the model however it lacks
the automated best practice checks offered by the best practice analyzer as a result identifying issues with summarization settings would require more manual effort making this option less ideal question 88 you have a fabric tenant that contains a Microsoft powerbi report named report one report one is slow to render you suspect that an inefficient Dax query is being executed you need to identify the slowest Dax query and then review how long the query spends in the formula engine as compared to the storage engine which five action should you perform in sequence pause the video and go
through the answer options carefully and then check your answer here is the answer options in sequence one from performance analyzer capture a recording start by capturing The performance data using the performance analyzer in powerbi desktop two sort the duration milliseconds column in descending order by Dax query time sort the results to identify the slowest Dax query three copy the first query over to Dax Studio take the slowest query and copy it into DAC studio for further analysis four enable query timing and server timings and run the query to analyze it in Detail five view the
server timings tab finally examine the server timings tab to review how long the query spends in the formula engine compared to the storage engine if you caught this answer wrong and for a step-by-step walk through check out the video Linked In the description you have a fabric tenant that contains a semantic model the model contains 15 tables you need to program automatically change each column that Ends in the word key to meet the following requirements what should you use a Microsoft power bi desktop B Alm toolkit C tabular editor D DAC Studio pause the video
and give it a think and check your answer the correct answer is C tabular editor because tabular editor is a powerful tool designed for managing and modifying powerbi and Analysis Services models it allows you to programmatically apply changes to multiple columns such as Hiding them setting properties like nullable summarize by available in MDX and marking columns as key columns you can write a c script or use Advanced object level actions within tabular Editor to apply these settings across all relevant columns in your model efficiently all other options are incorrect a Microsoft power B bi desktop
is primarily used for building and designing reports and while you can manually adjust column properties it is Not designed for batch processing or scripting changes across multiple columns it would require a lot of manual effort to apply these changes to each column individually B Alm toolkit is mainly used for comparing and deploying changes between different powerbi models or analysis Services projects it does not offer the same level of program ability or scripting capabilities as tabular editor for modifying multiple columns within a Model D Dax studio is a Tool primarily used for writing formatting and analyzing
Dax queries it does not provide features for programmatically changing column properties in a semantic model question 90 you have a Microsoft powerbi semantic model you plan to implement calculation groups you need to create a calculation it item that will change the context from the selected date to month to date MTD how should you complete the Dax expression to answer select the Appropriate options in the answer area take a moment to think and verify your answer the correct answers are here here's the reasoning calculate this function is used to change the context of the calculation which
is necessary when you want to apply a filter or modify the current filter context as in the case of changing to month to date MTD selected Measure this function dynamically refers to the measure selected in the calculation group when creating calculation items within a calculation group selected measure is used to refer to the measure being affected by the calculation group dates MTD date date this function returns a table containing the dates from the beginning of the month up to the last date in the current context by using it within calculate it adjusts the Context to
be month to date why other options are incorrect generate this function is used to generate combinations of rows from two tables and is not suitable for filtering contexts as needed here measure this is a function for creating a measure in Dax but it's not used within the context of calculate to refer to a measure dynamically combine values this function concatenates values which is irrelevant to the task of changing the context to Month to date selected value this function Returns the value of a column in the current context not a measure it's not used for dynamic
measure selection within calculation groups the correct selection aligns with the task of of creating a calculation item for a month-to-date context using a calculation group in powerbi question 91 you have a Microsoft powerbi report named report 1 that uses A fabric semantic model users discover that report one renders slowly you open performance analyzer and identify that a visual named orders by date is the slowest to render the duration breakdown for orders by date is shown in the following table what will provide the greatest reduction in the rendering duration of report on one a enable automatic
page refresh B optimize the Dax query of orders by date by using Dax Studio C change the visual type of Orders by date D reduce the number of visuals in report one take a moment to think and verify your answer the correct answer is option D this choice is based on the fact that in the provided scenario the other category in the performance analysis takes up the most time which often correlates with the overall complexity of the report including the number of visuals reducing the number of visuals can significantly decrease the Workload and improve the
reports rendering performance let's look at the other options a enable automatic page refresh this option would not reduce the rendering time instead it could increase the load on the report and slow it down further especially if the refresh frequency is high thus it is an incorrect answer Choice option b optimize the Dax query of orders by date by using Dax Studio The Dax query only takes 27 milliseconds which is a minimal part of the total rendering time so optimizing it wouldn't have a significant impact therefore it is an incorrect answer Choice option C change the
visual type of orders by date while this might help in some cases the performance issue here is likely due to the overall complexity of the report as indicated by the other category rather than the specific visual type making this an incorrect Choice question 92 you have a custom direct Lake semantic model named model one that has 1 billion rows of data you use tabular Editor to connect to model one by using the xmla endpoint you need to ensure that when users interact with reports based on model one their queries always use direct Lake mode what
should you do a from model configure the default mode option b from partitions configure the mode option C from model configure the storage location option D From model configure the direct Lake Behavior option take a moment to think and verify your answer from model configure the direct Lake Behavior option D is the correct choice because configuring the direct Lake Behavior ensures that the model uses direct Lake mode for queries and fallback to direct query mode is disabled why other options are incorrect A from model configure the default mode option this sets the default mode for
new tables not for Enforcing direct Lake mode in existing queries B from partitions configure the mode option configuring partitions is not specific to ensuring direct Lake mode across the entire model C from model configure the storage location option this deals with physical storage not with enforcing query modes so it is an incorrect answer Choice question 93 you create a semantic model by using Microsoft powerbi desktop the model contains One Security role named sales region manager and the following tables sales sales region sales address you need to modify the model to ensure that users assigned the
sales region manager role cannot see a column named address in sales address which three actions should you perform in sequence to answer move the appropriate actions from the list of actions to the answer area and arrange Them in the correct order please pause the video now review the action list carefully give it some thought and then check the answer this is the correct actions in sequence open the model in tabular editor this allows you to access advanced settings like object level security OS select the address column in sales address identify the specific column you want
to secure Set object level security to none for sales region manager this ensures that the address column is not visible to users in the sales region manager role this sequence ensures that the column is hidden from view based on the security role to configure object level security using tabular editor you can visit the official documentation or tutorial question 94 you have a Microsoft powerbi semantic model that contains measur measures the measures Use multiple calculate functions and a filter function you are evaluating the performance of the measures in which use case we're replacing the filter function
with the keep filters function reduce execution time a when the filter function uses a nested calculate function B when the filter function references a measure C when the filter function references columns from multiple tables d when the filter function references a Column from a single table that uses import mode pause the video and give it a think and check the answer the correct answer is D when the filter function references a column from a single table that uses import mode this is because using keep filters optimizes measure definitions by employing Boolean Expressions rather than table
Expressions import mode where data is pre-loaded into memory benefits significantly from this setup by Reducing execution times keep filters ensures that any existing filters on the column are preserved enhancing the performance of the query if you see the documentation says this article is especially relevant for model calculations that apply filters to import tables and if you see in this example it says the calculate function accepts a table expression return by the filter Dax function which evaluates its filter expression for each Row of the product table it achieves the correct result the sales result for red
products however it could be achieved much more efficiently by using a Boolean expression here's an improved measure definition which uses a Boolean expression instead of the table expression the keep filters Dax function ensures any existing filters applied to the color column are present oberved and not overwritten it's recommended you pass filter arguments as Boolean Expressions whenever possible it's because import model tables are inmemory column stores they are explicitly optimized to efficiently filter columns in this way why other options are incorrect A when the filter function uses a nested calculate function using keep filters in this
scenario doesn't inherently improve performance the complex of nested calculations is not mitigated by keep filters thus it is incorrect option b when the filter Function references a measure keep filters is designed to handle filter contexts on columns not measures so it won't reduce execution time in this case so this is also an incorrect option C when the filter function references columns from multiple tables keep filters is typically more effective on single table filters multi-table references may not benefit as much and could even complicate the filter context therefore this is also an incorrect Option question 95
you have a semantic model named model one model one contains five tables that all use import mode model one contains a dynamic row level security RLS role named HR the HR role filters employee data so that HR Manager ERS only see the data of the department to which they are assigned you publish model one to a fabric tenant and configure RLS role membership you share the model and related reports to users an HR Manager Reports that the data they see in a report is incomplete what should you do to validate the data seen by the
HR manager a select test as role to view the data as the HR role B filter the data in the report to match the intended logic of the filter for the HR department C select test as role to view the report as the HR manager D ask the HR manager to open the report in Microsoft power bi desktop Pause the video and give it a think and check the answer the correct answer is C select test as role to view the report as the HR manager this option is preferred because it enables precise simul ation
of the HR manager's experience showing exactly what data they see under their specific RLS settings if you thought option a select test as role to view the data as the HR Ro while useful does not cater to Specific user experiences testing as the HR manager rather than just the role allows for a more tailored review addressing user specific views and potential discrepancies more effectively let's see why other options are incorrect B filter the data in the report to match the intended logic of the filter for the HR department manually filtering the data doesn't replicate the
actual RLS Behavior as it Applies to specific users D ask the HR manager to open the report in Microsoft powerbi desktop this approach doesn't help in diagnosing the issue because the RLS settings May differ between powerb desktop and the service question 96 you have a Microsoft fabric tenant that contains a data flow you are exploring a new semantic model from Power query you need to view column information as shown in the following exhibit which three Data view options should you select each correct answer presents part of the solution a show colum value distribution B enable
details pain C enable column Prof profile D show column quality details e show column profile in details pain I suggest pausing the video to give it some thought and rethink before checking the answer the correct answers are a c and d let's rule out options B and E first if you look at the exhibit you'll notice that the details pane is Not visible so options B and E are not applicable now let's discuss why a c and d are correct correct option C enable column profile this option is necessary because it activates the column data
profiling tools which would otherwise be grayed out without enabling the column profile you wouldn't be able to access the data profiling features option D show column quality details this feature is visible in the Exhibit because we see the categories such as valid green error red empty dark gray option a show column value distribution this feature provides visuals under the column names showing the frequency and distribution of values in each column the data is sorted in descending order based on frequency which is exactly what we see in the exhibit if you got this answer wrong I
recommend reviewing the documentation on Data profiling tools or pausing the video to read through it carefully question 97 you have a fabric Warehouse that contains a table named sales products sales products contains the following columns you need to write a tsql query that will return the following columns how should you complete the code to answer select the appropriate options in the answer area this is a simple and straightforward tsql query the correct answer is Here first for the highest selling price column we want to return the highest value among three prices list price wholesale price
and agent price to achieve this the correct function to use is greatest this function compares the values and gives us the highest one other options like coals ifif or Max aren't suitable here because coals Returns the first non-null value among its arguments but does not compare their values iif a conditional function not Suitable for finding the highest value and Max works on aggregate data in a group not for comparing values within a single row next for the trade price column our goal is to return agent price if it exists if not we move to wholesale
price and if that's also null we fall back to list price the perfect function for this logic is coales which Returns the first non-null value in the sequence using Choose Returns the item at the specified index from a list of values in SQL Server which does not meet the condition of prioritizing agent price then wholesale price and finally list price iif a conditional function that requires a true false condition making it less suitable for this purpose Max again this function Aggregates data and is not suitable for this purpose question 98 you have a fabric notebook
That has the python code and output shown in the following exhibit which type of analytics are you performing a descriptive B diagnostic C prescriptive D predictive pause the video and give it a think and then check the answer the correct answer is a descriptive because in this type of analytics it involves summarizing historical data to understand what has happened in this case the histogram is summarizing the distribution of tip Amounts which is a classic example of descriptive analytics you're looking at past data and presenting it in a way that helps you understand the distribution of
tips why the other options are incorrect B diagnostic analytics this type of analytics goes beyond description and tries to understand why something happened it involves analyzing data to identify patterns or relationships that explain The results the histogram doesn't explore why the tip amounts vary it just shows how they are distributed C prescriptive analytics this involves recommending actions based on the data prescriptive analytics might use optimization algorithms to suggest the best course of action the histogram doesn't suggest any actions it only describes the data D predictive analytics Predictive Analytics uses historical data to make Predictions about
future events techniques like regression analysis machine learning or time series forecasting are used the histogram doesn't predict future tip amounts it just shows past data the easy way to remember the differences is descriptive what happened summarizing past data diagnostic why did it happen finding cause or reasons predictive what will happen forecasting future outcomes prescriptive what should we do Recommending actions by associating each type of analytics with a key question you can more easily remember their purposes and when to apply them question 99 you have a fabric Warehouse that contains a table named sales. orders sales.
orders contains the following columns you need to write a tsql query that will return the following columns how should you complete the code to answer select the appropriate options in the answer area pause the video and Review the code and then check the answer to determine the correct answers for the query let's analyze each drop-down option for the period date column the requirement is to return a date representing the first day of the month for order date the correct function to use here is date trunk month order date because this function truncates the order date
to the start of the specified date part in this Case the month now let's consider the incorrect options date bucket is designed to group data into buckets based on intervals which isn't suitable for getting the first day of the month date from Parts constructs a date from Individual Parts like year month and day but it doesn't truncate to the first day of the month and date part extracts a part of the date such as the year or month but it doesn't provide the first day of the Month next let's move to the second dropdown for
day name column the requirement here is to return the name of the day for order date such as Wednesday the correct choice is date name weekday order date this function Returns the name of the day for a given date which is exactly what we need need now let's take a look at the incorrect options date num day order date this Returns the day of the month As a number not the name of the day date num day of year order date this Returns the day of the year which is the sequential Day Count from the
start of the year to help clarify here's an example showing how the output would look for each of these options I understand that these functions can be a bit confusing I recommend reviewing the provided resources before your exam to solidify your understanding question 100 You have a fabric tenant that contains Json files in one Lake the files have 1 billion items you plan to perform time series analysis of the items you need to transform the data visualize the data to find insights perform anomaly detection and share the insights with other business users the solution must
meet the following requirements use parallel processing minimize the duplication of data minimize how long it takes to load The data what should you use to transform and visualize the data a the ppar library in a fabric notebook B the pandas library in a fabric notebook c a Microsoft powerbi report that uses core visuals p pause the video and give it a think and check the answer the correct answer is option A the P spark library in a fabric notebook is the best choice for this scenario because it meets all the key requirements parallel processing Pi
Spark is built on aaty spark which is designed for distributed computing and can efficiently handle large data sets by processing them in parallel across multiple nodes this makes it ideal for working with 1 billion items minimizing data duplication pypar can work directly with data in one Lake minimizing the need to duplicate data for processing it can efficiently read process and analyze large Json files without creating Unnecessary copies of the data minimizing load time P spark is optimized for large-scale data processing and can handle large data sets quickly reducing the time it takes to load and
transform the data for analysis why the other options are incorrect B the Panda's library in a fabric notebook this causes scalability issues while pandas is a powerful data manipulation library but it is not designed for parallel processing or Handling very large data sets like 1 billion items it operates in memory on a single machine which would lead to significant performance bottlenecks and long processing times and it would cause resource constraints as it would struggle with memory limitations and would not efficiently process such a large data set compared to pypar c a Microsoft powerbi report that
uses core visuals though powerbi is primarily a tool for data visualization And Reporting while it can handle some data Transformations it is not suitable for performing large- scale data Transformations or time series analysis directly on a billion items and powerbi does not offer the parallel processing capabilities needed to efficiently manage and analyze such a large data set question 101 you have a fabric tenant that contains customer churn data stored as parket files in one Lake the data Contains details about customer demographics and product usage you create a fabric notebook to read the data into a
spark data frame you then create column charts in the notebook that show the distribution of retained customers as compared to Lost customers based on geography the number of products purchased age and customer tenure which type of analytics are you performing a diagnostic B descriptive C prescriptive D predictive pause the Video and take a moment to think it through the correct answer is B descriptive B Des descriptive descriptive analytics this type of analytics involves summarizing and describing historical data to understand what has happened in this scenario you are creating column charts to show the distribution of
retained versus lost customers based on various factors like geography the number of products purchased age and customer Tenure these charts help you understand the current state or past behavior of your customers which is the essence of dis descriptive analytics why the other options are incorrect a diagnostic purpose diagnostic analytics goes a step further than descriptive analytics by attempting to explain why something happened it involves analyzing data to identify patterns relationships or Causes behind an event it is inapplicable here because in your scenario you're only summarizing and visualizing the data descrip not analyzing the underlying reasons
for customer churn diagnostic C prescriptive prescriptive analytics involves recommending actions based on the data it typically uses optimization techniques simulations or algorithms to suggest the best course of action it is Inapplicable here because you are not recommending any actions or decisions based on the data You're simply summarizing and visualizing it D predictive Predictive Analytics uses historical data to make predictions about future events this often involves using statistical models or machine learning to forecast future outcomes it is inapplicable here because you are not making any predictions or forecasts About future customer Behavior you are only describing
past and current data the easy way to remember the differences is descriptive what happened summarizing past data diagnostic why did it happen finding causes or reasons predictive what will happen forecasting future outcomes prescriptive what should we do recommending actions by associating each type of analytics with a key question you can more easily remember their Purposes and when to apply them question 102 you have a fabric tenant that contains a semantic model the model contains data about retail stores you need to write a Dax query that will be executed by using the xmla endpoint the query
must return the total amount of sales from the same period last year how should you complete the Dax expression to answer select the appropriate options in the answer area pause the video and review the code and Check the answer the correct answer for first dropdown is calculate and second dropdown is underscore Li sales the calculate function is used to modify the filter context of a calculation allowing you to evaluate an expression under a specific set of conditions in this case it's used to calculate the total sales for the same period last year which is exactly
what you need Li sales this is the variable that Stores the calculated value of total sales for the same period last year returning Li sales at the end ensures that the calculated result is output by the query why the other options are incorrect first dropdown calculate table this function returns a table rather than a single value since you are interested in calculating a single scalar value for total sales calculate table is not appropriate here filter this function returns a table of Filtered rows not a calculated value it's not used to directly calculate the total sales
summarize this is used to create a summary table not for calculating a specific scalar value like total sales summarize columns similar to summarize this function creates a table rather than calculating a scalar value second dropdown total sales this would return the total sales for the current context not the calculated value from the same period last Year Li sales this syntax with curly braces is not valid in this context and would cause an error v v is a keyword used to declare variables in Dax and cannot be returned as a value question 103 you have a
fabric workspace named workspace 1 that contains a data flow named data flow 1 data flow 1 returns 500 rows of data you need to identify the minimum and maximum values for each column in the query results which three Data view options should you select each correct answer presents part of the solution a show column value distribution B enable colum profile C show column profile in details pain D show column quality details e enable details pain I suggest pausing the video to give it some thought and rethink before checking the answer the correct answers are b
c and e let's discuss why these options are correct option b enable column profile firstly this Option is necessary because it activates the column data profiling tools which would otherwise be gray out without enabling the column profile you wouldn't be able to access the data profiling features option C show column profile in details pane this option displays detailed column profiling information in the details pane including the mine and Max values for each column and option e enable details pane this is required to view the column profile in the details Pane without enabling this you wouldn't
be able to see the detailed column statistics let's see the incorrect options option A show column value distribution this feature provides visuals under the column names showing the frequency and distribution of values in each column which is not what we need option D column quality this focuses on data quality metrics like valid error and empty percentages which are not directly related to finding the M and Max values here's a summary of the key data profiling Tools in power query question 104 you have a fabric tenant that contains a Microsoft powerbi report you are exploring a
new semantic model you need to display the following column statistics count average null count distinct count standard deviation which power query function should you run a table schema B table view c table fuzzy Group D table profile this is a Straightforward question and the correct answer is option D table profile this function generates detailed statistics for each column in a table including count average null count distinct count and standard deviation it is specifically designed for profiling data and obtaining these types of summary statistics why the other options are incorrect table schema this function Returns the
schema Information of a table such as column names and data types but it does not provide statistical information like counts or averages table view this function is used to create a virtual table or view of data which can be used to control how data is accessed but it does not generate column statistics table fuzzy group this function is used for grouping data based on fuzzy matching criteria useful for handling data with slight variations or Misspellings but it does not Prov provide statistical summaries note this question is part of a series of questions that present the
same scenario each question in the series contains a unique solution that might meet the stated goals some question sets might have more than one correct solution While others might not have a correct solution after you answer a question in this section you will not be able to return to it as a result these Questions will not appear in the review screen this is one of the questions I missed in the question series of 57 58 and 59 so here it is you have a fabric tenant that contains a Lakehouse named Lakehouse 1 Lakehouse 1 contains
a Delta table named customer when you query customer you discover that the query is slow to execute you suspect that maintenance was not performed on the table you need to identify whether maintenance tasks were Performed on customer solution you run the following spark SQL statement describe detail customer does this meet the goal a yes B no the answer is no running the described detail customer command in spark SQL gives you metadata about the Delta table such as file count table size and the last operation Tim stamp however it doesn't indicate whether maintenance tasks like vacuuming
compaction or Optimization were performed to check if maintenance tasks have been executed use the describe history customer command this provides a detailed history of all operations on the table including those considered maintenance tasks if you missed the question series I recommend watching them for more insights note this question is part of a series of questions that present the same scenario each question in the series Contains a unique solution that might meet the stated goals some question sets might have more than one correct solution While others might not have a correct solution after you answer a
question in this section you will not be able to return to it as a result these questions will not appear in the review screen this is one of the questions I missed in the question series of 54 55 and 56 so here it is you have a fabric tenant that Contains a new semantic model in one Lake you use a fabric notebook to read the data into a spark data frame you need to evaluate the data to calculate the min max mean and standard deviation values for all the string and numeric columns solution you use
the following pypar expression DF do explain do show does this meet the goal a yes B no the correct answer is no because firstly this code would result in an error because the explain method is used to Print the execution plan of a data frame and does not return a data frame itself so it cannot be chained with show Even if we consider using explain it is incorrect for this task because DF explain only displays the execution plan and does not calculate or display statistics like M Max mean or standard deviation for the columns the
expression DF do summary is suitable for achieving the goal of evaluating the data to calculate the minin max mean and Standard deviation values for all the string and numeric columns in a data frame and DDF do describe computes basic statistics for numeric and string columns if you missed the question series I recommend watching them for more insights this question is part of a series of questions that present the same scenario each question in the series contains a unique solution that might meet the stated goals some question sets might have more than one Correct solution While
others might not have a correct solution after you answer a question in this section you will not be able to return to it as a result these questions will not appear in the review screen question 107 you have a fabric tenant that contains a semantic model named model one you discover that the following query performs slowly against model one you need to reduce the execution time of the query solution you replace line four By using the following code I empty related table order item does this meet the goal a yes B no pause the video
and give it a think and check the answer the correct answer is no the replacement with ism related table order item does not significantly reduce execution time both ISM and contros perform similar operations and using iempty doesn't inherently optimize performance in this context therefore the solution does not meet the goal of reducing the query's Execution time question 108 solution you replace line four by using the following code not is empty calculate table order item does this meet the goal a yes B no pause the video and give it a think and check the answer the
correct answer is yes using not is empty calculate table order item effectively filters out customers with no related order item rows this approach is efficient because it directly checks if there are any related rows in the order item table and Returns true only if related rows exist this meets the goal of reducing the query's execution time by simplifying the condition making the query more efficient question 109 you replace line four by using the following code calculate count rows order item greater than or equal to zero does this meet the goal a yes B no the
correct answer is no using calculate count rows order item greater than or equal to zero is logically incorrect for Filtering because count rows will always return a non- negative number including zero this condition will always evaluate to True thus not effectively filtering out customers with no orders the original intent of the query is to filter out customers with no related order item rows and this condition does not achieve that therefore it does not meet the goal of reducing the query's execution time effective ly if you've completed this playlist I Want to congratulate you on your
dedication I believe you're ready and confident for the exam best of luck please leave a comment if this playlist helped so I can create more content like this for you