Thursday, June 18, 2026

DAIS 2026 — Announcements, Business & Productivity Impact

 DAIS 2026 — Announcements, Business & Productivity Impact

Ordered by business + productivity impact (broadest reach / biggest value first). 

Ranking is a judgment call: #1–#9 are broad-reach/platform items; #10–#13 are high-value but more vertical/role-specific; #14–#20 are strategic, incremental, or not-yet-GA.

#

Announcement

Short description

Business / productivity impact

URL

1

Genie One

GA "data-smart AI coworker" — single NL UI to query data, view dashboards, run apps; native in Slack/Teams + mobile

Puts self-serve data + action in every business user's hands; collapses tool sprawl into one conversational front door

https://www.databricks.com/blog/introducing-genie-one-genie-ontology-and-genie-agents

2

Unity AI Gateway

One runtime governance layer over models, agents, MCP, tools — access policies, guardrails, spend caps, routing, tracing

The control plane that lets enterprises actually run agentic AI in production with cost + risk under control

https://www.databricks.com/blog/ai-governance-data-ai-summit-2026-whats-new-unity-ai-gateway

3

Genie Code

Autonomous data-engineering agent — builds pipelines, debugs, ships dashboards, maintains prod; >2× leading coding agents

Large productivity lift for data teams; less hand-written notebook work, faster delivery

https://www.databricks.com/company/newsroom/press-releases/databricks-launches-genie-code-bringing-agentic-engineering-data

4

Agent Bricks (+ Sandboxes)

Production agent platform — any model/harness, secure VM sandboxes, deploy/eval/monitor; 100k+ agents, 1+ quadrillion tokens/yr

Makes production agents buildable + safe at enterprise scale (token, deploy, security, eval)

https://www.databricks.com/blog/agent-bricks-dais-2026

5

LTAP

Lake Transactional/Analytical Processing — OLTP + OLAP + streaming on one copy of storage, no ETL/replicas; on Lakebase

Removes the transactional↔analytical copy + CDC pipelines — major architecture simplification and cost/TCO reduction

https://www.databricks.com/company/newsroom/press-releases/databricks-launches-ltap-first-lake-transactionalanalytical

6

Genie Agents

Domain-specific autonomous agents spun up from a single prompt; act via MCP, schedules, doc-gen, external writes

Lowers agent-building to a prompt while keeping governance — broad automation of multi-step work

https://www.databricks.com/blog/introducing-genie-one-genie-ontology-and-genie-agents

7

Lakehouse//RT (Reyden)

Real-time lakehouse; ms latency (10ms–sub-100ms) at 10,000s of concurrent users/agents on Delta + Iceberg; up to 16× faster

Real-time analytics + agent serving without a separate serving tier — new low-latency use cases, less infra

https://www.databricks.com/company/newsroom/press-releases/databricks-launches-lakehousert-bring-real-time-analytics-directly

8

Genie Ontology

Automatic business knowledge graph / context layer; authority-ranked, UC-ACL governed; 84.5% vs 52.4% first-attempt accuracy

Foundational accuracy boost for every Genie surface — trustworthy NL-to-answer at scale

https://www.databricks.com/blog/introducing-genie-one-genie-ontology-and-genie-agents

9

Genie ZeroOps

Background agent that monitors, investigates, root-causes + sandbox-tests fixes (approval-gated, no auto-apply to prod)

Cuts ops toil + MTTR by automating the analysis-heavy part of incident response

https://www.databricks.com/blog/introducing-genie-zeroops

10

Industry Data Models

Pre-built, rule-validated, Silver-ready models for 40+ industries; model.json → governed UC Delta tables

Compresses industry data modeling from months to hours — fast vertical time-to-value

https://www.databricks.com/blog/jumpstart-your-data-modeling-databricks-industry-data-models

11

CustomerLake

Agentic CDP — Customer360, identity resolution, Profile + Campaign agents; open activation ecosystem

Replaces a separate CDP for marketing orgs; always-on personalization at scale on governed data

https://www.databricks.com/company/newsroom/press-releases/databricks-enters-marketing-industry-customerlake-agentic-customer

12

Unity Catalog: Glossary / Domains / Governance Hub

Business glossary, domain marketplace, central governance console; cross-cloud/region namespace, ABAC/RBAC

Shared governed meaning for people + agents and one console to run governance at scale

https://www.databricks.com/blog/whats-new-unity-catalog-data-ai-summit-2026

13

Lakewatch

Open, agentic SIEM — defensive security agents (Claude + Genie) for triage/detection/response; Antimatter + SiftD.ai

Agentic SOC at lakehouse scale — less alert fatigue, faster threat response

https://www.databricks.com/company/newsroom/press-releases/databricks-enters-security-market-launch-lakewatch-new-open-agentic

14

Migrate & Modernize

Specialized partner program — AI-assisted migrate + modernize (not lift-and-shift), code conversion + DQ validation

Shorter migrations, lower double-bubble cost, earlier value; prevents tech debt relocating

https://www.databricks.com/blog/skip-learning-curve-rethinking-data-migration-real-outcomes

15

OpenSharing / Storage Ecosystem

Open, vendor-neutral protocol (→ Linux Foundation) to query/share data + AI agent skills in place, no migration

Unlocks previously inaccessible data + cross-org skill sharing without copies — "govern, don't migrate"

https://www.databricks.com/company/newsroom/press-releases/databricks-announces-opensharing

16

Panther acquisition

Intent to acquire Panther (AI SOC) — 100+ integrations, detection-as-code, agentic SOC; 3rd security buy

Consolidates the "security lakehouse" category vs Splunk/CrowdStrike

https://www.databricks.com/company/newsroom/press-releases/databricks-agrees-acquire-panther-further-establishing-security

17

Azure / Genie connectors

Genie for Teams + M365 Copilot (Beta); Excel add-in (PP) + native Excel ingestion (GA); managed SharePoint connector (Beta)

Brings data/agents into everyday Office tools + adds managed ingestion for ubiquitous Office/SharePoint sources

https://www.databricks.com/blog/unifying-data-and-governance-agentic-era-whats-new-azure-databricks

18

Omnigent

Open-source (Apache 2.0) agent meta-harness — compose/control/share agents across harnesses; Managed Omnigent (Beta)

Standardizes governance across a heterogeneous agent ecosystem; developer-centric, early (alpha/OSS)

https://www.databricks.com/blog/introducing-omnigent-meta-harness-combine-control-and-share-your-agents

19

Genie App Builder

NL building of data apps / Databricks Apps via the Genie family (exact scope unconfirmed)

Potential to collapse app-building to a conversation — impact pending confirmed scope

https://www.databricks.com/blog/next-generation-databricks-genie

20

Databricks Ontology (teaser)

Native semantic/knowledge-graph layer over the lakehouse

Forward-looking; impact unclear until GA (Genie Ontology is the first productized piece)

???

Notes

  • A few items share a source blog (the Genie One / Agents / Ontology launch post).
  • #20 is a teaser with no standalone announcement URL.

Tuesday, October 15, 2024

Data Models in the Lakehouse

This is an excerpt from an upcoming whitepaper on Lakehouses.

Data Models

A common question is where do things like common data models reside within the Lakehouse? There are three basic answers. Gold, Silver, and Silver/Gold.

One answer is that all common data model items should be put in the Gold zone. This is because they were created for a specific purpose, namely, to enjoy the benefits of standardization. For example, healthcare has many common data models such as Observation Health Data Sciences’ (OHDSI) Observational Medical Outcomes Partnership (OMOP), Health Level 7’s (HL7) Fast Healthcare Interoperability Resources (FHIR), and Clinical Data Interchange Standards Consortium (CDISC). This is in addition to Data Vault, Snowflake Schema, and other modeling approaches, as shown in Figure 4.


Figure 4: Gold-Focused Data Modeling

The counterargument is that while they are created for a named reason, that reason is vague, so they should be in the Silver zone as they are fit for use but not created for a specific question, as shown in Figure 5.


Figure 5: Silver-Oriented Data Modeling

The first two answers have been discussed; they belong in the Gold or Silver zone. The answer I discussed in my book, Databricks Lakehouse Platform Cookbook (https://www.amazon.com/dp/9355519567), is that the decision depends on whether the model in question was created to be served or to act as an intermediate construct, as shown in Figure 6.



Figure 6: Use-Driven Modeling Approach

We have discussed several approaches to using data modeling techniques within a Lakehouse. This approach can be extended to cover any items created in a specific way, but not necessarily for a specific business purpose.

Data Modeling Recommendation

I recommend a pragmatic approach. From a technical perspective, it does not matter if you place them in Silver or Gold or a mixture of the two. Both Zones are fit for use. Instead, pick an approach (document it) and stick with it. This guidance is also recommended for other design decisions, such as mapping workspaces, environments, business functions, and other criteria to Unity Catalog Catalogs. If you find yourself (and your organization) engaged in multiple meetings belaboring this topic, review the approach I recommended in my book (discussed in the previous section). Gold items are typically more focused on consumption than Silver. As such, if your modeling output is directly consumable in a performant fashion – call it Gold; otherwise, put it in Silver.

Thursday, September 26, 2024

Lakehouse: You’re probably doing it wrong! How the Lakehouse should really work

Databricks has been championing the Lakehouse and Medallion Architectures for some time. While the approach is familiar, people’s understandings often differ from best practices. This discussion aims to clarify how to use the Medallion Architecture and discuss best practices.

Medallion Architecture in the Lakehouse

At this point, most people have heard of the Medallion architecture. The terms Bronze, Silver, and Gold are intuitive and gaining adoption. That said, often people assume bronze maps to raw, silver to refine/curate/etc., and gold to serve. This is normal, but it is important to understand that the Lakehouse is new and different. Figure 1 contains a visual representation of the typical flow. A key concept is that we prefer to skip landing data from a source system in a raw/landing zone. Instead, the preference is to connect to the source system and land the data directly in the Bronze zone.




Figure 1: Medallion Architecture

Bronze is an append-only Delta table. It serves to capture and memorialize the history of a given data asset. Assuming the system being read includes updates and supports incremental extraction, Bronze will contain duplicates based on the table’s primary key. Additionally, the only operation performed during the population of Bronze is restoring the data to the format it was in within the source system. For example, if we are ingesting data from an Event Hub, we typically need to convert the encoded body to a string and then convert that string to a JSON object. We will likely need to change data types from the external system to Delta and Spark alternatives, such as VARCHAR becoming a String. We should not update tables in Bronze or prune and transform columns. Instead, keep it as close as possible to the structure of the source system.

When we refine Bronze tables to Silver, we first apply business rules to filter invalid records. Once we have applied data quality rules, we can remove differences using the MERGE INTO construct. Note that there should be a one-to-one mapping from the source table to a Bronze version of that table and a refined and fit-for-use Silver table, as presented in Figure 2. Additionally, we avoid changing the columns during this transformation.



Figure 2: Zones

Once a table has been refined for the Silver zone, we can use it in data engineering activities. A common activity is to combine silver tables to prepare for their use in Gold items. For example, we may combine normalized tables to reduce effort later. We sometimes refer to this operation as Silver-to-Silver refinement, indicating that one or more Silver tables are used to construct other Silver tables. While these tables are fit-for-use, they were not created for a specific business purpose. When we create a table to address a business question, we place those tables in the Gold zone. Gold tables are often created by applying aggregations or joining multiple Silver or Gold tables, as presented in Figure 3.



Figure 3: Zone Movements

One key thing to consider when developing solutions for this pattern is that the Bronze table will grow to be large. That means that if we attempt to identify new records in the Bronze table by accessing the Bronze table, our search time will increase over time. This challenge is addressed through the use of Delta’s Change Data Feed (CDF).

CDF allows us to identify the records associated with each version quickly. This avoids searching the Bronze table and can greatly improve performance. Note that a record identified by the primary key might have multiple records in a single application of CDF to Silver.  This occurs when source to Bronze is not tied to Bronze to Silver. Once you have applied business rules to the records in the CDF, use a RANK OVER operation to get the most recent valid update to the Silver table.

Summary

When constructing or evaluating a framework to populate the tables in your Lakehouse, ensure that CDF, data quality rules, and RANK OVER operations are being used. If not used, your Lakehouse will likely perform well initially, only to gradually (if not suddenly) start to take longer to move records from Source to Bronze to Silver.

Saturday, October 21, 2023

 It has been far too long since I have posted anything here. I will attempt to do better, but we will see...

Monday, December 07, 2020

Traditional and Competency-Based Education


Competency-Based Education (CBE) is an emerging alternative to traditional educational approaches. To effectively engage in CBE, it is essential to understand how it differs from traditional educational approaches. To that end, the potential issues related to CBE will be explored, followed by the mitigation strategies. Next, the strengths of CBE will be discussed, followed by mechanisms to leverage those strengths. Lastly, the conclusion will be shared.

Potential Issues

To effectively discuss CBE’s potential issues, it is necessary first to evaluate the high-level differences between CBE and traditional educational approaches. The primary difference between traditional education and CBE is the focus on demonstrating skills or knowledge (Gervais, 2016). The CBE approach is based on the concept that once a student has demonstrated the skills associated with a given course, they receive credit. This approach contrasts with traditional approaches, which have a fixed timeline and require students to work at roughly the same pace.

Because CBE enables students to move at different speeds, it could be challenging to have a lecture that is beneficial to all members of the class.  For example, if one student enters the course with mastery of the course topics, the introductory discussions will be of little value to that student.  A more complex challenge is that students may have gaps in their knowledge.  They may have a great deal of depth in one area while missing fundamentals concepts. In this case, the students might be tempted to skip early work to complete the more advanced curriculum items.

 

 

Mitigation Strategies

As with many things, a single action cannot address the challenges previously discussed. Instead, a shift from traditional solutions to need-based instruction must occur. Additionally, CBE may require an instructor to teach in a different style than they have previously taught. The level of instructor engagement with an individual student must be much higher, given the customized and targeted educational experience associated with CBE.

For example, one approach would be to break instructional elements down to smaller segments. This point speaks to the challenges that the course designers must face when shifting to CBE (Cunningham, Key, & Capron, 2016). Rather than having a half-hour to an hour lecture, an instructor might employ small five to ten-minute miniature lectures targeted to a specific activity or unit of knowledge. Conducting these mini-discussions and then recording them so that a student can skip over the things they already know could be beneficial to the student.  Additionally, ensuring that the high-level concepts which the miniature lectures cover are documented might enable students with gaps in their knowledge to identify those elements. 

Strengths

From the student’s perspective, CBE’s primary strength is that it leverages their existing knowledge and background. Rather than assuming all students enter the course with the same level of knowledge, the assumption is that each student has potentially vast differences in their foundational knowledge. This shift in focus allows a student to complete work they know how to do quickly, enabling them to focus on improvement areas. While this could often be done in a flexible traditional course (working ahead), the student could not complete it early.

 

 

Leveraging Strengths

One way to leverage CBE’s strengths is to work closely with the student to ensure they have an accurate self-assessment of their knowledge. This leveraging of the strengths ties into CBE’s fundamental nature, where the instructor serves as a mentor and guide in the educational journey, not an oracle issuing lectures from on high.

Conclusion

A brief comparison of CBE to traditional educational approaches was presented. Understanding the differences between CBE and traditional educational approaches is essential to engaging in CBE as an instructor. An assessment of the potential issues and associated mitigation strategies were shared. The strengths and ways to maximize those strengths were explored. CBE is a powerful educational approach. The level of instructor engagement, targeted instruction, and student-driven timelines make CBE a compelling alternative to traditional education.

 


 

References

Cunningham, J., Key, E., & Capron, R. (2016). An evaluation of competency‐based education programs: A study of the development process of competency‐Based programs. The Journal of Competency‐Based Education, 1(3), 130-139.

Gervais, J. (2016). The operational definition of competency‐based education. The Journal of Competency‐Based Education, 1(2), 98-106. 

Monday, August 03, 2020

Discrete Event Simulation with SimPy using Databricks



Databricks is a notebook-based unified solution for performing various types of processing. It supports multiple programming languages, including Python. Databricks offers a free community version suitable for educational and training purposes (Databricks, n.d.). Discrete Event Simulation is a type of simulation focused on the occurrence of events. SimPy is a Python library that enables discrete event workloads (Team SimPy, 2020).  Using Databricks is a viable alternative to installing Python and Jupyter.

Getting Started

The first step is to go to community.cloud.databricks.com and create an account.

You will likely be asked if you want a free trial on AWS or Azure or if you want to use the community edition. For educational purposes, the community edition is sufficient (and free).


Once you’ve selected Getting Started under Community Edition you will receive notification that an email is being sent to you.  Opening the link lands you on a page where you are asked to reset (you are setting) your password.

After you’ve assigned your password, you land in the Databricks environment.

The landing page contains links to common tasks, along with a left navigation bar. If you are using a library like SimPy, it is a good idea to import it and set it to always be installed when a cluster is created.  The community edition seems to delete clusters after periods of inactivity.  Importing the SImPy library and selecting that should always be installed on clusters means you do not have to reimport it each time.  Selecting Import Library brings up a screen named “Create Library.” Don’t be confused. You are actually importing it into the environment. 
To import SimPy select PyPi for the Library Source and enter SimPy in the package name. Then hit Create.

You will likely see a screen saying that there are no clusters available.  Click the option which states “Install automatically on all clusters”

You will be prompted to ensure you really want to enable this feature.  As long as you are using versions less than 7, you do.
At this point, you have told Databricks that when you create a cluster, you want SimPy to always be installed and available.  The next step is to use it. To do so, you need to the landing page.
You might think you do this by clicking the Home link in the Left Nav, instead, click the databricks icon above it.  The home icon is used to navigate your notebooks and libraries. It is also where you can import and export notebooks, create Notebooks – libraries – folders – and ML Experiments.

Creating and Running a Notebook

Databricks uses the notebook paradigm made popular by Jupyter. From the landing page, click New Notebook. Alternatively, notebooks can be created using the Home link mention above.  Clicking New Notebook brings up a dialog where you supply the name of the notebook and select the language and cluster to use.  In this case, we have not created any clusters, so you will leave that blank. Since we are exploring the use of SimPy, ensure that Python is selected.

After clicking Create, you will land in a notebook with one cell and one line in that cell.

To add more cells, hover in the middle of the box on the lines above or below the cell.

Alternatively, click the chevron (Edit Menu) and select the desired location for the new cell.

Most of the time, the first thing you’ll want to do is import libraries so they are available for use within your code.

While the program is not very interesting, you can run it by doing a Shift-Enter while in the cell, or selecting the play button from the menu at the top.  Since we have no clusters, you will see a prompt asking if you would like to Launch a new cluster and run your notebook.  You most likely want to check the box that launches and attaches without prompting. Select Launch and Run.

At this point Databricks is creating a cluster with no worker nodes and single driver node. Since we are using the Community edition, there are no worker nodes.  Since we are writing Python that is not using Spark, that does not matter.  If we were addressing Big Data problems, then we would want to use a version of Databricks hosted in AWS or Azure.
After the cluster is created, the command is executed.  You can tell the results of the execution by looking below the cell.  You should see something like the following:

Now that we have a cluster, we need to write the rest of our simplistic simulation.  Since SimPy uses the concept of a generator to produce events, we need to write one. Below the first cell, add another cell and write code similar to the following:

The key instruction in this statement is the yield statement.  Next, we need to use this generator within SimPy. To do that, add a cell below the one you just created and put the following in it.

The final step is to run all of the cells.  You can do that by hitting Shift-Enter in each cell or select the run all (play button) at the top of the screen.

Generally, we would want to add documentation to our notebook, so that others can understand it better. To do this, add a cell above your first cell, and start it with %md.  This is a magic string that turns the cell into a markdown cell. Enter a description using markdown syntax and then run the cell.

After execution, the cell converts to the output associated with the markdown.

Publishing the Notebook

Often you need to share your finished.  To do this, you publish the notebook. This action results in a shareable link.  Note that anyone who has the link can view it.  The AWS or Azure hosted version of Databricks has a richer sharing model where users can collaborate on a notebook in real-time.

When you click the publish button you will be prompted to ensure you really want to publish the notebook.  Also, note that the share links are active for six months.

Once published, you will be given the link which you can share.

The link for the notebook used in this example is:

Additional Resources

Databricks has a Get Started page, which quickly walks a new user through the environment.
On that page, the quick start is very useful.

Conclusion

I walked through the process of signing up for Databricks Community Edition. I then shared the process of importing a library (SimPy), and then using that library on a cluster.  We created a die-rolling example and talked about how to publish that example. While this is not an exhaustive coverage of the topic, hopefully, this information will help you get started with SimPy in Dataricks.
References
Databricks. (n.d.). Databricks Community Edition. https://community.cloud.databricks.com/

Team SimPy. (2020). SimPy: Discrete event simulation for Python. https://simpy.readthedocs.io/en/latest/



Tuesday, July 02, 2019

Learner-Centered Teaching 

Learner-centered teaching is a form of instruction which focuses on the needs and interests of the learner (KeenGwe, OnChwari, & OnChwari, 2009).  The idea is to engage the learner in the learning process, help them take responsibility for their learning, and help them learn how to learn. In this form of instruction, engagement is critical.

The role of an educator in a learner-centered environment is that of guide and facilitator (Weimer, 2002).  That guidance should be customized and based on the unique identity of the student.  Rather than lecturing to a class, our role should be to answer questions, point out potential pitfalls, and ensure the students are engaged.

The way this can be applied in a computer science course is by assessing everyone’s strengths and weaknesses and providing specific guidance to each.  While this will be challenging, the benefits to the student make such effort worthwhile. As with other disciplines, establishing a trusting relationship built on mutual respect will provide a clear channel of communication.  The students must be comfortable to say “I don’t get it” without fear of embarrassment or loss of stature.

References
KeenGwe, J., OnChwari, G., & OnChwari, J. (2009). Technology and student learning: Towards a learner-centered teaching model. AACE Journal, 17(1), 11-22.
Weimer, M. (2002). Learner-centered teaching: Five key changes to practice: John Wiley & Sons.