Share this
Google Cloud Data Summit 2022
by Meliska Meintjes on Apr 12, 2022 3:56:29 PM
Google continues to be an industry leader in data and analytics, and they have once again outdone themselves (if you ask us) at the newest Cloud Data Summit 2022 event.
Those participating shared insights on how you can advance your business using the next generation of data solutions. Using key innovations in AI, machine learning, analytics, and databases, you and your organization can solve complex challenges and make smarter decisions.
Although the digital event ended, the sessions can still be viewed on-demand here.
Our key takeaways from the event
BigLake
Organizations store, manage and distribute data more than ever before, which is still growing by the day. If you want to watch the session on-demand, click here.
This new Google Cloud Platform (GCP) service allows customers to integrate data lakes and warehouses, manage access on a row and column level, and analyze that data using the GCP native tool BigQuery on an open-source processing service such as Spark (through BigQuery Storage Read API).
This extends a decade of BigQuery innovations to data lakes with the support of
- multi-cloud storage
- open formats
- unified security and governance
BigLake is based on BigQuery (BQ) and allows you to examine files in familiar formats (CSV, JSON, Avro, Parque, and ORC) that may be spread over several cloud storage systems (GCP – cloud storage, AWS S3, Azure – Blob Storage) from a centralized place. This enables a single source of data "truth" to be shared across numerous cloud platforms without duplicating or copying your data.
BigLake extends BigQuery to data lakes, so in the fullness of time, you can get the same functionality as BigQuery (even through the BigLake storage APIs for external engines). It does not necessarily help transfer data from Snowflake to BigQuery, though.
To start using BigLake features in your BigQuery, you must first enable the Bigquery Connection API.
- Once enabled, you can navigate to your BigQuery services and add a new External data source by clicking on the "+ ADD DATA" button and selecting the External data source.
- As a Connection Type, choose "Cloud Resource (for BigLake tables)." If desired, give it a distinct connection id and a friendly name, data location, and a description. To connect to Amazon S3 or Azure Blob Storage, you must use the "via BigQuery Omni" connection type.
- Once the new external data source has been added, it should be visible in the left menu under "External connections."
- Go to the new connection and copy the service account information. This service account allows you to access data stored in the cloud. Add the service account to your project as a "Storage Object Viewer" in IAM & Admin. This enables the connection to access data from your Cloud Storage buckets.
- You may now construct your first BigLake table, which will read data from your cloud storage bucket. To make a new table, go to an existing dataset (or make a new one) and select Create Table. Select "Google Cloud Storage" under "Create a table from." To read files from an external cloud provider, you might alternatively pick Amazon S3 or Azure Blob Storage here.
- After specifying the file's location, format, and desired destination, pick the table type as External table, which instructs BigQuery not to import the file but rather read it from an external data source. Then, check the box next to "Use Cloud Resource connection to establish approved external table," which will open the Connection ID drop-down menu and allow you to pick the connection you built in the previous step.
- Complete the remaining options (Schema, Partition/Cluster settings, and advanced options) and click "CREATE TABLE."
- You should now be able to query the table. These tables are read-only (no DML statements) and may be used with BigQuery's native tables in queries. You may also begin implementing access controls to these tables to utilize BigLake's capabilities fully.
Further announcements
Google also announced updates to Analytics Hub and introduced Spark everywhere, and Dataflow Prime GA.
They further launched Dataplex GA.
With Dataplex's intelligent data fabric, companies can centrally manage, monitor, and administer their data across data lakes, data warehouses, and data marts with uniform rules, enabling access to reliable data and powering analytics at scale.
Security and governance are centralized, allowing for distributed ownership while maintaining global control.
Your data intelligence is included in harmonizing scattered data without requiring data migration.
It is an open platform that supports open source technologies and has a robust partner ecosystem.
ABOUT CRYSTALLOIDS
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids creates crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach, Crystalloids ensures that use cases show immediate value to their clients and make their job focus more on decision making and less on programming.
Share this
- December 2024 (1)
- November 2024 (5)
- October 2024 (2)
- September 2024 (1)
- August 2024 (1)
- July 2024 (4)
- June 2024 (2)
- May 2024 (1)
- April 2024 (4)
- March 2024 (2)
- February 2024 (2)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (1)
- January 2023 (4)
- December 2022 (3)
- November 2022 (5)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (3)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (2)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (2)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (2)
- February 2020 (1)
- January 2020 (1)
- December 2019 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (5)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- November 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (3)
- July 2018 (3)
- May 2018 (2)
- April 2018 (4)
- March 2018 (5)
- February 2018 (2)
- January 2018 (3)
- November 2017 (2)
- October 2017 (2)