

Fundamentals of Data Engineering: Plan and Build Robust Data Systems

C**N
Excellent beginner’s data engineering book
Excellent beginner’s data engineering book—clear, practical, and well-structured. Covers core concepts, tools, and workflows to build real-world pipelines confidently.
A**R
Must read for anyone in data
I finished reading “Fundamentals of Data Engineering” by Joe Reis and Matt Houser. The book is wonderful and a must read for any data professional.My favorite part is that the book is tool-agnostic. While there are many books that teach data engineering in one specific tool or language, Fundamentals of Data Engineering succeeds in explaining data engineering concepts without being attached to a tool.What also caught my attention is that this book is very well designed for a broad audience. There is value for anyone who is either a seasoned professional, or someone who is relatively new to the field (like me).This field is constantly evolving in a very fast pace, and I can see how Fundamentals of Data Engineering is one of the books that will stand the test of time. I highly recommend it to anyone in the data industry.
G**E
Good book on Data engineering!
Good book on Data engineering! Highly recommend. Good overview / summary for hands-on practitioners and managers on what data engineering is and different phases it goes through.
Y**P
Worth reading, but only once.
Decent book and I liked the writing style a lot. Authors are not afraid to use words like “wild”, “catastrophic” and other strong phrases often and that is very compelling to me.But if you’ve been in the industry for some time or in adjacent role (SWE for example), I feel it doesn’t bring that much value. I’m content that I read it, but the chances I’ll read it again is very slim.
S**S
Great Introductory Book with Flaws in Prose and Content
I read this book end-to-end over a 2 month period, making notes and reviewing chapters as I went. I'm coming at it from the perspective of a data analyst with a reasonable amount of exposure to the concepts and practises mentioned.One of the best attributes about this book is that it is one of, if not the, first high level introductions that tries to remain technology agnostic. Unlike many others that define data engineering as "use of pyspark" or "use of Hadoop", this book tries and (mostly) succeeds in setting up a universally applicable framework or "lifecycle" through which the data flows through. It tries to instil in the reader a respect for good architecture, security, and ROI, beyond just playing with the latest toys. In doing so it casts a wide net, giving information ranging from the very "guts" of the hardware (HDD, SSD, etc.) right up to finops and stakeholder management. There's no doubt reading this book will give you a strong framework in how to view data engineering and a serviceable exposure to many of the terms, concepts and technologies therein.So this book is worth reading, but it's far from perfect. The first major thing is that this book is not a "holistic" product. It doesn't read end-to-end as a cohesive narrative with each chapter building on the ones prior. Some concepts are mentioned before they are described in any detail. Others are repeated multiple times across different chapters (I estimate the length of this book could be decreased by 15-20% without impacting the flow by removing redundancy). The choice of what to focus on is a bit odd. Some subjects are given enormous amounts of coverage explained in detail, while others get a paragraph. The level of technical detail also varies greatly. Some concepts are described as if it's talking to someone with no background in data/software eng., while others are a struggle to comprehend unless you've been exposed to it before.An additional thing worth noting is that, perhaps because of the background of the authors, it definitely reads like a book aimed for those who work in "big tech". The major way this manifests is in the book's incessantly repeated assertion that streaming data is the future, and the enormous amount of ink spilt detailing it. From the position of someone working in a non-FAANG company that doesn't need down-to-the-second data, this seems far more like an opinion then a fact. If their prediction comes true then they get my praise, but if they don't it's a massive and wearying drag for a topic that probably less than 5% of data engineers really need to be concerned about.In summary, this is a good, necessary book that provides a great introduction to the field while also clearly being a first edition by authors who haven't quite perfected the art of relaying information efficiently. Recommended to people trying to break into the field and those who want to catch up on key concepts. Might be worth waiting for a 2nd or 3rd edition though.
S**V
Recommendation
I consider this book a must read for getting the data engineering basics right.
E**Y
Great Starting Point for A Software Engineer
It is a popular book for a reason . Very theory based. I read it in conjunction with the data engineering class Joe has online. The class filled in a lot of the practical gaps.
A**R
Above and Beyond expectations
Many books in the computer science domain are informative, readable and useful. This book exceeds in all those categories, and in my opinion, achieves something more difficult: it is enjoyable to read. The concise and often times clever analogies efficiently compress complex ideas into a digestible concepts. Though the writing style is similar to other O'Reilly books in the same genre, you get a little more personality coming through the pages.
Trustpilot
2 months ago
2 months ago