SSIS and MSBuild: Resolving ambiguous remote build errors

At my current client we use MSBuild and Octopus to set up automated builds and continuous integration for our solutions. Every commit, a build server grabs the latest code, MSBuild compiles it, a NuGet package gets dropped into a drop folder, and Octopus then pulls the package from a package feed. Then we manually deploy as needed. This usually works pretty well, except… SSIS.

Builds for one of our SSIS packages started failing in July. A couple developers took some stabs at getting it working again, but were unable to do so. Then the backlog started filling. Seven months later, and here I am trying to resolve the build. I’m basically starting from scratch, so I don’t have a lot of knowledge of what may have caused the initial failures. But no problem!

I was able to resolve the build problems by logging directly into the build server and running build commands locally to get more specific errors. Eventually, I identified a third party module that was not present on our build server as the culprit of the problems, despite a couple red herrings. I discuss my process below.

Continue reading

Developing a practical understanding of internal and external tables in HDInsight

One has two options in creating HIVE tables in HDInsight: Internal, which is the default in a CREATE TABLE statement, and EXTERNAL, which is executed by CREATE EXTERNAL TABLE.

An internal table is one whose data is managed by Hive, so if you were to drop the table, the table information would go, and so would the data.

An external table is one whose data is NOT managed by Hive, so if you were to drop the table, the table information and any references to data would go, but the data would stay. Hive essentially becomes blind to the data, no matter where it is stored. There are certain misconceptions around INTERNAL tables and whether the data is also stored in the HIVE warehouse, which we will explore below.

So, how is Hive internal and external data stored in HDInsight? Let’s figure it out!

In this tutorial we will load sampledata onto BLOB storage. From there, we will create an external table and an internal table using Hive. Theoretically, an external table should keep our data in its original spot, while an internal table should move the data into the Hive warehouse. Let’s look at these scenarios in practice.

Continue reading

Creating a Local Archive of WordPress Installations

In college, I spent many evenings working on personal projects. These included “The Bear Trade,” a craigslist alternative for Baylor students; “NaturalStudyAids.com” which dropshipped natural ADHD supplements and was my first exposure to split-testing; “onlyatwalmart.net,” a site that automatically ripped content off of PeopleofWalmart, republished it, and made money off of ad clicks and good SEO; and various other personal blogs, among other sites. Trust me, there were a lot of projects. Ask me about “Sperrysocks.com” some day.

Anyway, after a number of failed projects and the realization that a lot of my information was still public on the internet, I decided it was time to purge my webserver. Doing so would reduce the number of exploits exposed from outdated software and allow me to tailor my personal web footprint and therefore my brand. Before deleting, I wanted to make a local replica of my websites, a number of which ran on WordPress.

WordPress is tricky, of course, as pages are built dynamically on the server side before being rendered in the DOM for the user. Thus, the only solution to have a WordPress installation available to view locally would be to install a server environment (MAMP or WAMP), copy your PHP files, DB, and scripts to it, and then deal with the hassle of renaming the WordPress installation paths (Note: A serialized find and replace script on your SQL database makes this part a lot easier). Not only that, you would have to spin up the server instance anytime you wanted to view your sites, and worry about keeping your software up-to-date.

Instead of going through this arduous process, I opted to use a mirroring utility. Such a utility starts at a root URL, and recursively mirrors each hyperlink as a static local HTML file. The benefits are obvious of course: viewing your local copy of your WordPress site later is instant, and requires no server side compiling. The downside is space is increased, as every potential dynamic page must be rendered and saved. Then again, storage cost hasn’t been the controlling variable in decision-making for years.

Wget is the obvious choice for recursive web downloads. This utility is part of the GNU project, is linux based, and can traverse over FTP or HTTP. After downloading your files, all links are changed to their relative paths on your local machine. Thus, clicking around the website never makes a remote HTTP request, and all resources remain local as well. I tried WinHTTrak, but could not get the utility to change resource paths from remote to relative/local; otherwise this would have been the utility of choice for someone unfamiliar or uncomfortable with a command line interface. I eventually used WinWGet, a GUI’d windows port of the utility.

Wget is pretty straightforward to use, but here are the hic-ups I ran into:

Conversion of links to relative paths (-k paramater) is the final step of the wget job
If your job fails or is cancelled for any reason, the URIs within a page remain pointed to their remote locations. Resources, however, appeared to be rendered using their new local, relative paths (if I remember correctly). If you use WinWGet, as I did, a failed job will sometimes show as complete. So don’t be quick to keep the local mirror and delete the remote! Consult the log no matter what to ensure -k actually performed. The log will tell you how many files were “converted.”

404s Lead to Failed jobs
If the utility encounters too many 404 URLs, it will stop running without any indication of success or failure. A 404 typically, though not always, will cause the job to stop and fail. Once again, WinWGet does not make this clear. A failed job due to 404s will still show the Green checkmark next to the job name, misleading you to believe the job was successful.

The Agile Auditor: Project Management Insights for Assurance Services

If programmers in 2008 found it acceptable to call themselves “Code Ninjas,” perhaps it wouldn’t be too much of a stretch to give some sort of heroic nomenclature to other professions. Auditors, for instance, may be the “Batman” of the financial services industry. Batman, as the Dark Knight, works tirelessly to keep order in Gotham, to allow the city to operate as is was intended. Gotham, meanwhile, distrusts this hero and would sooner outlaw Batman than embrace the costs associated with his heroism. Similarly, Auditors find themselves serving those who often forget the importance of the role they perform. The client can temporarily forget that all the effort and externalities associated with an audit opinion ultimately allows the company to raise capital.

Auditors bring liquidity to capital markets. How? Auditing allows consumers to invest with confidence in public companies, providing assurance that the company does not engage in corrupt business practices or falsify financial information. It’s a noble profession, and a company could not raise capital without an auditor’s opinion. However, at the same time, companies can be rightly frustrated at the tremendous operational burden and cost of a year-end audit. While necessary, the audit is often a headache for the company trying to assert their financial position to the public.

The frustration involved in an audit is shared by both the company and the auditors, and is certainly understandable. The race to file the 10-K for a F100 company on time each year involves the review of thousands of client-provided documents, 16-hour days, and no weekends for the auditors. Clients, meanwhile, find themselves frustrated when documents are reviewed weeks after having been provided (or lost altogether). The audit team and the client find themselves confused about the actual status of the audit, with all of its moving parts, which results in a tremendous final effort to complete the project at filing time.

At the end of the day, an audit is a huge project management exercise than shares many similarities to software development projects. Like software developers, auditors work in small teams. Auditors also perform their work in silos, and review their work together at completion. Like software developers, auditors plan their project after gaining an understanding of requirements specific to the particularities at-hand.

Unlike software developers, auditors are engaged in a highly traditional profession that can date itself to the birth of capital markets. Software developers, on the other hand, are iconoclasts, parading through a naescent industry and toppling any idealistic pillar that has become irrelevant or ineffective. Auditors, of course, are willing to adopt new practices as well, but do so more slowly.

Software developers… are iconoclasts, parading through a naescent industry and toppling any idealistic pillar that has become irrelevant.

I would posit that auditors can reduce burden on their staff and clients, while still forming a valid opinion, if they were to consider project management principles that have birthed and evolved with the software development industry since 2000. In this series, I will compare the Audit Life Cycle to Traditional and Agile Life Cycles in Software Development. In Part 3, I will offer some practical and theoretical considerations for how Audit project managers can adopt Agile principles to increase project profitability and efficiency while reducing burden on the client. Stay tuned!

Jump to parts in this series:

Part 1: The Agile Auditor: Project Management Insights for Assurance Services (current)
Part 2: The Agile Auditor: Comparing the SDLC to the Audit Life Cycle (coming soon)
Part 3: The Agile Auditor: Applying Agile Principles to Audit Projects (coming soon)
Part 4: The Agile Auditor: Consulting Opportunities in Public Accounting (coming soon)