Summary
The x2a-convertor migrate command enters an infinite loop during Ansible code generation when it encounters ansible-lint issues it cannot resolve. The LLM agent repeatedly rewrites the same file and re-runs ansible-lint, never succeeding, until it hits the LangGraph recursion limit of 500.
Steps to Reproduce
- Clone the x2ansible chef-examples repo:
git clone https://github.com/x2ansible/chef-examples.git cd chef-examples - Run init and analyze for the nginx-multisite cookbook
- Run the migrate command:
podman run --rm -ti \ -v $(pwd)/:/app/source:Z \ -e LLM_MODEL=anthropic.claude-3-7-sonnet-20250219-v1:0 \ -e AWS_REGION=eu-west-2 \ -e AWS_BEARER_TOKEN_BEDROCK=$AWS_BEARER_TOKEN_BEDROCK \ quay.io/x2ansible/x2a-convertor:latest \ migrate --source-dir /app/source/ --source-technology Chef \ --high-level-migration-plan migration-plan.md \ --module-migration-plan migration-plan-nginx-multisite.md \ "Convert the 'nginx-multisite' module"
Expected Behavior
The migrate agent should detect when it has failed to fix the same ansible-lint issues after a reasonable number of retries (e.g., 3-5 attempts) and either:
- Accept the code with known lint warnings and proceed
- Report the unfixable issues and move on to the next task
- Bail out gracefully with a clear error message
Actual Behavior
The agent gets stuck in a loop on ansible/roles/nginx_multisite/tasks/security.yml:
- Generates/rewrites the file using ansible_write tool
- Runs ansible_lint tool - finds 2 issues
- Rewrites the file to try to fix the issues
- Runs ansible_lint again - still finds 2 issues
- Repeats this cycle hundreds of times
- Eventually hits the LangGraph recursion limit of 500:
Error: Recursion limit of 500 reached without hitting a stop condition. For troubleshooting, visit: https://docs.langchain.com/oss/python/langgraph/errors/GRAPH_RECURSION_LIMIT
Additionally, ansible-lint itself is throwing internal errors during each iteration:
ERROR:ansiblelint.skip_utils: Error trying to append skipped rules RuntimeError: Error in matching skip comment to a task
This suggests the lint issue may be caused by an ansible-lint bug with skip comments, making it impossible for the LLM to fix by rewriting the Ansible code.
Impact
- Wasted LLM API credits: Hundreds of Bedrock API calls for the same failing operation
- CI pipeline failure: The Jenkins job fails after running for an extended period
- No usable output: The migration does not complete, so no Ansible code is produced for the module
Environment
- x2a-convertor image: quay.io/x2ansible/x2a-convertor:latest
- LLM Model: anthropic.claude-3-7-sonnet-20250219-v1:0
- Source cookbook: nginx-multisite from https://github.com/x2ansible/chef-examples.git
- Jenkins job: flightpath-x2ansible build #7
Suggested Fix
Add a maximum retry counter for lint-fix cycles in the migrate agent. After N failed attempts to fix the same lint issues on the same file, the agent should move on rather than retrying indefinitely.